I got an update on Enterprise Miner™ from the folks at SAS recently. Enterprise Miner is their development tool for data miners and predictive analytic specialists and is a graphical environment for designing and executing the steps in the creation of a predictive analytic model. Enterprise Miner 7.1 was part of the SAS 9.3 release that was released July 12th 2011. Key elements in the a new release were improved incorporation of the timing element (survival and time series data mining), insurance pricing models for rate making (a challenge due to the large number of people who make no claims relative to the those who make really large claims), credit scorecard extensions and full spectrum SAS data mining in Teradata 13.
Managing the time dimension of data is important in data mining as changes over time are some of the most predictive elements of your data. As a result time series data mining has always been part of how folks do data mining. SAS has added tools for reducing the dimensionality of data to enable similarity analysis (finding transactions that are like ones known to be fraudulent for instance) and matching patterns that include multiple transactions over time. Automated data preparation steps have been added to make it easier to include temporal relationships in modeling (automatically creating average daily balances by time period for instance). Survival analysis has been added to allow the prediction of event probabilities at discrete time intervals along with competing risks – the overall survival function is overlaid with a hazard function to break down the risk by discrete time periods. For instance an overall churn survival rate over 24 months might be broken down into discrete intervals to show much higher risk of churn in the first month and the last. This function also allows you to calculate “mean residual life” for customers – an essential ingredient for customer life time value calculations (after all if you don’t know how long you are likely to keep a customer you can hardly calculate their value to you).
Insurance Ratemaking is a big deal for insurance companies and is very similar to credit scorecarding in terms of the regulation and the need to bin predictor variables to generate an explicable rating structure. The problem is that you have very few interactions for most customers. Unlike a credit product where customers use them all the time, most customers don’t make insurance claims in a given year. Managing this very sparse data set is a particular challenge. The tool allows you to create models for claim frequency, severity and pure premium.
Finally, SAS has upgraded their credit scoring component to handle adverse reason codes (why are you being turned down) as well as constrained optimized variable binning. Binning is a critical step in building risk scorecards. An analyst must identify the upper and lower boundaries of a variable in the model that will be considered to have the same impact on risk. In other words a value for that variable in the range defined for a bin is considered equivalent when it comes to calculating risk. This is a time consuming activity and the results are highly constrained by regulation. SAS has developed some new automated tools (that they are patenting) to make this process quicker and more accurate.
Finally the integration with Teradata brings everything to Teradata 13 – Scoring was previously supported but the modeling was not done in-database (logistic regression particularly was requested by customers). Now everything including data preparation, summarization, transformations, variable selection, modeling AND scoring is done in-database. This also takes advantage of the time dimension management capabilities of Teradata and means that SAS’ support for Teradata is as good as anyone’s.
Other enhancements to the product include support for Support Vector Machines, increased PMML 4.0 support in scoring code (including some support for specifying the transformations a model needs using the new transformation schema), improvements in decision tree interactive pruning and improvements in the SAS Rapid Predictive Modeler interface.