The folks from SAS gave me a quick update the other day on SAS Enterprise Miner and SAS Model Manager, two of SAS’ data mining/predictive analytics products. I often blog about SAS, as you would expect, but I have not done any product posts. SAS’ Business Analytics Framework is focused on helping organizations find the answers to questions and make decisions. These decisions can, of course, be everything from transactional or embedded operational decisions to more collaborative tactical decisions to transformative, strategic decisions. SAS Analytics range from data visualization to statistics, data mining, forecasting, econometrics, operational research, text analytics and model management and deployment.
SAS Enterprise Miner 6.1 has been around for a decade or so and is a data mining workbench to build primarily classification and predictive models. The product also supports clustering, market basket analysis and web path analysis, among other tasks. It is built around a process view of model development so you can define a repeatable flow to produce a model. It supports groups of modelers who are collaborating on a project as this is increasingly common. It supports multi-threaded and grid-enabled modeling activities to support very large, distributed problems.
The tools are organized into the Sample, Explore, Modify, Model and Assess (SEMMA) model development process:
- Sample and Explore includes selecting data and understanding variable relationships (it can use data from the rest of the SAS stack but also handles things like oversampling of rare target events , balancing, data partitioning to help prevent overfitting) and evaluating it (statistical measures like variable importance rankings, correlations, k-means and cluster analysis as well as outlier identification ). One nice feature is the ability to incorporate profit and cost estimates into the model to maximize revenue or minimize cost explicitly rather than optimizing against a common statistical measure, such as, misclassification rate or an average squared error. . SAS also provides native access engines for , IBM DB2, Oracle, Teradata, Microsoft and host of other databases, as well as SQL access for all the major database platforms
- Modify and Model supports binning and scaling, replace or recode variables, impute missing values, etc. Some of these tools are interactive – the decision tree tool can be used interactively to support multiple targets (segment first based on loyalty then on profitability say) to support developing segmentation strategies. Lots of modeling methods are available including neural nets, gradient boosting, memory based reasoning , logistic regression, least angular regression splines, etc. Modeling routines run against the training, validation and testing data. Some of the SAS regression algorithms can be executed inside the Teradata database engine with plans to also support additional databases in future.
- Assess supports integrated model comparison based on several criteria, such as lift at given depth of file, misclassification and profit. Model comparison statistics are computed for all of the partitioned data sources to help insure the model does not over-fit Lots of profiling tools as well as support for training and monitoring champion/challenger in the development environment. The goal is to select the best model that will generalize well when applied to new data (i.e. model scoring)
SAS Enterprise Miner also generates scoring code – not just for the model but also or the transformations, binning, etc. (i.e. the characteristics and calculations for the whole scoring process). Base SAS is the traditional code deployment for batch scoring for instance and customers can just call the code from Base SAS to score their records. SAS Scoring Accelerators for Neteeza, Teradata and IBM DB2 allow models to be deployed to these systems for scoring purposes, while PMML, C and Java score code generation is also available. Both batch and real-time scoring is supported.
SAS Enterprise Miner is a classic and very robust graphical workbench for modelers to develop and deploy predictive models and conduct other data mining tasks. SAS Enterprise Miner can be used with a second workbench, SAS Model Manager.
SAS Model Manager is design to support organizations so they can get models out into production. To do this you need to validate the models outside the development tool, deploy them and then measure and monitor them to keep them up to date. The product supports a set of functions:
- Register candidate models, compare them and declare a champion
- Validate, freeze and deploy
- Validate scoring, monitor performance and ultimately either retire a model or request a new challenger model
Many of these steps can be performed by non modelers using SAS Model Manager – the product is not designed specifically for modelers, unlike SAS Enterprise Miner. Roles and responsibilities are clearly defined and registered within the tool. Auditability and deployment decision tracking are critical for lots of organizations and controls, tracking and separation of roles are all supported.
SAS Model Manager offers a central repository, pre-built templates to register models, store the models and administers access, validation of models before production deployment, reporting to compare and manage performance, managing lifecycle. Customers have hundreds or even thousands of models across many different areas of the business and of many different types. A project-based hierarchy, managed mapping of data sources, multiple versions of champions and challengers, strong performance reporting and comparison tools complete the product.
As I said, this product is not aimed at modelers necessarily – modelers supply models into the modeling environment but others (IT, the business) may use SAS Model Manager. No statistician skills are needed to be a user but you do need some understanding of concepts like lift curves, statistical comparisons etc.
The combination of the two – SAS Enterprise Miner and SAS Model Manager – is interesting. A graphical, repository-based environment for building and deploying models combined with a management tool aimed at non-modelers so they can be part of the process of monitoring and using models. I have often said that I don’t see enough engagement of the business or IT in modeling. SAS Model Manager makes this possible by providing a tool to facilitate the deployment process. An organization will still need to implement best practices for its business processes, but SAS Model Manager provides the analytic deployment and monitoring environment they will need to do so.