Model Builder is FICO’s predictive analytic workbench remains focused on superior scorecard technology with advanced inferencing and reject inference (described in my review of Model Builder 7.1). Integration with the Blaze Advisor BRMS and integration for common variable libraries remains a largely unique feature. The focus of the newly announced Model Builder 7.2 is on segmented ensemble modeling.
Ensemble models are a hot topic right now and the banking industry has been using an ensemble model for a long time without calling it one. Much risk decisioning uses a decision tree with leaf node models that takes advantage of unique predictive patterns of each segment but predicts for the population as a whole. This combination of a decision tree model and various predictive scorecard models is an ensemble model. To reflect this, FICO has renamed their Segmentation ART module to Segmented Ensembles Module and updated it in 7.2
The algorithm has been updated and refined to accelerate the search, to better estimate the performance of the segmentation, and to provide modelers with greater control. The traditional approach is to use CHAID or CART to build a decision tree that has “pure” leaf nodes with a coherent group of targets. A model then needs to be built for each node. FICO has an algorithm that generates 1000s of candidate trees, and trains scorecards for each leaf node in each tree or partial tree generated. The overall performance of each tree and related models is then considered to determine the best segmentation schemes. Coarse binning and automated variable selection help ensure robust performance estimates with the small populations typical of nodes in deep trees. Lots of parameters can be specified like depth of search or candidate splitters and it remembers “favorite” trees for comparison across subsequent runs. The tree search allows you to grow a new tree, start with an existing tree (PMML import) and extend or manually define a tree. The comparison of trees can use new trees discovered by the algorithm and existing trees or scores from existing setups.
Model Builder has a number of reports on performance as well as structure, variable usage etc. Reports can do things like show which variables are used in which leaf node model etc. Reports also highlight issues if a leaf node model is out of synch with the others in some way to ensure you have a cohesive segmented model.
Users can also bring in their own segmentation model. This will not use the search algorithm but will allow modelers to manage the models as a single ensemble model (rather than a segmentation tree and a bunch of predictive scorecards that must be manually kept synchronized) and deploy the ensemble using Model Builder’s deployment capabilities.
Any tree can be selected and promoted for ongoing management. Leaf nodes can be named, models retrained and engineered to be more precise. The overall model keeps track of the status of each node model so as modelers work on leaf node models the overall ensemble shows the changes. The modeling and retraining environment for the leaf models is the standard scorecard editor except that it incorporates reason code assignments based on the model selection. This reflects the fact that the scorecard for a leaf node is not the only source for points and reason codes – the segmentation model also impacts it. The whole thing can be deployed as a single unit to Blaze Advisor. Model Builder generates a ruleflow with steps for rules, variable calculation and model scoring and pushes this into the shared repository. It can also be exported as PMML though it needs some stitching together afterwards.
Some additional bits and pieces
- Model Tracking Reports have been extended and enhanced – population stability and characteristic analysis – that measure recent scores. These are helpful when there is a lag between the score’s calculation and the business results, as they help you assess whether the populations are changing in some significant way long before you have the business results.
- Faster creation and modification of predictive variables by letting modelers write a script and use automation to create similar variables such as “number of times payment was X days late in last Y months”.
- Automated binning – improving runtime performance, reduced memory footprint and enhanced defaults for more palatable bins, as determined from variable metadata.
Don’t forget the Decision Management Technology Map