- Text analytics and semantic scorecards
- High Volume Processing with Hadoop – including algorithms moved to MapReduce
- R Integration
- And drifting away from Big Data, Economic Impact Modeling.
One of the core challenges in a Big Data world is its variety – especially the inclusion of more unstructured data. From an analytic perspective you need to be able to use data that does not have a traditional, repeatable schema. This data might be human generated (tweets) or machine generated (sensors) or machine recorded data of human activity (weblogs). This data is sometimes unstructured but often semi-structured. Initially the opportunity FICO sees is to include this information with other data to improve supervised model building – using it to help more accurately predict something like risk or fraud.
FICO Model Builder 7.4 therefore adds what they call a Semantic Scorecard. These include structured data and unstructured data. For example, a dataset might include notes about applications for loans or collectors notes. The modeler begins by identifying the target variable they want to predict and then running the text miner tool to see how that field might be predictive of that target variable. Model Builder provides Entity Extraction that indexes all the terms found in the unstructured field and users can identify synonyms and plurals etc. to generate a list of terms that most strongly correlate to the target variable. The algorithm can identify single words or the modeler can define the maximum length of phrases that they want to consider.
The occurrence, weight of evidence and information value of each term is identified and an exemplar viewer shows specific examples to build confidence in the terms. Stop words can be identified and other edits made to the list. All these edits/refinements can be managed and shared between model projects. Finally the modeler can add the terms they think are most useful to the dataset, creating a true/false field for each showing whether it was in the text for that record or not. These new fields can then be included in the standard modeling algorithms so that a modeler can use both structured data and the terms extracted from the unstructured data in familiar ways. FICO’s evidence is that there is a modest but genuine lift by including this kind of text analysis.
When it comes to predictive analytic modeling the volume and velocity of Big Data are also problems – too much data is arriving too fast. To deal with this many companies are turning to Hadoop and MapReduce to federate massive amounts of data onto a network of commodity hardware and then run algorithms against it on those same commodity nodes – bringing the computing to the data. Using Hadoop and Model Builder together on a large distributed hardware setup offers three types of benefit:
- Together they can be used to process more data in less time, particularly in up-front activities like data cleansing, feature generation and variable library calculations. It is also possible to use user-defined MapReduce tasks.
- As a result you can build suites of higher resolution models for many time horizons, different outcomes, more entities (customers as well as products).
- Finally you can update all these models every day.
To enable this FICO has parallelized many of its algorithms to run on MapReduce. An example calculation on 1Bn records currently takes 65 minutes. A single node Hadoop set up brings it up to 75 minutes but 16 nodes drives it down to 5 minutes. Similarly a complex calculation generating 510 features from 200M records on a single threaded solution is 700+ minutes. A single Hadoop node is slightly more, two threads or four threads drop the time proportionally but the number of threads maxes out as the number of cores maxes out. The Hadoop approach does not however, passing the performance of the multi-threaded approach.
The new release is also integrated with the R open source modeling algorithms. This enables a Model Builder user to go beyond the scorecards, neural networks, decision tress and segmentation that make up the core of predictive analytics (especially in banking). With integration to R the modeler can use any of the available algorithms as part of their modeling process. This is designed to allow a Model Builder user to take advantage of R rather than to act as a primary working environment for those just using R.
Outside of support for Big Data, FICO has added an Economic Impact modeling approach that injects economic outlooks into models. This allows you, for instance, to correct a risk model to allow for the fact that you are building it with data from a boom. An optional add-on for 7.4 allows you to correlate your portfolio performance to macro economic variables to see (and quantify) what macroeconomic indicators seem to have a impact on the performance or accuracy of your models.
FICO continues to be very focused on how to deploy the models developed in Model Builder to ensure you can get value from your analytics. Not only does Model Builder allow you to package up models for deployment, it is also integrated with Model Central (reviewed here) for broader model management and deployment capabilities.
You can get more information on FICO Model Builder here and FICO is one of the vendors in our Decision Management Systems Platform Technology Report.