I got a chance to catch up with 11Ants Analytics recently. 11Ants Analytics is a spin-off out of the University of Waikato (the source of the WEKA project), and has commercialized technology for automating the production of predictive models. The technology was inspired by research at Waikato – a leading center for machine learning.. 11Ants has taken this machine learning technology and built a commercial product – the 11Ants Model Builder – aimed at making these algorithms available to users who are not expert data miners.
11Ants Analytics sees these non experts as having limited options when it comes to data mining. They could become a data miner investing lots of time in the techniques and technology, partner with someone who already is a data miner or even outsource their data mining tasks. 11Ants Analytics wants them to be able to build the models for themselves while continuing to do their day jobs. It should also be noted that even experts spend a lot of time doing “grunt” work like preparing data sets and struggle to have the time to do things like compare lots of modeling techniques or develop ensemble models to see what is most predictive.
The result combines a library of 11 machine learning algorithms that are widely used (Decision Tree, Gaussian Processes, Logistic Regression, Logit Boost, Model Tree, Naïve Bayes, Nearest Neighbors, PLS, Random Forest, Ridge Regression, Support Vector Machine) with a proprietary HyperLean ™ supervisory layer. This layer acts as a kind of auto pilot for building models. The Model Builder product gets deployed as four buttons in Excel – Split, Analyze, Predict and Manage – and all user interface with the application is via Excel. 11Ants Model Builder allows a user to either build actual predictive models and also determine which variables are predictive. The predictive models can predict a number (a risk score say), a category (which segment is someone in) or a propensity (how likely is something to be true).
11Ants Model Builder for Excel is the core product (for modeling with up to 1M records). It is paired with a new product (currently in Beta) called 11Ants MegaLearn designed for “Big Data” with hundreds of millions of records. This can monitor data for new records and rebuild and retune models incrementally as new data flows in. The predictive models can be deployed into 11Ants Predictor for execution. This allows the constructed models to be used to update a database with results in real time. The product is highly automated and 11Ants claim it generates effective predictions also (they have some examples from things like the KD Cup to back this up).
Using it is simple. First a user brings data into Excel. Using the tool starts a several step process where the user is guided through the set up steps. First the tool divides up the data into training and test datasets. The user then picks the target column and the variable columns, confirms which columns have discrete or continuous variables and presses a button to start the modeling process.
The engine then builds a large number of models (over 5,000 is typical) using some of the 11 algorithms (not all are relevant for each kind of prediction). The engine also tries various ensembles of these algorithms, uses different splits and parameters etc. The engine prioritizes big predictive gains early and as the models are built the engine keeps a running top 10 that the user can see. In addition the best model by algorithm and the history of improvements as the machine learning algorithms run can be viewed. Once done, or even while processing continues in the background, the user can also see the top input influencers. These are calculated by seeing how much less predictive models get if specific columns are left out – the ones that make the biggest difference are the major influencers.
The engine continues to tweak and change things for as long as the user allows but generally the improvements get smaller and take longer to find. This allows a user to cut if off after a reasonable time Small datasets get to thousands of models considered in seconds while large ones (a million rows and 20 input variables for instance) might run overnight. When the engine is done or the user stops the modeling, the best performing of these candidate models is identified and recorded (as are the 9 runners up).
The best model (or one of the runners up) is then tested against the test data to see how predictive it is. The results are shown in Excel using a scatter plot of predicted value v actual, predictive error rate graphs and some statistical measures. Obviously because the data is presented in Excel the user has all the usual Excel features for analysis.
Users can also just type or select some values and apply a model to them to get a prediction – the Excel buttons can apply any of the generated models against a selected data. The generated models are also available as files (in an 11Ants proprietary format) for execution outside of Excel using 11Ants Predictor. This can be hooked up to a database or to Excel, providing an Excel function that takes inputs and returns the prediction. For now that’s the only deployment option but 11Ants is working on generating SAS code and is considering PMML (though not all of their 11 algorithms are supported yet).
The base product works only from the variables provided – it does not attempt to generate potentially predictive variables derived from the data – but the MegaLearn product does looks for opportunities like deltas between columns to see if such calculated attributes might be predictive.
Besides the platform products, 11Ants are releasing two specific solutions – Customer Churn Analyzer and Customer Response Modeler. These have a similar style to the base product with a set of Excel buttons. Data is split and then customer data are analyzed similarly. Once built the churn or response models can be tested and used to score customers. The results of the models are presented as a lift chart and the customer can put in lifetime values, cost to communicate etc to see a profit curve. Top churn factors can be shown and the effect of values in these factors on churn can be seen – for instance if City is a high churn predictor then the distribution of churn/loyal customers by City can be shown. The various graphs and options are specific to Customer Churn or Response allowing the interface to be even less technical and more business friendly than the base product.
An evaluation version of 11Ants Model Builder can be downloaded from www.11AntsAnalytics.com. Some short videos of the tool in action, can be seen here: