I got an update from BigML recently – I first blogged about them back in December. BigML has continued to grow since then, opening the product to general use (you can sign up here). They now have over 4,600 users and more than 100 paying customers.
The product still offers a simple interface underpinned by a RESTful API for adding data sources (by uploading or linking to cloud-based sources), creating datasets, using these datasets to create models and run a variety of predictive analytics tasks. They have added several new features also, however:
- Just as a model can be built from a dataset with one click, now an ensemble can likewise be built in one click – or configured according to a variety of settings. Bagging or random forests approaches can be used, building multiple decision trees from small samples of the data and then creating an ensemble model from the set of resulting trees.
- Users can still view the decision tree, looking at the distribution between the branches at each node and picking a path to see how they get to that node. BigML has added a nice new visualization (one shown outcome distributions is shown at right) that they call a sunburst view. This has a ring for each layer in the tree and shows how much of the set goes into each branch within that ring. This shows the distribution nicely and can shade by prediction or by confidence. Each segment can be clicked on to become the center, zooming in on a piece of the model.
- Besides scoring in the cloud users can now download models, or ensembles, as working Python code, PMML, Ruby, Java, Objective C and a handful of other formats . This allows for multiple deployment options as well as cloud-based scoring through their API.
- The web interface now offers a simple one click evaluation of 80/20 splits, for instance, to allow the comparison of a model to a hold out set.
BigML has also updated the payment model from exclusively a pay as you go approach, to also offering a monthly subscription model based on the maximum size of model being developed. This allows users to iterate their models more often without incurring any additional cost – and is the approach that most of BigML’s users are taking
BigML focuses on creating decision trees, a proven approach that is also effective at reducing the number of variables/features in a dataset to those that are predictive (allowing very wide datasets to be fed in and then letting the algorithm figure out which ones are potentially predictive).
You can get more information on BigML here and they are one of the vendors in our Decision Management Systems Platform Technologies Report.