≡ Menu

BigML Update 2015

Share

I last got an update from BigML back in 2013 when they updated us on their predictive analytics/machine learning platform. Since then they have been adding new algorithms, especially in unsupervised machine learning techniques, and generally updating the interface and workflow. With about 15,000 registered users Big ML is moving into enterprise customers more and more with both private cloud and on-premise options. BigML is also seeing adoption from developers who leverage the platform to build predictive apps and services (both internal/enterprise and commercial).

Specific updates:

  • Text analysis has been added, allowing the tokenization of text words for use in the algorithms as well as support for date and time variables
  • Integration with Dropbox (as well as Azure) for data loading
  • Cluster analysis and anomaly detection algorithms have been added to the platform
  • As before, once loaded data is profiled with an in-situ histogram and some basic statistical analysis. Text analysis histograms are reflected as tag clouds.
  • Users can add fields to their dataset with various data transformations like discretizing by percentile, normalizing data, replacing missing data, running JSON scripts etc. All without going back to the original data source.
  • Algorithms like the decision tree have been extended with more configuration and thresholds as well as more advanced sampling approaches. Random forest, ensembles with various weighting and threshold approaches, cluster analysis and anomaly detection.
  • Performance has improved too and the results are still visualized using intuitive interactive visualization tools, both the original decision tree visualizations and new ones for clusters and anomalies.
  • When using a decision tree, any segment of the tree can be downloaded as a dataset or exported as a rule. Clusters can be similarly downloaded.
  • A new model summary report shows which fields have the most impact on the model. The same feature is also available for ensembles.
  • Models can be downloaded via various language exports like node.js, python, etc. – as well as into Excel and Tableau.
  • Integration with Tableau – enabling users use BigML predictive models to score data within Tableau
  • The interface generated to allow scoring of data against a model is now interactive thanks to new client-side prediction libraries which show the change in prediction as data is changed.
  • BigML has automated an 80/20 split for the purpose of evaluating a predictive model or ensemble. These can be evaluated as before using hold out datasets etc. A confusion matrix has been added to show false positives/negatives as well as various ways to evaluate performance by outcome, prediction etc. Evaluations can be compared.
  • All of this, and indeed everything in the product is available in the API so it can be scripted and embedded in systems.

BigML is one of the vendors in our Decision Management Systems Platform Technologies Report and you can get more information and access to the product at http://www.bigml.com

Share

Comments on this entry are closed.