While there are many machine learning packages out there, BigML is trying to make them usable by people with just a little experience – to improve the learning curve of machine learning techniques. At the same time they also address the scale challenge as Gigabytes of data override the traditional algorithms. Their product is therefore cloud-based and highly automated and supports a four stage process:
- Data sources
Any data can be turned into a data source. Flat files can be dragged into the environment and other data can be loaded up using a robust API with bindings to many different languages. They offer an ability to bind to a URL and to Excel/databases as well as R and Ruby. With multiple ways to move data in to the cloud to be coming soon they allow a more incremental upload of data from multiple locations, helping address the issue of bandwidth when large amounts of data must be moved into the cloud from a single location.
Once data sources are defined, a dataset can be created from a data source in a one click process. Today a dataset comes from a single data source but they are working on supporting data from several different sources in a single dataset. These datasets drive modeling and the automated process of converting data sources handles categorical fields, continuous variable distributions, automatic detection of categorical fields and so on. A dataset viewer allows the data to be explored online.
The user selects the variable they want to predict and a one click process derives a model. Decision Trees are the only technique supported at launch with logistic regression, time series and naïve Bayes coming as well as some random tree techniques. These are home grown algorithms being developed specifically for the engine, although based on the industry standard approaches defined in the literature. The decision tree model can be viewed interactively and explored on the website including looking at the path to specific nodes or outcome for instance.
Once you have a model you can pass a set of data in (using a form for testing or using the API) and get a prediction – a scoring API. You can also execute the model on a dataset for batch scoring. Users can also download a JSON document that describes the model and this document could, in theory, be used in a local implementation. They have what they describe as “a PMML compatible approach” and have given some thought to implementing a convertor.
BigML is invite-only at this point but have about 80 users so far. They are targeting developers who want to integrate analytics into their web-based application as well as those interested in using analytics but who lack a technical background in machine learning. Pricing is based on credits that get consumed by data uploaded, model create computation and predictions.
BigML is one of the vendors listed in our Decision Management Systems Platform Technologies report.