I got an update from in2clouds recently. Since I last wrote about them (see this First Look on in2clouds) they have made 3 key updates – they have added support for ensemble models, moved to allow private/hybrid cloud deployment and completed their service definition API.
Ensemble models first. While using an ensemble model does not always help, it often increases accuracy significantly. As with their standard model development/deployment approach, in2clouds begins with multiple data sources defined by the user and handles the extract, transform, load and sample process within the platform. Various model types can then be built against the data, each having its own training data and technique-specific transformation. Performance of these models is measured individually and any or all of them can also be deployed in a champion/challenger environment to compare the effectiveness of one model with another in a live production setting.
With the new ensemble feature, several models can be combined. This combination can use voting, an average of all the models or it can use a custom combination defined by the client. In2clouds have found some clients getting a 15% accuracy improvement from combining 2 or more models in an ensemble. Champion/challenger can be used to determine the models or model versions that go into the ensemble as well as to ensure that the ensemble is better than any of the component models.
As always it is important to remember that the critical measure is the impact on the business not the accuracy of the model. This means that sometimes even a model that appears more accurate must be run in production a while to see how the business responds before it can be adopted. Because in2clouds are using a cloud-based platform they find they that their clients use more model runs and more challengers at the same time – available compute power is less of an issue.
The second new feature is support for different cloud types. In2clouds are increasingly seeing a mix of private and public clouds. So for instance a client’s private cloud could request resources from the public cloud (code libraries for instance), the public cloud authenticates this and delivers the updates. The private cloud then executes transactions using private resources etc. To support private and hybrid cloud models in2clouds offers a charge per transaction pricing model in addition to their charge for data processed model. This is rolling out with pilot customers and will roll out fully at the end of the year.
Finally they have been extending and completing their service definition REST API. Data extraction and manipulation, model creation, scoring, results presentation and troubleshooting APIs are all now available and documented. This allows in2clouds to be embedded into any application. JSON is the only supported format for this APIs and the results can easily be viewed and understood in a regular web browser, helping with set up and debugging.
An in2clouds service is the basic interface and each service has a unique identifier. The URL http://cloud-analytics.appspot.com/service/UID gives immediate access to a basic API for the service. Links are provided to get definitions, stop a service, clean or do a backup or get the actions for a service. The definition URL allows access to more information about the service and includes the set of available actions. These actions include the various kinds of tasks supported by in2clouds such as data extraction, model creation, model update, scoring etc. Actions can use data from other actions to allow a client to specify a chain of API calls. So, for instance, actions might extract data from client systems, others might take this data and de-dupe and otherwise clean it and still others might take this data and use it to build a model. Each action can be accessed using a URL (REST/JSON) so a particular model can be passed data to get a score back for instance. Even the various graphs and UI widgets either from their pre-configured products or generically based on models (like a decision tree viewer for a decision tree model for instance) can be accessed and embedded in a UI using the REST/JSON APIs.
While regular clients could use this API to develop custom extensions, obviously a key buyer for this API is a SaaS vendor that needs advanced analytics so can improve their business using analytics. Given how analytically weak most SaaS vendors are, this seems like it should have a strong market as these vendors begin to understand the power of more advanced analytics.