I met the folks from Yottamine at Predictive Analytics World and got a chance to get a demo and an update recently. Yottamine is focused on helping companies build predictive models and see three main challenges for building good predictive models:
- An ever increasing amount of data makes building models harder and requires more storage and compute power to handle
- Advanced algorithms require excessive computing power but using less computationally intensive modeling techniques can result in less optimal models. This means that modeling success depends not only about mathematics but also on using IT infrastructure effectively to build the models
- Finally it has traditionally been hard to build models without in-depth knowledge of the field in which the model will be used
To address these issues, Yottamine has built a Predictive Platform that uses cloud-based compute power to create a highly scalable, affordable platform that creates models quickly. It also avoids needing a great deal of problem domain know-how thanks to the use of machine learning techniques like Support Vector Machines.
The software itself is web-based, allowing data files to be loaded up into one of many folders. The dataset is a standard flat analytic dataset that can contain numbers (treated as continuous unless the user specifies them as discrete), strings (treated as discrete), dates etc. Clients can upload CSV files or ARFF files (ARFF is the WEKA format and this format allows more detailed definitions of the variables – identifying numbers to be treated as discrete variables for instance).
Once loaded the client can specify the target column – what it is they want to predict – and whether it is a regression or classification problem. The software can separate data into training and testing data and allows the customer to specify different testing approaches including uploading a separate test data set.
Yottamine’s software then creates a suitable EC2 cluster depending on the amount of data uploaded for building the predictive model. The algorithms Yottamine have implemented are highly parallelized, allowing them to take advantage of the largest clusters available. They claim increased accuracy for their modeling approach over standard ones as well as improved performance thanks to their ability to scale out to large clusters. The modeling techniques supported include Linear- Polynomial-and Gaussian-Support Vector Machines as well as Local Model (based on a K Nearest Neighbor/SVM combination). Local Model can achieve equal or better accuracy than already highly accurate SVMs model.
All the parameters for a model job can be saved and kept for use later allowing a client to build a library of model creation jobs that can be run against updated data in the future. Clients can also set up a single batch job for building multiple models of different types from different data sets automatically.
Once complete the results of each job are displayed. This can include graphics such as an error rate heat map that allows a modeler to focus their efforts on particular combinations of parameters, tuning the model to get the lowest error rate. The best model for each run is also kept and is available for download, currently for use in a piece of standalone software called Yottamine Predictor. This allows records to be scored using the model. This proprietary format is being extended with a PMML download format becoming available soon. This, of course, would allow the model to be loaded into a Business Rules Management System or other PMML consumer. An API that allows clients to control all this from an application is also under development.
For now Yottamine Predictive Platform is going to be attractive to modelers who know how to create the right analytic data set to feed into a model and who want to use scalable cloud resources to build their final model. The platform claims to be the first on-demand model-building platform that charges clients based on their hourly usage instead of paying a fixed subscription fee per month. Though based on Amazon’s EC2 cloud computing, clients does not need an EC2 account to use the service. A free trial of the platform for 15 days is available by applying here.
Comments on this entry are closed.
Good to see that support for the PMML data mining standard is in the works!