EigenDog was founded in 2011 and launched their service for scalable predictive modeling in December 2011. Their objective was to provide scalability in machine learning. As we are all aware these days, more data tends to result in better predictive analytic models while better models can result in better business results. EigenDog’s perspective is that established approaches were developed for single-processor, desktop machines and that scaling up these approaches is a challenge. EigenDog aims to provide very scalable algorithms to those without either the engineering teams or equipment necessary to scale out these algorithms.
Behind the scenes, EigenDog runs on Amazon EC2, where it can spin up the number of compute resources needed to handle large inputs. They provide a cloud-based, parallelized algorithm for binary classification, an iterative, numerical algorithm where the data must be traversed multiple times.
The process is simple:
- The user creates a training set – a plain text file with a well defined format call ARFF (bit more complex than CSV but more robust). The file contains a column corresponding to the target the user is trying to predict. The user then compresses the file before uploading it to Amazon S3 (cloud storage).
- They point EigenDog at this file, set some parameters, and go.
- The results file represents a mathematical function of a probability calculation.
- Payment is in proportion to the work effort involved.
The core algorithm is based on Stochastic Gradient Boosting of decision trees. A simple interface let’s you specify the algorithm’s various parameters, including the learning rate (a way to balance model quality and cost), maximum tree depth (it’s a decision tree algorithm so this determines its behavior in part) and termination criteria (when to stop iterating – either using an automated rightsizing criterion or specific accuracy, false positive rates etc). You can also specify a cost cap, beyond which the algorithm will be aborted.
The web interface also allows users to manage multiple jobs and see the history of work done. Models can be re-trained from this interface (with different parameters) at any time. For each model, summary information is provided, including the confusion matrix (showing how many positive records were predicted to be positive, negative-negative, positive-negative and negative-positive), AUC (area under curve), accuracy, precision and recall.
The result file is well documented, and consists of their own JSON-formatted text describing the model. To score the model, you can use a simple web interface or a web API (passing in a record programmatically). For users that want to embed scoring into their systems (for high-throughput and low-latency), the model can be downloaded and evaluated by EigenDog’s open source libraries (Java and R are now available). The web API can also be used for to train models, fetch job history, get billing details, etc.
Like most predictive analytics in the cloud vendors, uploading the data to build the model can be a challenge, but many of their customers already have data elsewhere in the cloud mitigating this. They also plan to support building a model from multiple S3 files which would allow training data to be uploaded simultaneously from multiple points reducing the upload bottleneck. They are working on regression and multi-class classification algorithms too.
Despite being a new company, they have some real customers who are building models and downloading them for use with their open source libraries.
Eigendog is one of the vendors listed in our Decision Management Systems Platform Technologies report.