EigenDog was founded in 2011 and launched their service for scalable predictive modeling in December 2011. Their objective was to provide scalability in machine learning. As we are all aware these days, more data tends to result in better predictive analytic models while better models can result in better business results. EigenDog’s perspective is that established approaches were developed for single-processor, desktop machines and that scaling up these approaches is a challenge. EigenDog aims to provide very scalable algorithms to those without either the engineering teams or equipment necessary to scale out these algorithms.
Behind the scenes, EigenDog runs on Amazon EC2, where it can spin up the number of compute resources needed to handle large inputs. They provide a cloud-based, parallelized algorithm for binary classification, an iterative, numerical algorithm where the data must be traversed multiple times.
The process is simple:
- The user creates a training set – a plain text file with a well defined format call ARFF (bit more complex than CSV but more robust). The file contains a column corresponding to the target the user is trying to predict. The user then compresses the file before uploading it to Amazon S3 (cloud storage).
- They point EigenDog at this file, set some parameters, and go.
- The results file represents a mathematical function of a probability calculation.
- Payment is in proportion to the work effort involved.
The core algorithm is based on Stochastic Gradient Boosting of decision trees. A simple interface let’s you specify the algorithm’s various parameters, including the learning rate (a way to balance model quality and cost), maximum tree depth (it’s a decision tree algorithm so this determines its behavior in part) and termination criteria (when to stop iterating – either using an automated rightsizing criterion or specific accuracy, false positive rates etc). You can also specify a cost cap, beyond which the algorithm will be aborted.
The web interface also allows users to manage multiple jobs and see the history of work done. Models can be re-trained from this interface (with different parameters) at any time. For each model, summary information is provided, including the confusion matrix (showing how many positive records were predicted to be positive, negative-negative, positive-negative and negative-positive), AUC (area under curve), accuracy, precision and recall.
The result file is well documented, and consists of their own JSON-formatted text describing the model. To score the model, you can use a simple web interface or a web API (passing in a record programmatically). For users that want to embed scoring into their systems (for high-throughput and low-latency), the model can be downloaded and evaluated by EigenDog’s open source libraries (Java and R are now available). The web API can also be used for to train models, fetch job history, get billing details, etc.
Like most predictive analytics in the cloud vendors, uploading the data to build the model can be a challenge, but many of their customers already have data elsewhere in the cloud mitigating this. They also plan to support building a model from multiple S3 files which would allow training data to be uploaded simultaneously from multiple points reducing the upload bottleneck. They are working on regression and multi-class classification algorithms too.
Despite being a new company, they have some real customers who are building models and downloading them for use with their open source libraries.
Eigendog is one of the vendors listed in our Decision Management Systems Platform Technologies report.
Comments on this entry are closed.
Dear James Taylor,
thanks for your interesting article about EigenDog. I found the article interesting and directly visited the EigenDog web page. But then I started to wonder…
Moving data for data mining into the cloud requires trust. So who are the people behind EigenDog?
Their web page neither has an imprint nor does it list a company address or who the team behind the company is.
Would you seriously consider uploading customer data or any other critical data to such a company?
Would you consider allowing them to access your data on an Amazon EC2 server once you uploaded the data there?
Whoever answers these two questions with “yes”, makes me wonder how serious they take data protection and risks associated with uploading data to strangers or granting strangers access to your data…
Please do not get me wrong. I am a big fan of cloud computing and software-as-a-service (SaaS). However, trust is essential for SaaS in a cloud and a company that does not disclose its legal information, company address, or reponsible management team does a lousy job in building the required trust. In many countries it is actually illegal to provide company web sites without such legal information.
Best regards,
Peter Meyeri
Peter
As you say trust is essential in cloud AND analytics. I will let the folks at EigenDog answer the specifics in their case but you ask the right questions.
Peter
Mika Illouz here, founder at EigenDog.
First, thanks for your thoughtful comments. Trust is an essential ingredient in any successful business relationship, and I agree with you and JT that the SAAS environment is no exception.
To earn the trust of its customers, EigenDog has taken several measures to assure the security of data, and the confidentiality of customer interactions with EigenDog. For starters, our web site and API use 256-bit SSL. Secondly, we delegate data storage functions to Amazon S3, a mature service with comprehensive access-control options. Third, as EigenDog is completely automated service, EigenDog employees do not manually transfer, copy, or otherwise handle users’ data. Finally, the training set format we use (ARFF) readily enables our customers to anonymize their data.
Our customers receive timely support through support@eigendog.com. If you have additional questions, feel free to contact us through this email, or at our Silicon Valley office (+1 650.50E.IGEN). EigenDog is just now emerging publicly, and we can do a better job of providing this contact information in an “about us” section of our website.
best,
Mika Illouz
mika@eigendog.com