clario Analytics was founded back in 2002 largely by folks from Fingerhut. The team had been working on mailstream optimization – how to manage catalogs. The best customers of a catalog marketer can get literally 100 catalogs per year and this is not good. Initially a consulting company they raised money in 2006 and launched clario Stream, a product for developing customized/personalized outbound contact streams – contact optimization or the evolution of mailstream optimization. Now they have launched their generic data mining and analytics platform, clario.
clario was originally designed as an interface for internal analysts to work on the models in clario Stream. Like most analysts the clario Analytics analysts spent 60-70% of their time on data integration / manipulation / quality issues so this was an initial focus. With these data handling capabilities and some statistical / modeling nodes added it became clear it was usable as a standalone tool leading to the launch this year.
clario is a web-based, cloud-backed data mining and analytics tool. It is a general purpose tool with a node-driven workbench (allowing you to assemble the model development process) that is intended for analysts to build models. The user of clario works through the web browser (the front-end is in Adobe Flex) and connects to an application server that controls access/state. Secure FTP is used to put data in the cloud and analysts can then develop workflows to process the data. All the heavy duty work is done on the amazon.com compute cloud using standard capabilities, though clario has developed the dynamic scaling needed to manage multiple execution engines in support of data mining tasks.
The cloud based approach has a couple of advantages. First it allows for better collaboration –letting outsourced analysts and worldwide clients share an analytic workspace. Second it takes advantage of the cloud to allow analyst teams to have a lot of computing power when they are building a model (typically a very compute-intensive exercise) without paying for it while they are thinking about the model.
clario processes flat files and nodes include a variety of data manipulation nodes (sort, transform, join, filter, score, rank, append, aggregate), statistics (univariate, sample, bivariate, logistic, linear, factor) and best practices (kind of macro nodes for things like detecting outliers, handling missing values, eliminating unhelpful data elements etc). Once the model is done clario either outputs CSV files (scored data) or displays visual results. Modelers can export to pseudo code or to Excel. Right now they haven’t decided on a model output format – personally I hope they will support PMML soon.
Pricing is cool – $300/month to get 100GB of data transfer and 50GB of secure storage with 720 hours of processing. Typically users run 4-6 workflows simultaneously during the workday and still stay below the total. They also have a 30-day free trial where you can “kick the tires”.
The product is still fairly new and their immediate plans are to add decision tree (CART, CHAID and MARS) and clustering (EM and k-means) nodes over the next 6-9 months. Usability improvements around the user interface to make it easier to edit existing workflows are planned as are scheduling and workflow templates as well as some new collaboration tools. They are also taking advantage of being cloud-based to add some web services nodes too like the USPS address cleaning service.
The cloud seems like a perfect fit for data mining, with its ability to deliver lots of compute power when you need it without requiring you to own those computers. I think data mining in the cloud is going to be a hot space…