I got a chance to chat with Data Applied, a new start up that, like Clario (reviewed previously), offers real data mining in the cloud. This product is aimed at business users and is a very data-centric, web-based application for visualization and data mining/deep analysis. They are targeting folks with hundreds of thousands of records rather than the very large millions of records that some have. Immediate interest is coming from some partners and from BRIC countries that are cost sensitive.
Users begin by uploading data from Excel or CSV (with Google docs, Microsoft Dynamic CRM and Salesforce.com in the future). Each dataset is a flat file and each is shown as a tab in the user interface. Throughout the interface, and in all the various analysis tools, you can export data to Excel and search it. Search is used to allow you to filter the results down to those that match the search – typing in a simple string results in a filter. The application supports zooming at any point and to any level, though this just shows the same information at a larger scale. All reports and graphics can also be commented (multiple users can manage a shared set of comments, a nice feature), exported as graphics (PDFs to come) and to the gallery of static images for allowing non-interactive access by other users. For long running tasks, emails can be sent when the analysis is complete. Drilling into the data behind a graph is supported too. Today the product has 8 main tools – Super Pivots, Forecasting, Correlation, Association rules, Clusters, Decision Trees and Similarity Maps.
- Super Pivots offer cross tabulation and reports with graphs for various categories for instance. Reports include lines, pies, tree maps, heat maps etc. Designing these reports is drag and drop with grouping, graph type, binning etc. Binning can be done automatically (finding equal groups) or manually and applied to the graphs easily. This is a nice feature you don’t see often enough.
- Forecasting supports smoothing, Monte Carlo simulation and uses error estimation to see how stable a model is. This allows it to forecast and display, for instance, the 95% confidence range. The tool can find regions of a time series that look similar and find minimum and maximum in a region etc.
- Correlations allow you to see a grid or a network of nodes with the links showing the associations and their strengths.
- Association rule mining can be done automatically also. The search function allows this to be focused in different areas, as before. The association rules show their strength relative to the population as a whole or the set being filtered by a search. You can export the description of the rules and the members impacted. Unlike many associate rule mining interfaces it has a nice visual interface and a rich representation of the results.
- Clusters similarly. Automatic cluster detection is supported or you can specify the number of clusters. Each cluster can be analyzed and the various attributes graphed visually. Similarity between clusters is shown similarly to the correlation. You can center the diagram on a particularly cluster, see the clusters as a grid etc. They are not yet supporting predicting which cluster a new record will go into – you can’t generate a set of rules that define a cluster for instance. Clusters can be driven by selected fields. Auto detection of outliers and anomalies as well as noise detection as well as sampling are supported.
- Decision trees are very interactive with lots of data in the nodes and a nice navigable overview of the tree. A single variable must be selected as the target. Maximum depth, split, density etc can all be specified as well as the fields you want to consider. Tree information can be exported as text.
- Similarity Maps project each record onto a 2D map. Each cell is a group of records who are REALLY similar and similar record groups are close together.
The product has a secure API that allows you to push the data up and then download the analysis, making it easy to integrate with third party applications etc.
The product is visually very appealing and looks very easy to use – delivering data mining results without a need for a lot of data mining know-how. A very nice option for those who want to do more than report on their data but don’t have the time, budget or skill set to use an advanced analytic workbench. Here is a link to their free online version: http://www.data-applied.com/Web/TryNow/Overview.aspx