I got an update from TIBCO Spotfire recently, with my usual focus on operational decisions. Spotfire aim at what they call the analysis gap. Their customers typically have applications that are tailored and customized but too expensive to change, reporting that is widely distributed but inefficient (prompting users to use additional tools when needs change) and data mining or predictive analytic tools that are powerful but complex. Spotfire aims to fill this gap and increasingly to provide predictive analytics while doing so.
S+, based on the same underlying language as R, was acquired some time back. S+ has many users and continues to be developed by TIBCO. Spotfire has been embedding S+ predictive analytics into the visual applications that are their bread and butter functionality for some time. They have now made it possible to package any statistical method or script so that it can be easily included in analysis by someone working with the visualizations. Meanwhile R has exploded in popularity over the last few years with lots of energy, new algorithms etc.
To address the growing demand for predictive analytics and the increased interest in R, Tibco has developed Spotfire Statistics Services to integrate R or S+ analytics interchangeably into Spotfire. This is designed to let Tibco continue to develop S+ while embracing R and make it easier to bring analytics into the overall Tibco product set. By using an SOA approach, a single statistical model can be developed and then used in visual applications and/or called programmatically– “Write once call anywhere”. This would allow, for instance, a model which predicts the fraud risk of a claim to be directly used in the real-time processing of new claims and as part of a visualization for the portfolio as a whole.
An analyst can develop a modeling workflow in S+ or R and then register this with Spotfire Statistics Services so that the server knows what data it requires etc. This service is then available in Spotfire for inclusion without coding – the product maps the data being manipulated already in a report or visualization to the required inputs and then can include the defined outputs as data in the report or visualization. So simulations, segmentation or other predictive models can act on a dataset that is being manipulated in Spotfire in the background, without a requirement from any special user knowledge of statistics.
Here’s an example scenario.
- Take a set of customers some of whom have a particular product and some not.
- A predictive model might be built to predict the likelihood that a customer has that product (this will be useful with new customers to target cross-selling that product).
- Spotfire visualization tools let the modeler investigate the various attributes available and provides tools for identifying the most significant attributes etc.
- The user can select their target variable and the attributes available and use standard R or S+ algorithms (they showed me a decision tree) to build a model.
- The results can be analyzed and visualized in the tool – seeing how the population varies between the various nodes of the tree for instance. From this it is clear how the various groups identified by the decision tree break down for product ownership. A nice use of visualization as part of modeling.
- TIBCO Business Events can have rule functions which import and use rules generated from Spotfire – turning the decision tree into a ruleset that can be loaded directly into TIBCO Business Events. The user can then edit these rules and deploy them. These rules can then be executed on TIBCO Business Events using the bus, monitoring the event stream etc. A nice use of using business rules as a deployment platform for analytics.
This is one scenario but there are others. While there is not yet a globally shared set of metadata between the products, Business Events allows objects in memory to be queried by Spotfire – allowing access to the cached data from the modeling environment – so models can be built directly against this data, eliminating the need for data mapping. PMML export is also supported from the S+ tools with some limited import possible at the Business Events end. For models not reducible to rules, the Statistics Services engine can make calculations available to Business Events, for example providing a web service to provide a risk score or return the result of a neural net.
Spotfire of course also allows comparison of alternative models, scenario testing etc in the visualization tool. Business Events is good at monitoring real-time data so this can be used not only to execute rules, but also to track model performance or unexpected data changes which violate model assumptions – it can even invoke Spotfire and display the monitored data at issue and add contextual data from other sources for root cause analysis or re-building the model.
Hopefully I will get an update from the Business Events team before too long to complete what is a very interesting picture.