I recently got a chance to catch up with the IBM SPSS team for an update. Analytics, in IBM’s view and mine, are increasingly necessary as digitization increases the scale of business data and digital disruptors increase the difficulty of making good decisions. For those being disrupted analytics offers a powerful way to fight back. Those CEOs that are outperforming in this difficult environment are focusing increasingly on predictive analytics (not just analytics) and streaming/operationalized solutions not just visualization. In this environment IBM wants to offer a comprehensive platform for analytics with data connectors for all kinds of data, data preparation, analytics at scale, and insight to action with deployment. The full suite includes
- IBM SPSS Predictive Analytics (Last review here)
- Statistics
- Modeler
- Analytic Server
- IBM Prescriptive Analytics
- CPLEX Optimization Studio
- Decision Optimization Center
- Decision Optimization on Cloud (Reviewed here)
Plus there’s pre-configured and configurable content on Customer Analytics, Operational Analytics and Thread/Fraud Analytics. All of these – SPSS Modeler, the decision management capabilities and the optimization engine – are part of the IBM SPSS Modeler Gold.
The Predictive Analytics stack is focused on creating value faster by offering a mix of long-standing and new capabilities:
- Simplified, scalable, code-free deployment
- Advanced Model Management including Champion/Challenger
- In-database and In-Hadoop modeling
- Batch/Real Time/Streaming deployment
- Analytic Decision Management deployment
One of the key areas of focus is scaling these capabilities on big data systems because customers overwhelmingly intend to deploy to Hadoop, Spark, cloud and streaming environments. Customers really want to move to this environment and this has to be reflected in the way the products work. IBM SPSS has two approaches for this scale:
- Parallelism with support for Hadoop, Spark and streaming
- In-database across a wide range of database technologies
Spark is clearly a critical element with IBM making a large commitment to Spark. IBM SPSS allows users to deploy models on Spark clusters for instance. Recently IBM has made more of the algorithms in SPSS massively parallel so that they scale up to support Big Data volumes without the need for Analytic Server. New algorithms have been added in the area of geospatial analytics.
IBM is also focused on involving developers, data scientists and business analysts in the predictive analytic process. This means allowing the Watson Analytics smart data discovery environment to collaborate with those using more advanced predictive analytics tools like SPSS Modeler. Some of the same underlying technology is used in Watson Analytics albeit with a different UI but the intent is to allow users of Watson Analytics to access models developed using the more robust workflow management in SPSS Modeler.
Open Source is, of course, a big deal in analytics, so SPSS has been supporting R, Python and Spark. These can be scripted directly but data scientists can also encapsulate this code behind a simple UI and made available as a node in SPSS Modeler. An increasing array of these extensions are available in the IBM SPSS Predictive Analytics Gallery. Several of these also use the Watson APIs. Various Python and other extensions can also be loaded into the Modeler environment to make it easier to use a wider range of open source algorithms and scripting approaches in the workflows being managed in SPSS Modeler.
From a deployment perspective, Predictive Analytics on Bluemix allows models to be easily deployed to and then used in the cloud. The developer just needs to have access to the model project and they can create a scoring service in the cloud.
IBM has also recently launched the Data Science Experience leveraging RStudio, Notebooks and more. This is focused on a community environment for programmers and “hackers” and is web based, very focused on downloading examples, tutorials, community etc. All of the open source tooling can be integrated into the notebook metaphor and Apps can be created using Shiny. This is primarily focused on exploratory data science and deployment today means taking the scripts and loading them up into SPSS – the different environments have a shared understanding of open source scripting languages. IBM sees this a complementary to SPSS Modeler and sees more integration and overlap in the future.
You can get details on recent adds to SPSS Modeler here.