It has been a while since I was updated on IBM SPSS Modeler and I got an update from IBM recently. IBM SPSS Modeler is, of course, IBM’s primary data mining and predictive analytics workbench. It uses a standard workflow metaphor, letting you string together nodes that process data, run algorithms, score data, etc. Both structured and unstructured data are supported, analytic tasks can be pushed back into your database or data warehouse and IBM SPSS Modeler produces the models that are consumed in IBM Analytical Decision Management. IBM SPSS Modeler comes in Professional and Premium editions, with the Premium version supporting unstructured data with text analytics, as well as entity analytics and social network analysis.
The most recent release is IBM SPSS Modeler 15. This added improvements in four areas:
- Entity Analytics
This functionality identifies whether two entities are really the same or not. Organizations often struggle with multiple entries that should be linked, but are not – multiple CRM records, for instance. In some scenarios, like fraud, organizations often have to find links that someone is deliberately obscuring. Entity Analytics uses “context accumulation” – the consideration of the things around something – to make it more actionable. In IBM SPSS Modeler, the Entity Analytics engine maps various data sources into the repository (to identify that fields are meant to contain similar data, like a phone number or middle name) and then matches entities in the different data sources to create a resolved or composite entity. Nodes in the workflow can process new cases and update the repository or take all of the resolved entities and process them through additional nodes. Users can control how aggressive the mapping is (how much of a match it needs) and will continually reconsider as new data arrives.
- Big Data Analytics
IBM SPSS Modeler is typically deployed as a client-server architecture and supports in-database mining using SQL pushback. The latest release added support for new databases such as SAP HANA and EMC Greenplum and allows users to leverage database UDFs in a model stream. Additional support for in-database algorithms from IBM Netezza was added in version 15, extending existing support (IBM InfoSphere Warehouse, Oracle Data Miner and Microsoft SQL Server algorithms are also supported). Release 15 also added more support for in-database scoring through generated UDFs using Scoring Adapters (previously only certain models could be pushed back for in-database scoring) with support for Teradata, IBM Netezza and DB2 for z/OS. Predictive Techniques and Visualizations
Social network analysis was added to identify groups and the leaders of those groups (group analysis) from connection data, such as call detail records. It can also use existing churn information to see who the churner might influence to also leave (diffusion analysis). Generalized Linear Mixed Models, previously supported in IBM SPSS Statistics, were added, as were various mapping visualizations (coordinates, regions, minicharts on maps).
- Various Productivity Enhancements
Various usability improvements, improved functionality around stream parameters and data import among others, plus improved integration with IBM SPSS Statistics and IBM Cognos Business Intelligence.
IBM SPSS Modeler is one of the products in our Decision Management Systems Platform Technologies report and you can get more information on IBM SPSS Modeler here.