I spoke to Zementis back in June of 2011 and got an update on their Universal PMML Plug-in among other things. Since then they report growing client interest with a particular focus on real-time decision-making using real-time scoring in fraud detection for instance. They have also been updating their products. ADAPA, their analytic decision deployment infrastructure, has been expanded and they have just released ADAPA 3.5. This includes support for model ensembles, segmentation, chaining and compositions – multiple models used together as a set to improve predictive power. For instance a set of regression models, one for each node in a decision tree combined with the decision tree itself allow a customer to be put into a useful segment and then scored in a way that is highly predictive for that particular segment. Support for this in ADAPA includes weighted and balanced ensemble models.
Interestingly, support for ensemble models which had previously been added to PMML 4.0 (the Predictive Model Markup Language that Zementis and others use to move predictive analytic models from modeling environment to deployment) is now being extended in PMML 4.1. PMML 4.1 is described here and also adds support for some new model types including score cards and reason codes while further improving pre- and post-processing support.
Zementis’ focus on PMML means they have a vendor-neutral approach to modeling –any model from any vendor can be deployed. Today they offer four deployment options – deployment to ADAPA for real-time decision making as a cloud, embedded or server deployment plus their Universal PMML Plug-in for in-database deployment. This last allows PMML models to be pushed into the database as a function and is supported by EMC Greenplum as well as Sybase IQ as partners. In addition a partnership with Datameer allows predictive analytic models to be deployed to Hadoop environments. Zementis hopes and expects to add more deployment options using this plug-in.
Zementis is at the forefront of the one of the main debates about PMML. A typical predictive analytic model involves a large amount of pre-processing – data is transformed from the way it is stored into more analytically meaningful attributes that are then used in the model. The challenge this creates is that moving the model definition is only half the battle as the model will need attributes that may not immediately be available in the production environment and need to be derived on the fly. PMML 4.0 added broader support for these transformations and more and more modeling tools generate this when they output PMML. Some already provide extensive support for PMML pre-processing, e.g., KNIME and IBM/SPSS. Even if this is included in the model there is also a chance that the way these attributes should be calculated in production systems may not match how the modeler thinks about them – the IT department may have different ideas about how to derive them more efficiently. Zementis tries to accommodate both options with support for the PMML standard definitions as well as providing a tool to let you add preprocessing definitions to PMML that otherwise does not contain them.
I am a big fan of PMML and I see increasing interest in it. The increasingly rapid move to real-time scoring is driving a need to deploy models into production (rather than simply applying them in batch to existing data) and the need to model once and yet deploy in heterogeneous environments repays an investment in a common deployment language. The folks at Zementis would add that analytic modeling resources are scarce and PMML allows you to be flexible about which tool people use to build models – as long as it generates PMML the result can be deployed.
Zementis will be one of the vendors listed in the forthcoming report on Decision Management Systems platform technologies.