≡ Menu

In-database analytics and Decision Management

Share

One of the hot topics these days in analytics is “in-database” or “in-warehouse” analytics. I have blogged about multiple products in this space including the work SAS has done with Teradata, Aster, DB2 , Netezza  and Greenplum, the work Revolution Analytics has done with Netezza, Angoss and FuzzyLogix (described in this report). With all this interest it seemed to me that I should write a piece on the role of in-database analytics and Decision Management.

First, let’s differentiate between the two major cases of in-database analytics – in-database model creation and in-database model execution.

In-database Model Creation

Model creation first. In a standard analytic environment an analyst extracts an analytic dataset from the database or data warehouse – integrating multiple sources, flattening the structure and cleaning things up – before creating potentially large numbers of variables from the data. These variables often collapse behavior over time into a variable – taking payment records and creating a variable “Number of times payment was late in the last 12 months” for instance. This enhanced analytic dataset (original data plus calculated variables) is then run through multiple modeling algorithms and either the best algorithm is selected or an ensemble of the models is built.

In-database model creation involves running some part of this process on the database server – either the data integration and variable creation process alone or the whole process. Not only does this use the server processing capacity it also eliminates the need to extract and move the data. By accessing the data directly from the same server performance can be significantly improved, especially in the integration and variable creation pieces, less so in the modeling piece. Integration, cleansing and variable creation can also be easily shared across multiple models, reducing the management and calculation overhead for multiple models.

In-database Model Execution

The second element is one of model execution – taking the model produced and calculating the result of the model for a specific record or group of records – a customer, a claim or all accounts in the system for instance. Historically lots of model algorithms are re-coded by IT departments for performance or scalability (or simply to allow it to run on their hardware). Some models can be exported as code and then executed while others are exported as Predictive Model Markup Language (PMML) and loaded into a Business Rules Management System or BRMS. Only once this is done can the score, segment assignment, association rules or whatever actually be used in a report or a system.

In-database analytics model execution means pushing this execution onto the database server also. This  means being able to score the records in the database or warehouse by running them through the algorithm and storing the result. More interestingly it can also mean being able to treat the model as though it is an attribute and calculate it live when the attribute is requested. So, for instance, the segment to which a customer belongs can be treated by calling applications as though it is a stored a data element even though it is calculated in real-time from a model when it is needed. In this approach the data is not persisted and the in-database engine acts like a layer on top of the stored data, enhancing it with the model results.

Some in-database partnerships or products do both model creation and model execution, some only do the model creation piece and some only do the analytic data set generation part of model creation. In theory some could do the model execution piece only but I am not aware of any. With that primer on in-database analytics, how does Decision Management work with in-database analytics?

Decision Management and In-database Analytics

Decision Management ensures that predictive analytic insights are effectively applied to the decisions that impact the business. The use of in-database analytics reduces the time to build and deploy models, making it quicker and easier to enhance a business decision with predictive analytic models. The support for rapid model updates and the ability to reuse and share elements of the model creation process also mean that models are likely to be refreshed and updated more often, reducing the problem of stale models that are no longer very predictive.

When the predictive analytic model is applied to the decision-making process, especially when this is described and managed in a business rules management system, the operational value becomes very clear to the business. In addition the need to conduct experiments, to test and learn, can be easily handled with Decision Services (which can manage the changing models and logic for these experiments) with easy collaboration across IT, the business and the analytic group.

When to Use a BRMS for Model Execution

If you have in-database analytics it is worthwhile  to re-think the balance of pushing models into the database for execution and exporting them using PMML to a BRMS. The former will work well for something like a Regression Model or Neural Network and for models where it is “all or nothing” and the business users are comfortable with the idea of a model as a “black box”. When you need to interact with the model (to see which association rules to use for instance) or when the visibility of the model is critical to the business, a BRMS is proven to be a very effective way to  deploy the models. Exposing the models as readable, manageable business rules makes it easier to gain acceptance and to integrate the model results with the rest of the business rules involved in a business decision. Regardless, building the model in-database is still worthwhile.

 

Personally I see the increasing support for in-database analytics as reflective of the increased interest across industries in using predictive analytics to improve operational decision making.

Share

Comments on this entry are closed.

  • Herve Dhelin May 3, 2011, 9:52 am

    You are right James, but it’s an old Data Mining functionality that some vendors are only supporting now. SPSS was supporting that within Modeler for years (maybe 10), long time before I joined SPSS… 🙂 I ran seminar on this subejct with a database vendor before the IBM acquisition…
    Regards from Paris
    Hervé

  • Michael Zeller May 5, 2011, 9:02 am

    Hi James,

    Excellent article, thanks for pointing out the Predictive Model Markup Language (PMML) standard as an important element which facilitates interoperability. It allows portability of data mining models not only across vendors and tools, but also enables the same models to support both concepts that you highlight – in-database execution and deployment as part of a business rules management system.

    Having one common standard (PMML) across the organization to support production deployment and operational integration of predictive analytics reduces complexity and accelerates time to market for better decisions — and that is where the true value of predictive analytics will materialize.

    One example for an in-database, “execution-only” partnership is the Zementis Universal PMML Plug-in for EMC Greenplum. As the name indicates, it is based on the PMML standard. . Please see the Zementis EMC Whitepaper

    Regards,
    Mike

  • Bala Narasimhan February 22, 2012, 4:53 am

    Hi James,
    This is indeed one of the best descriptions of in-database analytics and its benefits. I’d like to point your readers to ParAccel’s capabilities for in-database analytics. We introduced our extensibility framework in version 3.0 that allows users to embed analytics inside the platform and invoke them via standard SQL. In addition, we also ship with a number of analytic modules out of the box.

    We are now further extending it to be able to bring data from multiple sources easily and efficiently. This will allow one to further accelerate the model creation phase.

    Here is a link to ParAccel’s capabilities in this area:
    http://www.paraccel.com/resources/Datasheets/ParAccel-Extensibility-Framework.pdf