Charlie Berger of Oracle presented on Powering Next-Generation Predictive Applications with Oracle Data Mining (ODM). Charlie joined Oracle from Thinking Machines about a decade ago and have been putting machine learning algorithms into the Oracle kernel. Data Mining, in database or otherwise, sifts through data to find hidden patterns, discover new insights and make predictions. For instance:
- Predict customer behavior (classification)
- Estimate a value (regression)
- Segment a population (clustering)
- Identify most important factors (attribute importance)
- Find profiles of targeted people (decision trees)
- …
He recommended the classic book Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management and then worked through an example of how ODM works on data in the database. The database contains records or cases some of which have a target value. The ODM algorithm generates target values for other records along with a confidence. Doing this in the database improves performance and scalability and makes it easier to integrate this with other queries. I like this as it allows rules to use these results too when they are used to automate decisions.
ODM offers a dozen or so algorithms in 11g and has a new UI – Data Miner. This UI can also generate the SQL you need to use the models. The use of models is just like any other piece of SQL once they are built. This also allows the algorithms to be embedded in any kind of UI e.g. through Oracle BI once hooked up by a designer. This allows the analytics to be delivered in reports and dashboards built using Oracle BI EE. ODM helps eliminate data movement and collapse latency (see this post of mine about Richard Hackathorn’s latency model). The database also supports a bunch of statistics functions like ranking, lag/lead, reporting and statistical aggregates, descriptive statistics like min/max, mode/median etc. Having all this in the database allows things like adaptive control (A/B testing) to be analyzed in the database.
Data Miner demo: Select raw data in database and Data Miner displays a sample set of data. Various tools for looking at the data, drilling into it are provided. This allows you to understand the data. You can also ask the tool to simply identify what predicts that people, say, buy an insurance product. It might use one of several algorithms to find the important attributes. This does not do any causality analysis nor does it link attributes but it helps generate insight automatically. If you want to do something more sophisticated then can select specific algorithms to generate a model. Have control over the algorithm, though it has good defaults. The results get presented back as a model and all the usual data mining tools for analysis/accuracy/predictive power etc are provided. The model is created and stored and can be applied to new records, creating a result and a confidence. Reporting, dashboards or applications can all use the same predictive models. Very nice.
ODM may not replace a high-end analytic tool like SAS, at least not completely but it makes the database an analytical database.