I got an update from Oracle on Oracle Data Mining (ODM) recently. ODM is an in-database data mining and predictive analytics engine that allows you to build and use advanced predictive analytic models on data that can be accessed through your Oracle data infrastructure. I blogged about ODM extensively last year in this First Look – Oracle Data Mining and since then they have released ODM 11.2.
The fundamental architecture has not changed, of course. ODM remains a “database-out” solution surfaced through SQL and PL-SQL APIs and executing in the database. It has the 12 algorithms and 50+ statistical functions I discussed before and model building and scoring are both done in-database. Oracle Text functions are integrated to allow text mining algorithms to take advantage of them. Additionally, because ODM mines star schema data it can handle an unlimited number of input attributes, transactional data and unstructured data such as CLOBs, tables or views.
This release takes the preview GUI I discussed last time and officially releases it. This new GUI is an extension to SQL Developer 3.0 (which is available for free and downloaded by millions of SQL/database people). The “Classic” interface (wizard-based access to the APIs) is still available but the new interface is much more in line with the state of the art as far as analytic tools go.
The GUI allows you to manage database Connections within which there are Projects that in turn contain Workflows. These look very much like those in SAS Enterprise Miner or IBM SPSS Modeler with a variety of action nodes that can be linked into a data mining flow. Workflows have immediate access to the database with tables, joins, views as well as to remote data being managed through the Oracle Database. Other nodes include those to create views, explore data, merge and clean data, transform and aggregate data, create models etc. A filter node allows you to take out noisy data elements, attributes that aren’t important, sample data and more. All the statistics needed by the analysis and reporting nodes are run inside the database too and all the steps in the workflow produce SQL breadcrumbs.
Nodes like Classification run multiple models at the same time and allow defaults to be overridden to play with different modeling approaches. There are also some power user features to allow tweaking of parameters. The user can select one or more models to push on through the workflow. Once a model is ready it can be applied to a database table or embedded in the database as a live column. With Exadata models are pushed to the storage level for execution for maximum performance.
The extension includes help for data miners to use the tool plus training “Cue Cards” and sample data as well as an embedded “infomercial” to introduce data mining concepts to DBAs and others who might come across it while using SQL Developer. There are also an increasing number of training classes available online.
One of the most interesting facts about ODM remains its use in other Oracle solutions. As noted previously it is being embedded in various applications – from Fusion Human Capital Management to Oracle Business Intelligence Enterprise Edition (OBIEE). There is also some ongoing work with Oracle Real Time Decisions (RTD) – see my review of the most recent version of RTD. This plus sharing the BI sales team is increasing take up and broadening the reach of ODM both within Oracle and its customer base.
In-database analytics is a hot topic these days and the ability of ODM to build and execute models completely in-database makes it a good candidate for anyone interested in using predictive analytics to make the most of the operational data they have.
Don’t forget the Decision Management Technology Map