Marty Gubar presented on Deliver Depper Insight by Combining Data Mining and OLAP. Marty presented on Oracle’s analytic spectrum and how OLAP and Data Mining fit and can be combined. OLAP and data mining are embedded in the Oracle database and share security, the partitioning, ETL etc. All can be accessed using PL/SQL so cubes (OLAP) can be joined with data mining and spatial etc. This supports what Marty calls the Analysis Continuum:
- How are my value customers
Easy to answer, simple query to select customers based on total purchases say.
- Describe a high value customer, which customers will respond to a promotion or what product should be offered next to a customer
Harder and more interesting. Uses data mining to analyze detailed and customer-level data. Turn uncertainty into probability in other words. Finds hidden patterns to find anomalies, make predictions and make associations
- What is the contribution of my high value customers, what are the trends across segments and how will the new promotion affect the bottom line
OLAP being used to assess the broader impact on my company. Supports ad-hoc data exploration, simplifies complex queries and is faster than accessing a database directly. Of course this is analysis of the results of the actions I took as a result of the decisions I made so this is analyzing at one step removed, as it must.
Data mining can enrich OLAP with new metrics that can be analyzed in aggregate as well as focus on the important information (determined by data mining) not the noise
He then got into demo mode and walked through how a company might use OLAP and data mining to determine if there is potential fraud in expense reports. If expenses, for instance, are rising is this legitimate or is there fraud? If there is, what’s the impact? Process is basically:
- Start with expense reports in the database
Objects are: Expense_Normal (known OK expenses), Expense_View (all of the expenses), Potential_Fraud_Results, Expense_Fraud_View that joins the potential fraud table with the raw expense information to show potential fraud cost, Expense_Analysis
- Use data mining to do anomaly detection on each item
He makes a couple of selections based on what works and does not work in data mining, removing some attributes and reducing the outlier rate for instance. The data mining wizard for anomaly detection runs and builds the model.
- Calculate a potential fraud score for each item
The model is then applied to the main expense view. Creates a table with additional columns for potential fraud prediction and a probability measures for this. The new table and columns are accessible everywhere using SQL.
- Load potential fraud score and expense information into an OLAP cube
A 3 dimensional model (time, organization and expense category) is created against the data. Different dimensions have hierarchies (calendar or fiscal year hierarchy, expense category hierarchy). The dimensions are then mapped to a source, a table, and then the cube created.
- Use the cube to see what this tells you about potential fraud over time, over divisions, over categories etc.
Note that this scenario just looks for anomalies, there is no closed loop to actually mark those that were, in fact, fraudulent (not just anomalous) and use that to drive the fraud engine. The ability to have a set of rules that use the probability of fraud to drive an auto-approve decision was also not discussed, though clearly it would be easy to use the same data mining results.