Syndicated from Smart Data Collective
John Elder presented a collection of case studies to showcase the ROI of data mining. John started by making the point that many of his case studies had technical success but not business success – an interesting statistic. John sees three major ways that predictive analytics can help – streamlining, eliminating the bad or discovering the good.
John and his team do a tremendous amount of work with data mining and predictive analytic tools and know how well they work but also consider the human aspect critical. After all, computers are both powerful and mindless and the human aspect of putting them to work is key.
Gartner have hype cycles for products and data mining, unlike artificial intelligence, is on the plateau of productivity. The focus of data mining on bottom line activities is part of why it is already considered productive. In addition, most corporate processes are already fairly well performed and so small improvements (using data mining) really matter. Cases on streamlining or automating decisions came first:
- HSBC wanted to cross-sell products and used their historical data to find out what might interest a customer next. They wanted to take a customer contact that was a pure cost and make it a benefit by targeting inbound calls or contacts at a branch. Used data mining and visualization to present new ideas to people.
- Anheuser-Busch wanted to see how their products are displayed in stores. Knowing this helps them see what works and does not work and helps them manage their products in a store. Used analytics to take an image and automate the definition of a plan for the shelf. Easier than other visual recognition because products and brands make it easy to spot what’s what. Got a 90% accuracy rate, dramatically improving the process.
- Lumidigm is a bio-metrics company that uses how your skin reflects infra-red reflections to identify you. Originally wanted to use this to diagnose disease but found that person-specific factors were overwhelming it. To use the differences required analytics to predict how likely someone is who they say they are. The fact that none of the models were 100% accurate did not mean it could not be used – Disney use it for tying people to their tickets for instance.
- Pergrine Systems wanted to develop a “Sim City” for IT and let an IT department simulate the impact of staffing, service level agreements etc. The analytics allowed IT departments to answer questions like where to add staff or what the impact of upgrading laptops would be. One of the key learnings was to keep uncertainty throughout the calculations.
- Social Security Administration wanted to improve a 2 year
disability process where about a third were accpeted and half of those
declined succeed on appeal. Needed a way to fast track “easy”
applications. But what is an easy application? Used text mining of the
application data to predict with 90% accuracy the 20% easy cases.
Next two were detecting and eliminating “bad” results:
- IRS wanted to detect fraud for a particular kind of refund. There were plenty of fraud examples in this case but the fraud was so easy to perpetrate that they were drowning in cases. They were finding 1 in 100 anyway but when they automated the detection they found 25 in every 100!
- Service fraud detection at a consumer electronics firm – warranty fraud. Got some tips from folks but not much else. Automated the decision to score claims and focused the investigators on the top ones. Recovered $20M in 9 months!
Final ones were mining for gold – finding the hidden good results.
- WestWind foundation hedge fund strategy trying to manage trades based on predictive models and market timing. Managed to do better over the year than the market as a whole but still very volatile and not always better. Felt like it could be luck but were able to develop a model of the model to see how likely it was that this was “real” or just luck. In this case they found there was almost no likelihood that this was random. Monitoring is critical.
- Pharmacia and Upjohn had a drug they were about the abandon because it did not seem useful. Were comparing it to a placebo – and placebos “work” (especially if the placebo has side-effects)! Analyzing the data there were, for instance, a group of people who really got better on the placebo as well as others who felt worse. The drug did much better but only a complex, sophisticated visualization made this clear. The scientists were applying the FDA test where the data miners just looked at the data.
The bottom line for these projects is interesting. In HSBC they lost the champion and in Anheuser-Busch 9/11 happened and the projects died. Lumidgim found a solution with Disney and Peregrine got a solution that was a successs. The SSA project died with a change of management. The IRS and the consumer goods fraud detection systems both worked. The market timing system lasted 8 years before the market caught up and the “edge” disappeared.
So, lessons learned. You need:
- Potential gains – either leverage where an incremental improvement helps or low hanging fruit that no-one has attempted yet (though the latter is increasingly rare).
- Interdisciplinary team
- Data vigilance – capture and maintain the data you have
- Time for learning cycles
- A Business Champion!
Great advice from John who has a new book coming – Handbook of Statistical Analysis and Data Mining Applications.