Charlie Berger and some others presented on using data mining for fraud detection. Fraud is a huge issue – for instance there is $31B annually in insurance claims fraud (10-15%) with 25% of all claims have some fraud and more than 1 in 3 bodily-injury claims from car crashes involving fraud. Other industries have similar numbers and fraud is widespread and expensive. Many people think it is ok to defraud insurers, for instance, and 30% would not report someone else who defrauded an insurer. Physicians often game the system to get coverage for patients. And so on.
Fraud is often tricky because there are relatively few fraud cases in a large population so you have to look at lots of data. It is hard to use many algorithms as a result as there are no enough “bads” among the “goods”. The 11g ODM has a One Class Support Vector Machine. This can be used to find anomalous records where you lack many examples. This is the anomaly detection algorithm in ODM which considers multiple attributes in various combinations to see what marks a record as anomalous. It first finds the “normal” and then identifies how unlike this each record is – there is no sample set. The algorithm can use unstructured data, text, also and use nested transactional data (all the claims for a person for example). Charlie walked through the very automated process of creating such a model and made the point that multiple algorithms can and often should be applied to enrich the analysis. As before, this information is accessible to the BI and reporting tools.
Karl Rexer (publisher of this nice report), came up next to give some examples ranging from warranty abuse, false repairs, tax fraud and counterfeiting. He made the point that these algorithms identify suspicious behavior, the probability of fraud, but that companies have to decide how to act on these suspicions. Predictive models, of course, make predictions they don’t take actions – that takes people or rules-based decisions. Lots of fraud goes uncaught partly because companies don’t want to admit it or because they accept it as a cost of doing business. They also fear that detecting fraud can affect customer service (which it can, false positives are really good at upsetting people). Karl had some suggestions:
- Leverage domain expertise
- Create metrics and red-flag thresholds
- Find outliers
- Look for individuals and groups
- Cluster analysis, anomaly detection and link analysis are particularly useful
- Automate flagging – something that resonates with me given the volumes of transactions and customers most companies have
- Use common sense! Think about the rules that can be defined, natural boundaries etc.
Natascha from UC San Diego talked about their work on using analytics to detect Medicare and Medicaid fraud nationally. She made the point that fraudsters will change their behavior as fraud is detected more aggresively. This post-payment fraud analysis is particularly good at catching organized networks but it is “pay and chase” rather than preventing fraudulent payment (something done by products like Fair Isaac’s Payment Optimizer, for instance).