John Elder, one of my favorite presenters, introduced a series of customer stories around text mining/text analytics. He calls this "the wild west" of analytics with lots of startups and innovation. He points out that these kinds of analytics must be designed to complement human capabilities, not least because the human brain is good at text. Examples:
- US Customs and Border Protection
Using text analytics to detect unusual patterns of activity in border crossings and to mine comments about container shipments to see if they match the codes used on the shipment and to enable them to be compared with the physical results from x-ray machines etc.
Discover and even prevent leaks – unauthorized disclosure of information – using past patterns of disclosure and by detecting unusual information movement.
- Social Security Administration
Process of applying for disability is long and complex, only 1/3 are approved and half of those declined are eventually approved. The text around these applications is typed by staff based on what applicants say is wrong about them. This is a rich, focused set of information. 20% of those who would eventually be approved could be identified and approved automatically. Processing it is complex thanks to misspelling, multi-word phrases like learning disability, spelling problems, synonyms and stemming (learn and learning for instance).
- National Center for Medical Intelligence
Looking for infectious animal diseases by monitoring the web. For instance, news reports of spontaneous abortions or bleeding in sheep might show that an outbreak of rift valley fever is happening. Find the words around the key phrases and find documents using those words. The process involves a review step by an expert where they can be presented a document that looks interesting to the engine then say yes/no. The engine then uses this review to re-prioritize the remaining documents.
John wrapped up with some practical advice on text mining:
- Know the gain expected in terms of something low-hanging or otherwise leverageable
- Have an interdisciplinary team
- Be vigilant about data, capturing and maintaining the information you use
- Allow for multiple learning cycles – time to learn
- Have a business champion – someone willing to take the risk
And some steps to follow:
- Assess data assets – lots of data warehouses are data mausoleums and can find serious problems. Get data owner on board
- Identify pain points on the front line – the decisions that would make a difference to use my terms
- Brainstorm a process, allowing for it to be VERY different than what is done today
- Conduct a pilot project – aiming for a quick win and the potential for a big win
- Have key people work with analytic experts
- Prove the ROI