I got a chance to chat with some folks from SAS about their text analytics. SAS’ Business Analytics framework contains an Analytics component and within this exists a set of products from SAS such as Enterprise Miner and Model Manager as well as a set of capabilities like Operations Research, Forecasting and Text Analytics.
The text analytics products fall into two broad categories based on the approach used – some are domain-driven and some are statistically driven. The four current products are:
- Enterprise Content Categorization
For help building taxonomies in a company
- Ontology Management
Tools for an organization’s semantic repository
- Text Miner 4.2
Now integrating more Natural Language Processing and part of Enterprise Miner
- Sentiment Analysis
Handling both complex rules and unstructured text analysis for sentiment analysis
Of these the most interesting to me is Text Miner as it integrates with the other analytics pieces.
SAS sees most users of Text Miner doing discovery work with text analytics – finding clusters of customer chatter or warranty claims or similar and showing how these clusters are changing relative to other data such as the size of a population for instance, or finding phrases, terms or groups of terms that are outside the realm of usual conversation. Lots of this information is shown in graphs and reports but it is also possible to deploy text analytics as part of the standard modeling process supported by Enterprise Miner and deploy into a more operational environment.
Those users focused on text analytics in this context are generally mixing structured and unstructured data in their models, using the text analytics techniques to extract additional value from a record that contains text fields or has associated documents. Some other interesting uses include:
- Using text mining to replace explicit coding of problems or calls by staff.
Letting them describe the call or problem freely and then using text analytics to classify the call, avoiding the problems caused by people having a few “favorite” classifications.
- Voice transcription
Taking calls and transcribing them and then using text analytics to search for things like “otherwise I will close my account”. The text analytics help mitigate transcription errors and variations in phrasing.
- Categorizing the last 20% of a document set
Lots of companies that specify rules to categorize documents run out of steam at 70 or 80% and text analytics can handle the rest automatically (though you could use text analytics to come up with the rules in the first place, especially combined with a decision tree algorithm for instance).
- Handling fields with too many values for data mining algorithms
Even in structured data there are sometimes fields that have too many potential values for a typical data mining algorithm – prescription codes, for example, are often grouped together in data mining. Text analytic techniques can be used on these fields.
- Use text analytics to analyze reasons for overrides
When analytic or rules-based decisioning is overridden by people they are often asked to explain. Text analytics can be used to provide a feedback loop that classifies these reasons into groups and then makes these reasons available in a drop-down list for subsequent use.
In general, SAS sees customers both creating structured data using text analytics (feeding the structured results into their usual analytic process) and creating new text analytic models and merging the models themselves with models built from structured data. An example of the first would be to extract entities discussed in call notes and or to score a call as to how likely the customer is to be upset and then add this data as new fields for data mining. An example of the second would be to score claims for risk, say, using structured data and then rank within that using a model based on text analytics. In both cases the same deployment approach can be taken for deploying text analytics as is used for other data mining and predictive models.
There are some interesting trends around text analytics. At Predictive Analytics World this spring, for instance, there was a lot more talk of text analytics than in previous years. At the same time the recent Rexer Analytics survey did not show much uptick in text analytics. The folks at SAS see more companies adopting text analytics at a strategic level – having an overall text analysis strategy–and more using text analytics to get at the complex logic they need and take it into ongoing daily operations. They also see a shift towards more internet oriented text for analysis and away from internal. They also see about half their customers approaching text analytics with some form of taxonomy or categorization already developed and about half starting from scratch and using text mining to find out what the categorization should be.