Brenda Dietrich from IBM research wrapped up the morning with a discussion of some of IBM’s research. This involved both managing uncertain data at scale and driving analytics for this data. Projects cover systems of people, the future Watson, Outcome-based business and resilient business and services.
As everyone knows there’s a lot more data out there and it is both different from traditional data in format it is also fundamentally less certain. In fact IBM expects most data to be of this uncertain, unstructured data that has high Volume, high Velocity, Variety and issues with Veracity (because it arrives at different times or in different ways for instance). Uncertainty comes from process uncertainty (traffic patterns), “inherent” variability such as yield, data uncertainty (spelling and other editing errors, GPS uncertainty, ambiguity of labels, rumors and conflicting data) and filtering (we interpret or use data in a way that creates uncertainty).
The growth in this uncertain data does not mean you cannot make decisions. You can categorize the uncertainty, disambiguate, find methods of analysis that are robust in the face of ambiguity and more. You can use data to find and improve data, you can cross-reference data and consider it spatially to see how “close” it might be, you can consider time as the past influences the future but not the reverse. And, of course, the smaller your target segment the more data you need (forcing you to use this data).
Systems of people comes up next. Brenda asserted that traditional analytics systems were focused on process automation and that the move to people-centric processes means you need a new approach to analytics. I am not sure about this as many processes now innovated by analytics were regarded as knowledge-centric, people-centric, “soft” processes. Nevertheless there are clear issues with using analytics to drive collaborative and people-centric processes. This requires three things:
- People enablement (adaptive, context-aware collaboration tools that keep track of what was done)
- People content (skills profiles, resumes etc in a structured way)
- People analytics (analysis of this new data to see what works, how teams might be formed etc)
Finally some thoughts on the future of analytics in three areas
- Explosion of unstructured data
- Skills gap, if only temporary, for both consumption and supply (not the same problem)
- Deploy analytics at scale
New data can come from an increase in decision-making scope or new formats/data. Broader scope is hard organizationally so much of the explosion comes from new data.
Analytic tools must expand to ingest and analyze new data sources and bring it into the analytic stack (feature extraction, entity identification etc). Plus you need to feed the results of decisions back into this analytic process for continuous improvement.
Analytic decision-making involves a series of steps:
- Data acquisition
- Filtering and extraction
- Core analytic algorithms
- Composition and packaging for
- Deployment
Research here involves bringing new and existing algorithms to a common platform while making the composition and packaging process hardware and deployment environment independent – removing the need to program it.
Finally some discussion of Watson. As IBM seeks new usages of Watson one of the key things is the ability to be more interactive, letting Watson’s algorithms ask for clarification or additional data while another is providing not just answers but also the reasoning and support for those decisions.
A whirlwind tour extracted from a longer presentation so this is a little choppy.