≡ Menu

PARC Research – Exploiting unstructured data for predictive applications #pawcon


Bo and Lawrence from PARC presented some work on contextual intelligence research designed to exploit unstructured data for novel predictive applications. PARC is now an independent business unit focused on the Business of Breakthroughs, working with Xerox and with other companies. The new focus means they work on a wide range of problems and aim at developing new business products.

There are 3 sources of improvements in predictive power – improved data quality, broader data sources and enriching by understanding context. This last is the focus of this team. Data can be enriched using content analysis techniques like tagging and annotation and there is some interesting research on image tagging. Context analysis is also possible, modeling relationships across documents, using social analytics and user activity modeling (associating people with information and analyzing the data exhaust from their activity.

The contextual intelligence work at PARC focuses on disambiguating the meaning of structured data by delivering relevant and timely documents, messages, recommendations and ads. For instance GPS coordinates are structured but ambiguous in terms of what I should do about it. Finding pertinence is a two step process:

  1. Activity identifies the type of information needed
    As businesses go through the activities within the broad areas of acquiring supplies, producing products and distributing these products they need different types of information. This is true of individuals too – people also acquire information, produce something(adding value) and distribute it to others – though the network is more complex.
  2. Interests identify pertinence within type
    Interests defined by job function, specialty or personal preferences determine which information is pertinent

PARC’s hybrid context engine links the current situation of a person (physical context or location and the electronic context of that person – calendar, calls, emails) and the previous patterns of interest, behavior or social network. These are combined using different weights to identify the person’s activity and interests. This can be used for information retrieval, targeted ads, lifestyle advice, workgroup coordination etc.

One example was a project done to take an offline leisure guide and use it to deliver activity-aware leisure guides. It used their location, time, recent messages to infer their current activity and drive leisure activities that might be relevant. This is key in Japan, where the trial was being conducted, because groups often meet halfway between their locations, meaning they are somewhere that none of the group knows. The application captured feedback and was used to improve the results. For instance, around lunch time it might suggest mostly restaurants but not only restaurants as perhaps shopping is on the users mind. The user can accept suggestions or use the interface to tell it what you are looking for, getting new results based on that. Users can also change their location/push the time forward to see what they will be recommended at the weekend say.

The engine first determines the number of slots for each content type to present. It uses all the context (physical and electronic) it has to drive this – for instance, if there are a lot of SMS messages to friends about movies then this might show what the user is planning. The probability of next activity is determined for different activities using various context inputs and weights for these. The weights are based on population prior activity and learn for each individual. Observation of these users compared different models and found that a model that used place (what do people do here), time (what do people do now) and learned visit behavior hit 82% against a baseline of about 62% for random selection.

Once the slots were determined, personal preferences (specified and learned)  combined with things like distance/time and collaborative filtering/text analytics determine which items get put into the slots. For instance, the preferences of a person and what they have been reading about restaurants might determine which restaurants go into the slots assigned to eating. Very hybrid/ensemble model with lots of weighted items feeding in.

When they compared how people rated the recommendations they found that individual models over- or under-estimate utility and combine model was closest to the user rating. Affinity clustering tended to overestimate their rating as did distance while stated preferences under estimated their preferences. Interestingly the effectiveness of different models varied a lot by people, so varying the weights by individual was critical.

Conclusions from PARC are that contextual intelligence – contextual awareness – enables new applications. It enables enterprise and person services that help you surf the data tsunami not drown in it.


Comments on this entry are closed.