I got a chance to attend some of Enterprise Search Summit West yesterday (thanks to my friends at Semantra). I attended a couple of sessions and browsed the exhibit hall.
The first session was entitled “What is Semantic Search“. Seth Earley gave a rapid 30 minute overview of semantic search. He touched on explicit and implicit metadata, the difficulties of extracting semantic meaning from documents (in part because context is so important) and the value of implicit metadata such as the number of occurrences of a term, source and structure. All this he said is basic capability in most search engines.
He went on to say that the problem that search terms are short, ambiguous and an approximation of the user’s real information need and so it is hard to get context from search terms. He discussed how there is a lot of focus on how the brain works and on natural language analysis to try and make it more practical for users to ask their actual question but this has lots of problems as language is very ambiguous. Like Seth I think that trying to solve these problems in the general case is very difficult and, for most users, not really necessary. After all most of the ambiguity and complexity comes from the number of things people do and talk about. If I were focused on a single decision area then I could make all sorts of assumptions about language and meaning and so greatly improve my results. Not only that but building (and maintaining) a knowledge base or ontology is much more practical for a focused area. Seth talked about roles in this context but I think decisions would be a better focus.
Seth also discussed how many small facts or micro theories can be chained together. For instance a search “looking for photos of people smiling” could use theories about what makes people smile (being happy) and what makes them happy (seeing someone they love achieve a milestone) and so on and so deliver a photo of a mother whose child is taking its first steps. Although I have not thought this through, it seems to me that these kinds of micro-theories are awfully rule-like and so some combination of search and business rules might be interesting.
Finally he pointed out that any solution to searching had to address two kinds of needs:
- Someone looking for something specific
- Someone knows they have a gap but does not know what might be in it
He gave a number of useful references – check out www.earley.com, www.readwriteweb.com and wordnet.princeton.edu
The second session was on Semantic Web – Mastering Discovery. This was a combined presentation by Siderean and a customer. Siderean went first discussing how, particularly in ad-hoc investigation, the issue is navigation not search – people want to find what’s out there (this is the second of Seth’s two types of search from the first session). To do this he argued that you need four things:
- Shared unified view of results across sources
- An ability to rapidly zero in on things
- An ability to find and follow relationships
- A way to share discoveries with others
Siderean, he said, uses RDF ontologies and focuses on relationships (where many search algorithms throw relationships away) along with auto categorization and entity extraction. He was followed by his customer, spendmatters.com a business blog on supply chain and spend management. The folks at Spend Matters wanted to deliver more, relevant information and get some information on the value of a particular piece of information. To do this they developed relational navigation to create a hub from a lot of different sources like supplier databases, blogs, trade pubs, analysts. The searching is based on ontologies developed for these sources using Siderean. They wanted to support very directed search while also supporting the browse or newspaper effect so that people find things they were not looking for. He was a little rushed and so we did not see much of the interface that resulted but as someone maintaining several blogs and a wiki and pulling information from lots of sources I can see why this would be useful.
Finally I went to the expo. I won’t go on too much but there were a couple of interesting trends:
- Automated entity extraction is improving
Lots more discussion of automated extraction of entities from text and of doing so based on structured data such as using a customer table to find candidate customer entities in text - Search across structured and unstructured data
Many more interfaces for searching across both databases and text documents with a single query for a more federated approach - Support for actions on results
A number of vendors had ways to get results back and identify them as business objects or events on which actions could be taken such as starting a process or running a service. This is key to using search in automated systems
All in all an interesting few hours.