First session today is Jayne Dutra of NASA on Re-Thinking Search in a Web 2.0 World. Jayne started by going over some of the basics, talking about web 1.0 with portals/websites/search moving to Web 2.0 with blogs, wikis, RSS, social networking and community portals. She used a Mills Davis slide that talked about web 3.0 being focused on mash ups, semantic search and virtual worlds. In general more people are publishing to the web, more tools exist for managing the wisdom of crowds and increasingly ubiquitous. These tools have led to an explosion of content, especially content in shared spaces and with shared tags etc. This explosion is also true inside the firewall and thus it impacts enterprise search.
Search has to evolve inside the enterprise too. Web crawling does not meet needs as information is available in lots of formats. People also want to use metadata to improve search but metadata has been entered in many different ways in different systems. Thus a search becomes time consuming and frustrating. For instance, it was particularly hard to find design and engineering rationales for historical projects.
Focus now is on a new search metaphor:
- Search multiple repositories at once
- Find an object without knowing where it is stored
- Combine key word queries with faceted navigation to assist in discovery
- Save and subscribe to searches
- Apply and manage tags
- See relationships as these make sense of things found in searches
To build the search engine they developed a lot of use cases for things like “all documents relating to certification for flight readiness” or “types of data returned from a earlier missions to help write follow-on proposal” or “status of all drawings for a given mission”. Lots of requests that are not repository or format dependent, for instance.
They have focused on taxonomies and consistent metadata so they can deliver “information services”. They are making this match the business processes being used by the teams involved and developing patterns that come up a lot. They are hoping to deliver results in lots of different ways – lists of documents, graphs of relationships etc. Metadata elements are being defined, some mandatory and some conditional, and some use of controlled terms (a taxonomy) is being considered also. The taxonomy is small and tightly coupled to the core metadata and owned by the SMEs of the various domains – the “gold sources”. The taxonomy also evolves constantly as things change – missions come and go, technology changes, organizational roles change etc. The initial focus is on everything around Mars – to avoid boiling the ocean.
They are using Unstructured Information Management Architecture (UIMA) – an open standard emerging from IBM and OASIS. This helps automate the creation of metadata. This is important as people hate entering metadata and are bad at it. UIMA has an aggregate analysis engine that pulls together many analysis engines. These analysis engines annotate the data iteratively – each iteration knows more and enriches the metadata. Then the data, and the metadata, is exposed to the consumers. They use taxonomSmart (Enough) Systems, the blog â€º Edit – WordPressies, linguistic analysis, rules-based categorization and more to automate the entity extraction and metadata recording. They use a product called Inxight to extract the data entities.
The end result is search driven by the way the user thinks. This might be keyword based or by structured keywords/taxonomy. Results display with all the metadata and allow you to navigate to other related items using the structured metadata. It also allows some graphical display of the results to see what kinds of materials are being returned – how many of the returned items match metadata of a specific type, for instance. You can pivot on the metadata, visualize related documents etc.
Clearly automation of metadata creation/entity extraction is important, even when a “gold source” taxonomy is being created – no manual process for creating metadata is going to cut it. Relationships between things also matter a great deal. Equally clearly a variety of display approaches adds value. It occurs to me that this kind of delivery of information can be integrated with automated decisioning systems both to do some statistical analysis (how many documents match) and to deliver supporting information for a decision.