FindUR: Knowledge Assisted Search

Download Report

Transcript FindUR: Knowledge Assisted Search

Ontological Issues in Knowledge-Enhanced Search

Deborah McGuinness AT&T Labs Research Florham Park, NJ 07932 USA contact email: [email protected]

FindUR homepage: http://www.research.att.com/~dlm/findur FOIS ‘98

Tom Beattie Beth Cataldo Ihung (Kyle) Chang Curtis Chen Lisa Croel Martha Desmond Paul Fuoss Karrie Hanson Pam Kirkbride Dave Kormann Harley Manning Russ Maulitz Mark Plotnick Lori Alperin Resnick Beth Robinson Steve Solomon

Contributors

Outline

• Motivation • Basic Solution • Ontological Issues • Future Work • Conclusion

Motivation

Queries miss relevant documents because: • queries are naive • documents do not contain “perfect” content

• Augment Documents

Solutions

- tag all pages with (controlled vocabulary) meta tags - labor is distributed and must be trained - approach is unscalable especially if content changes • Augment Index - centralized labor cost - must re-index every time meta tag language changes • Augment Query (manually) - requires user training • Augment Query (automatically) - no user training or content provider training - centralized labor cost, no rework needed

FindUR

• Address issues of recall, rank ordering, and browsing • utilizing available knowledge • in a standard search platform • deploy, test, and maintain on websites

Background Knowledge Supports:

• Retrievals of previously missed relevant documents • More relevant retrievals scored higher than less relevant documents • Simple user generation and refinement of queries • User expectation setting

FindUR Architecture

Content to Search:

Research Site Technical Memorandum Calendars (Summit 2005, Research) Yellow Pages (Directory Westfield)

Content (Web Pages or Databases

Newspapers (Leader) Internal Sites (Rapid Prototyping) AT&T Solutions Worldnet Customer Care Medical Information

Search Technology: Search Engine User Interface: Content Classification

CLASSIC Knowledge Representation System

Domain Domain Knowledge Knowledge

Verity (and topic sets)

Query Input

GUI supporting browsing and selection Results (standard format) Results (domain specific) Collaborative Topic Set Tool

Verity SearchScript, Javascript, HTML, CGI, CLASSIC

FindUR improves search by:

• Retrieving previously missed relevant documents • More appropriately ordering search results • Facilitating simple user generation and refinement of queries • Setting user expectations about the content domain

Selected FindUR implementations:

Electronic Yellow Pages: www.quintillion.com/westfield Event Calendars: www.quintillion.com/calendar/[summit |westfield] Medical Information (P-CHIP, POS) Computer Science Research Information Competitive Intelligence Sites Staff Augmentation and Vendor Procurement Info Network Service Realization Rapid Prototyping Info and Services Technical Memorandum Access Online Newspapers Hometown Cites Intellectual Capital

Common Site Conditions

Short Document Length Few related content words per document Unfamiliar vocabularies Variability in specificity of documents Inconsistent or irregular meta tagging Higher (relevance) value for general documents over specific documents

Evidence types

Synonyms Subclasses Subparts Products Companies Associated Standards Key People

Issues

Topic Set Inclusion Topic Exposure Description Logic Usage

FindUR/Smart Search Benefits

• Retrieves documents otherwise missed • More appropriately organizes documents according to relevance (useful for large number of retrievals) • Browsing support (navigation, highlighting) • Simple User Query building and refinement • Full Query Logging and Trace • Facilitate use of advanced search functions without requiring knowledge of a search language • Automatically search the right knowledge sources according to information about the context of the query

• • • FindUR Future Work

Topic Set Generation

• Distributed Collaborative Topic Set Building Environment • Use tagged content to generate candidate topic sets • Information Retrieval (use clustering to analyze documents and suggest topic definitions) • Machine Learning (use query logs as training data) • Reuse topic sets for different purposes using views of knowledge

Knowledge Representation Integration

• Use knowledge base to check definitions and determine overlaps • Expand beyond subclass, instance, and synonym relationships and incorporate more structured information • Maintain information about how and when to use topic information • Maintain descriptions of content sources

Evaluation and Interface Evolution

• Evaluate on effectiveness of retrievals, relevance ranking, ease of query refinement, east of content input into category scheme • Java-based interface for scalability, rapid changing, understandability

AT&T Labs Research Site

• FindUR has a taxonomy of background information which includes “knowledge representation” as a sub-category of “artificial intelligence.” • The category/sub-category relationships are displayed in the user interface. Users can construct queries by simply clicking categories and sub-categories, invoking background knowledge in the process.

AT&T Labs Research Site

• With background knowledge the search returns 696 relevant listings. • Documents of a more general nature such as bibliographies and departmental overviews float higher in the list. Without background knowledge, a reference manual was the first retrieval.

General Nature of Descriptions

a WINE a LIQUID a POTABLE general categories grape: chardonnay, ... [>= 1] sugar-content: dry, sweet, off-dry color: red, white, rose price: a PRICE winery: a WINERY structured components grape dictates color (modulo skin) harvest time and sugar are related interconnections between parts

General Nature of Descriptions

concept superconcepts a WINE a LIQUID a POTABLE general categories

number restrictions

roles

value restrictions

grape: chardonnay, ... [>= 1] sugar-content: dry, sweet, off-dry color: red, white, rose price: a PRICE winery: a WINERY structured components grape dictates color (modulo skin) harvest time and sugar are related interconnections between parts

URLs

FindUR Home Page: http://www.research.att.com/~dlm/findur Description Logic Home Page: http://dl.kr.org/dl Implemented Description Logic-based systems: http:/www.ida.liu.se/labs/iislab/people/patla/DL/systems.html

The CLASSIC Knowledge Representation System: http://www.research.att.com/sw/tools/classic Deborah McGuinness: http://www.research.att.com/info/dlm

Observations and Needs

• Users demand “smarter search”, browsing support, and personalization • Development demand standard platforms • Web sites evolve to include information in many formats • Content may be dynamically generated • Content may be of varying form and quality (may be limited, inconsistent, tagged, formatted, etc.) • Sites may cater to limited domains • Background knowledge may be available

FindUR -Knowledge-Enhanced Online Search

• Address issues of recall, rank ordering, and browsing • utilizing available knowledge • in a standard search platform • deploy, test, and maintain on websites

Background Knowledge Supports:

• Retrievals of previously missed relevant documents • More relevant retrievals scored higher than less relevant documents • Simple user generation and refinement of queries • User expectation setting

Content to Search:

Research Site Technical Memorandum

FindUR Architecture

Content Classification

Yellow Pages (Directory Westfield) Newspapers (Leader) Internal Sites (Rapid Prototyping) AT&T Solutions Worldnet Customer Care Medical Information

Search Technology:

Pages or Databases

Domain Domain Knowledge Knowledge

CLASSIC Knowledge Representation System

Search Engine

Verity (and topic sets)

User Interface:

GUI supporting browsing and selection Results (standard format) Results (domain specific) Collaborative Topic Set Tool

Verity SearchScript, Javascript, HTML, CGI, CLASSIC

Current and Future Work

• Collaborative topic set generation – interactive environment for updating knowledge – could incorporate other techniques using IR, ML, dictionaries, etc – about to be deployed on research and AT&T Solutions sites • Knowledge representation integration for topic maintenance – CLASSIC maintenance on definitions, overlaps, inconsistencies, – CLASSIC-VERITY translator – beyond subclass, instance, synonym relationships – content source description • Evaluation and interface evolution – effectiveness of retrievals, relevance ranking, ease of query refinement, ease of content generation and maintenance • Datalogs evaluation and incorporation of findings

FindUR improves search by:

• Retrieving previously missed relevant documents • More appropriately ordering search results • Facilitating simple user generation and refinement of queries • Setting user expectations about the content domain • Providing an environment to maintain evolving background knowledge sets • Integrating multiple forms of documents • Browsing support (navigation, highlighting) • Automatically search the correct knowledge sources (according to information about the context of the query

Status

• Platform is robust and integrated into Business Unit endorsed environment.

• FindUR is deployed on five sites including the Summit 2005 Calendar site, The Westfield Leader (newspaper) site, and the Technical Memorandum Research database as well as Directory Westfield and the Research Web Site.

• AT&T Solutions is funding a FindUR based solution to contractor and vendor management.

• FindUR contains some topic sets that could be leveraged in other projects • The collaborative topic set building environment could be used independent of the rest of the FindUR to generate and maintain background knowledge sets

Extras

Editing Evidence

Extras

AT&T Labs Research Site

with Gilmore (main site) & Malkhi (tm database)

• A simple text search for artificial intelligence only retrieves listings for pages that contain the phrase “artificial intelligence.” Information about branches of AI such as description logics typically do not contain this more general phrase.

• This results list for the phrase “artificial intelligence” misses 582 listings relevant to the concept of artificial intelligence.

AT&T Labs Research Site

• FindUR has a taxonomy of background information which includes “knowledge representation” as a sub-category of “artificial intelligence.” • The category/sub-category relationships are displayed in the user interface. Users can construct queries by simply clicking categories and sub-categories, invoking background knowledge in the process.

AT&T Labs Research Site

• With background knowledge the search returns 696 relevant listings. • Documents of a more general nature such as bibliographies and departmental overviews float higher in the list. Without background knowledge, a reference manual was the first retrieval.