Ontology-enhanced Search for Primary Care Literature

Download Report

Transcript Ontology-enhanced Search for Primary Care Literature

Ontology-enhanced Search for
Primary Care Literature
Deborah L. McGuinness
Associate Director and Senior Research Scientist
Knowledge Systems Laboratory
Stanford University
Stanford, CA 94305
650-723-9770
[email protected]
(work supported by AT&T Labs Research, Florham Park, NJ in conjunction with NIST)
Outline
 Background
and Motivation
 (Simple) Medical Applications
 Collaborative Ontology Maintenance
Environment
 Discussion
Background

Description Logics






Research



Co-author of widely used DL - CLASSIC
Knowledge Sharing Committee producing KRSS
Co-editor of forthcoming DL book
Conceptual Modeling
Co-organizer DL2000 (attended and/or org since ’84)
Making KR&R systems usable (explanation, markup languages,
expressiveness and/or functionality extensions – part-of, epistemic, …)
Collaborative ontology environments (merging, diagnostics, annotating,
difference, focus of attention, libraries)
Applications




Configuration
Online services (electronic yellow pages, online calendars, Healthsite,
Hometown…)
E-commerce
Medicine
FindUR
(McGuinness, et. al.-WWW6 ’97, McGuinness-FOIS ’98)
 Ontology-enhanced
online search
 Motivated by AT&T Personal Online Services
needs of “friendlier and smarter” support for
browsing and search
 Exploits background knowledge and structured (or
semi-structured) sites to provide query expansion
in limited contexts
 Applications: yellow pages, online calendars,
competitive intelligence, Worldnet homepages,
TM search, customer care, medical applications, ...
Collaborators: Lori Alperin Resnick, Tom Beattie, Harley
Manning, Steve Solomon, Harry Moore
FindUR Architecture
Content to Search:
P-CHIP
Research Site
Technical Memorandum
Calendars (Summit 2005, Research)
Yellow Pages (Directory Westfield)
Newspapers (Leader)
AT&T Solutions
Worldnet Customer Care
Search and
Representation
Technology:
Content (Web
Pages, Documents,
Databases)
Content
Classification
Search
Engine
Classic
Domain
User Interface:
Search
Parameters
Knowledge
Collaborative Topic
Building
Tool
Verity Topic Sets
Query Input
Results
(std. format)
Results
(domain spec.)
Verity SearchScript,
Javascript, HTML,
CGI
P-CHIP –Primary Care Health Information Provider
-
Russ Maulitz, Ihung Kyle Chang, Wes Hutchison, Eric Vogel, Bob Grealish,
Nick DiCianni, George Garcia, Chris Sparks, Sudip Ghatak, …
Vision:
Ubiquitous access to ever-changing documents
Online documents
Partially marked up data (“pearl”, author, date,…)
Initial user- docs; other users: health care workers;
health care students; patients in waiting rooms,
Traits
-
-
Documents may not contain exact terms in queries
(causing low recall)
Sites may contain exploitable structure
Vocabularies may vary
Users may benefit from help forming queries
Users may require varying granularity
Search within contexts
Discussion
 Simple
ontologies enhanced search and browsing
experience
 Mark-up and structure can be exploited
 Critically dependent on ontologies (and their
maintenance)
 Ontology
environments (for naïve and advanced users)
 Validation
 (Semi)-automatic input
 Merging
 Mark-up
and structure can be exploited
 Expressive
markup languages
 Automatic markup support
 Markup validation tools
What is different now?
Size
 Speed
 Ontology “pull” in the market place
 Tools for semi-automatic ontology generation and
import
 Tools for automatic markup generation
 Availability of marked up data
 Commercial search support
 Research on ontology environments

Pointers
 FindUR:
www.research.att.com/~dlm/findur
 CLASSIC: www.research.att.com/sw/tools/classic
 Chimaera:www.ksl.svc.stanford.edu:5915/doc/people/rice/chimaera/
chimaera-movie.avi
 Deborah
McGuinness: www.ksl.stanford.edu/people/dlm
Extra Slides
Acknowledgements
PCHIP:
FindUR:
Russ Maulitz
Ihung (Kyle) Chang
Eric Vogel,
Bob Grealish
Wes Hutchison
Nick DiCianni
George Garcia
Chris Sparks
Sudip Ghatak
Lori Alperin Resnick
Tom Beattie
Harley Manning
Steve Solomon
Mark Plotnick
Dave Kormann
Applications
P-CHIP
 Business Directories (Directory Westfield)
 Telephone Listings (Directory Westfield, Rainbow
Pages (predecessor to anywho.com))
 Project Information Resource (Research)
 Public Events & News(Summit 2005, Westfield
Calendar, Westfield Leader, AT&T Research)
 AT&T Solutions Vendor Management
 Network Service Realization Process Support
 AT&T Labs Industry Relations Site
 Technical Memorandum Database

FindUR: Advantages
Challenge
Non-Enhanced
Access Large Amount of Publish Content on
Information Easily
Intranet
Search Enhanced with
Domain Knowledge
Provide an Intuitive UI to
easily find useful
information
Quick Access to
Available Information
Hours of Surfing 1. Finds All Relevant
Many Retrievals to Sift Matches
Through
2. Lists Most Relevant
Matches First
Facilitate Searching for
Novice Users
Users Need to Know
Search Terms
Relevant Terms are PreDefined
Create a “Learning
Organization”
No way to easily share
domain knowledge
Provides Collaboration
Environment for Topic
Building
Make Iterative
Improvements to Speed
of Finding Relevant
Information
No Visibility to Actual
Queries
Incorporates Query
Logging for Machine
Learning and UI
Refinement
FindUR Benefits
Retrieves documents otherwise missed
 More appropriately organizes documents
according to relevance (useful for large number of
retrievals)
 Browsing support (navigation, highlighting)
 Simple User Query building and refinement
 Full Query Logging and Trace
 Facilitate use of advanced search functions
without requiring knowledge of a search language
 Automatically search the right knowledge sources
according to information about the context of the
query

Future Work

Topic Set Generation
Distributed Collaborative Topic Set Building Environment
 Use tagged content to generate candidate topic sets
 Information Retrieval (use clustering to analyze documents and
suggest topic definitions)
 Machine Learning (use query logs as training data)
 Reuse topic sets for different purposes using views of knowledge


Knowledge Representation Integration
Use knowledge base to check definitions and determine overlaps
 Expand beyond subclass, instance, and synonym relationships and
incorporate more structured information
 Maintain information about how and when to use topic information
 Maintain descriptions of content sources


Evaluation and Interface Evolution
Evaluate on effectiveness of retrievals, relevance ranking, ease of
query refinement, east of content input into category scheme
 Java-based interface for scalability, rapid changing, understandability

What is an Ontology?
Catalog/
ID
Thesauri
“narrower
term”
relation
Terms/
glossary
Informal
is-a
Frames General
Formal
is-a (properties) Logical
constraints
Formal
instance
Disjointness,
Value Inverse, partRestrs. of…
Selected Experiences
• Online Configurators: PROSE/QUESTAR family of
configurator applications for AT&T and Lucent
• Data Mining applications for AT&T and NCR
• Knowledge-enhanced web search – FindUR application
family: electronic yellow pages, online calendars,
competitive intelligence, staffing,
• Ontology mgmt applications and environments Chimaera, Collaborative Topic builder,e-commerce
ontologies, ...
• Government ontology efforts: HPKB, intrusion
detection, RKF, Army
• Commercial Search - Cisco, Worldnet
• KR&R Researcher: Description Logics, co-Author of
CLASSIC, explanation of reasoning, meta languages for
pruning, usability issues, ontology environments
• Executive council for AAAI, Board of ontology.org, Board
of Adsura.com
Ontologies - extra
 Simple
Ontologies can be built by non-experts
 Consider Verity’s Topic Editor, Collaborative Topic Builder,
GFP interface, Chimaera, etc.
 Ontologies can be semi-automatically generated
 from crawls of site such as yahoo!, amazon, excite, etc.
 Semi-structured sites can provide starting points
 Ontologies are exploding (business pull instead of technology
push)
 most e-commerce sites are using them - MySimon, Affinia,
Amazon, Yahoo! Shopping,, etc.
 Controlled vocabularies (for the web) abound - SIC codes,
UMLS, UN/SPSC, Open Directory, Rosetta Net,
 DTDs and ontologies are a natural pairing to facilitate
automatic extraction
 KM applications require them
Other Topics of Interest
 Description
Logics
 Ontology Libraries
 Ontology Tools - Merging, pruning,
explanation, etc.
 Representation and Reasoning applications
– configuration, completing records,
customer care, etc.
Ontologies and importance to
E-Commerce
Simple ontologies provide:
 Controlled shared vocabulary
 Organization (and navigation support)
 Expectation setting (left side of many web pages)
 Browsing support (tagged structures such as Yahoo!)
 Search support (query expansion approaches such as
FindUR, e-Cyc)
 Sense disambiguation
Ontologies and importance to
E-Commerce II
 Foundation
for expansion and leverage
 Conflict detection
 Completion
 Regression testing/validation/verification support
foundation
 Configuration support
 Structured, comparative search
 Generalization/ Specialization
…
E-Commerce Search
(starting point Forrester modified by McGuinness)
 Ask
Queries
- multiple search interfaces (surgical shoppers, advice
seekers, window shoppers)
- set user expectations (interactive query refinement,
- anticipate anomalies
 Get Answers
- basic information (multiple sorts, filtering, structuring)
- modify results (user defined parameters for refining,
user profile info, narrow query, broaden query,
disambiguate query)
- suggest alternatives (suggest other comparable products
even from competitor’s sites
 Make Decisions
- manipulate results (enable side by side comparison)
- dive deeper (provide additional info, multimedia, other