Ontology-enhanced Search for Primary Care Literature
Download
Report
Transcript Ontology-enhanced Search for Primary Care Literature
Ontology-enhanced Search for
Primary Care Literature
Deborah L. McGuinness
Associate Director and Senior Research Scientist
Knowledge Systems Laboratory
Stanford University
Stanford, CA 94305
650-723-9770
[email protected]
(work supported by AT&T Labs Research, Florham Park, NJ in conjunction with NIST)
Outline
Background
and Motivation
(Simple) Medical Applications
Collaborative Ontology Maintenance
Environment
Discussion
Background
Description Logics
Research
Co-author of widely used DL - CLASSIC
Knowledge Sharing Committee producing KRSS
Co-editor of forthcoming DL book
Conceptual Modeling
Co-organizer DL2000 (attended and/or org since ’84)
Making KR&R systems usable (explanation, markup languages,
expressiveness and/or functionality extensions – part-of, epistemic, …)
Collaborative ontology environments (merging, diagnostics, annotating,
difference, focus of attention, libraries)
Applications
Configuration
Online services (electronic yellow pages, online calendars, Healthsite,
Hometown…)
E-commerce
Medicine
FindUR
(McGuinness, et. al.-WWW6 ’97, McGuinness-FOIS ’98)
Ontology-enhanced
online search
Motivated by AT&T Personal Online Services
needs of “friendlier and smarter” support for
browsing and search
Exploits background knowledge and structured (or
semi-structured) sites to provide query expansion
in limited contexts
Applications: yellow pages, online calendars,
competitive intelligence, Worldnet homepages,
TM search, customer care, medical applications, ...
Collaborators: Lori Alperin Resnick, Tom Beattie, Harley
Manning, Steve Solomon, Harry Moore
FindUR Architecture
Content to Search:
P-CHIP
Research Site
Technical Memorandum
Calendars (Summit 2005, Research)
Yellow Pages (Directory Westfield)
Newspapers (Leader)
AT&T Solutions
Worldnet Customer Care
Search and
Representation
Technology:
Content (Web
Pages, Documents,
Databases)
Content
Classification
Search
Engine
Classic
Domain
User Interface:
Search
Parameters
Knowledge
Collaborative Topic
Building
Tool
Verity Topic Sets
Query Input
Results
(std. format)
Results
(domain spec.)
Verity SearchScript,
Javascript, HTML,
CGI
P-CHIP –Primary Care Health Information Provider
-
Russ Maulitz, Ihung Kyle Chang, Wes Hutchison, Eric Vogel, Bob Grealish,
Nick DiCianni, George Garcia, Chris Sparks, Sudip Ghatak, …
Vision:
Ubiquitous access to ever-changing documents
Online documents
Partially marked up data (“pearl”, author, date,…)
Initial user- docs; other users: health care workers;
health care students; patients in waiting rooms,
Traits
-
-
Documents may not contain exact terms in queries
(causing low recall)
Sites may contain exploitable structure
Vocabularies may vary
Users may benefit from help forming queries
Users may require varying granularity
Search within contexts
Discussion
Simple
ontologies enhanced search and browsing
experience
Mark-up and structure can be exploited
Critically dependent on ontologies (and their
maintenance)
Ontology
environments (for naïve and advanced users)
Validation
(Semi)-automatic input
Merging
Mark-up
and structure can be exploited
Expressive
markup languages
Automatic markup support
Markup validation tools
What is different now?
Size
Speed
Ontology “pull” in the market place
Tools for semi-automatic ontology generation and
import
Tools for automatic markup generation
Availability of marked up data
Commercial search support
Research on ontology environments
Pointers
FindUR:
www.research.att.com/~dlm/findur
CLASSIC: www.research.att.com/sw/tools/classic
Chimaera:www.ksl.svc.stanford.edu:5915/doc/people/rice/chimaera/
chimaera-movie.avi
Deborah
McGuinness: www.ksl.stanford.edu/people/dlm
Extra Slides
Acknowledgements
PCHIP:
FindUR:
Russ Maulitz
Ihung (Kyle) Chang
Eric Vogel,
Bob Grealish
Wes Hutchison
Nick DiCianni
George Garcia
Chris Sparks
Sudip Ghatak
Lori Alperin Resnick
Tom Beattie
Harley Manning
Steve Solomon
Mark Plotnick
Dave Kormann
Applications
P-CHIP
Business Directories (Directory Westfield)
Telephone Listings (Directory Westfield, Rainbow
Pages (predecessor to anywho.com))
Project Information Resource (Research)
Public Events & News(Summit 2005, Westfield
Calendar, Westfield Leader, AT&T Research)
AT&T Solutions Vendor Management
Network Service Realization Process Support
AT&T Labs Industry Relations Site
Technical Memorandum Database
FindUR: Advantages
Challenge
Non-Enhanced
Access Large Amount of Publish Content on
Information Easily
Intranet
Search Enhanced with
Domain Knowledge
Provide an Intuitive UI to
easily find useful
information
Quick Access to
Available Information
Hours of Surfing 1. Finds All Relevant
Many Retrievals to Sift Matches
Through
2. Lists Most Relevant
Matches First
Facilitate Searching for
Novice Users
Users Need to Know
Search Terms
Relevant Terms are PreDefined
Create a “Learning
Organization”
No way to easily share
domain knowledge
Provides Collaboration
Environment for Topic
Building
Make Iterative
Improvements to Speed
of Finding Relevant
Information
No Visibility to Actual
Queries
Incorporates Query
Logging for Machine
Learning and UI
Refinement
FindUR Benefits
Retrieves documents otherwise missed
More appropriately organizes documents
according to relevance (useful for large number of
retrievals)
Browsing support (navigation, highlighting)
Simple User Query building and refinement
Full Query Logging and Trace
Facilitate use of advanced search functions
without requiring knowledge of a search language
Automatically search the right knowledge sources
according to information about the context of the
query
Future Work
Topic Set Generation
Distributed Collaborative Topic Set Building Environment
Use tagged content to generate candidate topic sets
Information Retrieval (use clustering to analyze documents and
suggest topic definitions)
Machine Learning (use query logs as training data)
Reuse topic sets for different purposes using views of knowledge
Knowledge Representation Integration
Use knowledge base to check definitions and determine overlaps
Expand beyond subclass, instance, and synonym relationships and
incorporate more structured information
Maintain information about how and when to use topic information
Maintain descriptions of content sources
Evaluation and Interface Evolution
Evaluate on effectiveness of retrievals, relevance ranking, ease of
query refinement, east of content input into category scheme
Java-based interface for scalability, rapid changing, understandability
What is an Ontology?
Catalog/
ID
Thesauri
“narrower
term”
relation
Terms/
glossary
Informal
is-a
Frames General
Formal
is-a (properties) Logical
constraints
Formal
instance
Disjointness,
Value Inverse, partRestrs. of…
Selected Experiences
• Online Configurators: PROSE/QUESTAR family of
configurator applications for AT&T and Lucent
• Data Mining applications for AT&T and NCR
• Knowledge-enhanced web search – FindUR application
family: electronic yellow pages, online calendars,
competitive intelligence, staffing,
• Ontology mgmt applications and environments Chimaera, Collaborative Topic builder,e-commerce
ontologies, ...
• Government ontology efforts: HPKB, intrusion
detection, RKF, Army
• Commercial Search - Cisco, Worldnet
• KR&R Researcher: Description Logics, co-Author of
CLASSIC, explanation of reasoning, meta languages for
pruning, usability issues, ontology environments
• Executive council for AAAI, Board of ontology.org, Board
of Adsura.com
Ontologies - extra
Simple
Ontologies can be built by non-experts
Consider Verity’s Topic Editor, Collaborative Topic Builder,
GFP interface, Chimaera, etc.
Ontologies can be semi-automatically generated
from crawls of site such as yahoo!, amazon, excite, etc.
Semi-structured sites can provide starting points
Ontologies are exploding (business pull instead of technology
push)
most e-commerce sites are using them - MySimon, Affinia,
Amazon, Yahoo! Shopping,, etc.
Controlled vocabularies (for the web) abound - SIC codes,
UMLS, UN/SPSC, Open Directory, Rosetta Net,
DTDs and ontologies are a natural pairing to facilitate
automatic extraction
KM applications require them
Other Topics of Interest
Description
Logics
Ontology Libraries
Ontology Tools - Merging, pruning,
explanation, etc.
Representation and Reasoning applications
– configuration, completing records,
customer care, etc.
Ontologies and importance to
E-Commerce
Simple ontologies provide:
Controlled shared vocabulary
Organization (and navigation support)
Expectation setting (left side of many web pages)
Browsing support (tagged structures such as Yahoo!)
Search support (query expansion approaches such as
FindUR, e-Cyc)
Sense disambiguation
Ontologies and importance to
E-Commerce II
Foundation
for expansion and leverage
Conflict detection
Completion
Regression testing/validation/verification support
foundation
Configuration support
Structured, comparative search
Generalization/ Specialization
…
E-Commerce Search
(starting point Forrester modified by McGuinness)
Ask
Queries
- multiple search interfaces (surgical shoppers, advice
seekers, window shoppers)
- set user expectations (interactive query refinement,
- anticipate anomalies
Get Answers
- basic information (multiple sorts, filtering, structuring)
- modify results (user defined parameters for refining,
user profile info, narrow query, broaden query,
disambiguate query)
- suggest alternatives (suggest other comparable products
even from competitor’s sites
Make Decisions
- manipulate results (enable side by side comparison)
- dive deeper (provide additional info, multimedia, other