THE INTERSPACE PROTOTYPE An Analysis Environment for
Download
Report
Transcript THE INTERSPACE PROTOTYPE An Analysis Environment for
Technologies of the Interspace
Peer-Peer Semantic Indexing
Bruce Schatz
CANIS Laboratory
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
www.canis.uiuc.edu, [email protected]
Graduate School of Informatics
Kyoto University, November 21, 2001
THE THIRD WAVE OF NET EVOLUTION
CONCEPTS
OBJECTS
PACKETS
SCALABLE SEMANTICS
Automatic indexing
Domain-Independent indexing
Statistical clustering
Compute Context of
concepts within documents
documents within repositories
CROSS-OVERS IN SEMANTIC INDEXING
COMPUTING CONCEPTS
‘92: 4,000 (molecular biology)
‘93: 40,000 (molecular biology)
‘95: 400,000 (electrical engineering)
‘96: 4,000,000 (engineering)
‘98: 40,000,000 (medicine)
1992
1993
1995
1996
1998
SIMULATING A NEW WORLD
Obtain discipline-scale collection
Partition discipline into Community Repositories
4 core terms per abstract for MeSH classification
32K nodes with core terms (classification tree)
Community is all abstracts classified by core term
MEDLINE from NLM, 10M bibliographic abstracts
human classification: Medical Subject Headings
40M abstracts containing 280M concepts
concept spaces took 2 days on NCSA Origin 2000
Simulating World of Medical Communities
10K repositories with > 1K abstracts
(1K w/ > 10K)
COMMUNITY PROCESSING
Existing Technologies
Extracting Concepts (AI)
Canonical noun phrases
Generic statistical parser
Computing Context (IR)
Co-occurrence frequency, in collection
Useful interactively, not strict ordering
CONCEPT NAVIGATION
Semantic Indexes for Community Repositories
Navigating Abstractions within Repository
concept space
category map
Interactive browsing by Community experts
Category Map
Category
Navigation
Concept Navigation
CONCEPT SWITCHING
“Concept” versus “Term”
set of “semantically” equivalent terms
Concept switching
region to region (set to set) match
Semantic region
term
Concept Space
Concept Space
Medicine Session
Categories and Concepts
Concept Switching
Document Retrieval
Future Technologies
Concept Switching
Path Matching
Spreading activation, similarity clusters
Aggregating indexes, many repositories
Dynamic Indexing
On-the-fly collections, during session
Peer-Peer Computations
Local Interaction
Global Merging
Your PC does small computations
e.g. screensaver for SETI
Partition computation into small parts
Each local forms part of global whole
Large-Scale Distribution
3M users of SETI@Home
Public Health. www.intel.com/cure
THE NET OF THE 21st CENTURY
Beyond Objects to Concepts
Beyond Search to Analysis
Problem Solving via Cross-Correlating
Multimedia Information across the Net
Every community has its own special library
Every community does semantic indexing
Zen of Information Retrieval
Searching without Searching
Indexing without Indexing
Navigate concepts into documents
Based on interactive recognition
Compute context on dynamic collections
Based on distributed extraction
Sharing without Sharing
Record paths during user sessions
Based on community practices