THE INTERSPACE PROTOTYPE An Analysis Environment for

Download Report

Transcript THE INTERSPACE PROTOTYPE An Analysis Environment for

Technologies of the Interspace
Peer-Peer Semantic Indexing
Bruce Schatz
CANIS Laboratory
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
www.canis.uiuc.edu, [email protected]
Graduate School of Informatics
Kyoto University, November 21, 2001
THE THIRD WAVE OF NET EVOLUTION
CONCEPTS
OBJECTS
PACKETS
SCALABLE SEMANTICS

Automatic indexing
Domain-Independent indexing
Statistical clustering

Compute Context of




concepts within documents
documents within repositories
CROSS-OVERS IN SEMANTIC INDEXING
COMPUTING CONCEPTS
‘92: 4,000 (molecular biology)
‘93: 40,000 (molecular biology)
‘95: 400,000 (electrical engineering)
‘96: 4,000,000 (engineering)
‘98: 40,000,000 (medicine)
1992
1993
1995
1996
1998
SIMULATING A NEW WORLD

Obtain discipline-scale collection



Partition discipline into Community Repositories



4 core terms per abstract for MeSH classification
32K nodes with core terms (classification tree)
Community is all abstracts classified by core term



MEDLINE from NLM, 10M bibliographic abstracts
human classification: Medical Subject Headings
40M abstracts containing 280M concepts
concept spaces took 2 days on NCSA Origin 2000
Simulating World of Medical Communities

10K repositories with > 1K abstracts
(1K w/ > 10K)
COMMUNITY PROCESSING
Existing Technologies

Extracting Concepts (AI)



Canonical noun phrases
Generic statistical parser
Computing Context (IR)


Co-occurrence frequency, in collection
Useful interactively, not strict ordering
CONCEPT NAVIGATION

Semantic Indexes for Community Repositories

Navigating Abstractions within Repository



concept space
category map
Interactive browsing by Community experts
Category Map
Category
Navigation
Concept Navigation
CONCEPT SWITCHING

“Concept” versus “Term”


set of “semantically” equivalent terms
Concept switching

region to region (set to set) match
Semantic region
term
Concept Space
Concept Space
Medicine Session
Categories and Concepts
Concept Switching
Document Retrieval
Future Technologies

Concept Switching


Path Matching


Spreading activation, similarity clusters
Aggregating indexes, many repositories
Dynamic Indexing

On-the-fly collections, during session
Peer-Peer Computations

Local Interaction



Global Merging



Your PC does small computations
e.g. screensaver for SETI
Partition computation into small parts
Each local forms part of global whole
Large-Scale Distribution


3M users of SETI@Home
Public Health. www.intel.com/cure
THE NET OF THE 21st CENTURY





Beyond Objects to Concepts
Beyond Search to Analysis
Problem Solving via Cross-Correlating
Multimedia Information across the Net
Every community has its own special library
Every community does semantic indexing
Zen of Information Retrieval

Searching without Searching



Indexing without Indexing



Navigate concepts into documents
Based on interactive recognition
Compute context on dynamic collections
Based on distributed extraction
Sharing without Sharing


Record paths during user sessions
Based on community practices