2003ICADLinvited.ppt

Download Report

Transcript 2003ICADLinvited.ppt

Case Studies in the US National
Science Digital Library (NSDL):
DL-in-a-box, CITIDEL, OCKHAM
ICADL2003, Dec, 8-11, 2003
Kuala Lumpur, Malaysia
Edward A. Fox
[email protected]
CS / DLRL, Virginia Tech, USA
http://fox.cs.vt.edu http://www.dlib.vt.edu
ACKNOWLEDGEMENTS
• Helpful sponsorship by many organizations, especially Adobe,
AOL, CONACyT, DFG, FIPSE (US Dept. Education), IBM,
Mellon, Microsoft, NSF (IIS-9986089, 0086227, 0080748,
0325579; DUE-0121679, 0136690, 0121741, 0333601),
OCLC, SOLINET, SUN, SURA, UNESCO, VTLS, many
governments (Australia, Brazil, Germany, India, …), …
• Colleagues at Virginia Tech (faculty, staff, students), and
collaborators at many universities
– Boots Cassel, Su-Shing Chen, Debra Dudley, Jeremy
Frumkin, Lee Giles, Martin Halbert, Rex Hartson, JAN Lee,
Kurt Maly, Gail McMillan, Eric Morgan, Manuel Perez, Layne
Watson, …
– Yuxin Chen, Fernando Das Neves, Marcos Goncalves, Rohit
Kelapure, Aaron Krowne, Ming Luo, Paul Mather, Ryan
Richardson, Rao Shen, Hussein Suleman, Wensi Xi, Baoping
Zhang, Qinwei Zhu, …
Outline
•
•
•
•
•
•
•
Context
Digital Libraries for Education (DLE)
National Science Digital Library (NSDL)
OAI, ODL, DL-in-a-box
OCKHAM
CITIDEL (incl. GrapeZone, PIPE)
Conclusions
Information Life Cycle
Authoring
Modifying
Using
Creating
Retention
/ Mining
Organizing
Indexing
Accessing
Filtering
Storing
Retrieving
Distributing
Networking
Digital Libraries in Education
•
•
•
•
•
•
•
Analytical Survey, ed. Leonid Kalinichenko
© 2003, www.iite-unesco.org, [email protected]
Transforming the Way to Learn
DLs of Educational Resources & Services
Integrated/Virtual Learning Environment
Educational Metadata
Current DLEs: US (NSDL, DLESE, CITIDEL,
NDLTD), Europe (Scholnet, Cyclades), UK
(Distributed National Electronic Resource)
Digital Libraries in Education - 2
• Advanced Frameworks & Methodologies
– Instructional course development with learning
module repositories, Learning Object reuse
– Community organization around DLEs
– Other content for science and research
– Cyberinfrastructure, data grids
– Curriculum-based interfaces (see Krowne et al.)
– Concept-based organization of learning
materials and courses (CMs, ontologies)
DLEs: Future Vision (p. 6)
•
•
•
•
•
Global learning environment of the future:
Student-centered
Interactive and dynamic
Enabling group work on real world problems
Enabling students to determine their own
learning routes (styles, personalization)
• Supporting lifelong learning
DLEs: Objectives (p. 11)
• Long-range: lifelong/distance/anytime-anywhere
• Intermediate goals
– Support for students, teachers, parents
– Enhanced student performance
– More students excited about science
– More Internet-based science educational resources
• with increased quality and comprehensiveness,
• easy to discover and retrieve,
• preserved and universally available
DLEs: Guiding Principles (p. 12)
•
•
•
•
•
•
•
•
Driven by educational and science needs
Facilitating educational innovation
Stable, reliable, permanent
Accessible to all
Leveraging prior research: DL, courseware, …
Adaptable to new technologies
Supporting decentralized services
Resource integration thru tools/organization
NSDL Visioning: Learning Environments and
Resources Network for STEM Education
“The network is the library.”
NSDL Tracks
include
CI (Core
Integration)
Research
Collections
Services
include
include
include
CITIDEL
GetSmart
OCKHAM
supports
supports
Concept
Maps
P2P
libraries
Expectations of NSDL
ProgramTracks
• Core Integration: coordinate a distributed alliance of
resource collection and service providers; and ensure
reliable and extensible access to and usability of the
resulting network of learning environments and resources
• Collections: aggregate and actively manage a subset of
the digital library’s content within a coherent theme /
specialty
• Services: increase the impact, reach, efficiency, and
value of the digital library in its fully operational form
• Targeted (Applied) Research: have immediate impact
on one or more of the other three tracks
Collections
•
•
•
•
Discovery of content
Classification and cataloguing
Acquisition and/or linking; referencing
Disciplinary-based themes define a natural body
of content, but other possibilities are also
encouraged
• Access to massive real-time or archived datasets
• Software tool suites for analysis, modeling,
simulation, or visualization
• Reviewed commentary on learning materials and
pedagogy
Services
• Help services, frequently asked questions, etc.
• Synchronous/asynchronous collaborative learning
environments using shared resources
• Mechanisms for building personal annotated
digital information spaces
• Reliability testing for applets or other digital
learning objects
• Audio, image, and video search capability
• Metadata system translation
• Community feedback mechanisms
NSDL Information Architecture
Essentially as developed by the Technical Infrastructure Workgroup
Portals &
Portals &
Clients
Portals &
Clients
Clients
User
Interfaces
Core
NSDL
“Bus”
NSDL
NSDL
NSDL
Collections
Collections
Collections
Collection
Building
referenced
referenced
items&&
Special
items
collections
Databases
collections
Core Services:
CollectionCore
metadata
gathering
Core CollectionBuilding
Services
protocols
Building
Services
harvesting
NSDL
NSDL
Services
Other
NSDL
Services
Services
Usage
Enhancement
Core
Services:
CI Services
information
retrieval
CI Services
browsing
CI
Services
authentication
CI Services
personalization
CI Services
discussion
annotation
OAI, ODL, DL-in-a-box
• Open Archives Initiative
– since 1999, www.openarchives.org
• Open Digital Libraries
– since 2001, from www.dlib.vt.edu
– with Hussein Suleman (now U. Capetown)
• DL-in-a-box
– NSDL support since 2001
– Aimed to help new collections / services projects
– http://dlbox.nudl.org
Open Archives Initiative (OAI)
• Advocacy for interoperability
• Standard for transferring metadata among
digital libraries
– Protocol for Metadata Harvesting (PMH)
• Simplicity
• Generality
• Extensibility
• Support for PMH => Open Archive (OA)
OAI = Technical Umbrella for
Practical Interoperability…
Reference
Libraries
Museums
Publishers
E-Print
Archives
…that can be exploited by different communities
OAI – Repository Perspective
Required: Protocol
MDO
MDO
MDO
MDO
MDO
MDO
MDO
MDO
DO
DO
DO
DO
OAI – Black Box Perspective
OA 7
OA 4
OA 2
OA 1
OA 3
OA 6
OA 5
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
The World According to OAI
Service Providers
Discovery
Current
Awareness
Data Providers
Preservation
Document
Document
Document
1010100101
?
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Video
Video
Video
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
users
Program
Program
Program
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Image
Image
Image
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
digital objects
Monolithic
and/or
Custom-built
web-based
application
?
digital library
Document
Document
Document
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
Program
Program
Program
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
Video
Video
Video
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
Image
Image
Image
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
componentized digital library
?
Document
Document
Document
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
Program
Program
Program
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
Image
Image
Image
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
Video
Video
Video
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
XPMH
Document
Document
Document
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
OA
OA
XPMH
XPMH
OA
OA
XPMH
XPMH
XPMH
XPMH
OA
PMH
XPMH
OA
XPMH
OA
OA
XPMH
OA
XPMH
open digital library
PMH
Program
Program
Program
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
Image
Image
Image
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
Video
Video
Video
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
Protocol for
Metadata
Harvesting
Extended OAI-PMH
Open Digital Library Protocol
OPEN
ARCHIVE
Extended OPEN ARCHIVE
Open Digital Library Component
Open Digital Library Deployments
• NDLTD (www.ndltd.org)
• Computer Science Teaching Center
(www.cstc.org)
• Computing and Information Technology
Interactive Digital Educational Library
(www.citidel.org)
• Open Archives Distributed (NSF, DFG) –
enhancements to PhysNet
• OCKHAM
• Open to others through DL-in-a-box
Open Digital Library
• Network of Extended Open Archives where
each node acts as either a provider of data,
services or both.
• Component = Node
• Protocol = Arc
Open Digital Library Components
• Running now
– XML-File (data provider from file system)
– Search: simple or in-memory (Essex) or generalized
– Union, browse, recent, filter
– E-journal/review, Submit, Edit, Annotation
– Recommender, Rating; Mirroring (see JCDL’02)
– Working with NCSA: from DB, unstructured text
• Others in process
– Classification/categorization
– Registry (and other connections with web services)
Example Open Digital Library
Document
Document
ETD-1
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
ODLRecent
Recent
USER INTERFACE
ODLUnion
PMH
Filter
PMH
ODLUnion
Browse
Union
PMH
ODLBrowse
ODLUnion
PMH
Filter
PMH
Search
ODLSearch
Students and
researchers
ETD DL for the Networked Digital
Library of Theses and Dissertations
(www.ndltd.org)
Program
Program
ETD-2
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
Image
Image
ETD-3
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
Video
Video
ETD-4
101010010
101010010
101001010
101010010
101001010
101001010
101001010
101001010
101010101
101001010
101010101
0101
101010101
0101
0101
ETD collections
Open Digital Library: Extended
As What’s
New Service
Provider
What’s
New
Engine
XML File
Coll. & Data
Provider 1
XML File
Coll. & Data
Provider 2
XML File
Coll. & Data
Provider 3
As Metadata
Search Service
Provider
IRDB-1
Search
Engine
As Metadata
Browse Service
Provider
DBBrowse
Browse
Engine
As Recommend
& Rate Service
Provider
Recommend
Rate
Engine
DBUnion Archive Merger Component
Harvest from
As Annotation
Search Service
Provider
Annotation
Engine
data providers
Filter
OAI-PMH
Data Provider
Submit
Archive
IRDB-2
Search
Engine
OAIB (NCSA:
from RDBMS)
New ODL Component:
Generalized
Search Platform
CS6604 Client: Patrick Fan, Wensi Xi
Group Member: Ming Luo, Rui Yang, Xiaoyan Yu
Introduction
• Background
– The importance of search service in a digital
library
– Problems of search engines in DLRL
IRDB
ESSEX
MARIAN
Low search effectiveness,
insufficient parsing component
Less scalability due to inmemory Index
Low search efficiency
Algorithms
• Phrase Searching Algorithms
– Adjacency of terms
• Ranking Functions
– Okapi (baseline)
– GP-based ranking function
Genetic Programming (GP)
• A problem solving system designed based on
principles of evolution and heredity
Input
Order
1
2
3
4
5
6
7
Doc.
A
B
C
D
E
F
G
Feedback
Training
Data
Output
Rele.
1
0
0
1
0
1
1
Ranking
Function
Discovery
Ranking
Function f
Order
1
2
3
4
5
6
7
Doc.
A
D
F
G
B
C
E
Rele.
1
1
1
1
0
0
0
An Example of GP-based RF
tf
Query term frequency in the document ( vector )
tf_query
Query term frequency in the query ( vector )
tf_max
The maximum term frequency in a document ( scalar )
Length
Document length in the number of words ( scalar )
Length_avg
N
Average document length in the number of words
( scalar )
Number of documents in the collection ( scalar )
tf_avg
Average term frequency in the current document (scalar)
tf_avg_Col
Average term frequency for all the documents in the
collection ( scalar )
df_max_Col
Maximum document frequency for a word in the
collection ( scalar )
df
Document frequency for the query words ( vector )
(log (+ (* df
(log (log (* (* (/ n df)
(* (* (/ n df)
(* (* df_max_Col tf)
(+ df_max_Col
tf_avg)))
(* (/ tf tf_max)
(log tf_avg_Col))))
(* (/ (* (* (/ n df)
(* (* df_max_Col
tf)
(+ df_max_Col
tf_avg)))
(* (/ tf tf_max)
(log
tf_avg_Col)))
(+ (* length df)
tf_avg_Col))
(log tf_avg_Col))))))
(+ (* (* df_max_Col tf)
(/ (* (* (/ (/ (* tf 6.720)
(/ df N))
(* df_max_Col tf))
(* (* tf N)
(+ df_max_Col tf_avg)))
(* (/ tf tf_max)
(log tf_avg_Col)))
(+ (* length df)
(* (* (/ tf tf_max)
Parser
• Flexibility
– TREC Style SGML/HTML
– Configurable tagging
• Abbreviation and number detection
• Case sensitive
• Phrase parsing
Interface –(I)
1
Servlet
J
D
B
C
6
Socket
2
3
Search Engine
1. Receive user query
4
Database
5
2. Send query to search engine
3. Get ranked list
4. Search database
5. Get document information
6. Return results to user
Interface –(II)
1. Receive user query
1
4
Perl Adaptor
OAI data
5 provider
6
Socket
2
3
Search Engine
As an ODL component
thru ODL’s XOAI
searching protocol
2. Send query to search
engine
3. Get ranked list
4. Request metadata
5. Get metadata
6. Return results in
format complying
with ODL’s
searching protocol
OCKHAM Initiative, Contact Info
•
•
•
•
Supported by DL Federation, Mellon, NSF, …
P2P University Network involving:
Emory, Notre Dame, U. Arizona, Virginia Tech, …
PI: Martin Halbert
Phone 404-727-2204
Email: [email protected]
• OCKHAM URL:
http://ockham.library.emory.edu
The Problem
• Digital library development is complex and
expensive.
• Various DL development communities (in the
USA at least) are not working together well.
• Results exhibit much incompatibility, little
common practice, slow progress, and no
leverage on investment.
• If this continues, we are just going to languish
and fester.
Lightweight Protocols
• “Lightweight”, or relatively small and
simple protocols seem to have clear
advantages over “Full” protocols that
attempt to be comprehensive.
• Successes of protocols considered
lightweight is illuminating.
• Examples: TCP/IP, HTTP, LDAP, and the
OAI PMH
Reference Models
• Reference Model: a common vocabulary
and description of components, services,
and inter-relationships that comprise a
system under consideration
• Useful as a tool to foster consensus and
common understanding in a time of rapid
change and/or disagreement
• Explored in CS6604 class project with 2
focus groups: librarians, education experts
Current Focus: Peer-to-Peer (P2P)
Lightweight (Protocol) Reference Models
• Builds on successful example of the OAI
PMH, clearly understood minimalist concept
of metadata distribution, implemented in
simple protocols (e.g., ODL)
• Leads to developing simple reference
models of specific subsystems, with
associated simple protocols and standards
• Testing in NSDL, connecting university
libraries to support teaching & learning
OCKHAM Proposed Services
•
•
•
•
•
•
•
•
Alerting
Browsing
Cataloging
Conversion
OAI – Z39.50
Pathfinding
Registry – prototype in CS6604 now
(plus others such as from adapted ODL)
Computing and Information Technology
Interactive Digital Educational Library
Assessment
Lillian Cassel
Technical Development
Content Collection
Edward Fox (director)
Community
Development
John Impagliazzo
John A. N. Lee
Manuel Pérez-Quiñones
CSTC
Search Engines
Deborah Knox
C. Lee Giles
http://www.citidel.org/
CITIDEL -> NSDL
• CITIDEL is a collection project in the:
• US National STEM (science, technolgy,
engineering, and mathematics) education
Digital Library –
• NSDL (www.nsdl.org)
Multi-dimensional Categorization
Quality
Peer reviewed
Editor reviewed
Nominated
Identified by crawl
Algorithms
Java
English
Multimedia
Spanish
Language
Topic
CITIDEL: Computing & Information Technology
Interactive Digital Education Library
CITIDEL Technology Features
•Component architecture (Open Digital Library)
•Re-use and compose re-deployable digital library components.
•Built Using Open Standards & Technologies
•OAI: Used to collect DL Resources and DL Interoperability
•XSL and XML: Interface rendering with multi-lingual community
based translation of screens and content (Spanish, …)
•Perl: Component Integration
•ESSEX: Search Engine Functionality
•Very fast, utilizing in-memory processing
•Includes snap-shots for persistence
•Multi-scheming
•Integrates multiple classifications / views through maps, closure
Programming Team DL Project
Logan Hanks, [email protected]
Mike Scarborough, [email protected]
Stafford Fuller, [email protected]
Problem Description:
• VT has multiple programming teams, and has sent a team to the
ACM world finals every year for the past decade.
• Each week during the semester, the teams practice using a
problem set from a past regional or international contest.
• Each practice generates multiple solutions for each problem.
• What is needed is a digital library to collect these solutions and
serve as a reference.
Programming Team DL Project
Requirements:
• Importing and classifying problem statements and solutions.
• Solutions should be classified based on what algorithms and
methods they use and what problems they solve.
• Interface for browsing problem statements and their solutions.
• A search engine for finding problem statements or solutions
based on their classifications.
Deliverables:
• Problem statement and solution importer/archiver.
• Classification framework for problems and solutions.
• Search engine for the DL to locate problems and solutions by
their relevance to a set of classes given as input.
• Web interface for browsing problems and solutions as well as
accessing all of the above deliverables.
• Integration with CITIDEL.
Browsing and Searching with Filters
Users are placed in chosen sub-communities. They can filter results
based on these sub-communities. Also there is further customization.
Alternatively, users may view all results. Users may set up multiple
filters for simple or complex filtering based on many factors such as
education level, role, resource type, language, source, and much
more. This allows users to get exactly what they are or are not
looking for in the digital library. At any time, users are free to disable
these filters or see results excluded by them.
Searching
CITIDEL searching, which is driven by the ESSEX search engine for
relevance computation (fast, in-RAM processing with checkpoints),
also provides a list of relevant categories within the classification
schemes.
Enjoy in GrapeZone
• Derived from Carrot2 project
(http://www.cs.put.poznan.pl/dweiss/carrot/index.php/ind
ex.xml?lang=en)
• Online Grape
Cluster search results from CITIDEL
• Offline Grape
Cluster a static collection
Cluster search results from CITIDEL
Cluster a member collection (from a
content source) in CITIDEL
• The Computer Science Teaching Center
(CSTC)
• NDLTD-Computing
• ACM Digital Library
• …
Cluster CSTC
Cluster NDLTD-Computing
Cluster ACM
MOCA Algorithm
PIPE: Personalization by
Partial Evaluation
• Interactions at existing web sites are
predefined by the site designer
• Personalization is achieved by the designer’s
anticipation of users’ expectations
• PIPE allows automatic personalization of a
web site without designer anticipation
– Recognized with the 2001 New Century Technology
Council Innovation award
CITIDEL + PIPE
• Adds Interaction
Personalization to
CITIDEL
•Automatically handles
multi-modal conversion
to Cell phone, PDA,
Etc.
•Can be adopted to any
digital data set, only
requires XML file of
content with hierarchy
maintained.
PIPE provides
Mixed-Initiative Interaction
• Involves an extra specification window
(e.g., a toolbar)
• system-initiated + user-initiated modes of
interaction
Traditional browser: the user merely
clicks on available hyperlinks.
PIPE window: the user can type in
any information out-of-turn
Can also mix-n-match
Features of PIPE
• Applicable to many information system
technologies
• web sites (even third-party)
•Digital Libraries (currently working on CITIDEL
integration)
• voice-activated systems (e.g., pizza ordering, movie
information, and flight reservation services)
• PIPE is available for licensing and is ready for
commercialization, through VTIP
• PIPE has been featured in IEEE Internet
Computing, IEEE IT Professional, and the Appian
Web Personalization Report.
PIPE system architecture
Conclusions
• UNESCO analytical survey: DLE in every nation
• NSDL as an example,; case studies inside it
• OAI -> ODL -> DL-in-a-box -> OCKHAM as
framework for collaboration on services
• CITIDEL to highlight NSDL collection efforts
– Many sources for computing resources
– Software deployed from above efforts, refined, and
then the results made available for reuse
– Even class projects can lead to useful DL
components!