20060823USP.ppt

Download Report

Transcript 20060823USP.ppt

Digital Libraries:
1991-2006 and beyond,
with
Electronic Theses and Dissertations
University of Sao Paulo, Brazil
23 August 2006
Edward A. Fox, [email protected], http://fox.cs.vt.edu
Professor, Department of Computer Science
Director, Digital Library Research Laboratory
Virginia Tech, Blacksburg, VA 26061 USA
Acknowledgements
•
•
•
•
•
Students
Faculty, Staff
Collaborators
Support
Mentors
USP, Brazil, August 2006
2
Acknowledgements: Students
• Pavel Calado, Yuxin Chen, Fernando Das Neves,
Shahrooz Feizabadi, Robert France, Marcos
Gonçalves, Doug Gorton, Nithiwat Kampanya, Rohit
Kelapure, S.H. Kim, Neil Kipp, Aaron Krowne, Bing
Liu, Ming Luo, Paul Mather, Uma Murthy, Fernando
Das Neves, Unni. Ravindranathan, Ryan Richardson,
Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo
Torres, Srinivas Vemuri, Wensi Xi, Seungwon Yang,
Xiaoyan Yu, Baoping Zhang, Qinwei Zhu, …
USP, Brazil, August 2006
3
Acknowledgements: Faculty, Staff
• Lillian Cassel, Debra Dudley, Roger Ehrich,
Joanne Eustis, Weiguo Fan, James Flanagan, C.
Lee Giles, Eberhard Hilf, John Impagliazzo, Filip
Jagodzinski, Douglas Knight, Deborah Knox,
Alberto Laender, Gail McMillan, Claudia
Medeiros, Manuel Perez-Quinones, Naren
Ramakrishnan, Layne Watson, …
USP, Brazil, August 2006
4
Other Collaborators (Selected)
•
•
•
•
•
•
•
•
•
•
Brazil: FUA, UFMG, UNICAMP, USP
Case Western Reserve University
Emory, Notre Dame, Oregon State
Germany: Univ. Oldenburg
Mexico: UDLA (Puebla), Monterrey
College of NJ, Hofstra, Penn State, Villanova
University of Arizona
University of Florida, Univ. of Illinois
University of Virginia
VTLS (slides, services for NDLTD)
USP, Brazil, August 2006
5
Acknowledgements: Support
• Course: UNESCO, CETREDE, IFLA-LAC,
AUGM, CLEI, UFC
• Sponsors: ACM, Adobe, AOL, CAPES, CNI,
CONACyT, DFG, IBM, Microsoft, NASA,
NDLTD, NLM, NSF (IIS-9986089, 0086227,
0080748, 0325579; ITR-0325579; DUE-0121679,
0136690, 0121741, 0333601), OCLC, SOLINET,
SUN, SURA, UNESCO, US Dept. Ed. (FIPSE),
VTLS
USP, Brazil, August 2006
Acknowledgements - Mentors
• JCR Licklider – undergrad advisor (1969-71)
– Author in 1965 of “Libraries of the Future”
– Before, at ARPA, funded start of Internet
• Michael Kessler – BS thesis advisor
– Project TIP (technical information project)
– Defined bibliographic coupling
• Gerard Salton – graduate advisor (1978-83)
– “Father of Information Retrieval”
USP, Brazil, August 2006
7
Overview
• Digital Libraries: Sources, Chatham, Rome, …
• Curriculum Development Project: 5S
• NDLTD (Networked Digital Library of Theses
and Dissertations)
• Conclusions
• Challenges
USP, Brazil, August 2006
8
Libraries of the Future
JCR Licklider, 1965, MIT Press
World
Nation
State
City
Community
USP, Brazil, August 2006
9
Communications
(bandwidth, connectivity)
Locating Digital Libraries in Computing and
Communications Technology Space
Digital Libraries
technology
trajectory: intellectual
access to globally
distributed information
Computing (flops)
Digital content
less
more
Note: we should consider 4 dimensions:
computing, communications,
content, and community (people)
Information Life Cycle
Authoring
Modifying
Using
Creating
Retention
/ Mining
Organizing
Indexing
Accessing
Filtering
Storing
Retrieving
Distributing
Networking
Sources For More Information
• Magazine: www.dlib.org
• Books: http://fox.cs.vt.edu/DLSB.html (1994, covering since 1991)
– MIT Press: Arms, plus by Borgman, Licklider (1965)
– Morgan Kaufmann: Witten... (several), Lesk (2nd edition)
• Conferences
– ECDL: www.ecdl2005.org
– ICADL: www.icadl.org
– JCDL: www.jcdl2006.org
• Associations
– ASIS&T DL SIG
– IEEE TCDL: www.ieee-tcdl.org (student awards, doctoral consortia)
• NSF: www.dli2.nsf.gov
• Labs: VT: www.dlib.vt.edu
USP, Brazil, August 2006
12
DL Terminology
• Digital / electronic / virtual library
• Born digital, hybrid (digital/physical)
• Universal access (all people/places/times)
– Accommodate disabilities (color, visual, auditory)
– Mobile (office, home, laptop, PDA, mobile)
• Archiving, self-archiving
• Open (source, standards, archives)
USP, Brazil, August 2006
13
Digital Libraries
Shorten the Chain from
Editor
Reviewer
Publisher
A&I
Consolidator
Library
DLs Shorten the Chain to
Author
Teacher
Digital
Reader
Editor
Reviewer
Learner
Librarian
Library
R
e
a
g
a
n
M
o
o
r
e
E
d
F
o
x
Application
Domain
Related Institutions
Examples
Technical Challenges
Benefit / Impact
Publishing
Publishers, Eprint archives
OAI
Quality control, openness
Aggregation, organization
Education
Schools, colleges,
universities
NSDL, NCSTRL
Knowledge management, reuseability
Access to data
Art, Culture
Museum
AMICO, PRDLA
Digitization, describing, cataloging
Global understanding
Science
Government, Academia,
Commerce
NVO, PDG, SwissProt,
UK eScience,European
Union Commission
Data models
reproducibility, faster reuse, faster advance
(e)
Government
Government Agencies (all
levels)
Census
Intellectual property rights, privacy,
multi-national
Accountability, homeland security
(e)
Commerce,
(e) Industry
Legal institutions
Court cases, patents
Developing standards
Standardization, economic development
History,
Heritage
Foundations
American Memory
Content, context, interpretation
Long term view, perspective, documentation,
recording, facilitating, interpretation,
understanding
Cross-cutting
Library,
Archive
Web, personal
collections
Multi-language, preservation,
scalability, interoperability, dynamic
behavior, workflow, sustainability,
ontologies, distributed data,
infrastructure
Reduced cost, increased access,
pereservation, democratization, leveling,
peace, competitiveness
J
u
n
e
2
0
0
2
f
o
r
N
S
F
Motivation for Theory, Curriculum
• Digital Libraries (DLs): what are they??
– No definitional consensus
– Conflicting views
– Makes interoperability a hard problem
• DLs are not benefiting from formal theories as are other CS
fields: DB, IR, PL, etc.
• DL construction: difficult, ad-hoc, lack of support for
tailoring/customization
• Conceptual modeling, requirements analysis, and methodological
approaches are rarely supported in DL development.
– Lack of specific DL models, formalisms, languages
USP, Brazil, August 2006
17
DL Definitions - 1
• “A digital library is an organized and focused
collection of digital objects, including text, images,
video, and audio, along with methods of access and
retrieval, and for selection, creation, organization,
maintenance, and sharing of the collection.”
• Witten & Bainbridge – “How to Build a Digital
Library” – Morgan Kaufmann 2003
USP, Brazil, August 2006
18
DL Definitions - 2
• “Digital libraries are organizations that provide the
resources, including the specialized staff, to select,
structure, offer intellectual access to, interpret,
distribute, preserve the integrity of, and ensure the
persistence over time of collections of digital works so
that they are readily and economically available for
use by a defined community or set of communities”
• Waters,D.J. CLIR Issues, July/August 1998
• www.clir.org/pubs/issues/issues04.html
USP, Brazil, August 2006
19
DL Definitions - 3
• Issues and Spectra
– Collection vs. Institution
– Content vs. System
– Access vs. Preservation
– “Free” vs. Quality
– Managed vs. Comprehensive
– Centralized vs. Distributed
USP, Brazil, August 2006
20
DL Definitions - 4
• NOT a “digitized library”
• NOT a “deconstruction” of existing systems and
institutions, moving them to an electronic box in a
Library
• IS a new way to deal with knowledge
– Authoring, Self-archiving, Collecting,
– Organizing, Preserving,
– Accessing, Propagating, Re-using
USP, Brazil, August 2006
21
Digital Library Content
Content
Types
Text
Documents
Video
Audio
Geographic
Information
Software,
Programs
Bio
Information
Images and
Graphics
Articles,
Reports,
Books
Speech,
Music
(Aerial)
Photos
Models
Simulations
Genome
Human,
animal,
plant
2D, 3D,
VR,
CAT
USP, Brazil, August 2006
22
USP, Brazil, August 2006
23
USP, Brazil, August 2006
24
People
•
•
•
•
•
•
•
Digital librarians
DL system developers
DL system administrators
DL managers
DL collection development staff
DL evaluators
DL users
USP, Brazil, August 2006
25
DL Manifesto - 1
• DL Reference Model
• In support of the future European Digital Library
• Developed by team connected with DELOS
(Candela, Casteli, Ioannidis, Koutrica, Meghini,
Pagano, Ross, Schek, Schuldt)
• Draft 2.2 presented in Frescati, near Rome, June
2006 – 79 pages
• Could be integrated with work of DLF, JISC, etc.
USP, Brazil, August 2006
26
DL Manifesto – 2: 3 Tiers
USP, Brazil, August 2006
27
DL Manifesto – 3: Main Concepts
USP, Brazil, August 2006
28
DL Manifesto – 4: Actor Roles
USP, Brazil, August 2006
29
Curriculum Development Project
•
•
Collaborative Research launched by:
– Department of Computer Science, Virginia Tech
– School of Information and Library Science,
University of North Carolina, Chapel Hill
Three year (2006 - 2008) funded project
USP, Brazil, August 2006
30
Project Teams/NSF Grant
• Project Team at VT (IIS-0535057):
– PI: Dr. Edward A. Fox ([email protected])
– GRA: Seungwon Yang ([email protected])
• Project Team at UNC-CH (IIS-0535060):
– Co-PI: Dr. Barbara Wildemuth
([email protected])
– Co-PI: Dr. Jeffrey Pomerantz
([email protected])
– GRA: Sanghee Oh ([email protected])
USP, Brazil, August 2006
31
Project Links
• Homepage
http://curric.dlib.vt.edu/DLcurric.html
- Overview, proposal, progress diary, news &
interviews, contact information
• Wiki
http://curric.dlib.vt.edu/wiki
- Resources will be added here
- Coming soon 
USP, Brazil, August 2006
32
What We Do:
• Identify, develop and test educational DL
modules, guided by
- Experts and international collaborators
- Computing Curriculum 2001
- 5S framework
- Analysis of DL course syllabi
- Development of module template
USP, Brazil, August 2006
33
Taxonomy of DL Educational Resources
Module Template (Draft)
1.
2.
3.
4.
5.
6.
7.
8.
Module name
Learning objectives
Level of effort required
(in hours, for students,
teachers)
Prereq knowledge
required
Remedial materials
5S characterization
Relationships with other
modules and module
topics
Resources (books, …)
9.
Body of knowledge
• Topical outline
• with resources in
context
• Theory and practice
• Learning activities
• Presentation materials
10. Concept maps
11. Exercises/learning
activities
12. Evaluation of learning
outcomes
13. Glossary
14. Useful links
CC2001 Information Management Areas
IM1. Information models and
systems*
IM8. Distributed DBs
IM2. Database systems*
IM9. Physical DB design
IM3. Data modeling*
IM10. Data mining
IM4. Relational DBs
IM11. Information storage and
retrieval
IM5. Database query languages
IM12. Hypertext and hypermedia
IM6. Relational DB design
IM13. Multimedia information &
systems
IM7. Transaction processing
IM14. Digital libraries
How to organize a DL course?
• Various frameworks
–
–
–
–
–
What, Why, How
History, Current status, Future (research)
Economics: open source, sustainability
Social: users/patrons, management
Technical: HCI, HT, IR, LIS, Web
• So, we should see what is discussed
• And, we should generalize, so we have a stable
framework that is intuitive and formally based
USP, Brazil, August 2006
37
5S Framework
• Developed at Digital Library Research Laboratory (DLRL,
Virginia Tech )
• Strong foundation for DL module development
– Intuitive as well as formal definitions
• Base ideas named with five S’s
- streams, structures, spaces, scenarios, societies
• Key aspects of DLs precisely defined using one or more of the Ss
• Set of metamodels for classes of DLs: minimal, archeological
(ETANA), practical, European DL, …
USP, Brazil, August 2006
38
Informal 5S & DL Definitions
DLs are complex systems that
•
•
•
•
•
help satisfy info needs of users (societies)
provide info services (scenarios)
organize info in usable ways (structures)
present info in usable ways (spaces)
communicate info with users (streams)
USP, Brazil, August 2006
39
5S Examples
USP, Brazil, August 2006
40
5S Hypotheses
• A formal theory for DLs can be built based
on 5S.
• The formalization can serve as a basis for
modeling and building high-quality DLs.
USP, Brazil, August 2006
41
5S and DL formal definitions and compositions (April 2004 TOIS)
relation (d. 1)
sequence graph (d. 6)
(d. 3)
measurable(d.12), measure(d.13), probability (d.14),
language (d.5)
vector (d.15), topological (d.16) spaces
sequence
tuple (d. 4)*
(d.
3)
function
state (d. 18)
event (d.10)
(d. 2)
5S
grammar (d. 7)
streams (d.9)
structures (d.10) spaces (d.18) scenarios (d.21) societies
(d. 24)
services (d.22)
structured
stream (d.29)
digital
object
(d.30)
structural
metadata
specification
(d.25)
descriptive
metadata
specification
(d.26)
metadata catalog
transmission collection (d. 31)
(d.32)
(d.23)
repository
(d. 33)
(d.34)indexing
service
hypertext
(d.36)
browsing
service
(d.37)
digital
library
(minimal) (d. 38)
searching
service (d.35)
A Minimal DL in the 5S Framework
Streams
Structured
Stream
Structures
Spaces
Structural
Metadata
Specification
Scenarios
Societies
services
Descriptive
Metadata
Specification
indexing
browsing searching
hypertext
Digital Object
Collection
Metadata Catalog
Repository
Minimal DL
Services Taxonomy
Infrastructure Services
Repository-Building
Creational
Preservational
Acquiring
Cataloging
Crawling (focused)
Describing
Digitizing
Federating
Harvesting
Purchasing
Submitting
Conserving
Converting
Copying/Replicating
Emulating
Renewing
Translating (format)
Add
Value
Annotating
Classifying
Clustering
Evaluating
Extracting
Indexing
Measuring
Publicizing
Rating
Reviewing (peer)
Surveying
Translating
(language)
USP, Brazil, August 2006
Information
Satisfaction
Services
Browsing
Collaborating
Customizing
Filtering
Providing access
Recommending
Requesting
Searching
Visualizing
44
Requirements (1)
Analysis (2)
5S
Meta
Model
DL
Expert
5SGraph
DL
Designer
Practitioner
5SL
DL
Model
component
pool
ODLSearch,
ODLBrowse,
ODLRate,
ODLReview,
…….
Teacher
Design (3)
Researcher
5SLGen
Tailored
DL
Services
Implementation (4)
5SSuite
5SGraph
5SGen
Mapping Tool
USP, Brazil, August 2006
46
USP, Brazil, August 2006
47
DL Topics in 19 Modules (original)
USP, Brazil, August 2006
48
Module Revision 3/27/06
STREAM
1.
Collection Development
–
–
–
2.
Digitization
Document and E-publishing Markup
Harvesting
Digital objects/Composites/Packages
–
–
Text Resources
Multimedia streams/structures, Captures/representation, Compression/coding
•
•
Content-based analysis, Multimedia indexing
Multimedia presentation rendering
STRUCTURE
3.
Metadata, Cataloging, Author submission
–
–
4.
Thesauri, Ontologies, Classification, Categorization
Bibliographic information, Bibliometrics, Citations
Architecture (agents, buses, wrappers/mediators), Interoperability
USP, Brazil, August 2006
49
Module Revision 3/27/06
SPACE
5. Spaces (conceptual, geographic, 2/3D, VR)
–
–
Storage
Repositories, Archives
SENARIOS
6. Services (searching, linking, browsing, etc.)
–
–
–
–
–
Info needs, Relevance, Evaluation, Effectiveness
Search & search strategy, Info seeking behavior, User modeling, Feedback
Routing, Filtering, Community filtering
Sharing, Networking, Interchange
Info summarization, Visualization
SOCIETIES
7. Intellectual property rights management, Privacy, Protection (watermarking) (ILS)
8. Social issues / Future DLs
9. Archiving and preservation integrity (ILS)
USP, Brazil, August 2006
50
Ascertaining Priority Topics
• We’ve manually classified and analyzed publications
using 9-Modules (revised):
Source
Count
Proceedings
JCDL ’01 – ’05
354
Proceedings
ACM DL ’96 – ’00
189
Magazine articles
D-Lib ’95 – ‘06
521
Session titles
JCDL, ACM DL, ECDL
264
USP, Brazil, August 2006
51
Distribution of Conference Papers
across Module Topics
200
JCDL 05
180
JCDL 04
JCDL 03
JCDL 02
160
JCDL 01
ACM DL 00
Number of conference papers
140
ACM DL 99
ACM DL 98
ACM DL 97
120
ACM DL 96
100
80
60
40
20
0
1
2
3
4
5
Module ID
6
7
8
9
Distribution of D-Lib Magazine Articles
across Module Topics
200
D-Lib 06
180
D-Lib 05
D-Lib 04
D-Lib 03
160
D-Lib 02
D-Lib 01
Number of D-Lib articles
140
D-Lib 00
D-Lib 99
D-Lib 98
120
D-Lib 97
D-Lib 96
100
D-Lib 95
80
60
40
20
0
1
2
3
4
5
USP,
Brazil, August
2006
Module ID
6
7
8
9
53
Distribution of Session Titles
across Module Topics
35
JCDL & ACM DL
30
ECDL
ICADL
Number of panel sessions
25
20
15
10
5
0
1
2
3
4
5
USP, Brazil,Module
AugustID2006
6
7
8
9
54
Textbook on DLs
• PI Fox, along with co-author Gonçalves, is
preparing a textbook on DLs based on 5S
• This work will rely on the 5S framework to ensure
that it provides integrated coverage of the many
concepts related to DLs
• Fox and Gonçalves are focused on a book for
teaching as well as reference
USP, Brazil, August 2006
55
Textbook Outline
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
–
–
–
–
–
Ch. 2: Streams
Ch. 3: Structures
Ch. 4: Spaces
Ch. 5: Scenarios
Ch. 6: Societies
USP, Brazil, August 2006
56
Chapter 2 Overview
• Multiple media types and representation
– See ch. 4 for IR (except some here for non-text)
– Standards for each, and for some combinations
• Text
– Character strings, encoding (Unicode)
– Morphology -> Stemming
– Syntax, semantics -> stop words
• Images, Audio, Video, Graphics, Animation
– Capture, digitization, representation
– CBIR for each
USP, Brazil, August 2006
57
Ch 3: Structures: Degrees of
Web
DLs
DBs
Chaotic
Organized
Structured
USP, Brazil, August 2006
58
Digital Objects (DOs)
• Born digital
• Digitized version of “real” object
– Is the DO version the same, better, or worse?
– Decision for ETDs: structured + rendered
• Surrogate for “real” object
– Not covered explicitly in metamodel for a minimal DL
– Crucial in metamodel for archaeology DL
USP, Brazil, August 2006
59
Metadata Objects (MDOs)
•
•
•
•
•
•
•
•
MARC
Dublin Core
RDF
IMS
OAI (Open Archives Initiative)
Crosswalks, mappings
Ontologies
Topics maps, concept maps
USP, Brazil, August 2006
60
Complex to Simple
+
thesis
MARC ($50)
USP, Brazil, August 2006
Dublin Core (DC)
61
Also : Epub, SGML, XML
•
•
•
•
•
•
•
•
5S perspective: streams, structures, scenarios
Authoring
Rendering, presenting
Tagging, Markup, DOM
Semi-structured information
Dual-publishing, eBooks
Styles (XSL, XSLT)
Structured queries
USP, Brazil, August 2006
62
Textbook Outline
• Part 2 – Higher DL Constructs
–
–
–
–
–
–
Ch. 7: Collections
Ch. 8: Catalogs
Ch. 9: Repositories and Archives
Ch. 10: Services
Ch. 11: Systems
Ch. 12: Case Studies
USP, Brazil, August 2006
63
What is Fedora™?
Flexible Extensible Digital Object
Repository Architecture
• Slides courtesy Vinod Chachra of VTLS
USP, Brazil, August 2006
64
Fedora™ Digital Object Architecture
Persistent ID (PID)
Disseminators
Globally unique persistent id
Public view: access methods
for obtaining “disseminations”
of digital object content
Internal view: metadata
necessary to manage the object
System Metadata
Datastreams
EAD, TEI, DC, MARC,
VRA Core, MIX, etc.
Images, E-books, E-journals,
Music, Video, etc.
Protected view: content
that makes up the “basis”
of the object
The Mellon Fedora Project
Adapted from Slide by V. Chachra, VTLS
Client
Application
Fedora™
Repository
Batch
Program
Web
Browser
HTTP SOAP
HTTP SOAP
HTTP SOAP
Manage
Access
Search
Server
Application
Web Service
Web Service
Exposure
Exposure
Layer
Layer
HTTP
OAI Provider
Session Management
User Authentication
Management
Subsystem
Security
Subsystem
Access
Subsystem
Policy Mgmt
Object Reflection
Component Mgmt
Policy Enforcement
Object Dissemination
HTTP
Object Validation
Users/Groups
PID Generation
External
Content
Source
HTTP
FTP
External Content
Retriever
Digital Objects
XML Files
Datastreams
HTTP
Local
Service
Policies
Storage Subsystem
FT P
External
Content
Source
SOAP
Object Mgmt
Remote
Service
Content
Relational DB
Adapted from Slide by V. Chachra, VTLS
VITAL / Fedora Relationship
USP, Brazil, August 2006
67
OAI - Open Archives Initiative
• Advocacy for interoperability
• Standard for transferring metadata among digital
libraries
– Protocol for Metadata Harvesting (PMH)
• Simplicity
• Generality
• Extensibility
• Support for PMH => Open Archive (OA)
USP, Brazil, August 2006
68
OAI = Technical Umbrella for
Practical Interoperability…
Reference
Libraries
Museums
Publishers
E-Print
Archives
…that can be exploited by different communities
OAI – Black Box Perspective
OA 7
OA 4
OA 2
OA 1
OA 3
OA 6
USP, Brazil, August 2006
OA 5
70
The World According to OAI
Service Providers
Discovery
Current
Awareness
Preservation
Data Providers
USP, Brazil, August 2006
71
Textbook Outline
• Part 3 – Advanced Topics
–
–
–
–
Ch. 13: Quality
Ch. 14: Integration
Ch. 15: How to build a digital library
Ch. 16: Research Challenges, Future Perspectives
• Appendix
–
–
–
–
–
A: Mathematical preliminaries
B: Formal Definitions: Ss
C: Formal Definitions: DL terms, Minimal DL
D: Formal Definitions: Archeological DL
E: Glossary of terms, mappings
USP, Brazil, August 2006
72
Quality and the Information Life Cycle
Active
Accurac
y
Comple
teness
Conform
ance
Timeliness
Similarity
Preservability
Describing
Organizing
Indexing
Authoring
Modifying
Semi-Active
Pertinence
Retention
Significance
Mining
Creation
Accessibility
Storing
Accessing
Timeliness
Filtering
Utilization
Archiving
Distribution
Seeking
Ac
ce
ssi
b
Networking Pr
ese ility
rva
bil
ity
Searching
Browsing
Recommending
Discard
Inactive
USP, Brazil, August 2006
Relevance
73
Ellis & Kuhlthau’s Models Mapped to Info. Life Cycle
ib
bi
lit
y
E1:starting
se K2:
l ec
tio
n
ki
e
ing g
n se
n
i
a
ch wsin ting
:
E2 : bro ntia ng
e
E3 iffer itori g
: d mon actin
4
E 5: xtr
E :e
E6
K
K3: n
form 4:
ulat
atio
r
o
l
ion
p
ex
K1
initia :
tion
ng
DL Success Constructs
e
tiv
va
-ac
,
ing
hiv
arc ng
g, rki
rin wo
sto net
pr
es
er
mi
ili
ty
,
n
similarity
,
Ac
timelines pres
s,
er v
t iv
ab
ilit e
y,
de a
sc ut
r ib h o
in rin
g
or g, m
ga o
ni di
zi fy
ng in
, i g,
nd
ex
in
g
ss
io
e
ce
ut
ac
rib
In
st
di
K6:
tation
presen
e
cr
ut
ili
za
ti o
co K5
lle :
cti
on
tiv
ion
t
a
ac
Se
ness,
e
t
e
l
p
, com ormance
y
c
conf
ra
u
acc
DL quality
dimension
DL success
manifest variable
5S and
DL concept
DL success construct
accessibility
accuracy
completeness
consistence
conformance
pertinence
preservability
relevance
significance
similarity
timeliness
adequacy
relevance
reliability
scope
timeliness
understandability
stream, structure
digital object
metadata
collection
catalog
repository
information quality (IQ)
composability
efficiency
effectiveness
extensibility
reusability
reliability
accessibility
reliability
ease of use
joy of use
society, scenario,
space
service
system quality (SQ)
performance expectancy (PE)
DL visibility
society
social influence (SI)
NDLTD
• Networked Digital Library of Theses & Dissertations
• Members
– ~50 full members, ~200 associated members
– International (Australia, Canada, China, Germany, India, Jamaica,
Korea, South Africa, Sudan, Taiwan, Turkey, U.K., U.S.A., and
many more)
• Over 250K metadata records in Union Catalo
• URL http://www.ndltd.org
USP, Brazil, August 2006
76
NDLTD Goals
• For Students:
– Gain knowledge and skills for the Information Age, especially about
Digital Libraries
– Richer communication (digital information, multimedia, …)
• For Universities:
– Easy way to enter the digital library field and benefit thereby
• For the World:
– Global digital library – large, useful, many services
USP, Brazil, August 2006
77
NDLTD Members - 1
Ball State University
Government of Canada
Brigham Young University
Griffith University
California Institute of
Technology
John Hopkins University
Consorci de Biblioteques
Universitàries de Catalunya
Duke University
Georg August Universität
Göttingen
George Washington University
Georgetown University
Georgia Institute of Technology
Kauno Technologijos
Universitetas
Louisiana State University
L'Université du Québec à
Rimouski
McGill University
New Jersey Institute of
Technology
Georgia Southern University
North Carolina Central
University
Georgia State University
North Carolina State
Ohio University
NDLTD Members - 2
Oregon State University Library
University of Missouri
Penn State University
University of North Carolina Chapel Hill
Pontifícia Universidade Católica
do Rio de Janeiro
University of Pittsburgh
Portugal National Library
University of Pretoria
Rita Chu (individual)
University of Southern Florida
Simon Fraser University
University of Tennessee
State of Kansas
University of Waterloo
Texas Tech University
Virginia Tech
Universidad de las Américas,
Puebla
West Virginia University
Libraries
Universität St. Gallen
Worcester Polytechnic Institute
University of Glasgow
Yale University
University of Maine
USP, Brazil, August 2006
80
USP, Brazil, August 2006
81
USP, Brazil, August 2006
82
USP, Brazil, August 2006
83
A Digital Library Case Study
• Domain: graduate education,
research
• Genre:ETDs=electronic
theses & dissertations
• Submission: http://etd.vt.edu
• Collection:
http://www.theses.org
USP, Brazil, August 2006
Project:
Networked Digital
Library of Theses
& Dissertations
(NDLTD)
http://www.ndltd.org
How can a university get involved?
• Select planning/implementation team
–
–
–
–
Graduate School
Library
Computing / Information Technology
Institutional Research / Educ. Tech.
• Join online, give us contact names
– www.ndltd.org/join
• Adapt Virginia Tech or other proven approach
– Build interest and consensus
– Start trial / allow optional submission
USP, Brazil, August 2006
Student Gets Committee
Signatures and Submits ETD
Signed
Grad School
Library Catalogs ETD, Access is
Opened to the New Research
WWW
NDLTD
Q uickTim e™ and a
Cinepak decom pr essor
ar e needed t o see t his pict ur e.
http://scholar.lib.vt.edu/theses/available/etd-2227102539751141/
USP, Brazil, August 2006
88
USP, Brazil, August 2006
89
USP, Brazil, August 2006
90
ETD Union Collection (OAI)
VIRTUA
ODL (VT)
Future: recommender, …
Merged Metadata
Collection
LEGEND
OAI Data Provider
Virginia
Tech ETD
Archive
OCLC
ETD
Archive
Brazil
ETD
Archive
…
USP, Brazil, August 2006
OAI Service Provider
OAI Harvesting
91
Union catalog: OCLC
• OCLC will expand OAI data provider on TDs.
• Is getting data from WorldCat (so, from many
sites!).
• Will harvest from all others who contact them.
• Need DC and either ETD-MS or MARC.
• Has a set for ETDs.
USP, Brazil, August 2006
92
USP, Brazil, August 2006
93
USP, Brazil, August 2006
94
USP, Brazil, August 2006
95
OCLC SRU Interface
USP, Brazil, August 2006
96
USP, Brazil, August 2006
97
ETD Union Search Mirror Site in China (CALIS)
(http://ndltd.calis.edu.cn – popular site!)
USP, Brazil, August 2006
99
VTLS Union Catalog
Content Languages

The VTLS NDLTD Union Catalog has data in 6 different languages.
These are:
 English
 German
 Greek
 Korean
 Portuguese
 Spanish

Examples follow
USP, Brazil, August 2006
100
Language = German; hits = 137
USP, Brazil, August 2006
101
Full record display
USP, Brazil, August 2006
102
USP, Brazil, August 2006
103
USP, Brazil, August 2006
104
USP, Brazil, August 2006
105
ETDs: Library Goals
• Improve library services
– Better turn-around time
– Always available
• Reduce work
– catalog from e-text
– eliminate handling: mailing to ProQuest,
bindery prep, check-out, check-in,
reshelving, etc.
• Save space
USP, Brazil, August 2006
106
What are we doing?
• Aiding universities to enhance graduate
education, publishing and IPR efforts
• Helping improve the availability and content of
theses and dissertations
• Educating ALL future scholars so they can
publish electronically and effectively use digital
libraries (i.e., are Information Literate and can
be more expressive)
USP, Brazil, August 2006
NDLTD Incorporation
• Networked Digital Library of Theses and Dissertations
incorporated May 20, 2003 in Virginia, USA
• Charitable and educational purposes (501 c 3)
• Officers
– Executive Director (Ed Fox)
– Secretary (Gail McMillan)
– Treasurer (Scott Eldredge)
USP, Brazil, August 2006
108
Board of Directors (2006)
•
•
•
•
•
•
•
•
•
•
•
•
•
Suzie Allard (ETD 2004, U. Kentucky)
Denise A. D. Bedford (World Bank)
Julia C. Blixrud (ARL, SPARC)
José Luis Borbinha (Natl Lib Portugal)
Alex Byrne (ETD 2005, ADT: Australia)
Tony Cargnelutti (ETD 2005, Australia)
Vinod Chachra (VTLS)
William Clark (Ohio State U.)
Susan Copeland (RGU, UK)
Jude Edminster (Bowling Green St. U.)
Scott Eldredge (Treasurer, ETD 2002,
BYU)
Edward A. Fox (Exec Director,Virginia
Tech)
John H. Hagen (West Virginia U.)
• Thomas B. Hickey (OCLC)
• Christine Jewell (U. Waterloo,
Canada)
•
•
•
•
•
•
•
•
•
•
•
•
Joan K. Lippincott (CNI)
Mike Looney (Adobe)
Austin McLean (ProQuest)
Gail McMillan (Secretary, Virginia Tech)
Joseph Moxley (ETD 2000, USF)
Eva Müller (U. Uppsala, Sweden)
Ana Pavani (PUC Rio, Brazil)
Sharon Reeves (Natl Library Canada)
Peter Schirmbacher (ETD 2003,
Humboldt)
Hussein Suleman (U.Cape Town, S.
Africa)
Shalini R. Urs (U. Mysore, India)
Eric F. Van de Velde (ETD 2001,
Caltech)
USP, Brazil, August 2006
109
NDLTD Committees (Chairs)
•
•
•
•
•
•
•
•
•
•
Awards (John Hagen)
Conferences (Sharon Reeves)
Development (Peter Schirmbacher)
Executive (Edward Fox)
Finance (Scott Eldredge)
Implementation (Ana Pavani)
Membership (Tony Cargnelutti)
Nominating (Joan Lippincott)
Standards (Thomas B. Hickey)
Union Catalog (Vinod Chachra)
USP, Brazil, August 2006
110
Selected Projects / Sponsors
•
•
•
•
•
•
•
•
•
Australia (ADT)
Brazil (BDT, IBICT)
Canada
Catalunya
Chile (Cybertesis)
China (CALIS)
Germany
India (Vidyanidhi)
Korea
• OhioLINK: 79
colleges/univs
• Portugal (National Library)
• South Africa
• UK (British Library, JISC,
Edinburgh, …)
• UNESCO (especially Latin
America, Eastern Europe,
Africa)
• …
USP, Brazil, August 2006
111
Some Countries
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Australia
Belgium
Brazil
Canada
Chile
China, Hong Kong
Columbia
Finland
France
Germany
Greece
India
Italy
Jamaica
Korea
Lithuania
Malaysia
Mexico
USP, Brazil, August 2006
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Namibia
Netherlands
Norway
Poland
Russia
Singapore
S. Africa
S. Korea
Spain
Sudan
Sweden
Switzerland
Taiwan
Thailand
Turkey
UK
USA
Venezuela
Yugoslavia
112
UNESCO and ETDs
(by Axel Plathe at ETD2003)
• Promoting the use of the Internet as a tool for disseminating scientific
knowledge
• Facilitating the transfer of ETD expertise from developed to developing
countries
• 1998: Member of the NDLTD Steering Committee
• 1999: First UNESCO ETD meeting on ETD internationalisation
• 2002: “UNESCO Guide to Electronic Theses and Dissertations”
• 2003: Model training programmes and training courses
• 2003: Sponsor pilot projects
• 2003: Pilot projects (Africa, Europe, Latin-America)
USP, Brazil, August 2006
113
Why ETD? Short Answer
• For Students:
– Gain knowledge and skills for the Information Age
– Richer communication (digital information, multimedia, …)
• For Universities:
– Easy way to enter the digital library field and benefit thereby
• For the World:
– Global digital library – large, useful, many services
• General:
– Save time and money
– Increased visibility for all associated with research results
USP, Brazil, August 2006
114
Patrons, Queries
• User Profile Data (Oct. 2005 – May 2006)
• Online User Survey as part of User Modeling study
• Total 1100 User Data that include
– User survey: majors, specialties, years of experience, and
demographic information.
– Tracking Data: Queries and detailed research interests obtained
by a Search User Interface embedded User Tracking System [4]
USP, Brazil, August 2006
115
Categorization of Academic Subjects
• Created our own classification categories 
• Based on colleges/faculties in five universities in VA
- Virginia Tech, University of Virginia, George Mason University,
VCU and Virginia State University
• Identified
- 7 categories and 77 subcategories
- Word patterns for each subcategories
USP, Brazil, August 2006
116
Categorization of Academic Subjects
• 7 categories and selected 77 subcategories
7 Categories
Selected 77 Sub-categories
1
Architecture and Design
ArchitectureConstruction,
LandscapeArchitecture,…
2
Law
Law
3
Medicine, Nursing and
Veterinary Medicine
Dentistry, Medicine, Pharmacy, Nursing,…
4
Arts and Science
Agriculture, AnimalPoultry,Biology,...
5
Engineering and Applied
Science
ComputerScience, Material, Electronics,…
6
Business and Commerce Buisiness, Economics, Management,…
7
Education
8
Others (unclassifiable)
Education
USP, Brazil, August 2006
117
Supply-Demand Comparison
1 Architecture
and Design
ETD Resources and User Demands (Number of Queries) in NDLTD
50%
ETDs
2 Law
Demands
3 Medicine,
Nursing and
Veterinary
Medicine
45%
40%
35%
30%
4 Arts and
Science
25%
5 Engineering
and Applied
Science
20%
15%
10%
5%
6 Business and
Commerce
0%
7 Education
1
2
3
4
5
Academic Categories
6
USP, Brazil, August 2006
7
8
8 Others.
(unclassifiable)
118
Measuring Supply – Demand
• ETD Supply:
- Number of resources provided
- 242,688 ETDs classified into 7 categories and counted
• Patron’s Demand:
- Number of queries entered
- 4519 queries (in 1100 user data) classified into 7 categories
- “Sum of all queries” in each category calculated as
Demand of a Category 
 number of queries
user  category
USP, Brazil, August 2006
119
Resource Distribution
Resource Distribution in NDLTD
2
1
3
4
8
1
2
3
4
5
6
7
1
Architecture and
Design
2
Law
3
Medicine, Nursing
and Veterinary
Medicine
4
Arts and Science
5
Engineering and
Applied Science
6
Business and
Commerce
7
Education
8
Others.
(unclassifiable)
8
5
7
6
USP, Brazil, August 2006
120
User Distribution
User Distribution in NDLTD
1
1
Architecture and
Design
2
Law
3
Medicine, Nursing
and Veterinary
Medicine
4
Arts and Science
5
Engineering and
Applied Science
6
Business and
Commerce
7
Education
8
Others.
(unclassifiable)
2
3
8
4
7
1
2
3
4
5
6
7
8
5
6
USP, Brazil, August 2006
121
Query Distribution
1 Architecture
and Design
Query Distribution in NDLTD
1
2 Law
2
3
4
8
5
1
2
3
4
5
6
7
8
3 Medicine,
Nursing and
Veterinary
Medicine
4 Arts and
Science
5 Engineering
and Applied
Science
6 Business and
Commerce
7 Education
7
6
USP, Brazil, August 2006
8 Others.
(unclassifiable)
122
ti ngF
ETD supply
Ar t
Astr o
nom
y
B io c
B io lo
hem
i st ry
gi cal
Eng in
eeri n
g
B io lo
gy
B ot a
ny
B usi
ness
Chem
i cal
Chem
i stry
Co m
mun
i cat io
Co m
n
put e
Cro p
rSc ie
Soi lE
nc e
n vSc
ie nce
s
Dai ry
Scien
ce
Dent
istr y
Eco lo
gy
Eco n
o mic
s
Educ
at io n
El ec
Eng in
tr on ic
eeri n
s
g Scie
n ce
Eng li
sh
Ent o
mo l o
gy
Envi r
o nm
e nt
F am
F or e
i ly
ig nLa
ngu a
F oo d
g esL
i ter a
t ures
F or e
str y
Geo g
ra ph
Go ve
y
r nme
Geo l
nt Int
e rna
og y
ti on a
l Affa
ir
in anc
e
Aer o
spac
e
Ag ri c
ul tur
e
An im
a lPo
ul try
An th
r opo
lo gy
Appa
r elHo
using
Ar ch
Ar ch
i tect
a eol
o gy
ure C
o nst
r uct io
n
Acc o
un
Supply-Demand of 77 Subcategories
(1/2)
Supply/Demand 77 Subcategories (1/2)
12%
10%
User Demand
8%
6%
4%
2%
0%
USP, Brazil, August 2006
123
ETD Supply
Law
Lib r ar
yScie
nce
Lin gu
i sti cs
Lit er a
tu re
Man a
g eme
nt
Mat er
i al s
Mech
a ni cs
Medic
i ne
Mete o
r ol o gy
Mat he
ma ti c
s
Min ing
Min er
al
Music
Nava l
Nucl e
ar
N
u
r
si ng
Oce a
nEng in
eeri ng
Pha rm
a cy
Phi lo s
op hy
Phy si
cs
Pl ant
Po li ti c
s
Pub li c
Psych
o l og y
Admi
nistr a
t io nPo
li cy
Pub li c
Affai r
So cio
lo g y
Sta ti s
tic s
Ur ba n
Pla nn
i ng
V eter in
ar y
Wil dli
fe
Woo d
Zoo lo
gy
Hist or
y
Ho rt ic
ul tur e
Ho sp
it al it yT
our ism
Huma
nDeve
Huma
l o pme
nNutr
nt
it io nE
xer cis
e
Indus
tri al
Infor m
at ic s
Int erd
isci pl in
Lan ds
ary
ca peA
rchi te
c tur e
Supply-Demand of 77 Subcategories
(2/2)
Supply/Demand 77 Subcategories (2/2)
12%
User Demand
10%
8%
6%
4%
2%
0%
USP, Brazil, August 2006
124
User Expertise Years
Users' Expertise in Years
200
180
160
120
100
80
60
40
20
50
35
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0
0
Users
140
Years
USP, Brazil, August 2006
125
Expertise Years and Demand
Expertise Years and Demand
25%
Users
Demand
20%
15%
10%
5%
50
er
ro
r
39
40
30
35
27
28
25
26
23
24
21
22
19
20
17
18
16
14
15
12
13
10
11
9
8
7
6
5
4
3
2
1
0
0%
Years
USP, Brazil, August 2006
126
17 xx
18 xx
19 0x
19 1x
19 2x
19 3x
19 4x
19 5x
19 6x
19 7x
19 8x
19 90
19 91
19 92
19 93
19 94
19 95
19 96
19 97
19 98
19 99
20 00
20 01
20 02
20 03
20 04
20 05
20 06
error
Date Stamp of ETD
60,000
50,000
40,000
30,000
20,000
10,000
0
Year
USP, Brazil, August 2006
127
Future Work
• Use of widely-used classification system
- e.g., Dewey Decimal Classification 22 ($375)
• More detailed classification of ETDs
- Include title, abstract and other subject field data
- Approx. 7000 etds in oai_etdms as well as oai_dc
 Utilize “discipline” in oai_etdms format records
• Use of user activity data
- e.g., Clicking of query results in NDLTD
• Visualization of NDLTD use and its community
USP, Brazil, August 2006
128
Challenges
• Preservation - so people with trust DLs
• Supporting infrastructure - networks, ...
• Scalability, sustainability, interoperability
• DL industry - critical mass by covering libraries,
archives, museums, corporate info, govt info, personal
info - “quality WWW” integrating IR, HT, MM, ...
– Need tools & methods to make them easier to build
USP, Brazil, August 2006