20060917ECDLtutorialFoxSlides.ppt

Download Report

Transcript 20060917ECDLtutorialFoxSlides.ppt

ECDL 2006 Tutorial
(Alicante, Spain –17 Sept 2006)
“Learning/Teaching DLs: An Overview”
Based on Draft Book
“Foundations for Information Systems:
Digital Libraries and the 5S Framework”
by Edward A. Fox and
Marcos André Gonçalves
• See content of Preface in the next slides.
• See table of contents / outline, and then
corresponding content, following.
1
Acknowledgements (selected)
• Colleagues: Lillian Cassel, Debra Dudley,
Weiguo Fan, Marcos Gonçalves, Doug Gorton,
Rohit Kelapure, Neill Kipp, Aaron Krowne, Ming
Luo, Uma Murthy, Manuel Perez, Ananth
Raghavan, Rao Shen, Hussein Suleman,
Srinivas Vemuri, Layne Watson, …
• Sponsors: ACM, AOL, CAPES, DFG, IBM,
Microsoft, NSF (IIS-9986089, 0086227,
0080748, 0325579, 0535057, 0535060; ITR0325579; DUE-0121679, 0136690, 0121741,
0333601), SUN, …
Introductions
• Country, City, Languages you speak
• Main discipline of training
• # of digital libraries (DLs) used:
– Please list:
• # of DL conferences attended? ECDLs?
• Why taking this course?
• You goals for today?
3
For More Information
• Magazine: www.dlib.org
• Books: http://fox.cs.vt.edu/DLSB.html (1994)
– MIT Press: Arms, plus by Borgman, Licklider (1965)
– Morgan Kaufmann: Witten... (several), Lesk (2nd edition)
• Conferences
– ECDL: www.ecdl2006.org
– ICADL: http://www.icadl.org
– JCDL: www.jcdl2006.org
• Associations
– ASIS&T DL SIG
– IEEE TCDL: www.ieee-tcdl.org (student awards, doctoral
consortia)
• NSF: www.dli2.nsf.gov
4
• Labs: VT: www.dlib.vt.edu
5
DL Challenges
• Preservation - so people with trust DLs
• Supporting infrastructure - networks, ...
• Scalability, sustainability, interoperability
• DL industry - critical mass by covering
libraries, archives, museums, corporate info,
govt info, personal info - “quality WWW”
integrating IR, HT, MM, ...
– Need tools & methods to make them easier to
build
DL Challenges – 2: Terminology
• Digital / electronic / virtual library
• Born digital, hybrid (digital/physical)
• Universal access (all people/places/times)
– Accommodate disabilities (color, visual, auditory)
– Mobile (office, home, laptop, PDA, mobile)
• Archiving, self-archiving
• Open (source, standards, archives)
7
How to organize a DL course?
• Various frameworks
– What, Why, How
– History, Current status, Future (research)
– Economics: open source, sustainability
– Social: users/patrons, management
– Technical: HCI, HT, IR, LIS, Web
• Suggest that concept maps be drawn by
readers to help in working with this book
• Instructors can access “expert” maps with
IHMC tools
8
CC2001 Information Management Areas
IM1. Information models and
systems*
IM2. Database systems*
IM8. Distributed DBs
IM3. Data modeling*
IM10. Data mining
IM4. Relational DBs
IM11. Information storage and
retrieval
IM12. Hypertext and
hypermedia
IM13. Multimedia information
& systems
IM14. Digital libraries
IM5. Database query
languages
IM6. Relational DB design
IM7. Transaction processing
IM9. Physical DB design
9
* Core components
RELATED
TOPICS
CORE DL
TOPICS
COURSE
STRUCTURE
DL Curriculum Framework
Semester 1:
DL collections:
development/creation
Digitization
Storage
Interchange
Metadata
Cataloging
Author
submission
Digital objects
Composites
Packages
Semester 2:
DL services and
sustainability
Architectures
(agents, buses,
wrappers/mediators)
Interoperability
Spaces
(conceptual,
geographic,
2/3D, VR)
Documents
E-publishing
Markup
Multimedia
streams/structures
Capture/representation
Compression/coding
Bibliographic
information
Bibliometrics
Citations
Content-based
analysis
Multimedia
indexing
Naming
Repositories
Archives
Services
(searching,
linking,
browsing, etc.)
Archiving and
preservation
Integrity
Architectures
(agents, buses,
wrappers/mediators)
Interoperability
Thesauri
Ontologies
Classification
Categorization
Multimedia
presentation,
rendering
Info. Needs
Relevance
Evaluation
Effectiveness
Intellectual property
rights mgmt.
Privacy
Protection (watermarking)
Routing
Filtering
Community
filtering
Search & search strategy
Info seeking behavior
User modeling
Feedback
Info
summarization
Visualization
10
Book Parts
• Ch. 1. Introduction (Motivation, Synopsis)
•
•
•
•
Part 1 – The “Ss”
Part 2 – Higher DL Constructs
Part 3 – Advanced Topics
Appendix
11
Book Parts and Chapters - 1
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
12
Book Parts and Chapters - 2
• Part 2 – Higher DL Constructs
– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
13
Book Parts and Chapters - 3
• Part 3 – Advanced Topics
– Ch. 13: Quality
– Ch. 14: Integration
– Ch. 15: How to build a digital library
– Ch. 16: Research Challenges, Future Perspectives
• Appendix
– A: Mathematical preliminaries
– B: Formal Definitions: Ss
– C: Formal Definitions: DL terms, Minimal DL
– D: Formal Definitions: Archeological DL
– E: Glossary of terms, mappings
14
Acknowledgements
•
•
•
•
•
Students
Faculty, Staff
Collaborators
Support
Mentors
15
Acknowledgements: Students
• Pavel Calado, Yuxin Chen, Fernando Das
Neves, Shahrooz Feizabadi, Robert
France, Marcos Gonçalves, Nithiwat
Kampanya, S.H. Kim, Aaron Krowne, Bing
Liu, Ming Luo, Paul Mather, Fernando
Das Neves, Unni. Ravindranathan, Ryan
Richardson, Rao Shen, Ohm Sornil,
Hussein Suleman, Ricardo Torres, Wensi
Xi, Baoping Zhang, Qinwei Zhu, …
16
Acknowledgements: Faculty, Staff
• Lillian Cassel, Debra Dudley, Roger
Ehrich, Joanne Eustis, Weiguo Fan,
James Flanagan, C. Lee Giles, Eberhard
Hilf, John Impagliazzo, Filip Jagodzinski,
Rohit Kelapure, Neill Kipp, Douglas
Knight, Deborah Knox, Aaron Krowne,
Alberto Laender, Gail McMillan, Claudia
Medeiros, Manuel Perez, Naren
Ramakrishnan, Layne Watson, …
17
Other Collaborators (Selected)
•
•
•
•
•
•
•
•
•
•
Brazil: FUA, UFMG, UNICAMP
Case Western Reserve University
Emory, Notre Dame, Oregon State
Germany: Univ. Oldenburg
Mexico: UDLA (Puebla), Monterrey
College of NJ, Hofstra, Penn State, Villanova
University of Arizona
University of Florida, Univ. of Illinois
University of Virginia
VTLS (slides on digital repositories, NDLTD)
18
Acknowledgements: Support
• Course: UNESCO, CETREDE, IFLALAC, AUGM, CLEI, UFC
• Sponsors: ACM, Adobe, AOL, CAPES,
CNI, CONACyT, DFG, IBM, Microsoft,
NASA, NDLTD, NLM, NSF (IIS-9986089,
0086227, 0080748, 0325579; ITR0325579; DUE-0121679, 0136690,
0121741, 0333601), OCLC, SOLINET,
SUN, SURA, UNESCO, US Dept. Ed.
(FIPSE), VTLS
Acknowledgements - Mentors
• JCR Licklider – undergrad advisor (1969-71)
– Author in 1965 of “Libraries of the Future”
– Before, at ARPA, funded start of Internet
• Michael Kessler – BS thesis advisor
– Project TIP (technical information project)
– Defined bibliographic coupling
• Gerard Salton – graduate advisor (1978-83)
– “Father of Information Retrieval”
20
Chapter 1 - Introduction
21
Chapter 1 Overview
•
•
•
•
•
•
Why do we need this book?
What are digital libraries (DLs)?
Why is 5S helpful in a DL book?
How do digital libraries work?
History: Memex, 1990s, proliferation
Related areas: LIS, linguistics, IR, AI, DBs,
knowledge management, content
management, probability/statistics
22
Synchronous
Scholarly Communication
Same time, Same or different place
23
Asynchronous, Digital Library
Mediated Scholarly Communication
Different time and/or place
24
DL Overview
Why of Global Interest?
• National projects can preserve antiquities and
heritage: cultural, historical, linguistic, scholarly
• Knowledge and information are essential to
economic and technological growth, education
• DL - a domain for international collaboration
–
–
–
–
wherein all can contribute and benefit
which leverages investment in networking
which provides useful content on Internet & WWW
which will tie nations and peoples together more
strongly and through deeper understanding
Digital Libraries --- Objectives
• World Lit.: 24hr / 7day / from desktop
• Integrated “super” information systems: 5S:
Table of related areas and their coverage
• Ubiquitous, Higher Quality, Lower Cost
• Education, Knowledge Sharing, Discovery
• Disintermediation -> Collaboration
• Universities Reclaim Property
• Interactive Courseware, Student Works
• Scalable, Sustainable, Usable, Useful
Libraries of the Future
JCR Licklider, 1965, MIT Press
World
Nation
State
City
Community
27
Communications
(bandwidth, connectivity)
Locating Digital Libraries in Computing and
Communications Technology Space
Digital Libraries
technology
trajectory: intellectual
access to globally
distributed information
Computing (flops)
Digital content
less
more
Note: we should consider 4 dimensions:
computing, communications,
content, and community (people)
AmericanSouth.Org – Roles, Content
SOLINET
Libraries (Data Providers)
Scholars
Intellectual Organization
Controlled vocabulary
Metadata extension development
Collection Decisions
Selection Criteria
Selection Criteria
Controlled vocabulary
Central Server Maintenance
Local Server Maintenance
Provision of Context
Metadata Repository
Metadata Creation/Maintenance
Organizational Structure and
Annotation Tools
Central Interface Design/Maintenance
Local Interface Design/Maintenance
Selection of Other
Annotation
Tools
Central Indices Creation/Maintenance
Local Indices
Selection of Thesauri
Coordination of Metadata Gateway
Development
Gateway Implementation
Concept Mapping
Digital Objects
29
Information
Life
Cycle
Borgman et al.:
Workshop Report on
Social Aspects of
Digital Libraries:
http://www-lis.gseis.
ucla.edu/DL/ 30
Information Life Cycle
Authoring
Modifying
Using
Creating
Retention
/ Mining
Organizing
Indexing
Accessing
Filtering
Storing
Retrieving
Distributing
Networking
31
Digital Libraries
Shorten the Chain from
Editor
Reviewer
Publisher
A&I
Consolidator
Library
32
DLs Shorten the Chain to
Author
Teacher
Digital
Reader
Editor
Reviewer
Learner
Library
Librarian
33
R
e
a
g
a
n
M
o
o
r
e
E
d
F
o
x
Application
Domain
Related Institutions
Examples
Technical Challenges
Benefit / Impact
Publishing
Publishers, Eprint
archives
OAI
Quality control, openness
Aggregation, organization
Education
Schools, colleges,
universities
NSDL, NCSTRL
Knowledge management,
reuseability
Access to data
Art, Culture
Museum
AMICO, PRDLA
Digitization, describing, cataloging
Global understanding
Science
Government,
Academia, Commerce
NVO, PDG,
SwissProt, UK
eScience,European
Union Commission
Data models
reproducibility, faster reuse, faster
advance
(e)
Government
Government Agencies
(all levels)
Census
Intellectual property rights, privacy,
multi-national
Accountability, homeland security
(e)
Commerce,
(e) Industry
Legal institutions
Court cases, patents
Developing standards
Standardization, economic development
History,
Heritage
Foundations
Crosscutting
Library,
Archive
J
u
n
e
2
0
0
2
American Memory
Content, context, interpretation
Long term view, perspective,
documentation, recording, facilitating,
interpretation, understanding
Web, personal
collections
Multi-language, preservation,
scalability, interoperability,
dynamic behavior, workflow,
sustainability, ontologies,
distributed data, infrastructure
Reduced cost, increased access,
pereservation, democratization, leveling,
peace, competitiveness
34
f
o
r
N
S
F
Running Examples
– Institutional repositories linked worldwide
• Building from partial vs. complete
coverage
• Partial1: ETDS -> NDLTD
• Partial2: Courseware -> NSDL -> DLEs
– Archaeological information worldwide
• Sites -> ETANA -> …
35
Motivation
• Digital Libraries (DLs): what are they??
– No definitional consensus
– Conflicting views
– Makes interoperability a hard problem
• DLs are not benefiting from formal theories as are
other CS fields: DB, IR, PL, etc.
• DL construction: difficult, ad-hoc, lack of support
for tailoring/customization
• Conceptual modeling, requirements analysis, and
methodological approaches are rarely supported in DL
development.
– Lack of specific DL models, formalisms, languages36
DL Definitions - 1
• “A digital library is an organized and
focused collection of digital objects,
including text, images, video, and audio,
along with methods of access and
retrieval, and for selection, creation,
organization, maintenance, and sharing of
the collection.”
• Witten & Bainbridge – “How to Build a
Digital Library” – Morgan Kaufmann 2003
37
DL Definitions - 2
• “Digital libraries are organizations that
provide the resources, including the
specialized staff, to select, structure, offer
intellectual access to, interpret, distribute,
preserve the integrity of, and ensure the
persistence over time of collections of
digital works so that they are readily and
economically available for use by a defined
community or set of communities”
• Waters,D.J. CLIR Issues, July/August 1998
• www.clir.org/pubs/issues/issues04.html 38
DL Definitions - 3
• Issues and Spectra
– Collection vs. Institution
– Content vs. System
– Access vs. Preservation
– “Free” vs. Quality
– Managed vs. Comprehensive
– Centralized vs. Distributed
39
DL Definitions - 4
• NOT a “digitized library”
• NOT a “deconstruction” of existing
systems and institutions, moving them to
an electronic box in a Library
• IS a new way to deal with knowledge
– Authoring, Self-archiving, Collecting,
– Organizing, Preserving,
– Accessing, Propagating, Re-using
40
Digital Library Content
Content
Types
Text
Documents
Video
Audio
Geographic
Information
Software,
Programs
Bio
Information
Images and
Graphics
Articles,
Reports,
Books
Speech,
Music
(Aerial)
Photos
Models
Simulations
Genome
Human,
animal,
plant
2D, 3D,
VR,
CAT
41
Content Area Description
Audio
Digital
Finding
Aid
MSS
Other
Photo
Video
MF
Print
Total
African-American cultural life
6
4
6
9
4
12
3
10
18
72
Agricultural crisis of late 19th century
1
1
3
1
1
4
8
19
Codification of segregation laws
1
3
2
1
8
16
Configuration of white supremacy
1
3
3
1
9
20
Cultural values and activities
3
5
17
4
15
1
5
20
71
Disenfranchising movements
1
2
2
1
2
1
6
15
Educational movements
6
1
18
6
21
3
27
98
1
1
7
10
1
1
Emergence of Holiness & Pentecostal Groups
Emergence of new musical forms
3
…
Total Each Format
3
2
…
…
…
41
14
51
5
1
1
Emergence of organized groups expressing
farmers concerns
1
1
1
2
8
2
1
8
13
… … … … … … …
161
38
133
13
79
301
831
42
Outline
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
43
Informal 5S & DL Definitions
DLs are complex systems that
•
•
•
•
•
help satisfy info needs of users (societies)
provide info services (scenarios)
organize info in usable ways (structures)
present info in usable ways (spaces)
communicate info with users (streams)
44
5S Layers
5 Elements
Societies
Fire
Scenarios
Wood
Spaces
Earth
Structures
Metal
Streams
Water
45
Hypotheses
• A formal theory for DLs can be built
based on 5S.
• The formalization can serve as a
basis for modeling and building highquality DLs.
46
Research Questions
1. Can we formally elaborate 5S?
2. How can we use 5S to formally describe digital libraries?
3. What are the fundamental relationships among the Ss
and high-level DL concepts?
4. How can we allow digital librarians to easily express
those relationships?
5. Which are the fundamental quality properties of a DL?
Can we use the formalized DL framework to
characterize those properties?
6. Where in the life cycle of digital libraries can key aspects
of quality be measured and how?
47
5Ss
Ss
Examples
Objectives
Streams
Text; video; audio; image
Describes properties of the DL content
such as encoding and language for
textual material or particular forms of
multimedia data
Structures Collection; catalog;
hypertext; document;
metadata
Specifies organizational aspects of the DL
content
Spaces
Measure; measurable,
topological, vector,
probabilistic
Defines logical and presentational views
of several DL components
Scenarios
Searching, browsing,
recommending
Details the behavior of DL services
Societies
Service managers,
learners, teachers, etc.
Defines managers, responsible for
running DL services; actors, that use
those services; and relationships among
48
them
5S and DL formal definitions and compositions (April 2004 TOIS)
relation (d. 1)
sequence graph (d. 6)
(d. 3)
measurable(d.12), measure(d.13), probability (d.14),
language (d.5)
vector (d.15), topological (d.16) spaces
sequence
tuple (d. 4)*
(d.
3)
function
state (d. 18)
event (d.10)
(d. 2)
5S
grammar (d. 7)
streams (d.9)
structures (d.10) spaces (d.18) scenarios (d.21) societies
(d. 24)
services (d.22)
structured
stream (d.29)
digital
object
(d.30)
structural
metadata
specification
(d.25)
transmission collection (d. 31)
(d.23)
repository
(d. 33)
descriptive
metadata
specification
(d.26)
metadata catalog
(d.32)
(d.34)indexing
service
hypertext
(d.36)
browsing
service
(d.37)
digital
library
(minimal) (d. 38)
searching
service (d.35)
49
50
ETANA-DL
•
•
Archaeological DL
Integrated DL
– Heterogeneous data handling
•
Applies and extends the OAI-PMH
– Open Archives Initiative Protocol for Metadata
Handling
•
Design considerations
– Componentized
– Extensible
– Portable
51
Initial ETANA-DL Member Locations
Canadian University College
Andrews University
CWRU
Walla Walla College
Willamette University
Virginia Tech
Vanderbilt University
Mississippi State University
Map courtesy: www.enchantedlearning.com
52
53
54
Lahav Website
55
Megiddo Opening Screen
56
Locus Screen:
Pictures
View all
57
Area Screen
58
59
ETANA-DL Approach
• Applying and extending Digital Library (DL)
techniques to solve key problems: making primary
data available, data preservation, and interoperability
• Modeling archaeological information systems using
5S to better understand the domain and design the
system and the supporting services
• Rapidly prototyping DLs that handle heterogeneous
archaeological data using componentized
frameworks:
– eliciting requirements
– refining metamodel and union schema
– modeling sites
– mapping
– harvesting
60
– providing useful services
ETANA-DL Website
61
Marking – writing
notes for
a specific user
Marking Items
62
Sender, Date,
Object OAI ID
Sender
Comments
Options:
View Record,
Add record to Items Of Interest,
Re-mark item (Redirect),
Unmark item (Remove item from list)
Marked Items Display
63
Discussions
about an
object
View/Post
messages,
create new
threads
Discussions Page
64
Items recommended
on the basis of
similar interests
Recommendations
65
ETANA-DL Searching Service
Search
66
ETANA-DL Multi-dimensional Browsing
3 new sites
2 new types of artifacts
67
ETANA-DL Visual Browsing Service
By site
Visual Browse
68
Visual Browsing Nimrin:
Topographical Drawings
Square:
N40/W20
Full site
North west quadrant
69
Visual Browsing Nimrin : Square information
Square:
N40/W20
Locus: 86
Loci layout
70
Visual Browsing Nimrin : locus sheet
71
Visual Browsing
Bab edh-Dhra'
Cemetery
Pottery # 25
72
Visual Browsing
Bab edh-Dhra'
Cemetery
Pottery # 25
73
ETANA Societies
1. Historic and pre-historic societies (being studied)
2. Archaeologists (in academic institutes, fieldwork
settings, or local and national governmental
bodies)
3. Project directors
4. Technical staff (consisting of photographers,
technical illustrators, and their assistants)
5. Field staff (responsible for the actual work of
excavation)
6. Camp staff (e.g., camp managers, registrars, tool
stewards)
7. General public (e.g., educators, learners, citizens)
74
ETANA Societies
•
Social issues
1. Who owns the finds?
2. Where should they be preserved?
3. What nationality and ethnicity do they
represent?
4. Who has publication rights?
5. What interactions took place between those
at the site studied, and others? What
theories are proposed by whom about this?
75
ETANA Scenarios
1.
2.
3.
4.
Life in the site in former times
Digital recording: the planning stage and the excavation stage
Planning stage: remote sensing, fieldwalking, field surveys, building
surveys, consulting historical and other documentary sources, and
managing the sites and monuments
Excavation
1.
2.
3.
4.
5.
6.
7.
8.
Detailed information is recorded, including for each layer of soil, and for
features such as pole holes, pits, and ditches.
Data about each artifact is recorded together with information about its
exact find spot.
Numerous environmental and other samples are taken for laboratory
analysis, and the location and purpose of each is carefully recorded.
Large numbers of photographs are taken, both general views of the
progress of excavation and detailed shots showing the contexts of finds.
Organization and storage of material
Analysis and hypotheses generation and testing
Publications, museum displays
Information services for the general public
76
ETANA Spaces
1. Geographic distribution of found artifacts
2. Temporal dimension (as inferred by
archaeologists)
3. Metric or vector spaces
1. used to support retrieval operations, and to
calculate distance (and similarity)
2. used to browse / constrain searches spatially
4. 3D models of the past, used to reconstruct and
visualize archaeological ruins
5. 2D interfaces for human-computer interaction
77
ETANA Structures
1. Site Organization
1. Region, site, partition, sub-partition, locus,
…
2. Temporal orderings (ages, periods)
3. Taxonomies
1. for bones, seeds, building materials, …
4. Stratigraphic relationships
1. above, beneath, coexistent
78
ETANA Streams
1. successive photos and drawings of
excavation sites, loci, unearthed artifacts
2. audio and video recordings of excavation
activities and discussions
3. textual reports
4. 3D models used to reconstruct and
visualize archaeological ruins.
79
Exercise 1
• Forms groups of 2.
• Select a digital library you wish to build,
improve, or study.
• As was done for ETANA, discuss it using
the 5S perspective.
• Present a summary to the class and lead a
discussion.
80
Outline
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
81
Chapter 2 Overview
• Multiple media types and representation
– See ch. 4 for IR (except some here for non-text)
– Standards for each, and for some combinations
• Text
–
–
–
–
Character strings, encoding (Unicode)
Morphology -> Stemming
Syntax, semantics -> stop words
** POS tagging, phrases
• Images, Audio, Video, Graphics, Animation
– Capture, digitization, representation
– CBIR for each
• ** Compression, processing, analysis
• **Synchronization, rendering, presentation, interchange
– RealVideo, SMIL, QoS
82
Integrated CCLINC
Translingual Information System
DARPA
CCLINC
SERVER
Translation
It seems that North Korea launch a missile again
After North Korea launched a Daipodong missile
last month, NK is perceived to proceed to an additional
test launch. Korea, US and Japan enter into an alert
state, and prepare for a joint response policy. Korea
estimates that the additional launch will be on 09/05.
Japan estimates that NK’s missile range is short. US
information says that there is no sign of launch yet.
83
Structured Video Browser
(making video into hypermedia)
www.learn.umd.edu
• IBrowse
• Expository multimedia
• Narrative Structures
84
 MPEG7
MPEG-7 Video Library Systems Tech.
Video Library Systems Tech.
Architecture
Video Data
Description Generator
Description
Scheme
Description Schemes
Design Tool
Player
Video
Database
Retrieval Server
Module
Presentation Module
Meta
Database
and Communication
85
ICU Information
University
Tides in Early Texas History –
Stephen F. Austin University
86
VITAL Web Portal
Clicking on the thumbnail image from
this screen will launch the VITAL HiRes Image Navigator – a tool which
provides for detailed examination of
these wavelet compressed image files
Institutions have considerable flexibility in
the way they present their collections – the
examples here show two different
approaches to presenting EAD (Encoded
87
Archival Description) metadata objects
VITAL Web Portal
MrSID and JPEG2000 wavelet
compressed images can be stored in
the repository and displayed to the
user via the integrated VITAL Hi-Res
Image Navigator
88
The AMICO Library™
89
VITAL Web Portal
The AMICO Library in VITAL
90
Implementation Options
The Fedora™
package
Fedora™ open
source software
(free)
VTLS installation,
training, and
support
91
Implementation Options
 The Full VITAL package
 Fedora™ open source
software (free)
 VTLS software and
hardware extensions,
with features and
workflows
 VTLS installation,
training, support,
integration and
documentation
92
Implementation Options
 VITAL Hosted Solution
 VTLS provides ASP
services for your digital
collections
 VTLS Professional Digital
Imaging Services
 Imaging services and
project consulting can be
combined with any of the
above packages to provide
a solution tailored to your
needs
93
DL Student Research: Torres
• Search in collections of fish images
• using combination of
• image properties (CBIR) and
• textual descriptions
94
Textual information retrieval
Query on Google using Sunset and Rio de Janeiro
Query
result
95
Content Based
Information
Retrieval
96
Torres: Visualizations
Concentric Rings Pattern
Spiral Pattern
97
Outline
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
98
Chapter 3 Overview
• Digital Objects
– Documents, digitization, packaging (METS), interchange,
standards, format conversion
– Genre: plays, encyclopedia, dictionaries, educational resources:
courses (e.g., syllabi) and lessons
– Structural organizations (books, chapters, sections),
excerpts/spans (mark, superimposed info)
• Metadata: standards, markup
• Knowledge Structures & Representations
– Databases, Schema, Ontologies, Thesauri, Lexicons, Authority
files, Concept maps, Semantic networks
• Indexes
– Inverted files, signature files, R-trees, Quad trees, etc.
• Clusters & Classification Schemes
99
Degree of Structure
Web
DLs
DBs
Chaotic
Organized
Structured
100
Digital Objects (DOs)
• Born digital
• Digitized version of “real” object
– Is the DO version the same, better, or worse?
– Decision for ETDs: structured + rendered
• Surrogate for “real” object
– Not covered explicitly in metamodel for a
minimal DL
– Crucial in metamodel for archaeology DL
101
Metadata Objects (MDOs)
•
•
•
•
•
•
•
•
MARC
Dublin Core
RDF
IMS
OAI (Open Archives Initiative)
Crosswalks, mappings
Ontologies
Topics maps, concept maps
102
Complex to Simple
+
thesis
MARC ($50)
Dublin Core (DC)
103
Also Important: Epub, SGML, XML
• 5S perspective: streams, structures,
scenarios
• Authoring
• Rendering, presenting
• Tagging, Markup, DOM
• Semi-structured information
• Dual-publishing, eBooks
• Styles (XSL, XSLT)
• Structured queries
104
105
106
107
108
109
Databases
• 5S perspective: structures, streams,
scenarios
• Extending database technology
• Structured and unstructured info
• Multimedia databases
• Link databases
• Performance, transaction processing
• Replicated storage, rollback/recovery
110
PACS Automatic Classification
111
Outline
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
112
Chapter 4 Overview
• Retrieval models
– Boolean, extended Boolean
– Vector, LSI
– Probabilistic: classical, belief network,
inference network, language models
• User interfaces and visualization
113
User interfaces and visualization
•
•
•
•
2D interfaces
3D interfaces
GIS
Other paradigms
• Stepping Stones and Pathways
– http://fox.cs.vt.edu/SSP/
114
Outline
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
115
Chapter 5 Overview
• Recall OO for streams – now have objects as
well as scenarios – ex interface components
• Information Access
– Searching: ad hoc, filtering/routing
– Browsing: using an organization, using a
visualization, using links (i.e., hypertext, hypermedia)
– Workflow: sessions, feedback, etc.
• Scenario-based Design
• Usability: goals, tasks, claims
• NOTE: this is covered in the outline
116
Outline
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
117
Chapter 6 Overview
• User communities
– Authors, editors, teachers, students, readers
– Personal(ization), group(ware), community, global
– Accessibility, universal access
• Librarians: reference, acquisition, operations
• Research community
– Associations, conferences, publications, labs, projects
• Economics
– Copyright, intellectual property rights, digital rights
management, authorization, authentication, security,
privacy, self-archiving (eprints)
– Publishers, catalogers, distributors, sustainability
– Open source, commercial, hybrid
118
119
Outline – Part 2
• Part 2 – Higher DL Constructs
– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
120
5S and DL formal definitions and compositions (April 2004 TOIS)
relation (d. 1)
sequence graph (d. 6)
(d. 3)
measurable(d.12), measure(d.13), probability (d.14),
language (d.5)
vector (d.15), topological (d.16) spaces
sequence
tuple (d. 4)*
(d.
3)
function
state (d. 18)
event (d.10)
(d. 2)
5S
grammar (d. 7)
streams (d.9)
structures (d.10) spaces (d.18) scenarios (d.21) societies
(d. 24)
services (d.22)
structured
stream (d.29)
digital
object
(d.30)
structural
metadata
specification
(d.25)
transmission collection (d. 31)
(d.23)
repository
(d. 33)
descriptive
metadata
specification
(d.26)
metadata catalog
(d.32)
(d.34)indexing
service
hypertext
(d.36)
browsing
service
(d.37)
digital
library
(minimal) (d. 38)
searching
service (d.35)
121
Streams
image
contains
metadata
specifications


describes
Collection
Catalog
text
audio
video
contains
Structures
is_version_of/
cites/links_to
describes
digital
object
Index
stores
Measurable
is_a
Measure
employs
produces
Topological
Repository
employs
produces
is_a
is_a Vector Metric
Probabilistic
Spaces
employs
produces
inherits_from/includes
runs
Service

extends
reuses
Scenario
precedes
contains
happens_before
event
Scenarios
Societies
Service
Manager
uses
participates_in Actor
recipient

association
operation
executes
122
redefines
invokes
Outline – Part 2
• Part 2 – Higher DL Constructs
– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
123
Chapter 7 Overview
•
•
•
•
Terminology: set, “database”
Distributed: basis, efficiency/effectiveness
Parallelism: federation, harvesting
Scale: object size, compression, replication,
stream splitting
• Intelligence/processing granularity: object,
cluster, collection, repository
• NOTE: covered in outline
124
Outline – Part 2
• Part 2 – Higher DL Constructs
– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
125
Chapter 8 Overview
•
•
•
•
•
OPACs
Distributed vs. centralized
Coverage, breadth
Specificity, depth
Management: versioning, works
• NOTE: covered in outline
126
Outline – Part 2
• Part 2 – Higher DL Constructs
– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
127
Chapter 9 Overview
• Naming, identifiers
• Architectures, interoperability
– OAI: harvesting
– SRU/SRW: federation
• Preservation, archives
– LOCKSS, UVC, emulation/migration
• Scalability, storage
128
LOCKSS
•
•
•
•
•
Lots of copies keep stuff safe
Stanford (Vicky Reich)
Initial focus on lower levels
Initial content: journals
Emory (Martin Halbert)
– Help deploy and adapt
– Help apply in other contexts
• Another registry
• Set of publisher manifests (information providers)
• Set of storage systems (archival storage)
– NDIIP: AmericanSouth, MetaArchive
129
Digitization and Preservation
Community and Activity (selected)
• Archivists worldwide
• International collaboration
– Million book project in US, China, India (Reddy, Chen,
Balakrishnan)
• US Library of Congress
– Matching funds
– American Memory
– Infrastructure: NDIIP
• Dutch National Library + IBM
• Associations: ARL, DLF
• People
– Harnad: Self-archiving movement
– Lorie: Universal virtual computer
– Gladney: technology, philosophy
(http://home.pacbell.net/hgladney/ddq_3_1.htm)
– Besser, Trant, …
130
OAI - Open Archives Initiative
• Advocacy for interoperability
• Standard for transferring metadata among
digital libraries
– Protocol for Metadata Harvesting (PMH)
• Simplicity
• Generality
• Extensibility
• Support for PMH => Open Archive (OA) 131
OAI = Technical Umbrella for
Practical Interoperability…
Reference
Libraries
Museums
Publishers
E-Print
Archives
…that can be exploited by different communities
132
OAI – Repository Perspective
Required: Protocol
MDO
MDO
MDO
MDO
MDO
MDO
MDO
MDO
DO
DO
DO
DO
133
OAI – Black Box Perspective
OA 7
OA 4
OA 2
OA 1
OA 3
OA 6
OA 5
134
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
135
The World According to OAI
Service Providers
Discovery
Current
Awareness
Preservation
Data Providers
136
137
138
139
Institutional Repositories - 1
• “Institutional repositories are digital
collections that capture and preserve the
intellectual output of a single university or
a multiple institution community of colleges
and universities.”
• Crow, R. “Institutional repository checklist
and resource guide”, SPARC, Washington,
D.C., USA
• www.arl.org/sparc/IR/IR_Guide_v1.pdf
140
Institutional Repositories - 2
• “A university-based institutional repository is a set
of services that a university offers to the members
of its community for the management and
dissemination of digital materials created by the
institution and its community members. It is most
essentially an organizational commitment to the
stewardship of these digital materials, including
long-term preservation where appropriate, as well
as organization and access or distribution.”
• Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7,
Feb. 2003, www.arl.org/newsltr/226/ir.html
141
142
What is a
Digital Object Repository?
Also called: digital rep., digital asset rep.,
institutional repository
Stores and maintains digital objects (assets)
Provides external interface for Digital
Objects
Creation, Modification, Access
Enforces access policies
Provides for content type disseminations
Adapted from Slide by V. Chachra, VTLS
143
Goals of Institutional Repositories
(by Steven Harnad, U. Southampton)
 Self Archiving of Institutional Research
Thesis and Dissertations (VTLS NDLTD Project)
Article preprints and post prints
Internal documents and maps
 Management of digital collections
 Preservation of materials – decentralized approach
 Housing of teaching materials
 Electronic Publishing of journals, books, posters,
maps, audio, video and other multimedia objects
Adapted from Slide by V. Chachra, VTLS
144
Outline – Part 2
• Part 2 – Higher DL Constructs
– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
145
Chapter 10 Overview
•
•
•
•
Taxonomy of services
Ontology, composition, reuse
Evaluation
Key services in-depth:
– Crawling, indexing
– Clustering, classifying
– Recommending, using social networks
– Logging
146
147
Ontology: Applications
• Expand definition of minimal DL by
characterizing
– typical DL services
– in the context of “employs” and “produces”
relationships
• Use characterization to:
– Reason about how DL services can be built
from other DL components
– As well as be composed with other services
through extension or reuse
148
Infrastructure Services
Repository-Building
Creational
Preservational
Acquiring
Cataloging
Crawling (focused)
Describing
Digitizing
Federating
Harvesting
Purchasing
Submitting
Conserving
Converting
Copying/Replicating
Emulating
Renewing
Translating (format)
Add
Value
Annotating
Classifying
Clustering
Evaluating
Extracting
Indexing
Measuring
Publicizing
Rating
Reviewing (peer)
Surveying
Translating
(language)
Information
Satisfaction
Services
Browsing
Collaborating
Customizing
Filtering
Providing access
Recommending
Requesting
Searching
Visualizing
149
Ontology: Applications
150
Composition of key fundamental /
infrastructure services
universal
collection
Authoring
Digitizing
p
Describing e doi
Cataloguing
e e
Acquiring
p
mskj
p
e
p
C
e
de
scr
Submitting
p
ibe
s
DMC
Indexing
p
Ic
Linking
p
Hypertext
151
Infrastructure
Information
Satisfaction
Services
Services (Add_Value)
Rating
Indexing
p
Training
p
{(digital object, Index
actor, rate) }
Society
actor
p
handle
anchor
e
classifier
e
Browsing
e
Requesting
p
p
e
e
user model
query/category
e
e
Recommending
p
{digital object}
e
e
Searching
p
Collection, {digital object}
e
Filtering
Binding
p
p
{digital object}
query
e
binder
e
fundamental
composite

{digital object}
transformer
e
e
e
Visualizing
Expanding query
p
p
space
query’
152
Formal
Theory/
Metamodel
5S
Requirements
5SGraph
5SL
Analysis
DL XML
Log
5SLGen
OO Classes
Workflow
Design
Components
Implementation
DL
Evaluation
Test
153
XML-based DL Log Standard
• Log analysis
– is a source of information on:
• How patrons really use DL services
• How systems behave while supporting user information
seeking activities
• Used to:
– Evaluate and enhance services
– Guide allocation of resources
• Common practice in the web setting
– Supported by web servers, proxy caches
• DL Logging can be more detailed
154
DL Logging Features
• Captures high level user and system
behaviors
• Organized according to the 5S framework
–
–
Hierarchical organization (XML-based)
Centered on the notions of events
•
•
Record only events related to initial user inputs
and final system outputs
Help to understand user interactions and the
perceived value of responses
155
The XML Log Format
Log
Transaction SessionId MachineInfo Timestamp
Event
StatusInfo
Search
SearchBy
SessionInfo
RegisterInfo
Timestamp
Statement
Action
Browse
QueryString
Statement
Update
Collection Catalog
StoreSysInfo
Timeout
PresentationInfo
156
What is a Crawler?
• A Program
• An Important Module For Web Search Engine
• Crawls On The Web According To Its
Algorithm
• Retrieves Web Pages
• Gets Useful Information
• Stores The Web Pages For Future Refining
157
Jobs For Threads
Get A New URL
From Buffer
Put New URLs
Into Buffer
Contact The Server
For File Type
Parse The
Web Page
Download The
File
158
Advanced Functions
• Backward Linkage Information Collector
A Web Page
159
Outline – Part 2
• Part 2 – Higher DL Constructs
– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
160
Chapter 11 Overview
• Architectures
– Client-server, service-oriented
– P2P, Grid
• System descriptions and comparisons
– Personal DLs; Institutional to global
– DSpace, Eprints, Fedora, Greenstone, Kepler
• ODL
• 5S Suite: language, visualization,
generation, logging
161
Architectural Issues
•
•
•
•
•
Independent system vs. part of federation
Centralized vs. distributed vs. open services
Monolithic vs. modular vs. componentized
Topologies: bus vs. star vs. hierarchical vs. network
Decompositions vary
– search engine, browser, DBMS, MM support
– repository, handle server, client
– information resources + mediators, bus or agent
collection + client with workspace/environment
162
Clusters
• How can computer clusters scale with
collections and user communities to achieve
cost-effective solutions for DLs?
• Paul Mather dissertation by mid 2005
• Modeling and simulation
• Cluster size
• Communication fabric and patterns
• Disks and nodes
• Characterize DL collections: file sizes
• Characterize user workload: logs
• Special considerations:
– Linear hashing of names
– Replication of popular objects
163
Also Important: Agents
• 5S perspective: societies, streams, spaces,
scenarios, structures
• Protocols: light-weight
• Knowledge interchange: mediators, wrappers
• Negotiation, registries
• Distributed issues
• Webbots (automatic indexing)
• Ontologies (standard upper)
164
165
166
167
168
169
170
171
What is Fedora™?
Flexible Extensible Digital Object
Repository Architecture
• Slides courtesy Vinod Chachra of VTLS
172
History of Fedora™
• 1997-Present
– DARPA and NSF-funded research project at Cornell
(Conceptual framework developed by Sandra Payette and Carl
Lagoze)
– Reference implementation developed at Cornell
• 1999-2001
– University of Virginia digital library prototype (Thornton
Staples and Ross Wayland)
• 2002-Present
– Andrew W. Mellon Foundation granted Virginia and Cornell $1
million to develop a production-quality Fedora system
– Fedora 1.0 released in May 2003 as Open Source under the
Mozilla public license.
173
Fedora™ Terms
Metadata
Digital Objects (data)
Complex Objects (Object consisting of many
objects in a complex/hierarchical relationship)
Content (Data and Metadata together)
Data-streams (are content for dissemination)
Disseminators (are services) – A
dissemination is defined as a stream of data
that manifests a view of the digital objects
174
content.
Fedora™ Digital Object Architecture
Persistent ID (PID)
Disseminators
Globally unique persistent id
Public view: access methods
for obtaining “disseminations”
of digital object content
Internal view: metadata
necessary to manage the object
System Metadata
Datastreams
EAD, TEI, DC, MARC,
VRA Core, MIX, etc.
Images, E-books, E-journals,
Music, Video, etc.
Protected view: content
that makes up the “basis”
of the object
The Mellon Fedora Project
Adapted from Slide by V. Chachra, VTLS
175
Digital Object w. multiple datastreams
Digital Object
DC
Datastreams
Datastreams
EAD
Admin
Metadata
EA
D
176
Example Disseminators
Persistent ID (PID)
Disseminators
Default
Get Profile
List Items
Get Item
List Methods
Get DC Record
Simple Image
System Metadata
Datastreams
Get Thumbnail
Get Medium
Get High
Get VeryHigh
177
Client
Application
Fedora™
Repository
Batch
Program
Web
Browser
HTTP SOAP
HTTP SOAP
HTTP SOAP
Manage
Access
Search
Server
Application
Web Service
Web Service
Exposure
Exposure
Layer
Layer
HTTP
OAI Provider
Session Management
User Authentication
Management
Subsystem
Security
Subsystem
Access
Subsystem
Policy Mgmt
Object Reflection
Component Mgmt
Policy Enforcement
Object Dissemination
HTTP
Object Validation
Users/Groups
PID Generation
External
Content
Source
HTTP
FTP
External Content
Retriever
Digital Objects
XML Files
Datastreams
HTTP
Local
Service
Policies
Storage Subsystem
FT P
External
Content
Source
SOAP
Object Mgmt
Remote
Service
Content
Relational DB
Adapted from Slide by V. Chachra, VTLS
178
Fedora Advantage
• Extensible digital object model
• Repository exposed by Web services APIs
– Management (Creation, Deletion, Maintenance,
Validation)
– Access (Search, Disseminations)
• Scalable, persistent storage for content and
metadata
• Content can be local and/or remote
• Content versioning
• Open source solution
179
Comparison of DSpace and Fedora
 Dspace is a standalone product in a box whereas
Fedora can be standalone or integrated with ILS
 In Fedora the metadata and the content are treated
the same way as data-streams; in Dspace the
metadata and content get separate treatments.
 Fedora can define complex objects easier
 Dspace is not as extensible as Fedora as it deals
both with the repositories and workflows. Fedora
focuses only on the data model.
 Fedora uses the Mozilla licensing model and
Dspace uses GNU license. It makes it easier for
software companies to provide extensions to the
180
model.
Putting it all together - VITAL
Some observations:
It is easy to digitize and manage a few images;
scalable solutions are more difficult to create.
Quality has to planned for before the project starts;
it cannot be introduced afterwards.
Productivity (cost control) in digitization project is
essential for ultimate success as the numbers are
181
large.
VITAL - Introduction
Digital Asset Management System - based
on the Fedora – Open-Source Digital
Object Repository Architecture
Software for creating, storing, managing,
cataloging, indexing, searching &
retrieving your digital collections
Backed by VTLS software and service
solutions designed to meet your needs
182
VITAL / Fedora Relationship
183
Four Components of VITAL
1. Fedora™ Repository
UNIX/LINUX, SUN and Windows
2. VITAL Manager
Based on Windows 2000 and XP
Integrated with
XML Cataloging Utility
Has a digital object loader
3. VITAL Web Portal
UNIX/LINUX and Windows
4. Oracle Database (Optional)
184
VITAL Manager
2.1 Collection Management
Functions
2.2 XML/METS Metadata Storage,
Linking, Retrieval & Export
2.3 XML (Dublin Core) Editing &
Indexing
2.4 Uses Fedora™ Digital Object
Search Tool
2.5 Easy Image Management &
Import with some Automatic
Metadata Creation and Linking
185
VITAL Collection Management
 Supports multiple
Fedora™ repositories
 Collections can be
dispersed across
locations
 Repositories can contain
diverse digital object types
 VITAL facilitates easy
loading and searching of
repositories
186
VITAL XML Metadata Storage
 Standards based
 XML/METS Schema
 Dublin Core
 EAD
 MARC
 Additional formats can
be added quickly like
AMICO XML format
 Metadata may be
exported in XML for
use in other
applications
187
VITAL XML Editing
 Cataloging/editing
with XMLSpy
Software
 Templates planned
for Dublin Core, EAD,
and MARC
 Additional 3rd Party
XML tools may be
used
 XMetal from Corel
 Microsoft InfoPath
188
VITAL Image Management
• Easy import of digital
objects and images
– Watched Folder
– VITAL Import Tool
• Digital object versioning
– Changes made to the
digital objects are
recorded in the
repository
• VITAL automatically
creates technical metadata
for the digital object by
recognizing the imported
files mime type
189
VITAL Image Management
 Integrates with any
TWAIN source
scanning software or
imaging application
 Images can be
immediately verified
prior to load - through
the VITAL Manager
preview window
 Tools to facilitate the
digitization of all
materials including,
rare objects and
historical documents
190
VITAL Manager Client Details
191
VITAL Manager Client Details
 Search the repository to locate digital objects
and their associated image, text and metadata
 Launch the software of a TWAIN compliant
scanner or digital camera directly from VITAL
and load the digitized images in one step
 Import one or many image, text, sound and
other digital files into the repository and have
the basic metadata created dynamically based
on mime type
 Configure a “watched” folder from your favorite
application to automatically move files into the
repository
192
VITAL Manager Client Details
The VITAL Manager Client
allows for easy navigation and
searching of your digital object
repository
193
VITAL Manager Client Details
A search of the repository
produces the digital object
reference and its associated
datastreams - reflected here
with a local digital image file
and the Dublin Core metadata
describing this object
194
VITAL Manager Client Details
Datastreams can be edited using
linked applications – Metadata
datastreams such as Dublin Core
are modified by integrated XML
editors such as XMLSpy or XMetal
195
VITAL Manager Client Details
1
2
VITAL facilitates adding
additional scanned images to
the repository by providing an
easy to use interface compliant
with any TWAIN scanning or
digital camera device
3
196
VITAL Manager Client Details
VITAL features an Import/Ingest
tool for loading digital images,
text, metadata, etc., from your
local or networked file system into
the repository – individual or
multiple files may be added to the
repository using this workflow
197
VITAL Web Portal
198
VITAL Web Portal
• Z39.50 compliant – compatible with
any integrated library system
• Sophisticated display for Encoded
Archival Description (EAD), Dublin
Core and MARC
• Includes the VTLS Hi-Res Image
Navigator – uses Wavelet compression
for incredibly detailed viewing of your
images
– Supports MrSID and JPEG2000
encoded image files
• Instant access to digital content
anytime, anywhere, to anyone with a
web browser
199
VITAL Web Portal
The VITAL Access Portal is a
Z39.50 compliant, web-based
software interface for searching
and retrieving digital objects from
the repository
200
VITAL Web Portal
The VITAL Access Portal has a
completely configurable interface
– institutions can create their own
look and feel for the front-end and
provide a variety of search options
including pre-defined searches to
assist their users in locating
groups of digital objects in the
repository
201
VITAL Web Portal
The results screen presents a list
of digital objects that satisfy the
search term(s) – clicking on the
hyperlinks to the left will bring up
the digital object summary screen
202
VITAL Web Portal
Text documents in Word, PDF and
DjVu may launched into the
browser by clicking on the
“Content” datastream icon
Dublin Core and Digital File
(DjVu) Datastreams
203
Other Commercial DL Examples
• IBM Digital Library
• Virtua (www.vtls.com)
• Some systems from NSF DLI projects
– Google
204
The World According to OAI:
Open Archives Initiative –
Protocol for Metadata Harvesting
Service Providers
Discovery
Current
Awareness
Preservation
Data Providers
205
ODL: Open Digital Library
Document
Document
Document
1010100101
?
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Video
Video
Video
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
users
Program
Program
Program
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Image
Image
Image
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
digital objects
206
Monolithic
and/or
Custom-built
web-based
application
?
Document
Document
Document
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Program
Program
Program
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Video
Video
Video
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Image
Image
Image
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
digital library
207
Document
Document
Document
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Program
Program
Program
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Image
Image
Image
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
?
?
?
Video
Video
Video
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
componentized digital library
208
Document
Document
Document
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
XPMH
OA
OA
XPMH
XPMH
OA
OA
OA
XPMH
XPMH
XPMH
PMH
XPMH
OA
XPMH
XPMH
XPMH
OA
OA
OA
XPMH
PMH
Program
Program
Program
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Image
Image
Image
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Video
Video
Video
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
open digital library
209
Protocol for
Metadata
Harvesting
Extended OAI-PMH
Open Digital Library Protocol
210
OPEN
ARCHIVE
Extended OPEN ARCHIVE
Open Digital Library Component
211
Open Digital Library Deployments
• NDLTD (www.ndltd.org)
• Computer Science Teaching Center
(www.cstc.org)
• Computing and Information Technology
Interactive Digital Educational Library
(www.citidel.org)
• Open Archives Distributed (NSF, DFG) –
enhancements to PhysNet
• OCKHAM
• Open to others through DL-in-a-box
212
Open Digital Library
• Network of Extended Open Archives
where each node acts as either a provider
of data, services or both.
• Component = Node
• Protocol = Arc
213
Open Digital Library
Components
• Running now
– XML-File (data provider from file system)
– Search: simple or in-memory (Essex) or
generalized
– Union, browse, recent, filter
– E-journal/review, Submit, Edit, Annotation
– Recommender, Rating; Mirroring (see JCDL’02)
– Working with NCSA: from DB, unstructured text
• Others in process
– Classification/categorization
214
– Registry (and other connections with web services)
Example Open Digital Library
Document
Document
ETD-1
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
ODLRecent
Recent
USER INTERFACE
ODLUnion
PMH
Filter
PMH
ODLUnion
Browse
Union
PMH
ODLBrowse
ODLUnion
PMH
Filter
PMH
Search
ODLSearch
Students and
researchers
Program
Program
ETD-2
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
ETD DL for the Networked Digital
Library of Theses and Dissertations
(www.ndltd.org)
Image
Image
ETD-3
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Video
Video
ETD-4
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
ETD collections
215
OAI, ODL, DL-in-a-box
• Open Archives Initiative
– since 1999, www.openarchives.org
• Open Digital Libraries
– since 2001, from www.dlib.vt.edu
– with Hussein Suleman (now U. Cape Town)
• DL-in-a-box
– NSDL support since 2001
– Aimed to help new collections / services projects
– http://dlbox.nudl.org
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
5S Modeling -> Systems
represented by
Domain
Concepts
(theory)
instance of
interpreted as
used
to compose
abstracted
from
Modeling
Language
(Meta-Model)
instance of
represented by
DL
Architecture
Model
interpreted as
instance of
instance of
Running
DL
“real” world
object
Actors
Q
“Real”
World
231
Tools/Applications
5S
Meta
Model
DL
Expert
5SGraph
DL
Designer
Practitioner
5SL
DL
Model
Teacher
component
pool
ODLSearch,
ODLBrowse,
ODLRate,
ODLReview,
…….
Researcher
5SLGen
Tailored
DL
Logging Module
XML
Log
232
5SL: a DL design language
• Domain specific languages
– Address a particular class of problems by offering
specific abstractions and notations for the domain at
hand
– Advantages: domain-specific analysis, program
management, visualization, testing, maintenance,
modeling, and rapid prototyping.
• XML-based realization of 5S
– Interoperability
– Use of many sub-languages (e.g., MIME types, XML
Schemas, UML notations)
233
5SL – The Minimal DL Metamodel
Scenarios
(Meta-) Model
Societal
(Meta-) Model
Meta-Models
Meta-Models
Primitives
uses Actor
runs
Service
Scenario
receiver
Community
Service
Event
Manager
Interface
Manager
Index
Manager
Search
Manager
Collection
Index
User
Repository
Manager
Browsing
Manager
Catalog
Interface
Document
Metadata
Retrieval
Model
Text
Spatial
Stream
(Meta-) Model
(Meta-)Model
Video
Audio
Structural
(Meta-) Model
Image
234
Example of
Document
declaration in the
Structures Model
<document name=`ETD'>
<stream_enumeration>
Example of Actors
declaration in the
Societies Model
<Society>
<Actor>
<Community name='Patron‘/>
<Attribute name='name‘
<stream
type='String'/>
value=`ETDText'>
<Attribute name='ID‘
type='Integer'/>
<stream
value=`ETDAudio'>
...
</Community>
<Community name='Student'>
<Service>Converting</Service>
</stream_enumeration>
</Community>
<structured_stream>
<Community name='ETDReviewer'>
<Service>Reviewing</Service>
%XMLSchema%
<structured_stream>
</document>
</Community>
<Community name='ETDCataloguer'>
<Service>Cataloguing</Service>
</Community>
Example of Service
declaration in the
Scenario Model
<SERVICE name ='Searching'>
<SCENARIO name='SimpleSearching'>
<NOTE>Simple scenario for an NDLTD
site searching service</NOTE>
<EVENT>
<SENDER>Patron</SENDER>
<RECEIVER>InterfaceManager</RECEIVER>
<OPERATION name=SearchCriteria/>
<PARAMETER>collection</PARAMETER>
<PARAMETER>query</PARAMETER>
</EVENT>
<EVENT>
<SENDER>InterfaceManager</SENDER>
<RECEIVER>SearchManager</RECEIVER>
<OPERATION name='Search'/>
<PARAMETER>collection</PARAMETER>
<PARAMETER>query</PARAMETER>
</EVENT>
<EVENT>
</Actor>
<SENDER>SearchManager</SENDER>
………
<RECEIVER>InterfaceManager</RECEIVER>
<PARAMETER name='Results'>WtdSet
</PARAMETER>
</EVENT>
….
235
5SGraph: A DL Modeling Tool
•
•
•
Help users model their own instances of a
digital library (DL) in the 5S language (5SL).
A simple modeling process which enables rapid
generation of digital libraries
Features
–
–
–
5SGraph loads and displays a metamodel in a
structured toolbox.
The structured editor of 5SGraph provides a topdown visual building environment for the DL
designer.
5SGraph produces syntactically correct 5SL files
according to the visual model built by the designer.
236
Overview of 5SGraph
Workspace
(instance model)
Structured
toolbox
(metamodel)
237
238
239
240
241
5SGraph: Other Key Features
• Flexible and extensible architecture
• Reuse of models
– Load, save, and change common (sub-)
models
• Synchronization of views
• Enforcing of semantic constraints
242
5SGraph Evaluation: Usability
Study
Task 1
100
11.3
0.483
97.4
Completion Rate (%)
Mean Task Time (min)
Mean Closeness to Expertise
Mean Goal Achievement (%)
Task 2
100
11.4
0.752
97.4
Task 3
100
15.1
0.712
98.2
10
10
9
9
8
8
7
7
6
6
Satisfaction
Usefulness
5
5
4
4
3
3
2
2
1
1
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Pre-Understanding
Post-Understanding
243
5SGen
• Version 1 -- MARIAN as the target system
– Focused on rich structures: semantic networks
– Behavior attached to nodes/links
• Version 2 -- Shifted for later work to
componentized (ODL) approach
– Focused on scenarios/societies
– Structures/Spaces encapsulated within components
(e.g., relational tables, indexes)
– Only textual streams supported
244
5SLGen – Version 2: ODL,
Services, Scenarios
5SL-Scenario
Model (6)
DL
Designer
Component
Pool
XMI:Class
Model (3)
ODL
Search
Wrapping
Wrapping
import
import
Scenario
Synthesis (9)
Deterministic
FSM (10)
Xmi2Java (4)
Java
Classes
Model (5)
DL
Designer
StateChart
Model (8)
5SLGen
Java
ODL
Browse
XPath/JDOM
Transform (7)
XPATH/JDOM
Transform (2)
.
.
.
Java
5SL-Societies
Model (1)
SMC (11)
superclass
Java
Finite
State Machine
Class
Controller (12)
binds
JSP
User
Interface
View (13)
245
Generated DL Services
5SLGen
• Proof of Concept: prototyping
– CITIDEL
– Viaduct
– NDLTD Union Catalog
– BDBComp
246
Outline – Part 2
• Part 2 – Higher DL Constructs
– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
247
OCKHAM Library Network
NSDL
Services
NSDL
OCKHAM
Library
Network
OCKHAM
Services
Library
Services
Teachers
Learners
Librarians
248
OCKHAM
• Simplicity (a la OCCAM’s razor)
• Support by Mellon and DLF
• Four main ideas:
1. Components
2. Lightweight protocols
3. Open reference models (e.g., 5S, OAIS)
4. Community perspective and involvement
• Funded by NSF in NSDL, with P2P
249
Lightweight Protocols
• “Lightweight”, or relatively small and
simple protocols seem to have clear
advantages over “Full” protocols that
attempt to be comprehensive.
• Successes of protocols considered
lightweight is illuminating.
• Examples: TCP/IP, HTTP, LDAP, and
the OAI PMH
250
Reference Models
• Reference Model: a common vocabulary
and description of components, services,
and inter-relationships that comprise a
system under consideration
• Useful as a tool to foster consensus and
common understanding in a time of rapid
change and/or disagreement (e.g., OAIS)
251
OCKHAM Proposed Services
•
•
•
•
•
•
•
•
Alerting
Browsing
Cataloging
Conversion
OAI – Z39.50
Pathfinding
Registry
(plus others such as from adapted ODL)
252
CS -> CSTC
• NSF and ACM Education Committee funded
a 2 year project “A Computer Science
Teaching Center” - CSTC http://www.cstc.org/
• College of NJ, U. Ill. Springfield, Virginia Tech
• Focus initially on labs, visualization,
multimedia
• Multimedia part supported by a 2nd grant to
Virginia Tech and The George Washington
University (with curricular guidelines)
CS Teaching Center (CSTC)
• Instead of building large, expensive multimedia
packages, that become obsolete and are difficult to
re-use, concentrate on small knowledge units.
• Learners benefit from having well-crafted modules
that have been reviewed and tested.
• Use digital libraries to build a powerful base of
support for learners, upon which a variety of courses,
self-study tutorials & reference resources can be built.
• ACM support led to Journal of Educational Resources
in Computing (JERIC): completed 2 co-EIC terms
255
Browsing (1)
256
Browsing (2)
257
258
259
260
Computing and Information
Technology Interactive Digital
Educational Library (CITIDEL)
• Domain: computing / information
technology
• Genre: one-stop-shopping for teachers &
learners: courseware (CSTC, JERIC),
leading DLs (ACM, IEEE-CS, DB&LP,
CiteSeer), PlanetMath.org, NCSTRL
(technical reports), …
• Submission & Collection: sub/partner
collections  www.citidel.org
261
www.CITIDEL.org
• Led by Virginia Tech, with co-PIs:
– Fox (director, DL systems)
– Lee (history)
– Perez (user interface, Spanish
support)
• Partners
– College of New Jersey (Knox)
– Hofstra (Impagliazzo)
– Villanova (Cassel)
– Penn State (Giles)
Multi-dimensional Categorization
Quality
Peer reviewed
Editor reviewed
Nominated
Identified by crawl
Algorithms
Java
English
Multimedia
Spanish
La n g u a g e
To p i c
263
Overview of CITIDEL architecture
USER PORTALS
DIGITAL LIBRARY SERVICES
REPOSITORIES
264
Distributed repository structure
Digital Library Services
OAI
Data
Provider
Applets
Repository
OAI
Data
Harvester
Union Metadata
Repository
Laboratories
Repository
Syllabi
Repository
Papers
Repository
...
265
Digital library architecture for local
and interoperable CITIDEL services
EDUCATORS
Multilingual
Searching
LEARNERS
Browsing
Union Metadata
Filtering
Filtering Profiles
OAI
Data
Provider
Annotating
ADMINISTRATORS
Revising
Administering
User Profiles
Annotations
PORTALS
SERVICES
REPOSITORIES
OAI
Data
Harvester
Remote and Peer Digital Libraries (eg. NSDL -CIS)
266
CITIDEL: Computing & Information Technology
Interactive Digital Education Library
267
268
269
270
271
272
CITIDEL Technology Features
•Component architecture (Open Digital Library)
•Re-use and compose re-deployable digital library components.
•Built Using Open Standards & Technologies
•OAI: Used to collect DL Resources and DL Interoperability
•XSL and XML: Interface rendering with multi-lingual community
based translation of screens and content (Spanish, …)
•Perl: Component Integration
•ESSEX: Search Engine Functionality
•Very fast, utilizing in-memory processing
•Includes snap-shots for persistence
•Multi-scheming
•Integrates multiple classifications / views through maps, closure
273
274
Cluster Search Results from CITIDEL
275
Cluster NDLTD-Computing
276
CITIDEL + PIPE
• Adds Interaction
Personalization to
CITIDEL
•Automatically
handles multi-modal
conversion to Cell
phone, PDA, Etc.
•Can be adopted to
any digital data set,
only requires XML file
of content with
hierarchy maintained.
277
278
CITIDEL -> NSDL
• A collection project in the
• National STEM (science, technolgy,
engineering, and mathematics)
education Digital Library – NSDL
• National Science Digital Library
• www.nsdl.org
• (Next slides courtesy Lee Zia, NSF)
Connects:
Users: students, educators, life-long learners
Content: structured learning materials; large
real-time or archived datasets; audio, images,
animations; primary sources; digital learning
objects (e.g. applets); interactive (virtual,
remote) laboratories; ...
Tools: search; refer; validate; integrate; create;
customize; publish; share; notify; collaborate;
280
...
Supports:
Learning communities
Users
(profiles)
Application services
Tools
Customizable collections
(protocols)
Content
(metadata)
281
Enables:
Environments for
• Discovery
• Communication
• Stability
• Collaboration
• Reliability
• Creation
• Reusability
• Validation
AND
• Interoperability
• Evaluation
• Customizability
• Recognition
• ...
• ...
of Resources
282
NSDL ProgramTracks
• Core Integration: coordinate a distributed alliance of resource
collection and service providers; and ensure reliable and
extensible access to and usability of the resulting network of
learning environments and resources
• Collections: aggregate and actively manage a subset of the
digital library’s content within a coherent theme / specialty
• Services: increase the impact, reach, efficiency, and value of
the digital library in its fully operational form
• Targeted (Applied) Research: have immediate impact on
one or more of the other three tracks
• Pathways: large efforts across broad ranges of areas or
approaches or users
283
Collections
•
•
•
•
Discovery of content
Classification and cataloguing
Acquisition and/or linking; referencing
Disciplinary-based themes define a natural body of
content, but other possibilities are also encouraged
• Access to massive real-time or archived datasets
• Software tool suites for analysis, modeling,
simulation, or visualization
• Reviewed commentary on learning materials and
pedagogy
284
Services
• Help services, frequently asked questions, etc.
• Synchronous/asynchronous collaborative
learning environments using shared resources
• Mechanisms for building personal annotated
digital information spaces
• Reliability testing for applets or other digital
learning objects
• Audio, image, and video search capability
• Metadata system translation
• Community feedback mechanisms
285
286
287
288
289
NSDL Information Architecture
Essentially as developed by the Technical Infrastructure Workgroup
Portals &
Portals &
Clients
Portals &
Clients
Clients
User
Interfaces
Core
NSDL
“Bus”
NSDL
NSDL
NSDL
Collections
Collections
Collections
Collection
Building
referenced
referenced
items&&
Special
items
collections
Databases
collections
Core
Core Services:
Collectionmetadata
Building
Core gathering
CollectionServices
protocols
Building
Services
harvesting
NSDL
NSDL
Services
Other
NSDL
Services
Services
Usage
Enhancement
Core
Services:
CI Services
information
retrieval
CI Services
browsing
CI
Services
authentication
CI Services
personalization
CI Services
discussion
annotation
290
Digital Libraries in Education
•
•
•
•
•
•
•
Analytical Survey, ed. Leonid Kalinichenko
© 2003, www.iite-unesco.org, [email protected]
Transforming the Way to Learn
DLs of Educational Resources & Services
Integrated/Virtual Learning Environment
Educational Metadata
Current DLEs: US (NSDL, DLESE,
CITIDEL, NDLTD), Europe (Scholnet,
Cyclades), UK (Distributed National
291
Electronic Resource)
Digital Libraries in Education - 2
• Advanced Frameworks & Methodologies
– Instructional course development with learning
module repositories, Learning Object reuse
– Community organization around DLEs
– Other content for science and research
– Cyberinfrastructure, data grids
– Curriculum-based interfaces (see Krowne et al.)
– Concept-based organization of learning
materials and courses (CMs, ontologies)
292
DLEs: Future Vision (p. 6)
• Global learning environment of the
future:
• Student-centered
• Interactive and dynamic
• Enabling group work on real world problems
• Enabling students to determine their own
learning routes (styles, personalization)
• Supporting lifelong learning
293
DLEs: Objectives (p. 11)
• Long-range: lifelong/distance/anytimeanywhere
• Intermediate goals
– Support for students, teachers, parents
– Enhanced student performance
– More students excited about science
– More Internet-based science educ. resources
• with increased quality and comprehensiveness,
• easy to discover and retrieve,
• preserved and universally available
294
DLEs: Guiding Principles (p. 12)
•
•
•
•
•
•
•
•
Driven by educational and science needs
Facilitating educational innovation
Stable, reliable, permanent
Accessible to all
Leverage prior research: DL, courseware, …
Adaptable to new technologies
Supporting decentralized services
Resource integration thru tools/organization
295
A Digital Library Case Study
• Domain: graduate
education, research
• Genre:ETDs=electronic
theses & dissertations
• Submission:
http://etd.vt.edu
• Collection:
http://www.theses.org
Project:
Networked Digital
Library of Theses
& Dissertations
(NDLTD)
http://www.ndltd.org
297
NDLTD: How can a
university get involved?
• Select planning/implementation team
–
–
–
–
Graduate School
Library
Computing / Information Technology
Institutional Research / Educ. Tech.
• Join online, give us contact names
– www.ndltd.org/join
• Adapt Virginia Tech or other proven approach
– Build interest and consensus
– Start trial / allow optional submission
Student Gets Committee
Signatures and Submits ETD
Signed
Grad School
Library Catalogs ETD, Access is
Opened to the New Research
WWW
NDLTD
Q uickTim e™ and a
Cinepak decom pr essor
ar e needed t o see t his pict ur e.
http://scholar.lib.vt.edu/theses/available/etd-2227102539751141/
301
302
303
ETD Union Collection (OAI)
VIRTUA
ODL (VT)
Future: recommender, …
Merged Metadata
Collection
LEGEND
OAI Data Provider
Virginia
Tech ETD
Archive
OCLC
ETD
Archive
Brazil
ETD
Archive
…
OAI Service Provider
OAI Harvesting
304
Union catalog: OCLC
• OCLC will expand OAI data provider on
TDs.
• Is getting data from WorldCat (so, from
many sites!).
• Will harvest from all others who contact
them.
• Need DC and either ETD-MS or MARC.
• Has a set for ETDs.
305
306
307
308
OCLC SRU Interface
309
310
311
ETD Union Search Mirror Site in China (CALIS)
(http://ndltd.calis.edu.cn – popular site!)
312
313
VTLS Union Catalog
Content Languages

The VTLS NDLTD Union Catalog has data in 6
different languages. These are:
 English
 German
 Greek
 Korean
 Portuguese
 Spanish

Examples follow
314
Language = German; hits = 137
315
Full record display
316
317
318
319
ETDs: Library Goals
• Improve library services
–Better turn-around time
–Always available
• Reduce work
–catalog from e-text
–eliminate handling: mailing to
ProQuest, bindery prep, checkout, check-in, reshelving, etc.
• Save space
320
What are we doing?
• Aiding universities to enhance
graduate education, publishing and
IPR efforts
• Helping improve the availability and
content of theses and dissertations
• Educating ALL future scholars so they
can publish electronically and
effectively use digital libraries (i.e., are
Information Literate and can be more
expressive)
NDLTD Incorporation
• Networked Digital Library of Theses and
Dissertations incorporated May 20, 2003 in
Virginia, USA
• Charitable and educational purposes (501 c 3)
• Officers
– Executive Director (Ed Fox)
– Secretary (Gail McMillan)
– Treasurer (Scott Eldredge)
322
Board of Directors
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Suzie Allard (ETD 2004, U. Kentucky)
•
Denise A. D. Bedford (World Bank)
•
Julia C. Blixrud (ARL, SPARC)
•
•
José Luis Borbinha (Natl Lib Portugal)
Tony Cargnelutti (Ex Libris)
•
Vinod Chachra (VTLS)
•
William Clark (Ohio State U.)
•
Susan Copeland (RGU, UK)
Jude Edminster (Bowling Green St. U.) •
Scott Eldredge (Treasurer, ETD 2002,
•
BYU)
Edward A. Fox (Exec Director,Virginia Tech)
•
John H. Hagen (West Virginia U.)
Thomas B. Hickey (OCLC)
•
Christine Jewell (U. Waterloo, Canada) •
Joan K. Lippincott (CNI)
•
Austin McLean (Proquest)
Gail McMillan (Secretary, Virginia Tech)
Joseph Moxley (ETD 2000, USF)
Eva Müller (U. Uppsala, Sweden)
Ana Pavani (PUC Rio, Brazil)
Janice Rickards (chair of ADT)
Sharon Reeves (National Library
Canada)
Peter Schirmbacher (ETD 2003,
Humboldt)
Samson Soong (Hong Kong U. Science
& Technology)
Hussein Suleman (U.Cape Town, S.
Africa)
Shalini R. Urs (U. Mysore, India)
Eric F. Van de Velde (ETD 2001,
Caltech)
Ellen Wagner (Adobe)
323
NDLTD Committees (Chairs)
•
•
•
•
•
•
•
•
•
•
Awards (John Hagen)
Conferences (Sharon Reeves)
Development (Peter Schirmbacher)
Executive (Edward Fox)
Finance (Scott Eldredge)
Implementation (Ana Pavani)
Membership (Tony Cargnelutti)
Nominating (Joan Lippincott)
Standards (Thomas B. Hickey)
Union Catalog (Vinod Chachra)
324
Selected Projects / Sponsors
•
•
•
•
•
•
•
•
•
Australia (ADT)
Brazil (BDT, IBICT)
Canada
Catalunya
Chile (Cybertesis)
China (CALIS)
Germany
India (Vidyanidhi)
Korea
• OhioLINK: 79
colleges/univs
• Portugal (National
Library)
• South Africa
• UK (British Library,
JISC, Edinburgh, …)
• UNESCO (especially
Latin America,
Eastern Europe,
Africa)
• …
325
Some Countries
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Argentina
Australia
Belgium
Brazil
Canada
Chile
China, Hong Kong
Columbia
Finland
France
Germany
Greece
India
Italy
Jamaica
Korea
Lithuania
Malaysia
Mexico
Namibia
Netherlands
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Namibia
Netherlands
Norway
Peru
Poland
Russia
Singapore
S. Africa
S. Korea
Spain
Sudan
Sweden
Switzerland
Taiwan
Thailand
Turkey
UK
Ukraine
United Arab Emirates
USA
Venezuela
Yugoslavia
326
UNESCO and ETDs
(by Axel Plathe at ETD2003)
• Promoting the use of the Internet as a tool for disseminating
scientific knowledge
• Facilitating the transfer of ETD expertise from developed to
developing countries
• 1998: Member of the NDLTD Steering Committee
• 1999: First UNESCO ETD meeting on ETD
internationalisation
• 2002: “UNESCO Guide to Electronic Theses and
Dissertations”
• 2003: Model training programmes and training courses
• 2003: Sponsor pilot projects
• 2003: Pilot projects (Africa, Europe, Latin-America)
327
Why ETD? Short Answer
• For Students:
– Gain knowledge and skills for the Information Age
– Richer communication (digital information, multimedia, …)
• For Universities:
– Easy way to enter the digital library field and benefit
thereby
• For the World:
– Global digital library – large, useful, many services
• General:
– Save time and money
– Increased visibility for all associated with research results
328
329
ETANA Case Study
• Some of the following slides were copied
to “Running examples” part of the
presentation according to the book outline.
• The full set about ETANA follows too.
330
331
ETANA-DL
•
•
Archaeological DL
Integrated DL
– Heterogeneous data handling
•
Applies and extends the OAI-PMH
– Open Archives Initiative Protocol for Metadata
Handling
•
Design considerations
– Componentized
– Extensible
– Portable
332
NSF ITR Funding
• IT Research
• Digital library:
• Integration of DB,
HCI, HT, IR, LIS,
MM, …
• Complexity!
• Variety!
• Distributed!
• => 5S Framework
•
+ OAI / ODL
•
•
•
•
Archaeology Research
Multiple sites
Multiple kinds of artifacts
Multiple terminologies
•
•
•
•
General/special services
Multiple views
Hypothesis testing
Rapid publication
333
Initial ETANA-DL Member Locations
Canadian University College
Andrews University
CWRU
Walla Walla College
Willamette University
Virginia Tech
Vanderbilt University
Mississippi State University
Map courtesy: www.enchantedlearning.com
334
Acknowledgements
Team:
• Joanne Eustis, CWRU
• Weiguo Fan, Virginia Tech
Contributors:
• Nick Fischio, CWRU
• Karen Borstad, MPP
• Paul Gherman, Vanderbilt U.
• Douglas Clark, Walla Walla College
• Marcos Goncalves, Virginia Tech
• Larry Herr, Canadian University College
• Christopher Holland, LRP
• Paul Jacobs, Mississippi State U.
• Doug Gorton, Virginia Tech (CS4624)
• Douglas Knight, Vanderbilt U.
• Likhita Krishnamurthy, VT (CS5604)
• Ming Luo, Virginia Tech
• Stan LaBianca, Andrews U.
• Ananth Raghavan, VT (CS5604)
• David McCreery, Willamette U.
• Divya Rangarajan, VT (Ind. Study)
• David Schloen, U. of Chicago
• Unni Ravindranathan, Virginia Tech
• Randall Younker, Andrews U.
• Jack Sasson, Vanderbilt U.
•...
• Rao Shen, Virginia Tech
• Ricardo Torres, U. Campinas,335
Brazil
• Srinivas Vemuri, Virginia Tech
336
337
Lahav Website
338
Megiddo Opening Screen
339
Locus Screen:
Pictures
View all
340
Area Screen
341
342
ETANA-DL Approach
• Applying and extending Digital Library (DL)
techniques to solve key problems: making primary
data available, data preservation, and interoperability
• Modeling archaeological information systems using
5S to better understand the domain and design the
system and the supporting services
• Rapidly prototyping DLs that handle heterogeneous
archaeological data using componentized
frameworks:
– eliciting requirements
– refining metamodel and union schema
– modeling sites
– mapping
– harvesting
343
– providing useful services
ETANA-DL Architecture
DigKit
Users
Services
DigBase
Data
ETANA-DL
Union
Services
Users
344
ETANA-DL Architecture
DigBase and DigKit
Lahav
Nimrin
Umayri
Hisban
Megiddo
Jalul
…
New Sites
D
A
T
A
B
A
S
E
W
R
A
P
P
E
R
S
Search
U
S
E
R
Browse
Recommend
ETANA-DL
UNION
CATALOG
Note
Personalize
Review
Visualizations
Archaeology
Specific
I
N
T
E
R
F
A
C
E
345
Work in progress
ETANA-DL Website
346
Marking – writing
notes for
a specific user
Marking Items
347
Sender, Date,
Object OAI ID
Sender
Comments
Options:
View Record,
Add record to Items Of Interest,
Re-mark item (Redirect),
Unmark item (Remove item from list)
Marked Items Display
348
Discussions
about an
object
View/Post
messages,
create new
threads
Discussions Page
349
Items recommended
on the basis of
similar interests
Recommendations
350
ETANA-DL Searching Service
Search
351
ETANA-DL Multi-dimensional Browsing
3 new sites
2 new types of artifacts
352
ETANA-DL Visual Browsing Service
By site
Visual Browse
353
Visual Browsing Nimrin:
Topographical Drawings
Square:
N40/W20
Full site
North west quadrant
354
Visual Browsing Nimrin : Square information
Square:
N40/W20
Locus: 86
Loci layout
355
Visual Browsing Nimrin : locus sheet
356
Visual Browsing
Bab edh-Dhra'
Cemetery
Pottery # 25
357
Visual Browsing
Bab edh-Dhra'
Cemetery
Pottery # 25
358
Outline – Part 3
• Part 3 – Advanced Topics
– Ch. 13: Quality
– Ch. 14: Integration
– Ch. 15: How to build a digital library
– Ch. 16: Research Challenges, Future Perspectives
• Appendix
– A: Mathematical preliminaries
– B: Formal Definitions: Ss
– C: Formal Definitions: DL terms, Minimal DL
– D: Formal Definitions: Archeological DL
– E: Glossary of terms, mappings
359
Chapter 13 Overview
•
•
•
•
•
Information life cycle
Dimensions, Indicators
Definitions
Examples
Evaluation
360
Describing Quality in
Digital Libraries
• What’s a “good” digital Library?
– Central Concept: Quality!
– Hypotheses of this work:
• Formal theory can help to define “what’s a good
digital library” by:
• New formalizations of quality indicators for DLs
within our 5S framework
• Contextualizing these measures within the
Information Life Cycle
361
Quality Dimensions
DL Concept
Digital object
Metadata specification
Collection
Catalog
Repository
Services
Dimensions of Quality
Accessibility
Pertinence
Preservability
Relevance
Similarity
Significance
Timeliness
Accuracy
Completeness
Conformance
Completeness
Impact Factor
Completeness
Consistency
Completeness
Consistency
Composability
Efficiency
Effectiveness
Extensibility
Reusability
Reliability
362
Digital Objects: Accessibility
•
A digital object is accessible by an DL
actor or patron, if
1. it exists in the DL collections
2. is retrievable from the repository
3. it is not restricted from access
–
–
by metadata on rights
For actor or actor’s society
363
Digital Objects: Pertinence
• Inf(doi) = information carried by a digital object
or any of its descriptions
• IN(acj) = information need of an actor
• Contextjk = an amalgam of societal factors
which can impact the judgment of pertinence by
acj at time k.
– Factors include time, place, the actor's history
of interaction, task in hand, and factors
implicit in the interaction and ambient
environment.
364
Digital Objects: Pertinence
• The pertinence of a digital object to a
user acj is an indicator function
Pertinence(doi, acj): Inf(doi)  IN(acj) 
Contextjk defined as:
– 1, if Inf(doi) is judged by acj to be informative
with regards to IN(acj) in context Contextjk;
– 0, otherwise
365
Digital Objects: Relevance
• Relevance (doi,q)
1, if doi is judge by external-judge to be relevant to q
0, otherwise
• Relevance Estimate
– Rel(doi,q) = doi dj / |doi|  |q|
• Objective, public, social notion
– Established by a general consensus in the field, not
subjective, private judgment by an actor with an
information need
366
Metadata Specifications and Metadata
Format: Completeness
• Refers to the degree to which values are present in the
description, according to a metadata standard. As far as
an individual property is concerned, only two situations
are possible: either a value is assigned to the property
in question, or not.
•
Completeness(msx) = 1 - (no. of missing
attributes in msx/ total attributes of the schema
to which msx conforms)
367
368
WagUniv
UCL
CALTECH
UPSALLA
LAVAL
NSYSU
WATERLOO
CCSD
UTENN
MUENCHEN
USF
ETSU
GATECH
VIENNA
DRESDEN
BGMYU
OCLC
HUMBOLT
HKU
PITT
USASK
NCSU
VANDERBILT
VTINDIV
PHYSNET
UBC
MIT
VTETD
LSU
GWUD
Metadata Specifications and Metadata
Format: Completeness
• OCLC NDLTD Union catalog
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Metadata Specifications and Metadata
Format: Conformance
• An attribute attxy of a metadata specification msx is
cardinally conformant to a metadata format/standard if:
– it appears at least once, if attxy is marked as
mandatory;
– its value is from the domain defined for attxy;
– it does not appear more than once, if it is not marked
as repeatable.
• Conformance(msx) = ((attribute attxy of msx)
degree of conformance of attxy)/ total attributes).
369
0. 75
370
WagUniv
UCL
CALTECH
UPSALLA
LAVAL
NSYSU
WATERLOO
CCSD
UTENN
MUENCHEN
USF
ETSU
GATECH
VIENNA
DRESDEN
BGMYU
OCLC
HUMBOLT
HKU
PITT
USASK
NCSU
VANDERBILT
VTINDIV
PHYSNET
UBC
MIT
VTETD
LSU
GWUD
Metadata Specifications and Metadata
Format: Conformance
• Based on ETD-MS
1
0. 95
0. 9
0. 85
0. 8
Services: Efficiency / Effectiveness
• Effectiveness
– Very common measures: Precision, Recall, F1, 10precision, R-Precision
– Other services may have different measures: e.g.,
Recommending, etc.
• Efficiency
– let t(e) be the time of an event e
– let eix and efx be the initial and the final event of
service sex .
– For service sex, efficiency is defined as:
• Efficiency(sex) = t(efx) - t(eix)
371
Services: Extensibility & Reusability
• A service Y reuses a service X if the
behavior of Y incorporates the
behavior of X.
• A service Y extends a service X if it
subsumes the behavior of X and
potentially includes additional
subflows of events.
372
Services: Extensibility & Reusability (2)
• Macro-Reusability(Serv) = no. of reused
services/ total number of services
• Micro-Reusability(Serv) = number of lines
of code of managers that implement (run)
reused services/ total lines of code
373
Services: Extensibility and Reusability
Component
Based
LOC for
implementing
service
LOC reused
from
component
Total
LOC
Searching – Back-end
Yes
-
1650
1650
Search Wrapping
No
100
-
100
Recommending
Yes
-
700
700
Recommend Wrapping
No
200
-
200
Annotating – Back-end
Yes
50
600
600
Annotate Wrapping
No
50
-
50
Union Catalog
Yes
-
680
680
User Interface Service
No
1800
-
1600
Browsing
No
1390
-
1390
Comparing (objects)
No
650
-
650
Marking Items
No
550
-
550
Items of Interest
Recent
Searches/Discussions
Collections Description
No
480
-
480
No
230
-
230
No
250
-
250
User Management
No
600
-
600
Framework Code
No
2000
-
2000
8280
3630
11910
Service
Total
Macro-Reusability = 4/16 = 0.25
Micro-Reusability = 3630 / 11910 = 0.304
374
Quality and the Information Life Cycle
Active
Accura
cy
Comple
te
Conform ness
ance
Timeliness
Similarity
Preservability
Describing
Organizing
Indexing
Authoring
Modifying
Semi-Active
Pertinence
Retention
Significance
Mining
Creation
Accessibility
Storing
Accessing
Timeliness
Filtering
Utilization
Archiving
Distribution
Seeking
Discard
Inactive
Ac
ce
ssi
bil
Networking P
r es
i
er v t y
ab
ilit
y
Searching
Browsing
Recommending
Relevance
375
Quality Model: Evaluation
• Focus groups
– 3 librarians
– Major points
• Focus on DLs not traditional libraries
• Some indicators may have more theoretical than
practical use in some contexts
• Liked minimalist approach
• Interesting and potentially useful mainly for
education and evaluation
376
Prior Work on Measuring DL Success
inspection of NCSTRL
Usability of DLs
has an example
evaluation of ACM, IEEE-CS,
NCSTRL, and NDLTD
evaluation of ADL
evaluation of ADEPT
Technology
acceptance model
predict
Intention to re/use
has an example
Venkatesh
system usage
IS success model
has an indicator
has an example
DeLone et al.
Seddon
Ellis
Information seeking
behavior model
has an example
DL quality model
has an example
Kuhlthau
Gonçalves
377
DL Success Model
 Synthesize




IS success and adoption models
DL quality model (Gonçalves)
Information life cycle model
Information-seeking behavior models (Ellis’ and
Kuhlthau’s)
 From end user perspective
378
Gonçalves et al.
379
E: Ellis’ model
K: Kuhlthau’s model
E1:starting
380
DL Success Model
relevance adequacy timeliness
reliability understandability scope
information
information quality
quality
(IQ)
(IQ)
performance
expectancy
(PE)
satisfaction
system quality
(SQ)
behavioral
Intention to
(re)use
social influence (SI)
user
interface
ease of use accessibility
joy of use
reliability
381
DL quality
dimension
DL success
manifest variable
5S and
DL concept
DL success construct
accessibility
accuracy
completeness
consistence
conformance
pertinence
preservability
relevance
significance
similarity
timeliness
adequacy
relevance
reliability
scope
timeliness
understandability
stream, structure
digital object
metadata
collection
catalog
repository
information quality (IQ)
composability
efficiency
effectiveness
extensibility
reusability
reliability
accessibility
reliability
ease of use
joy of use
society, scenario,
space
service
system quality (SQ)
performance expectancy (PE)
DL visibility
society
382
social influence (SI)
Exercise 2
• Re-form into former groups of 2.
• Recall the digital library you selected earlier.
• Select the most important measures of quality
for that digital library (from those discussed or
others you feel are needed).
• Work out the details of an evaluation using those
measures.
• Present a summary to the class and lead a
discussion.
383
Outline – Part 3
• Part 3 – Advanced Topics
– Ch. 13: Quality
– Ch. 14: Integration
– Ch. 15: How to build a digital library
– Ch. 16: Research Challenges, Future Perspectives
• Appendix
– A: Mathematical preliminaries
– B: Formal Definitions: Ss
– C: Formal Definitions: DL terms, Minimal DL
– D: Formal Definitions: Archeological DL
– E: Glossary of terms, mappings
384
Chapter 14 Overview
• Integration
385
DL Integration
•
What is “DL Integration”
– Hide distribution
– Hide heterogeneity
– Enable autonomy of individual component
•
Why Integration
– island-DLs
– inability to seamlessly and transparently
access knowledge across DLs
Utilize various autonomous DLs in concert
386
Integration: Rationale
• We can read any paper book (ignoring
limitations of language, vision, …).
• Scholarship requires access, analysis,
and synthesis spanning disciplines and
sources.
• New theories, systems, and services build
upon our past accomplishments.
• Our “Small World” and the “Internet Age”
demand that we, and our computers, work
together and interoperate.
387
Integration: Urgency, Longevity
• If we collect, capture, acquire, or produce
information, will it be usable in 100 years?
• NSF Digital Archiving Program
• Library of Congress National Digital
Information Infrastructure and
Preservation Program
388
Integration: Standards
• Standards don’t exist in many areas.
• Standards that do exist create a
jumble:
– Conversion between (without loss?)
– Bridging gaps (Z39.50 -> OAI)
– Managing legacy content and systems
• Standards in DLs have focused on:
– Metadata (e.g., Dublin Core)
– Architecture (e.g., handles, repositories)
389
Integration: Challenges
• “Semantic Web” is vision, not reality.
• How can we integrate without a theory?
• How can we interoperate without a
common framework?
• How can we have a science of DLs if
we lack agreement on definitions (so
we can reason and discuss) and
measures of quality (so we can
compare and improve)?
390
Hypothesis and Research Questions
•
The 5S framework provides effective
solutions to DL integration.
– Formally define the DL integration problem?
– Guide integration of domain focused DLs?
•
•
•
How to formally model such domain specific DLs?
How to integrate formally defined DL models into a
union DL model?
How to use the union DL model to help design and
implement high quality integrated DLs?
– Assess the integration?
391
Related Work
DL interoperability approach
Consists of
Intermediary-based
Interrelated with
mapping-based
use
mediator
wrapper
use
agent
schema mapping
used in
two architectures
Consists of
federation
Union Archiving
use
hybrid mapper
has an example
SemInt
composite mapper
has an example
LSD
392
DL integration formalization
based on
DL interoperability approach
Consists of
Intermediary-based
Interrelated with
mapping-based
use
mediator
wrapper
use
agent
schema mapping
used in
two architectures
Consists of
federation
Union Archiving
use
hybrid mapper
composite mapper
trained by
GA
393
Formal Definition of DL Integration
•
DLi=(Ri, DMi, Servi, Soci), 1  i  n
–
–
–
–
•
•
•
•
Ri is a network accessible repository
DMi is a set of metadata catalogs for all collections
Servi is a set of services
Soci is a society
UnionRep
UnionCat
UnionServices
UnionSociety
394
Formal Definition of DL Integration (Cont.)
•
DL integration problem definition:
Given n individual libraries, integrate the n DLs
to create a UnionDL.
395
Union Catalog Quality Measurement
•
Complete
– All the catalogs to be integrated are complete.
•
Consistent
– All the catalogs to be integrated are consistent.
– Each descriptive metadata specification in the
union catalog describes only one digital object.
396
Architecture of a Union DL
DL1
Union DL
DL2
Union Society
Society

archaeologists
Service
Searching

Society
Archaeologists
General Public
General Public
Union Service
Harvesting, Mapping,
Searching, Browsing,
Clustering, Visualization

Service
Browsing
Catalog1
Union
Catalog
Catalog2
Repository1
Union
Repository
Repository2
397
Example of Union Service: CitiViz
398
Multidimensional Browsing: Percentages of
Animal Bones Across Nimrin Cultural Phases
399
Integration of Domain Focused DLs
•
•
•
Union archaeological metadata catalog
generation
Modeling archaeological DLs (ArchDLs) in
the 5S framework
ArchDL integration case study:
ETANA-DL
400
Union Catalog Integration
Virtual Nimrin
(VN)
VN Metadata
Format
Mapping
Tool
Union ArchDL
VN
Catalog
Halif DigMaster
(HD)
Wrapper
Union
Catalog
HD
Catalog
Global Metadata
Format
Wrapper
HD Metadata
Format
Mapping
Tool
401
Heterogeneous data handling
Site
Artifact Type
Original data source
Number of
records harvested
Bab edh-Dhra’
Pottery
cp6 database file
786
Lahav
Figurine
Tab-delimited text file
563
Madaba
Locus field record
Tables in Access DB
786
Mozan
Publication
PDF files
19
Bone field record
Table in Oracle DB
7419
Seed field record
Table in Oracle DB
429
Locus field record
Table in Oracle DB
2101
Bone field record
2 tables in Access DB
2122
Nimrin
Umayri
Total
18404
402
ETANA-DL Schema Design
Owner
Locus
Collection
ID
Bone
Partition
ETANA-DL
Object
Subpartition
Seed
Figurine
Container
Animal
Name
Dimensions
Count
Species
Description
……
……
……
……
403
Visual Mapping Tool Architecture
Visualizing Components
Composite Mapper
Mapper1
Mapper2
Mapper3
Mapper4
404
Data Mapping (state-of-the-art)
405
local schema
global schema
406
Mapping recommendation
407
Mapping confirmation
Mapping history
408
No recommendation for “Tomb_Area”
409
User-decided mapping
410
Modeling ArchDLs in the 5S Framework
•
•
•
Modeling archaeological information
systems using the 5S theory to better
understand the domain and design the
system and the supported services
Minimal DL
Minimal ArchDL
411
A Minimal DL in the 5S Framework
Streams
Structured
Stream
Structures
Spaces
Structural
Metadata
Specification
Scenarios
Societies
services
Descriptive
Metadata
Specification
indexing
browsing searching
hypertext
Digital Object
Collection
Metadata Catalog
Repository
Minimal DL
412
A Minimal ArchDL in the 5S Framework
Streams
Structures
Structured
Stream
Spaces
Descriptive
Metadata
specification
Scenarios
Societies
services
SpaTemOrg
StraDia
Arch Descriptive
Metadata specification
ArchObj
indexing
browsing searching
hypertext
ArchDO
Arch Metadata catalog
ArchColl
ArchDColl
ArchDR
Minimal ArchDL 413
Requirements (1)
5S
Meta
Model
DL
Expert
Analysis (2)
DL
Designer
5SGraph
Practitioner
5SL
DL
Model
component
pool
ODLSearch,
ODLBrowse,
ODLRate,
ODLReview,
…….
Teacher
Design (3)
Researcher
Tailored
DL
Services
5SLGen
Implementation (4)
5SSuite
5SGraph
5SGen
Mapping Tool
414
ArchDL Expert
5S Archaeology
MetaModel
ArchDL Designer
5SGraph
VN Metadata Format
Scenario
Sub-model
ETANA-DL
Union Services
Descriptions
ETANA-DL Metadata Format
VN
Catalog
HD
Catalog
Mapping Tool
Wrapper4VN
Harvesting
Mapping
Searching
Browsing
…
Wrapper4HD
Structure
Inverted FilesSub-model
Search
Service
XOAI
Browse DB
Browse
Service
Component
Pool
Services DB
5SGen
Other
XOAI
ETANA-DL
Services
Web Interface
Union
Catalog
Browsing
…
HD Metadata Format
415
Outline – Part 3
• Part 3 – Advanced Topics
– Ch. 13: Quality
– Ch. 14: Integration
– Ch. 15: How to build a digital library
– Ch. 16: Research Challenges, Future Perspectives
• Appendix
– A: Mathematical preliminaries
– B: Formal Definitions: Ss
– C: Formal Definitions: DL terms, Minimal DL
– D: Formal Definitions: Archeological DL
– E: Glossary of terms, mappings
416
Chapter 15 Overview
– Requirements gathering
– Modeling with 5S-based approach
– Identifying good fit among existing systems or
toolkits
– Adapting an existing DL to fit new needs
– Construction of new system from toolkit
– Domain specific enhancement
417
Chapter 16 Overview
• Future direction workshops
• Challenges
418
419
420
421
422
People
•
•
•
•
•
•
•
Digital librarians
DL system developers
DL system administrators
DL managers
DL collection development staff
DL evaluators
DL users
423
DL Manifesto - 1
• DL Reference Model
• In support of the future European Digital Library
• Developed by team connected with DELOS
(Candela, Casteli, Ioannidis, Koutrica, Meghini,
Pagano, Ross, Schek, Schuldt)
• Draft 2.2 presented in Frescati, near Rome,
June 2006 – 79 pages
• Could be integrated with work of DLF, JISC, etc.
424
DL Manifesto – 2: 3 Tiers
425
DL Manifesto – 3: Main Concepts
426
DL Manifesto – 4: Actor Roles
427
428
As data, information, and knowledge play
increasingly central roles … digital library
research should focus on:
• Increasing the scope and scale of information
resources and services;
• Employing context at the individual,
community, and societal levels to improve
performance;
• Developing algorithms and strategies for
transforming data into actionable information;
• Demonstrating the integration of information
spaces into everyday life; and
• Improving availability, accessibility, and,
429
thereby, productivity.
An appropriate infrastructure program will
provide sustainability of digital knowledge
resources among five dimensions:
• Acquisition of new information resources;
• Effective access mechanisms that span
media type, mode, and language;
• Facilities to leverage the utilization of
humankind’s knowledge resources;
• Assured stewardship over humanity’s
scholarly and cultural legacy; and
• Efficient and accountable management
of systems, services, and resources. 430
Conclusions
• We have answered the almost 40-year-old
challenge of Licklider to build a unified CS / LIS
theory by
– Proposing and formalizing the first comprehensive
formal framework for digital libraries
• Showed how to move from theory to practice by
– Applying the framework to the problems of
– Materializing these application into languages, tools,
formats, etc.
– Explaining and evaluating these applications (usability
studies, focus groups, prototyping, etc.)
431
Future Work
• Theory
– Apply to formally describe other systems
– Complete formal definitions of all services with
further events
– Load axioms in knowledge base to automatically
assess quality of models (correctness, etc.)
• Applications/Tools
– Language
• Make different versions uniform
• Extend with METS, less complex scenario, society
models
• New metamodels
– Domain/application oriented (e.g., archaeology, education)
– For traditional libraries
432
Future Work (cont’d)
• Applications/Tools
– Visualization
• Integration with other tools
– through Wizard
• New visualizations
• Applying as educational tool
– Generation
• Use of Web services
• Incorporation of Native XML repositories
• Improvement of Scenario Algorithms
– Logging
•
•
•
•
Promote use
Consider privacy issues
New actions
Deal with scalability issues
433
Future Work (cont’d)
• Quality
– Development of more usage-oriented measures
• Current measures are mostly system-oriented
• Focus on log format and evaluation
– Development of Quality ToolKit (5SQual) for DL
managers with following features:
• Mapping tool to map local log format to standard XML Log
format
• Components to implement all measures
• Visualization of data and measures
• Broken into several logical pieces to be used in the different
phases of the information life cycle
• Others, e.g., personalization
• Create theories, tools, languages, methods for
personalization based on 5S
434
Selected Links - http://fox.cs.vt.edu
• CITIDEL (computing education resources)
– www.citidel.org
• NCSTRL (computing technical reports)
– www.ncstrl.org
• NDLTD (electronic theses and dissertations
worldwide)
– www.ndltd.org and etdguide.org
• NSDL (National Science Digital Library)
– www.nsdl.org
• OAI (Open Archives Initiative)
– www.openarchives.org
• Virginia Tech Digital Library Research Laboratory
(DLRL, www.dlib.vt.edu)
– 5S, AmericanSouth.Org, CSTC, DL-in-a-box, ENVISION,
ETANA, MARIAN, NDLTD, NSDL, OAD, ODL, …) 435
Questions?
Discussion?
Thank You!
436