Streams, Structures, Spaces, Scenarios, and Societies (5S

Download Report

Transcript Streams, Structures, Spaces, Scenarios, and Societies (5S

Digging into Digital Libraries:
From Archaeology to
Formalism
Edward A. Fox
Virginia Tech, Dept. of CS
[email protected]
CSC Spring Colloquium
Villanova – February 20, 2006
1
Acknowledgements (selected)
• 5S Helpers: Weiguo Fan, Marcos Gonçalves,
Doug Gorton, Rohit Kelapure, Neill Kipp, Uma
Murthy, Ananth Raghavan, Rao Shen, Hussein
Suleman, Srinivas, Vemuri, Layne Watson, …
• Sponsors: ACM, AOL, CAPES, DFG, IBM,
Microsoft, NSF (IIS-9986089, 0086227,
0080748, 0325579, 0535057, 0535060; ITR0325579; DUE-0121679, 0136690, 0121741,
0333601), SUN
Outline
•
•
•
•
•
•
WWW and Digital Libraries (DLs)
Minimal DLs
Powerful DLs
Why
How
Summary and Conclusions
3
WWW and DLs
•
•
•
•
•
•
Both emerged in early 1990s.
Convergence began around 1994.
Example: Google spun off from Stanford DL.
Crawling WWW is one way to build DLs.
WWW support many portals to DLs.
Parts of WWW that have catalogs (e.g.,
Yahoo categories) are close to DLs.
• Web Services help move WWW toward DLs,
4
as the Semantic Web emerges.
Degree of Structure
Web
DLs
DBs
Chaotic
Organized
Structured
5
NSDL Information Architecture
Essentially as developed by the Technical Infrastructure Workgroup
Portals &
Portals &
Clients
Portals &
Clients
Clients
User
Interfaces
Core
NSDL
“Bus”
NSDL
NSDL
NSDL
Collections
Collections
Collections
Collection
Building
referenced
referenced
items&&
Special
items
collections
Databases
collections
Core
Core Services:
Collectionmetadata
Building
Core gathering
CollectionServices
protocols
Building
Services
harvesting
NSDL
NSDL
Services
Other
NSDL
Services
Services
Usage
Enhancement
Core
Services:
CI Services
information
retrieval
CI Services
browsing
CI
Services
authentication
CI Services
personalization
CI Services
discussion
annotation
6
7
8
9
Outline
• WWW and Digital Libraries (DLs)
• Minimal DLs
– Definitions
– ETANA example
•
•
•
•
Powerful DLs
Why
How
Summary and Conclusions
10
Minimal Digital Libraries
•
•
•
•
•
Key concepts, core ideas
Minimalist perspective
Underlying concepts: 5S (ETANA example)
Higher DL constructs
Bases:
– Literature
– Informal explanations
– Formal definitions
11
Informal 5S & DL Definitions
DLs are complex systems that
•
•
•
•
•
help satisfy info needs of users (societies)
provide info services (scenarios)
organize info in usable ways (structures)
present info in usable ways (spaces)
communicate info with users (streams)
12
5Ss
Ss
Examples
Objectives
Streams
Text; video; audio; image
Describes properties of the DL content
such as encoding and language for
textual material or particular forms of
multimedia data
Structures Collection; catalog;
hypertext; document;
metadata
Specifies organizational aspects of the DL
content
Spaces
Measure; measurable,
topological, vector,
probabilistic
Defines logical and presentational views
of several DL components
Scenarios
Searching, browsing,
recommending
Details the behavior of DL services
Societies
Service managers,
learners, teachers, etc.
Defines managers, responsible for
running DL services; actors, that use
those services; and relationships among
13
them
Example of 5Ss: ETANA-DL
•
•
Archaeological DL (Electronic Tools for Ancient
Near Eastern Archaeology Digital Library)
Integrated DL
–
•
Applies and extends the OAI-PMH
–
•
Heterogeneous data handling
Open Archives Initiative Protocol for Metadata Handling
Design considerations
–
–
–
–
Componentized
Extensible
Portable
Work based on 5S framework
14
15
ETANA Societies
1. Historic and pre-historic societies (being studied)
2. Archaeologists (in academic institutes, fieldwork
settings, or local and national governmental
bodies)
3. Project directors
4. Technical staff (consisting of photographers,
technical illustrators, and their assistants)
5. Field staff (responsible for the actual work of
excavation)
6. Camp staff (e.g., camp managers, registrars, tool
stewards)
7. General public (e.g., educators, learners, citizens)
16
ETANA Societies – cont’d
•
Social issues
1. Who owns the finds?
2. Where should they be preserved?
3. What nationality and ethnicity do they
represent?
4. Who has publication rights?
5. What interactions took place between those
at the site studied, and others? What
theories are proposed by whom about this?
17
ETANA Scenarios
1.
2.
3.
4.
Life in the site in former times
Digital recording: the planning stage and the excavation stage
Planning stage: remote sensing, fieldwalking, field surveys, building
surveys, consulting historical and other documentary sources, and
managing the sites and monuments
Excavation
1.
2.
3.
4.
5.
6.
7.
8.
Detailed information is recorded, including for each layer of soil, and for
features such as pole holes, pits, and ditches.
Data about each artifact is recorded together with information about its
exact find spot.
Numerous environmental and other samples are taken for laboratory
analysis, and the location and purpose of each is carefully recorded.
Large numbers of photographs are taken, both general views of the
progress of excavation and detailed shots showing the contexts of finds.
Organization and storage of material
Analysis and hypotheses generation and testing
Publications, museum displays
Information services for the general public
18
ETANA Spaces
1. Geographic distribution of found artifacts
2. Temporal dimension (as inferred by
archaeologists)
3. Metric or vector spaces
1. used to support retrieval operations, and to
calculate distance (and similarity)
2. used to browse / constrain searches spatially
4. 3D models of the past, used to reconstruct and
visualize archaeological ruins
5. 2D interfaces for human-computer interaction
19
ETANA Structures
1. Site Organization
1. Region, site, partition, sub-partition, locus,
…
2. Temporal orderings (ages, periods)
3. Taxonomies
1. for bones, seeds, building materials, …
4. Stratigraphic relationships
1. above, beneath, coexistent
20
ETANA Streams
1. successive photos and drawings of
excavation sites, loci, unearthed artifacts
2. audio and video recordings of excavation
activities and discussions
3. textual reports
4. 3D models used to reconstruct and
visualize archaeological ruins.
21
5S and DL formal definitions and compositions (April 2004 TOIS)
relation (d. 1)
sequence graph (d. 6)
(d. 3)
measurable(d.12), measure(d.13), probability (d.14),
language (d.5)
vector (d.15), topological (d.16) spaces
sequence
tuple (d. 4)*
(d.
3)
function
state (d. 18)
event (d.10)
(d. 2)
5S
grammar (d. 7)
streams (d.9)
structures (d.10) spaces (d.18) scenarios (d.21) societies
(d. 24)
services (d.22)
structured
stream (d.29)
digital
object
(d.30)
structural
metadata
specification
(d.25)
transmission collection (d. 31)
(d.23)
repository
(d. 33)
descriptive
metadata
specification
(d.26)
metadata catalog
(d.32)
(d.34)indexing
service
hypertext
(d.36)
browsing
service
(d.37)
digital
library
(minimal) (d. 38)
searching
service (d.35)
22
A Minimal DL in the 5S Framework
Streams
Structured
Stream
Structures
Spaces
Structural
Metadata
Specification
Scenarios
Societies
services
Descriptive
Metadata
Specification
indexing
browsing searching
hypertext
Digital Object
Collection
Metadata Catalog
Repository
Minimal DL
23
Streams
image
contains
metadata
specifications


describes
Collection
Catalog
text
audio
video
contains
Structures
is_version_of/
cites/links_to
describes
digital
object
Index
stores
Measurable
is_a
Measure
employs
produces
Topological
Repository
employs
produces
is_a
is_a Vector Metric
Probabilistic
Spaces
employs
produces
inherits_from/includes
runs
Service

extends
reuses
Scenario
precedes
contains
happens_before
event
Scenarios
Societies
Service
Manager
uses
participates_in Actor
recipient

association
operation
executes
24
redefines
invokes
Outline
• WWW and Digital Libraries (DLs)
• Minimal DLs
• Powerful DLs
– Services
– Ontology
• Why
• How
• Summary and Conclusions
25
Infrastructure Services
Repository-Building
Creational
Preservational
Acquiring
Cataloging
Crawling (focused)
Describing
Digitizing
Federating
Harvesting
Purchasing
Submitting
Conserving
Converting
Copying/Replicating
Emulating
Renewing
Translating (format)
Add
Value
Annotating
Classifying
Clustering
Evaluating
Extracting
Indexing
Measuring
Publicizing
Rating
Reviewing (peer)
Surveying
Translating
(language)
Information
Satisfaction
Services
Browsing
Collaborating
Customizing
Filtering
Providing access
Recommending
Requesting
Searching
Visualizing
26
Ontology: Applications
27
Ontology: Applications
• Expand definition of minimal DL by
characterizing
– typical DL services
– in the context of “employs” and “produces”
relationships
• Use characterization to:
– Reason about how DL services can be built
from other DL components
– As well as be composed with other services
through extension or reuse
28
Composition of key fundamental /
infrastructure services
universal
collection
Authoring
Digitizing
p
Describing e doi
Cataloguing
e e
Acquiring
p
mskj
p
e
p
C
e
de
scr
Submitting
p
ibe
s
DMC
Indexing
p
Ic
Linking
p
Hypertext
29
Infrastructure
Information
Satisfaction
Services
Services (Add_Value)
Rating
Indexing
p
Training
p
{(digital object, Index
actor, rate) }
Society
actor
p
handle
anchor
e
classifier
e
Browsing
e
Requesting
p
p
e
e
user model
query/category
e
e
Recommending
p
{digital object}
e
e
Searching
p
Collection, {digital object}
e
Filtering
Binding
p
p
{digital object}
query
e
binder
e
fundamental
composite

{digital object}
transformer
e
e
e
Visualizing
Expanding query
p
p
space
query’
30
Outline
•
•
•
•
WWW and Digital Libraries (DLs)
Minimal DLs
Powerful DLs
Why
–
–
–
–
–
Support DL education
Practical systems
Institutional repositories (DSpace)
Personal DLs (SenseCam -> Memex)
Support archaeology
• How
• Summary and Conclusions
31
RELATED
TOPICS
CORE DL
TOPICS
COURSE
STRUCTURE
DL Curriculum Framework
Semester 1:
DL collections:
development/creation
Digitization
Storage
Interchange
Metadata
Cataloging
Author
submission
Digital objects
Composites
Packages
Semester 2:
DL services and
sustainability
Architectures
(agents, buses,
wrappers/mediators)
Interoperability
Spaces
(conceptual,
geographic,
2/3D, VR)
Documents
E-publishing
Markup
Multimedia
streams/structures
Capture/representation
Compression/coding
Bibliographic
information
Bibliometrics
Citations
Content-based
analysis
Multimedia
indexing
Naming
Repositories
Archives
Services
(searching,
linking,
browsing, etc.)
Archiving and
preservation
Integrity
Architectures
(agents, buses,
wrappers/mediators)
Interoperability
Thesauri
Ontologies
Classification
Categorization
Multimedia
presentation,
rendering
Info. Needs
Relevance
Evaluation
Effectiveness
Intellectual property
rights mgmt.
Privacy
Protection (watermarking)
Routing
Filtering
Community
filtering
Search & search strategy
Info seeking behavior
User modeling
Feedback
Info
summarization
Visualization
32
Foundations for Information Systems:
Digital Libraries and the 5S Framework
• Ch. 1. Introduction (Motivation, Synopsis)
•
•
•
•
Part 1 – The “Ss”
Part 2 – Higher DL Constructs
Part 3 – Advanced Topics
Appendix
33
Book Parts and Chapters - 1
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
34
Book Parts and Chapters - 2
• Part 2 – Higher DL Constructs
– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
35
Book Parts and Chapters - 3
• Part 3 – Advanced Topics
– Ch. 13: Quality
– Ch. 14: Integration
– Ch. 15: How to build a digital library
– Ch. 16: Research Challenges, Future Perspectives
• Appendix
– A: Mathematical preliminaries
– B: Formal Definitions: Ss
– C: Formal Definitions: DL terms, Minimal DL
– D: Formal Definitions: Archeological DL
– E: Glossary of terms, mappings
36
Practical Systems
• Commercial: IBM, VTLS, …
• Open Source
– Greenstone
– CWIS (for NSDL)
– Institutional repositories
• DSpace
• Fedora
37
Institutional Repositories
• “A university-based institutional repository is a set
of services that a university offers to the members
of its community for the management and
dissemination of digital materials created by the
institution and its community members. It is most
essentially an organizational commitment to the
stewardship of these digital materials, including
long-term preservation where appropriate, as well
as organization and access or distribution.”
• Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7,
Feb. 2003, www.arl.org/newsltr/226/ir.html
38
39
ETANA-DL Global Architecture
DigBase and DigKit
Lahav
Nimrin
Umayri
Hisban
Megiddo
Jalul
…
New Sites
D
A
T
A
B
A
S
E
W
R
A
P
P
E
R
S
Search
U
S
E
R
Browse
Recommend
ETANA-DL
UNION
CATALOG
Note
Personalize
Review
Visualizations
Archaeology
Specific
I
N
T
E
R
F
A
C
E
40
Work in progress
Megiddo Opening Screen
41
Locus Screen:
Pictures
View all
42
Area Screen
43
Global DL: Architecture of a Union DL
DL1
Union DL
DL2
Union Society
Society

archaeologists
Service
Searching

Society
Archaeologists
General Public
General Public
Union Service
Harvesting, Mapping,
Searching, Browsing,
Clustering, Visualization

Service
Browsing
Catalog1
Union
Catalog
Catalog2
Repository1
Union
Repository
Repository2
44
Outline
•
•
•
•
•
WWW and Digital Libraries (DLs)
Minimal DLs
Powerful DLs
Why
How
–
–
–
–
–
–
Components
Metamodels, Models
Graphical model building aids
DL generators
Integration
Quality
• Summary and Conclusions
45
Document
Document
Document
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Program
Program
Program
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Image
Image
Image
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
?
?
?
Video
Video
Video
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
componentized digital library
46
The World According to OAI:
Open Archives Initiative –
Protocol for Metadata Harvesting
Service Providers
Discovery
Current
Awareness
Preservation
Data Providers
47
48
49
Metamodels
• Completed
– Minimal
– Archaeological
• Planned
– Practical
– System oriented
• Doug Gorton’s thesis, so people can build models
for their systems, and have them generated to
work with a particular DL system
50
A Minimal DL in the 5S Framework
Streams
Structured
Stream
Structures
Spaces
Structural
Metadata
Specification
Scenarios
Societies
services
Descriptive
Metadata
Specification
indexing
browsing searching
hypertext
Digital Object
Collection
Metadata Catalog
Repository
Minimal DL
51
5SL – The Minimal DL Metamodel
Scenarios
(Meta-) Model
Societal
(Meta-) Model
Meta-Models
Meta-Models
Primitives
uses Actor
runs
Service
Scenario
receiver
Community
Service
Event
Manager
Interface
Manager
Index
Manager
Search
Manager
Collection
Index
User
Repository
Manager
Browsing
Manager
Catalog
Interface
Document
Metadata
Retrieval
Model
Text
Spatial
Stream
(Meta-) Model
(Meta-)Model
Video
Audio
Structural
(Meta-) Model
Image
52
A Minimal ArchDL in the 5S Framework
Streams
Structures
Structured
Stream
Spaces
Descriptive
Metadata
specification
Scenarios
Societies
services
SpaTemOrg
StraDia
Arch Descriptive
Metadata specification
ArchObj
indexing
browsing searching
hypertext
ArchDO
Arch Metadata catalog
ArchColl
ArchDColl
ArchDR
Minimal ArchDL
53
Formal
Theory/
Metamodel
5S
Requirements
5SGraph
5SL
Analysis
DL XML
Log
5SLGen
OO Classes
Workflow
Design
Components
Implementation
DL
Evaluation
Test
54
Overview of 5SGraph
Workspace
(instance model)
Structured
toolbox
(metamodel)
55
Tools/Applications
5S
Meta
Model
DL
Expert
5SGraph
DL
Designer
Practitioner
5SL
DL
Model
Teacher
component
pool
ODLSearch,
ODLBrowse,
ODLRate,
ODLReview,
…….
Researcher
5SLGen
Tailored
DL
Logging Module
XML
Log
56
5SGen – Version 2: ODL,
Services, Scenarios
5SL-Scenario
Model (6)
DL
Designer
Component
Pool
XMI:Class
Model (3)
ODL
Search
Wrapping
Wrapping
import
import
Scenario
Synthesis (9)
Deterministic
FSM (10)
Xmi2Java (4)
Java
Classes
Model (5)
DL
Designer
StateChart
Model (8)
5SLGen
Java
ODL
Browse
XPath/JDOM
Transform (7)
XPATH/JDOM
Transform (2)
.
.
.
Java
5SL-Societies
Model (1)
SMC (11)
superclass
Java
Finite
State Machine
Class
Controller (12)
binds
JSP
User
Interface
View (13)
57
Generated DL Services
XML-based DL Log Standard
• Log analysis
– is a source of information on:
• How patrons really use DL services
• How systems behave while supporting user information
seeking activities
• Used to:
– Evaluate and enhance services
– Guide allocation of resources
• Common practice in the web setting
– Supported by web servers, proxy caches
• DL Logging can be more detailed
58
The XML Log Format
Log
Transaction SessionId MachineInfo Timestamp
Event
StatusInfo
Search
SearchBy
SessionInfo
RegisterInfo
Timestamp
Statement
Action
Browse
QueryString
Statement
Update
Collection Catalog
StoreSysInfo
Timeout
PresentationInfo
59
DL Integration
•
What is “DL Integration”
– Hide distribution
– Hide heterogeneity
– Enable autonomy of individual component
•
Why Integration
– island-DLs
– inability to seamlessly and transparently
access knowledge across DLs
Utilize various autonomous DLs in concert
60
Formal Definition of DL Integration
•
DLi=(Ri, DMi, Servi, Soci), 1  i  n
–
–
–
–
•
•
•
•
Ri is a network accessible repository
DMi is a set of metadata catalogs for all collections
Servi is a set of services
Soci is a society
UnionRep
UnionCat
UnionServices
UnionSociety
61
Formal Definition of DL Integration (Cont.)
•
DL integration problem definition:
Given n individual libraries, integrate the n DLs
to create a UnionDL.
62
ETANA-DL Approach
• Applying and extending Digital Library (DL)
techniques to solve key problems: making primary
data available, data preservation, and interoperability
• Modeling archaeological information systems using
5S to better understand the domain and design the
system and the supporting services
• Rapidly prototyping DLs that handle heterogeneous
archaeological data using componentized
frameworks:
– eliciting requirements
– refining metamodel and union schema
– modeling sites
– mapping
– harvesting
63
– providing useful services
Example of Union Service: CitiViz
64
Union Catalog Integration
Virtual Nimrin
(VN)
VN Metadata
Format
Mapping
Tool
Union ArchDL
VN
Catalog
Halif DigMaster
(HD)
Wrapper
Union
Catalog
HD
Catalog
Global Metadata
Format
Wrapper
HD Metadata
Format
Mapping
Tool
65
local schema
global schema
66
Describing Quality in
Digital Libraries
• What’s a “good” digital Library?
– Central Concept: Quality!
– Hypotheses of this work:
• Formal theory can help to define “what’s a good
digital library” by:
• New formalizations of quality indicators for DLs
within our 5S framework
• Contextualizing these measures within the
Information Life Cycle
67
Quality Dimensions
DL Concept
Digital object
Metadata specification
Collection
Catalog
Repository
Services
Dimensions of Quality
Accessibility
Pertinence
Preservability
Relevance
Similarity
Significance
Timeliness
Accuracy
Completeness
Conformance
Completeness
Impact Factor
Completeness
Consistency
Completeness
Consistency
Composability
Efficiency
Effectiveness
Extensibility
Reusability
Reliability
68
Quality and the Information Life Cycle
Active
Accurac
y
Comple
teness
Conform
ance
Timeliness
Similarity
Preservability
Describing
Organizing
Indexing
Authoring
Modifying
Semi-Active
Pertinence
Retention
Significance
Mining
Creation
Accessibility
Storing
Accessing
Timeliness
Filtering
Utilization
Archiving
Distribution
Seeking
Discard
Inactive
Ac
ce
s si
b
Networking Pr
ese ility
rva
bil
ity
Searching
Browsing
Recommending
Relevance
69
Summary and Conclusions
•
•
•
•
•
WWW and Digital Libraries (DLs)
Minimal DLs
Powerful DLs
Why
How
• -> Theory-based discipline and high
quality DL management systems (DLMS)
70
Selected Links - http://fox.cs.vt.edu
• CITIDEL (computing education resources)
– www.citidel.org
• NCSTRL (computing technical reports)
– www.ncstrl.org
• NDLTD (electronic theses and dissertations
worldwide)
– www.ndltd.org and etdguide.org
• NSDL (National Science Digital Library)
– www.nsdl.org
• OAI (Open Archives Initiative)
– www.openarchives.org
• Virginia Tech Digital Library Research Laboratory
(DLRL, www.dlib.vt.edu)
– 5S, AmericanSouth.Org, CSTC, DL-in-a-box, ENVISION,
ETANA, MARIAN, NDLTD, NSDL, OAD, ODL, …)
71
Questions?
Discussion?
Thank You!
72