Introduction to Metadata for Digital Libraries

Download Report

Transcript Introduction to Metadata for Digital Libraries

Creating Working
Digital Libraries
Howard Besser
UCLA School of Education & Information
http://www.gseis.ucla.edu/~howard
Besser--LITA Dig Imaging Preconference 7/7/00
1
Creating Working
Digital Libraries_
_
_
_
_
_
Moving from Digital Collections to Digital
Libraries
Interoperability
Importance of Standards
Longevity
Best Practices for Managing Digital
Projects
Some Wild Musings
Besser--LITA Dig Imaging Preconference 7/7/00
2
Moving from Digital
Collections to Digital Libraries
_
_
What’s the difference?
Recent history of Library Automation-
Besser--LITA Dig Imaging Preconference 7/7/00
3
Developmental Stages
_
_
_
Experiment with methods
Build real operational systems
Build interoperable operational systems
Besser--LITA Dig Imaging Preconference 7/7/00
4
Traditional Digital Library
Model
DL
DL
DL
search & presentation
search & presentation
search & presentation
user
Besser--LITA Dig Imaging Preconference 7/7/00
DL
search & presentation
user
5
Ideal Digital Library Model
DL
DL
DL
DL
search & presentation
user
user
Besser--LITA Dig Imaging Preconference 7/7/00
6
Developmental Stages
_
_
_
Experiment with methods
Build real operational systems
Build interoperable operational systems
–
–
–
–
For DL Initiatives
For OPACs
For I & A Services
For Image Retrieval
Besser--LITA Dig Imaging Preconference 7/7/00
7
Key problems we’re facing
 Discovery
 Interoperability Longevity-
Besser--LITA Dig Imaging Preconference 7/7/00
8
For Interoperability Digital
Libraries Need Standards
 Descriptive
Metadata for consistent
description
 Discovery Metadata for finding
 Administrative Metadata for viewing and
maintaining
 Structural Metadata for navigation
 ... Terms & Conditions Metadata for
controlling access...
Besser--LITA Dig Imaging Preconference 7/7/00
9
Metadata is not just indexing terms
_
_
_
_
_
_
CBIR attributes used for retrieval on color, shape,
texture, etc.
Structural attributes used for page-turning
Administrative attributes used for managing a digital
work over time
IPR attributes to limit unauthorized use
Identification attributes to determine what application
software is needed to view a particular digital work
Can be located anywhere
Besser--LITA Dig Imaging Preconference 7/7/00
10
Why are Standards and
Metadata consensus
important?
 Managing
digital files over time
 Longevity
 Interoperability
 Veracity
 Recording
in a consistent manner
 Will give vendors incentive to create
applications that support this
Besser--LITA Dig Imaging Preconference 7/7/00
11
Why Standards?
 Why
–
–
–
do we need standards?
To make information universally available to
users
facilitate sharing and interchange of
information
To preserve information (make it safe from
changes in hardware and software)
 Standards
only work if communities widely
accept them, but they’re necessary for
communities to work together
Besser--LITA Dig Imaging Preconference 7/7/00
12
Serious Longevity Problems
 What
we know from prior widespread
digital file formats
 Images separating from their metadata
 Inaccessibility of software needed to view
an image
 Inability to even decode the file format of
an image
Besser--LITA Dig Imaging Preconference 7/7/00
13
Journal Archiving
_
_
_
_
License, don’t own; may not be even able to
obtain right to make archival copy
Increasingly no paper back-up at all
Usually we don’t have the important
redundancy factor
Stanford’s LOCKSS Project (Lots of Copies
Keeps Stuff Safe) and its problems
(http://lockss.stanford.edu)
Besser--LITA Dig Imaging Preconference 7/7/00
14
The Short Life of Digital Info:
Digital Longevity Problems Disappearing
Information
 The Viewing Problem
 The Scrambling Problem
 The Inter-relation Problem
 The Custodial Problem
 The Translation Problem
Besser--LITA Dig Imaging Preconference 7/7/00
15
The Viewing Problem
 Digital
Info requires a whole infrastructure
to view it
 Each piece of that infrastructure is changing
at an incredibly rapid rate
 How can we ever hope to deal with all the
permutations and combinations
Besser--LITA Dig Imaging Preconference 7/7/00
16
The Scrambling Problem
Dangers from:
 Compression
to ease storage & delivery
 Container Architecture to enhance digital
commerce
Besser--LITA Dig Imaging Preconference 7/7/00
17
The Inter-relation Problem
 -Info
is increasingly inter-related to other
info
 -How do we make our own Info persist
when it points to and integrates with Info
owned by others?
 -What is the boundary of a set of
information (or even of a digital object)?
Besser--LITA Dig Imaging Preconference 7/7/00
18
The Custodial Problem
 How
do we decide what to save?
 Who should save it?
 How should they save it?
–
–
-methods for later access: emulation, migration,
etc.
-issues of authenticity and evidence
Besser--LITA Dig Imaging Preconference 7/7/00
19
The Translation Problem
 Content
translated into new delivery devices
changes meaning
–
–
–
-A photo vs. a painting
-If Info is produced originally in digital form in
one encoded format, will it be the same when
translated into another format?
Behaviors
Besser--LITA Dig Imaging Preconference 7/7/00
20
Pieces of the Solution (1/2)
 -We
need to insist upon clearly readable
standardized ways for digital objects to selfidentify their formats
 -We should discourage scrambling
 -We need to better understand information
inter-relates to other Info, and what
constitutes “boundaries” of Info objects
Besser--LITA Dig Imaging Preconference 7/7/00
21
Pieces of the Solution (2/2)
 -People
and organizations wishing to make
information persist need guidelines of how
to go about doing it
 -We need to better understand how
translating from one storage or display
format to another affects the meaning of a
work
 -We need to save the “behaviors” of a
digital
object,
not just
Besser--LITA
Dig Imaging
Preconference
7/7/00it’s “contents”
22
Metadata can be the first line
of defense
 Can
–
–
–
–
–
tell you
where the file is (if you can’t find the file)
where more info about the file is (if you have
the file but most other metadata has become
separated)
what the file format is
what the compression scheme is
what application program and version is needed
for the file
Besser--LITA Dig Imaging Preconference 7/7/00
23
Groups Working on
the Big Longevity Problem
http://sunsite.Berkeley.EDU/Imaging/Databases/Longevity/
 CPA Task
Force
 Getty “Time & Bits” Conference & followup
 NEDLIB, CURL, Michigan
 Internet Archive
 Long Now
Besser--LITA Dig Imaging Preconference 7/7/00
24
Migration/Refreshing
 Impact
on evidential value
Besser--LITA Dig Imaging Preconference 7/7/00
25
Best Practices for Managing
Digital Projects_
_
_
Who will your users be?
Best Practices Guidelines
Workflow and Management Issues
Besser--LITA Dig Imaging Preconference 7/7/00
26
Why are you Managing this
Information?
 Organizational
mission & type
 Users
 Uses
Besser--LITA Dig Imaging Preconference 7/7/00
27
Scanning Best Practices
_
_
_
_
Think about users (and potential
users), uses, and type of
material/collection
Scan at the highest quality that
does not exceed the likely potential
users/uses/material
Do not let today’s delivery
limitations influence your scanning
file sizes; understand the difference
between digital masters and
derivative files used for delivery
Many documents which appear to
be bitonal actually are better
represented with greyscale scans
_
_
_
_
_
Besser--LITA Dig Imaging Preconference 7/7/00
Include color bar and ruler in the
scan
Use objective measurements to
determine scanner settings (do
NOT attempt to make the image
good on your particular monitor or
use image processing to color
correct)
Don’t use lossy compression
Store in a common (standardized)
file format
Capture as much metadata as is
reasonably possible (including
metadata about the scanning
process itself)
28
Why Scale is important
Besser--LITA Dig Imaging Preconference 7/7/00
29
Digital Object Behaviors
_
Book example
Besser--LITA Dig Imaging Preconference 7/7/00
30
Metadata Standards
(from MOA2)
_
Administrative Metadata
–
_
Structural Metadata
–
_
for enhancing resource management
for reflecting internal hierarchies and
relationships btwn parts
Raw/Seared/Cooked
Besser--LITA Dig Imaging Preconference 7/7/00
31
Workflow and Management
Issues_
_
_
Managing multiple image files
Persistent Identification
Making your works accessible throughout
the Net
Besser--LITA Dig Imaging Preconference 7/7/00
32
The number of variant forms
of a work can be enormous
 different
views of the same object
 different scans of the same photo
 different resolutions
 different compression schemes
 different compression ratios
 different file storage formats
 different details of the same image
 ...
Besser--LITA Dig Imaging Preconference 7/7/00
33
Image Families
Identification/Provenance
 how
to deal with different versions (browse,
hi-res, medium res) derived from the same
scan or different encoding schemes (TIFF,
PICT, JFIF)
 Vocabulary Standards to express this
–
–
VRA Surrogate Categories
CIMI's "Image Elements”
Besser--LITA Dig Imaging Preconference 7/7/00
35
Persistent IDs--the Problem
_
_
_
Need to separate work ID from work
location
URNs probably won’t be ready until 2003
Becomes a business process issue when one
organization maintains the resource and
another organization references it (ie.
licensed from vendors or managed by
separate administrative structures)
Besser--LITA Dig Imaging Preconference 7/7/00
36
More Persistent IDs
--the Approach for today
_
_
_
_
PURLs
Handles
HTTP redirects
And worry about costs now and conversion
costs when URNs become feasible
Besser--LITA Dig Imaging Preconference 7/7/00
37
Data Set Management
More issues with referencing IDs
_
_
_
References for mirror sites
References for back-up sites when main site
is down or bottle-necked
References for off-site copies and archival
copies
Besser--LITA Dig Imaging Preconference 7/7/00
38
Making your works accessible
throughout the Net
_
_
The DLF/Mellon meeting
An administrative and political issue as
much as a a technical one
Besser--LITA Dig Imaging Preconference 7/7/00
39
Some Wild Musings_
_
Movement towards packages and away
from MARC
The disappearance of OPACs
Besser--LITA Dig Imaging Preconference 7/7/00
40
Containers and Packages of
Metadata
Warwick, not MARC
_
_
_
_
_
modular
overlapping
extensible
community-based
designed for a networked world to aid
commonality btwn communities while still
providing full functionality within each
community
Besser--LITA Dig Imaging Preconference 7/7/00
41
DC Qualifiers
_
_
allows one community to express important
nuances and qualifications, while still
making the basic importance available to
communities with simple needs
our community can reflect alternate title,
transliterated title, and main title, yet they
will all be found under a simple Web search
under “title”
Besser--LITA Dig Imaging Preconference 7/7/00
42
Crosswalks
 mapping
btwn differing metadata structures
 eliminate the need for monolithic,
universally adopted standards
 focus on flexibility and interoperatiblity
 RDF-based metadata registries
Besser--LITA Dig Imaging Preconference 7/7/00
43
C DWA
CIMI
Schema
O bject ID
Ê
VRA C ore
C ate g ories
USMARC
Do cu ment
Classificatio n
-Catalog
Level (co re)
Do cu ment
Classificatio n
-Group Type
Do cu ment
Classificatio n
- Do cu ment
Typ e (core)
Purpo sePurpo se
(Bro ad ) (core)
Purpo sePurpo se
(N arrow )
Do cu ment
Classificatio n
-Extent
Ê
Ê
Ê
W1 . Wo rk
Typ e
655 GenreForm
Typ e
Ê
300a Phy sical
DescriptionExtent
Ê
FDA
DUBLIN
CORE
OBJECT/W
ORK (core)
Ê
Object/Wo rkTyp e (core)
Typ e o f
Object
ob jectNAME
Object/Wo rkCo mpon ents
Ê
qu antity
ORIEN TATI
ON/
ARRA NGE
MEN T
TITLES OR
NAMES
(core)
Ê
Ê
Ê
Ê
Ê
Description
Title
ob jectTitle
bibliograph ic
Title
Group/Item
Id entification Repo sitory
Title
Group/Item
Id entification Descriptive
Title (core)
Group/Item
Id entification Inscribed
Title
W2 . Title
24Xa Ti tle
and TitleRelated
Information
TitleÊ
Crosswalk Example
Besser--LITA Dig Imaging Preconference 7/7/00
44
Do we still need OPACs?
_
_
_
Why repeat almost identical bibliographic
descriptions in each local system?
Why not store only local information locally, and
link to bibliographic descriptions stored in the
major utilities?
Could our acquisition systems for monographs
begin to use the acquisition systems imposed on us
by our parent organizations (like those for
supplies)?
Besser--LITA Dig Imaging Preconference 7/7/00
45
Creating Working
Digital Libraries_
_
_
_
_
_
Moving from Digital Collections to Digital
Libraries
Interoperability
Importance of Standards
Longevity
Best Practices for Managing Digital
Projects
Some Wild Musings
Besser--LITA Dig Imaging Preconference 7/7/00
46
Creating Working Digital Libraries
Howard Besser
UCLA School of Education & Information
http://www.getty.edu/gri/standard/intrometadata/
http://www.ifla.org/II/metadata.htm
http://sunsite.Berkeley.EDU/Imaging/Databases/#standards
http://sunsite.Berkeley.EDU/moa2/
http://sunsite.Berkeley.EDU/Longevity/
http://purl.oclc.org/metadata/dublin_core/
http://www.gseis.ucla.edu/~howard/image-meta.html
http://www.gseis.ucla.edu/~howard/Metadata/UC-May00/
http://sunsite.berkeley.edu/Metadata/sp2000.html
http://www.gseis.ucla.edu/~howard/
Besser--LITA Dig Imaging Preconference 7/7/00
47