METS in context: standards for metadata
Download
Report
Transcript METS in context: standards for metadata
Putting together a METS profile
Questions to ask when setting
down the METS path
●
●
●
Should you design your own
profile?
Should you use someone else’s off
the peg?
Should to adapt someone else’s?
Finding a pre-existing profile
What’s in a METS profile?
●
●
●
●
●
●
●
●
●
●
●
●
●
URI
Short Title
Abstract
Creation Date
Contact Information
Related Profiles
Extension Schema
Rules of Description
Controlled Vocabularies
Structural Requirements
Technical Requirements of Content, Behavior and
Metadata Files
Tools and Applications
Appendix: Example Document
Currently registered profiles
●
●
●
●
●
●
●
Oxford Digital Library METS Profile
UCB Imaged Object Profile
UCB Paged Text Object Profile
Model Imaged Object Profile
Model Paged Text Object Profile
The University of Waikato Digital Library
Group - Greenstone Project METS Profile
[Draft]
Library of Congress METS Profile for Audio
Compact Discs
Putting together your own profile
●
Descriptive metadata
●
Administrative metadata
●
File section
●
Structural map
Descriptive metadata
●
Embed within the METS file, or hold
externally and reference from it?
●
One metadata section or several?
●
Which schemes?
●
Which content rules to follow
(AACR2, ISAD-G etc)?
Embed or reference?
●
Referencing
–
–
–
●
Allows metadata not in XML to be used (as a last
resort)
Allows metadata files to be distributed and held
anywhere (including different repositories)
Means that when metadata is updated, only the
referenced file is changed, not the METS file
Embedding
–
–
–
–
Requires metadata to be in XML
Keeps everything in one place for easier archiving
(OAIS)
Prevents dead links
Allows easier processing
One metadata section?
●
●
Multiple <dmdSec> sections are allowed in a
METS file
Possible uses of multiple sections:–
–
–
Multi-lingual objects, with descriptions in each
language in separate sections
Different schemes revealing different facets of the
object (iconography, intellectual content etc).
A simple main description and more detailed
supplementary descriptions
Which schemes to use?
●
If possible, use schemes recommended by
the METS Editorial Board (METS Extenders)
–
Dublin Core
–
MARCXML MARC 21 Schema (MARCXML)
–
Metadata Object Description Schema (MODS)
Dublin Core
●
●
●
●
15 basic fields
Can be qualified
A set of suggested
qualifiers published by
DC
Problems:–
–
Unqualified DC too vague
for detailed descriptions
Qualifying DC reduces its
interoperability
Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
Format
Identifier
Source
Language
Relation
Rights
MARC-XML
●
●
A translation of
MARC to the XML
schema format
Can move
losslessly from
MARC to
MARCXML and
vice versa
MODS
●
●
●
●
●
“Metadata Object Description Schema”
A subset of MARC intended particularly for
digital items
Richer than unqualified Dublin Core but more
interoperable
Easier for non-librarians than MARC-XML
Generally seen as a good compromise
solution for digital objects
Content rules
●
●
To ensure interoperability, metadata content
should be controlled if possible
Some possibilities:–
–
–
–
AACR2, particularly if collection digitizes library
materials (allowing compatibility with OPAC)
LCNAF for name authorities
ISAD (G) for archival materials
National Council of Archives rules for name
authorities?
Administrative metadata
●
Most of the same considerations apply to
administrative as to descriptive metadata
–
Embed within the METS file, or hold
externally and reference from it?
–
One metadata section or several?
–
Which schemes?
Schemas for administrative
metadata
●
Still images
–
●
Text
–
●
MIX: NISO Technical Metadata for Digital Still
Images Standards Committee
Schema for Technical Metadata for Text
Video
–
VIDEOMD: Video Technical Metadata Extension
Schema
What files will you include in your <fileSec>
and how will they be arranged?
●
Archival images
–
–
–
●
Deliverable images
–
–
●
Uncompressed TIFFs (colour or greyscale)
Group IV compressed bitonal TIFFs (bitonal)
Held on archival file server
JPEGS or GIFS
Possibly more than one to allow viewing at
differing resolutions
Thumbnails
–
JPEGS or GIFS
How will you arrange your
<structMap>?
●
●
Probably no internal structure if each METS
file contains metadata for a single image only
Possibly treat METS file as holder for
collection of images
–
–
Group into categories?
Work out a logical sequence
The file inventory <fileSec>
●
Which files to include, and in what format?
–
Image files
●
●
●
–
Text
●
●
–
Archival format (TIFF)
Delivery format (JPEG)
Thumbnails (JPEG)
XML-marked up text (preferably in TEI)
Word files etc?
AV materials
●
●
Video files (MPEG, MOV, WMV)
Sound files (WAV, MP3?)
The file inventory <fileSec>
●
Embed or reference?
–
–
–
●
Content may be embedded within METS file (as
XML or Base 64 encoded data)
Embedding allows all data and metadata to be
held together for archival purposes, but files can
be huge!
Embedding is feasible with text, probably best
avoided with image, sound, or video!
How to organise them?
–
–
Group by referent?
Or by file type?
<fileSec>
fileSec
fileGrp
file
file
file
FLocat
Grouping by referent
●
●
●
Each <fileGrp> element contains the files for a
given unit (page of a book, slide, section of
video)
Point at the <fileGrp> element from the <div>
within the structural map corresponding to this
unit
Use the GROUPID attribute to differentiate
between the types of file
<fileGrp ID="munahi010-aaa-fgrp-0001">
<file GROUPID="0" ID="munahi010-aaa-0001-0" MIMETYPE="image/tiff"
ADMID="munahi010-aaa-tmd-0001-0">
<FLocat LOCTYPE="URL"
xlink:href="file://hfs.ox.ac.uk/data/odl/munahi010/digObjects/aaa/0/muna
hi010-aaa-0001.tiff"/>
</file>
<file GROUPID="6" ID="munahi010-aaa-0001-6" MIMETYPE="image/jpeg"
ADMID="munahi010-aaa-tmd-0001-6">
<FLocat LOCTYPE="URL"
xlink:href="http:odl/munahi010/digObjects/aaa/6/munahi010-aaa-00016.jpg"/>
</file>
<file GROUPID="3" ID="munahi010-aaa-0001-3" MIMETYPE="image/jpeg"
ADMID="munahi010-aaa-tmd-0001-3">
<FLocat LOCTYPE="URL"
xlink:href="http:odl/munahi010/digObjects/aaa/3/munahi010-aaa-00013.jpg"/>
</file>
</fileGrp>
Grouping by file type
●
All files of the same type are listed under the
same <fileGrp>, eg.
–
–
–
●
●
All archival images
All delivery images
All thumbnails
The GROUPID attribute is used to indicate the
referent (eg. page) of each image
Each file is referenced separately in the
structural map
<mets:fileGrp USE="archive image">
<mets:file ID="FID1" MIMETYPE="image/tiff" GROUPID="GID1">
<mets:FLocat xlink:href="bkm00002773a.tif" LOCTYPE="URL"/>
</mets:file>
<mets:file ID="FID2" MIMETYPE="image/tiff" GROUPID="GID2">
<mets:FLocat xlink:href="bkm00002774a.tif" LOCTYPE="URL"/>
</mets:file>
</mets:fileGrp>
<mets:div ORDER="1" TYPE="page" LABEL=" Page [1]">
<mets:fptr FILEID="FID1"/>
<mets:fptr FILEID="FID3"/>
<mets:fptr FILEID="FID5"/>
</mets:div>
Organising the structural map
●
Need to work out how users will want to
browse through item and design structure
accordingly
–
Images – should these be put into a sequence or
collated into collections?
–
Book -> chapters -> sub-chapters -> page
–
Video -> sections -> segments -> timecodes
One structural map or many?
●
Do you need separate hierarchies?
–
●
●
eg. Physical vs logical hierarchies
Usually one <structMap> is sufficient if
hierarchies nest neatly
If more than one hierarchy is used, how are
they linked together?
Coming next…