Transcript Tiina Ison

National Library of Finland
Metadata in the Digitisation Process
Cultural unity and diversity of the Baltic Sea Region –
common history, different languages, mixed culture
Helsinki, 21st–22nd October 2010
Tiina Ison, Senior Analyst,
National Library of Finland
Outline
1. Front End - National Digital Library and Long Term
Preservation (KDK/PAS)
2. Back End - Digitisation Production Process, METS Profiles
3. Descriptive Metadata
4. Administrative/Technical Metadata
5. Structural Metadata
6. Wrapping things together: METS Profile
7. Processes towards distrubed work, crowd soucing,
annotaiton and ontologies
1. Frond End: National Digital Library and LongTerm Preservation Infrastructure
Infrastructure Intiatives:
National Digital LibraryNational
Long-Term Prservation
http://www.kdk2011.fi
Rights
Management
...
METS
profiles
Libraries / Archives / Museums
BACK END SYSTEMS
In their digitisation production
memory institutions produce
authentic, trustworthy digitised
content and collections
OPM-KD Project 2007-2009,
digitisation production revewed
http://www.kansalliskirjasto.fi/extra/vanhat_bulletinit/b
ulletin09/article6.html
Ministry of Education www.kdk2011.fi/fi/tietoa-hankkeesta www.minedu.fi
Kansallisen Digitaalisen Kirjaston Arkkitehtuuri
http://www.kdk2011.fi/images/stories/Kokonaisarkkitehtuuri-yleiskuva-fi_iso.jpg
2. Back End: Digitisation Production Processes,
METS Profiles
Articles
Illustrations
Poems
LEVEL OF
MARK UP
Structural metadata
METS, ALTO
POST
PROCESSING
Administrative/technical metadata
MIX/PREMIS
CATALOUGING
Newspapers
Serials
Books
Parchments
Notes
Maps
SOURCE MATERIAL
Audio
PHYSICAL COLLECTIONS
Standards & OAI-PMH
complient METS SIP
packages
METS EXPORT
Packesges include:
JPEG2000
SCANNING
Descriptive metadata
DIGITAL RESOURCE
COMPREHENSIVE
DIGITIAL COLLECTIONS
MARC21/MODS
Two Bibliographic
Records
OCR TXT as ALTO XML
PDF
JPEG(150)
METSXML
MARCXML
3. Descriptive Metadata
Catalogued Items
Un-catalogued Items – Minimal bibligraphic record
Bar Code ID’s – Unique ID’s for Physical Items
Ingest of bibliographic metadata into digitisation produciton
MARC21 conversion into MARCXML (MODS)
Two bibliographic recrods – physical and digital (link 776)
Post cataloguing for minimal records
Enrichmnent of catalogue
CATALOUGING
4. Administrative/Technical Metadata
SCANNING
An XML Schema designed for expressing technical metadata for digital still
images
Technical Metadata for Digital Still Images - (NISO Z39.87 Data Dictionary)
MIX: Image width, Color space, color profile, Scanner metadata, Digital
camera settings
Preservation Metadata/Premis (information about actions on object, on
even, on technical environment)
Rights Metadata (access restriction)
Persistent ID’s
5. Structural Metadata
Navigation, use and access ?
Logical Structure
Physical Structure
METS structMap – relatinships between parts
POST
PROCESSING
6. Level of Structural Mark Up
LEVEL OF
MARK UP
Material types books , serials, newspaoers, audio, projects
Granularity - different level of structural mark up
- i.e. article, illustration, poem
Granularity - all material types: pages, footnotes, running title, tables,
advertisemnts, image (captions and categories)
Labour intensive
Phased approach in production
Crowd sourcing
7. Wrapping things together; METS Profiles
METS profiles for different material types
• monographs, serials, newspapers, audio…
Export files :
JPEG2000, lossless, PDF, OCR TXT as ALTO XML, JPEG (150dpi), METSXML and
MARCXML
METS container or wrapper provides a SIP package for delivery and exchange
of digital objects accross systems that is OAI-PMH compliant. Wraps
descriptive, administrative and structural metadata + PREMIS.
• MODS and MARCXML for descriptive and bibliographical metadata
(http://www.loc.gov/standards/mods/)
(http://www.loc.gov/standards/marcxml/)
• MIX for image technical metadata (http://www.loc.gov/standards/mix/)
• PREMIS for preservation metadata (http://www.loc.gov/standards/premis/)
(standardi salkku)
8. Processes towards distributed work, crowd
sourcing, annotation and ontolgies
Content and context as part of
digitisation processes…
Automatic and semiautomatic
proccess for data extraction …
Distributed work processes i.e.
for:
•Mark up level
•OCR correction
•Controlled annotation
•Social tagging
OCR Correction
THANK YOU