METS and TEI - University of Oxford

Download Report

Transcript METS and TEI - University of Oxford

METS and TEI

Richard Gartner Oxford University

Introduction (verbal)

• METS provides framework within which any data or metadata can be referenced or embedded • This presentation shows how easily METS and TEI can be used in tandem • The context is an image database with full OCR’d text encoded in TEI

Cobbett’s Parliamentary History

Incorporating TEI into METS

Incorporating TEI into METS

"/>

Incorporating TEI into METS

THE Parliamentary History OF ENGLAND, FROM THE EARLIEST PERIOD TO THE YEAR 1803.

FROM WHICH LAST-MENTIONED EPOCH IT IS CONTINUED DOWNWARDS IN THE WORK ENTITLED, '� THE PARLIAMENTARY DEBATES." VOL. II. A.D. 1625�1642.

LONDON: PRINTED BY T. C. HANSARD, PETERBOROUGH-COURT, FLEET-STREET s �RLONGMAN, HURST, REES, ORME, & BROWN; J. RICHARDSON; BLACK, PARRY, & co,; j. HATCH ARD; J.RIDGWAY; E.JEFFERY; J.BOOKER; J- RODWELL; CRADOCK & JOY; R. H. EVANS; J. BUDD; J. BOOTH; T. C. HANSARD.

1807. ;

OCR -> TEI

• TEI in Libraries level 1 – simplest level of encoding designed for OCR texts – One

element enclosing complete text – One

element within this – Page breaks marked with

OCR -> TEI (verbal)

• OCR’d text put into skeletal TEI file with minimal header • Page-breaks in file replaced with • A simple stylesheet assigns a sequential ID to each • Another stylesheet adds elements to METS structural map pointing to elements

modhis006-aab OCR text Oxford Digital Library Put your OCR text here!

OCR text from modhis006-aab

□Parliamentary History.

VOL. n.

.pb.

Parliamentary History.

VOL. n.

.pb.

.pb.

"/>

Why use METS and TEI together?

Images

Overlapping hierarchies

Verbal

• Images – AS far as P4, TEIs image facilities clumsy • Have to use entity references only – no URLs URIs etc • No way to distinguish between inline images (designed for these) and whole-page images • No scope for administrative metadata • Overlapping hierarchies – CONCUR was SGML mechanism for this – clumsy to use and gone in XML – various other approaches all distinguised by notational complexity

Images

Page 1

Overlapping hierarchies

• Some approaches used with TEI – CONCUR (SGML) – MECS (Wittgenstein archive) – Stand-off markup: XLink mechanisms to impose markup (varying hierarchies) – TexMECS – Witt: PROLOG

Images in METS

• List all variants of image files in • Each can have extensive administrative or descriptive metadata attached • Reference them by URLs, URIs etc or embed them in the METS file • FILEID element in indicates exact correspondence of image to part of the item

Overlapping hierarchies

Overlapping hierarchies

More information

• http: www.loc.gov/standards/mets • http://www.jisc.ac.uk/index.cfm?name=techwatch_report_0205