Transcript METS and TEI - University of Oxford
METS and TEI
Richard Gartner Oxford University
Introduction (verbal)
• METS provides framework within which any data or metadata can be referenced or embedded • This presentation shows how easily METS and TEI can be used in tandem • The context is an image database with full OCR’d text encoded in TEI
Cobbett’s Parliamentary History
Incorporating TEI into METS
Incorporating TEI into METS
"/>
Incorporating TEI into METS
FROM WHICH LAST-MENTIONED EPOCH IT IS CONTINUED DOWNWARDS IN THE WORK ENTITLED, '� THE PARLIAMENTARY DEBATES." VOL. II. A.D. 1625�1642.
LONDON: PRINTED BY T. C. HANSARD, PETERBOROUGH-COURT, FLEET-STREET s �RLONGMAN, HURST, REES, ORME, & BROWN; J. RICHARDSON; BLACK, PARRY, & co,; j. HATCH ARD; J.RIDGWAY; E.JEFFERY; J.BOOKER; J- RODWELL; CRADOCK & JOY; R. H. EVANS; J. BUDD; J. BOOTH; T. C. HANSARD.
1807. ;
OCR -> TEI
• TEI in Libraries level 1 – simplest level of encoding designed for OCR texts – One
element within this – Page breaks marked with
OCR -> TEI (verbal)
• OCR’d text put into skeletal TEI file with minimal header • Page-breaks in file replaced with
OCR text from modhis006-aab
□Parliamentary History.
VOL. n.
□
VOL. n.
"/>
Why use METS and TEI together?
•
Images
•
Overlapping hierarchies
Verbal
• Images – AS far as P4, TEIs image facilities clumsy • Have to use entity references only – no URLs URIs etc • No way to distinguish between inline images (designed for these) and whole-page images • No scope for administrative metadata • Overlapping hierarchies – CONCUR was SGML mechanism for this – clumsy to use and gone in XML – various other approaches all distinguised by notational complexity
Images
Overlapping hierarchies
• Some approaches used with TEI – CONCUR (SGML) – MECS (Wittgenstein archive) – Stand-off markup: XLink mechanisms to impose markup (varying hierarchies) – TexMECS – Witt: PROLOG
Images in METS
• List all variants of image files in
Overlapping hierarchies
Overlapping hierarchies
More information
• http: www.loc.gov/standards/mets • http://www.jisc.ac.uk/index.cfm?name=techwatch_report_0205