METS Java Toolkit

Download Report

Transcript METS Java Toolkit

METS Java Toolkit
DLF Spring Forum
May 10-12, 2002, Chicago, IL
Stephen L. Abrams
Harvard University Library
[email protected]
Why Do We Need a Toolkit?
• Automation for archiving project with
multiple content providers.
– METS used in hierarchical SIP
– Client-side tools to produce syntactically valid
SIPs
• Use of METS to encapsulate complex
objects, with multiple content streams.
– Page turner, currently based on MOA2
DLF Spring Forum
2002
METS Java Toolkit
2
Functional Requirements
• Java API to provide support for generic
METS.
• Support procedural:
– Construction of in-memory representation
– Validation
– Marshalling/unmarshalling to/from instance
documents
• Usable as basis for application-specific tools.
– Sub-class for specific functionality or restrictions
DLF Spring Forum
2002
METS Java Toolkit
3
JAXB
• API based on Sun’s JAXB specification, but
not the tools.
Source
schema
JAXB
compiler
Schema
classes
DLF Spring Forum
2002
METS Java Toolkit
Binding
schema
JAXB bind
package
JAXB
marshal
package
4
Toolkit API
• Each schema element corresponds to a class.
Mets mets = new Mets();
• Accessor/mutator methods for each attribute.
mets.setID(id);
String id = mets.getID();
• Accessor/mutator methods for content model.
List content = Mets.getContent();
content.add(child);
DLF Spring Forum
2002
METS Java Toolkit
5
Toolkit API UML
«interface»
Element
+validateThis()
javax.xml.bind
«interface»
IdentifiableElement
+id() : String
Mets
ValidatableObject
-_valid : bool
...mets
+invalidate()
+validate()
+validateThis()
MarshallableObject
Validator
«interface»
RootElement
+validate()
+validate(in vob : ValidatableObject)
+marshal()
+unmarshal()
Marshaller
PCData
+marshal(in mob : MarshallableObject)
+writer() : XMLWriter
+chars() : String
+chars(in chars : String)
Unmarshaller
+scanner() : XMLScanner
+unmarshal() : MarshallableObject
MarshallableRootElement
-_ID : String
-_OBJID : String
-_LABEL : String
-_TYPE : String
-_PROFILE : String
-_content : List
+get*()
+set*()
+marshal(in m : Marshaller)
+validate(in v : Validator)
+validateThis()
+unmarshal(in u : Unmarshaller)
MetsHdr
DmdSec
AmdSec
+marshal()
+validate()
FileSec
XMLScanner
XMLWriter
+atAttribute() : boolean
+atAttributeValue() : boolean
+atEnd() : boolean
+atStart() : boolean
+atChars() : boolean
+takeAttributeName() : String
+takeAttributeValue() : String
+takeChars() : String
+takeEmpty()
+takeEnd() : String
+takeStart() : String
+chars(in chars : String)
+end(in name : String)
+flush()
+leaf()
+start(in name : String)
+attribute(in name : String, in value : String)
StructMap
javax.xml.marshal
DLF Spring Forum
2002
METS Java Toolkit
BehaviorSec
6
Why Do We Need a New API?
• Why not use DOM?
– Unnatural unit of granularity: elements and
attributes are both nodes in DOM tree
• Why not JDOM?
– Explicit support for validation
• JAXB compiler could (potentially) be used
to support METS upgrades.
DLF Spring Forum
2002
METS Java Toolkit
7
Procedural Construction
• The initial current element is <mets>
• For each child element in the current
element’s content model:
–
–
–
–
Instantiate an appropriate element object
Set its attributes
Define its content model
Add it to the content model of its parent
DLF Spring Forum
2002
METS Java Toolkit
8
Procedural Construction (Ex.)
Mets mets = new Mets();
mets.setID ("1234");
MetsHdr metsHdr = new MetsHdr();
metsHdr.setCREATEDATE(new Date());
Agent agent = new Agent();
agent.setROLE(Role.CREATOR);
Name name = new Name ();
name.getContent().add(new PCData ("S. Abrams"));
agent.getContent().add(name);
metsHdr.getContent().add(agent);
mets.getContent().add(metsHdr);
...
DLF Spring Forum
2002
METS Java Toolkit
9
Validation
• Global
– ID uniqueness
– IDREF-to-ID consitency
• Local
– Existence of required attributes and content
model elements
Mets mets = new Mets();
...
mets.validate ();
DLF Spring Forum
2002
METS Java Toolkit
10
Marshalling
• Serializing in-memory representation to an
output stream.
Mets mets = new Mets();
...
FileOutputStream out =
new FileOutputStream("mets.xml");
mets.validate ();
mets.marshal(out);
DLF Spring Forum
2002
METS Java Toolkit
11
Unmarshalling
• Parsing instance document and creating inmemory representation.
• Implicit local validation during parsing;
global validation must be explicit.
• Internal parsing with Jim Clark’s XP.
FileInputStream in =
new FileInputStream("mets.xml");
Mets mets = Mets.unmarshal(in);
mets.validate ();
...
DLF Spring Forum
2002
METS Java Toolkit
12
Extension Schemas
• Toolkit could be extended to include
explicit support for additional schemas.
• Generic namespace-aware Any class:
Any any = new Any("elem");
any.setAttribute("attr", value);
String attr = any.getAttribute("attr");
any.getContent().add(child);
DLF Spring Forum
2002
METS Java Toolkit
13
Additional Work
• To be done any day now…
–
–
–
–
Support for <area>, <par>, and <seq>
Strict validation of sequence ordering
Marshal non-UTF-8 encodings
Base64 encoding/decoding methods for
binData and Fcontent
– Support for entity references
– Diagnostic error messages
DLF Spring Forum
2002
METS Java Toolkit
14
Distribution
• HUL’s intent is to make the toolkit freely
available under an Open Source license.
• Minimal support (if any).
• Community process for maintenance?
• Does an appropriate organizational home
exist?
DLF Spring Forum
2002
METS Java Toolkit
15
Implementation
• METS schema, Version 1.0 (zeta)
• JAXB specification, Version 0.21
<http://java.sun/xml/jaxb>
•
•
•
•
XP, Version 0.5 <http://jclark.com/xml/xp>
Java J2SE and JDK 1.3.1
Solaris 2.7
Home page: <http://hul.harvard.edu/mets>
DLF Spring Forum
2002
METS Java Toolkit
16
public class Marshal
{
public static void main (String [] args)
{
Mets mets = new Mets ();
mets.setOBJID ("1234-5678(2002)9:1<>1.0.CO;9-X");
mets.setLABEL ("METS Java toolkit");
mets.setTYPE ("Article");
MetsHdr metsHdr = new MetsHdr ();
metsHdr.setCREATEDATE (new Date ());
metsHdr.setRECORDSTATUS ("DRAFT");
Agent agent = new Agent ();
agent.setROLE (Role.CREATOR);
Name name = new Name ();
name.getContent ().add (new
PCData ("S. L. Abrams"));
agent.getContent ().add (name);
Note note = new Note ()
note.getContent ().add (new
PCData ("HUL/OIS"));
agent.getContent ().add (note);
note = new Note ();
note.getContent ().add (new
PCData ("Special order, 2002/02/25"));
agent.getContent ().add (note);
metsHdr.getContent ().add (agent);
AltRecordID doi = new AltRecordID ();
doi.setTYPE ("DOI");
doi.getContent ().add (new
PCData ("10.1234/56789"));
AltRecordID nrs = new AltRecordID ();
nrs.setTYPE ("NRS");
nrs.getContent ().add (new
PCData ("nrs:hul.ois:10203"));
metsHdr.getContent ().add (doi);
metsHdr.getContent ().add (nrs);
mets.getContent ().add (metsHdr);
DmdSec dmdSec = new DmdSec ();
dmdSec.setID ("xyz-123");
MdRef mdRef = new MdRef ();
(Loctype.DOI);
mdRef.setLOCTYPE
MdRef.setMDTYPE (Mdtype.DC);
mdRef.setMIMETYPE ("text/xml");
...
import java.util.*;
import org.mets.xml.bind.*;
import org.mets.xml.mets.*;
Marshal.java
DLF Spring Forum
2002
METS Java Toolkit
17
...
mdRef.setXlinkHref ("10.9876/54321");
dmdSec.getContent ().add (mdRef);
MdWrap mdWrap = new MdWrap ();
mdWrap.setMDTYPE (Mdtype.MARC);
BinData binData = new BinData ();
binData.getContent ().add (new
PCData ("AbC…Yz0123456789"));
mdWrap.getContent ().add (binData);
dmdSec.getContent ().add (mdWrap);
mets.getContent ().add (dmdSec);
AmdSec amdSec = new AmdSec ();
TechMD techMD = new TechMD ();
techMD.setID ("t-1234");
mdWrap = new MdWrap ();
(Mdtype.OTHER);
mdWrap.setMDTYPE
mdWrap.setOTHERMDTYPE ("MyTechMD");
XmlData xmlData = new XmlData ();
Any any = new Any ("my", "techMD");
any.getAttributes ().add (new
Attribute ("ID", "AB123"));
any.getAttributes ().add (new
Attribute ("my", "type", "TIFFF"));
any.getContent ().add (new
PCData ("...technical MD..."));
xmlData.getContent ().add (any);
mdWrap.getContent ().add (xmlData);
techMD.getContent ().add (mdWrap);
amdSec.getContent ().add (techMD);
RightsMD rightsMD = new RightsMD ();
rightsMD.setID ("r-5678");
mdWrap = new MdWrap ();
(Mdtype.OTHER);
mdWrap.setMDTYPE
mdWrap.setOTHERMDTYPE ("MyRightsMD");
xmlData = new XmlData ();
any = new Any ("my", "rightsMD");
any.getContent ().add (new
PCData ("...rights MD..."));
xmlData.getContent ().add (any);
any = new Any ("your", "rightsMD");
any.getContent ().add (new
PCData ("...rights MD..."));
xmlData.getContent ().add (any);
any = new Any ("their", "rightsMD");
any.getContent ().add (new
PCData ("...rights MD..."));
...
Marshal.java (cont.)
DLF Spring Forum
2002
METS Java Toolkit
18
...
xmlData.getContent ().add (any);
mdWrap.getContent ().add (xmlData);
rightsMD.getContent ().add (mdWrap);
amdSec.getContent ().add (rightsMD);
SourceMD sourceMD = new SourceMD ();
sourceMD.setID ("s-9012");
mdWrap = new MdWrap ();
(Mdtype.OTHER);
mdWrap.setMDTYPE
mdWrap.setOTHERMDTYPE ("MySourceMD");
xmlData = new XmlData ();
any = new Any ("my", "sourceMD");
any.getAttributes ().add (new
Attribute ("aat", "type",
new Integer (178684)));
any.getContent ().add (new
PCData ("...source MD..."));
xmlData.getContent ().add (any);
mdWrap.getContent ().add (xmlData);
sourceMD.getContent ().add (mdWrap);
amdSec.getContent ().add (sourceMD);
DigiprovMD digiprovMD = new DigiprovMD ();
digiprovMD.setID ("d-3456");
mdWrap = new MdWrap ();
(Mdtype.OTHER);
mdWrap.setMDTYPE
mdWrap.setOTHERMDTYPE ("MyDigiprovMD");
xmlData = new XmlData ();
any = new Any ("my", "digiprovMD");
any.getContent ().add (new
PCData ("...provenance MD..."));
xmlData.getContent ().add (any);
mdWrap.getContent ().add (xmlData);
digiprovMD.getContent ().add (mdWrap);
amdSec.getContent ().add (digiprovMD);
mets.getContent ().add (amdSec);
FileSec fileSec = new FileSec ();
FileGrp fileGrp = new FileGrp ();
fileGrp.getADMID ().add ("t-1234");
fileGrp.getADMID ().add ("s-9012");
File file = new File ();
file.setID ("a1b2c3");
FLocat flocat = new FLocat ();
(Loctype.URN);
flocat.setLOCTYPE
flocat.setXlinkHref ("urn:nid:nss");
file.getContent (). add (flocat);
FContent fcontent = new FContent ();
...
Marshal.java (cont.)
DLF Spring Forum
2002
METS Java Toolkit
19
DLF Spring Forum
2002
METS Java Toolkit
}
}
mets.validate ();
mets.marshal (System.out);
...
fcontent.getContent ().add (new
PCData ("MS0yLTM="));
file.getContent ().add (fcontent);
fileGrp.getContent ().add (file);
fileSec.getContent ().add (fileGrp);
mets.getContent ().add (fileSec);
StructMap structMap = new StructMap ();
structMap.setID ("A125");
structMap.setLABEL ("Individual volumes");
Div div = new Div ();
(25);
div.setORDER
div.setORDERLABEL ("xxv");
("Chapter");
div.setTYPE
Div sec = new Div ();
sec.setTYPE ("Section");
Div sub = new Div ();
sub.setTYPE ("Sub-section");
Fptr fptr = new Fptr ();
fptr.setFILEID ("a1b2c3");
sub.getContent ().add (fptr);
sec.getContent ().add (sub);
div.getContent ().add (sec);
sec = new Div ();
sec.setTYPE ("Section");
Mptr mptr = new Mptr ();
("123-45-6789");
mptr.setID
(Loctype.OTHER);
mptr.setLOCTYPE
mptr.setOTHERLOCTYPE ("filepath");
mptr.setXlinkHref ("dir/file.xml");
sec.getContent ().add (mptr);
div.getContent ().add (sec);
structMap.getContent ().add (div);
mets.getContent ().add (structMap);
BehaviorSec behavior = new BehaviorSec ();
("killerapp");
behavior.setID
behavior.getSTRUCTID ().add ("A125");
behavior.getSTRUCTID ().add ("s-9012");
Mechanism mechanism = new Mechanism ();
(Loctype.URL);
mechanism.setLOCTYPE
mechanism.setXlinkHref ("http://host/path");
behavior.getContent ().add (mechanism);
mets.getContent ().add (behavior);
Marshal.java (cont.)
20
<mets xmlns="http://www.loc.gov/METS/”
xmlns:xlink="http://www.w3.org/1999/xlink”
xmlns:xsi="http://www.w3.org/2001/XMLSchemainstance”
xsi:schemaLocation="http://www.loc.gov/METS/
http://www.loc.gov/standards/mets/mets.xsd”
OBJID="1234-5678(2002)9:1&lt;&gt;1.0.CO;9-X”
LABEL="METS Java toolkit" TYPE="Article">
<metsHdr CREATEDATE="2002-03-15T161023”
RECORDSTATUS="DRAFT">
<agent ROLE="CREATOR">
<name>S. L. Abrams</name>
<note>HUL/OIS</note>
<note>Special order, 2002/02/25</note>
</agent>
<altRecordID TYPE="DOI">10.1234/56789</altRecordID>
<altRecordID TYPE="NRS">nrs:hul.ois:10203</altRecordID>
</metsHdr>
<dmdSec ID="xyz-123">
<mdRef LOCTYPE="DOI" xlink:type="simple”
xlink:href="10.9876/54321" MDTYPE="DC"
MIMETYPE="text/xml"/>
<mdWrap MDTYPE="MARC">
<binData>AbCdEfGhIjKlMnOpQrStUvWxYz0123456789</binData>
</mdWrap>
</dmdSec>
<amdSec>
<techMD ID="t-1234">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyTechMD">
<xmlData>
<my:techMD ID="AB123" my:type="TIFF">...technical
MD...</my:techMD>
</xmlData>
</mdWrap>
</techMD>
<rightsMD ID="r-5678">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyRightsMD">
<xmlData>
<my:rightsMD>...rights MD...</my:rightsMD>
<your:rightsMD>...rights MD...</your:rightsMD>
<their:rightsMD>...rights MD...</their:rightsMD>
</xmlData>
</mdWrap>
</rightsMD>
...
marshal.xml
DLF Spring Forum
2002
METS Java Toolkit
21
...
<sourceMD ID="s-9012">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MySourceMD">
<xmlData>
<my:sourceMD aat:type="178684">...source
MD...</my:sourceMD>
</xmlData>
</mdWrap>
</sourceMD>
<digiprovMD ID="d-3456">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyDigiprovMD">
<xmlData>
<my:digiprovMD>...provenance MD...</my:digiprovMD>
</xmlData>
</mdWrap>
</digiprovMD>
</amdSec>
<fileSec>
<fileGrp ADMID="t-1234 s-9012">
<file ID="a1b2c3">
<FLocat LOCTYPE="URN" xlink:type="simple”
xlink:href="urn:nid:nss"/>
<FContent>MS0yLTM=</FContent>
</file>
</fileGrp>
</fileSec>
<structMap ID="A125" LABEL="Individual volumes">
<div ORDER="25" ORDERLABEL="xxv" TYPE="Chapter">
<div TYPE="Section">
<div TYPE="Sub-section">
<fptr FILEID="a1b2c3"/>
</div>
</div>
<div TYPE="Section">
<mptr ID="123-45-6789" LOCTYPE="OTHER”
OTHERLOCTYPE="filepath”
xlink:type="simple" xlink:href="dir/file.xml"/>
</div>
</div>
</structMap>
<behaviorSec ID="killerapp" STRUCTID="A125 s-9012">
<mechanism LOCTYPE="URL" xlink:type="simple”
xlink:href="http://host/path"/>
</behaviorSec>
</mets>
marshal.xml (cont.)
DLF Spring Forum
2002
METS Java Toolkit
22