無投影片標題

Download Report

Transcript 無投影片標題

Document Object Model
(DOM)
Cheng-Chia Chen
What is DOM ?
• DOM (Document Object Model)
• A tree-view Data model of XML Documents
• An API for XML document processing
–
–
–
–
cross multi-languages
language neutral.
defined in terms of CORBA IDL
language-specific bindings supplied for ECMAScript, java, ….
DOM (Document Object Model)
What is the Document Object Model of the following document:
<?xml version=“1.0” encoding=“UTF-8” ?>
<TABLE>
<TBODY>
<TR>
<TD>紅樓夢</TD>
<TD>曹雪芹</TD>
</TR>
<TR>
<TD>三國演義</TD>
<TD>羅貫中</TD>
</TR>
</TBODY>
</TABLE>
Tree view (DOM view) of an XML Docuemnt
(document node; root)
(element node)
(text node)
紅樓夢
曹雪芹
三國演義
羅貫中
Class/interface Hierarchy of DOM (core) level 1&2 spec.
Attr
Commnet
CharacterData
NamedNodeMap
DocumentType
NodeList
DocumentFragment
Document
Node
Element
DOMImplementation
(general) Entity
EntityReference
DOMException
Notation
ProcessingInstruciton
Text
CDATASection
Possible children of different kinds of nodes
• Document
•
– Element (≤ 1), DocumentType (≤ 1) , ProcessingInstruction,
Comment,
Element , DocumentFragment, EntityReference,
Entity
– Element, ProcessingInstruction, Comment, Text,
CDATASection, EntityReference
• Attr
– Text, EntityReference
• Text, CDATASection, Comment, Notation,
ProcessingInstruction, DocumentType
– are leaves [ no children]
Notes: 1. Attr is not a child of any element.
2. Entities and Natations defined in DTD can be accessed via
getEntities() and getNatations() of DocumentType.
DOM Tree Classes – UML Model
Source: http://www.xml.com/1997/07/dom/dom.gif
Node and Nodetype constants
public interface Node {
// NodeType: there are 12 kinds of nodes
public static final short
public static final short
public static final short
public static final short
public static final short
public static final short
public static final short
7;
public static final short
public static final short
public static final short
ELEMENT_NODE
= 1;
ATTRIBUTE_NODE
= 2;
TEXT_NODE
= 3;
CDATA_SECTION_NODE
= 4;
ENTITY_REFERENCE_NODE
= 5;
ENTITY_NODE
= 6;
PROCESSING_INSTRUCTION_NODE =
COMMENT_NODE
DOCUMENT_NODE
DOCUMENT_TYPE_NODE
= 8;
= 9;
= 10;
public static final short DOCUMENT_FRAGMENT_NODE =
11;
public static final short NOTATION_NODE
= 12;
IDL2Java Mapping of IDL attributes
// syntax of IDL attributes:
[readonly] attribute <type> <attrName> [// raise (<exception>) ]*
// we will abbreviate it by
<type>[R]:<attrName>
which is translated into one or two java methods:
• public <type> get<AttrName>() [throws {<exceptions>}];
if it is readable and
• public void set<AttrName>(<type> <newAttValue> )
[throws {<exceptions>}];
if it is writable.
Example:
• The following attributes of the Node interface :
readonly attribute DOMString
nodeName;
attribute DOMString
nodeValue;
// raises(DOMException) on setting
// raises(DOMException) on retrieval
readonly attribute Node parentNode; are abbreviated as:
String[R]:nodeName,
String:nodeValue,
String[R]:parentNode, respectively, and will be mapped to 4 java
methods:
public String getNodeName();
public String getNodeValue() throws DOMException;
public void
setNodeValue(String nodeValue)
throws DOMException;
public Node
getParentNode();
Node attributes
// nodeName, nodeType and nodeValue
• String[R] : nodeName;
• short[R] : nodeType;
• String : nodeValue;
// raise(DOMException) on get/set
// namespace support: DOM2 only
• String[R] : namespaceURI;
• String[R] : localName;
• String : prefix
// node owner:
• Document[R]: ownerDocument;
Values of NodeName, NodeType and attributes in a Node
Interface
nodeName
nodeValue
attributes
Attr
name of attribute
value of attribute
null
CDATASection #cdata-section
content
null
Comment
#comment
content
null
Document
#document
null
null
DocumentFragment
#document-fragment
null
null
DocumentType document type name null
null
Element
tag name
null
NamedNodeMap
Entity
entity name
null
null
EntityReference
null
name of entity referenced
null
Notation
notation name
null
null
ProcessingInstruction
content excluding target
target
null
Text
#text
content of the text node
null
Node attributes
// node relatives
• Node[R] : parentNode, firstChild, lastChild,
• Node[R] : previousSibling, nextSibling;
• NodeList [R] : childNodes;
• NamedNodeMap[R]: attributes;
parentNode
this
previousSlibling
nextSibling
firstChild
lastChild
childNodes
Node manipulation and testing Methods
public Node insertBefore(Node newChild, Node refChild)
public Node replaceChild(Node newChild, Node oldChild)
public Node removeChild(Node oldChild)
public Node appendChild(Node newChild)
// all the above 4 methods throws DOMException;
public boolean hasChildNodes();
public Node cloneNode(boolean deep);
// Introduced in DOM Level 2:
public boolean hasAttributes(); // ture if element and
hasAttributes
public void normalize(); // merge descendant adjacent
Texts into one
public boolean isSupported(String feature, String version);
// same as hasFeature(feature, version) in
DOMImplementation
NodeList and NamedNodeMap
public interface NodeList { // access node collection by index
public Node
item(int index); // zero-based
public int
getLength();
}
public interface NamedNodeMap {
public Node getNamedItem(String name); // by nodeName
public Node setNamedItem(Node arg) throws DOMException;
// insert/replace node with nodeName= arg.getNodeName()
public Node removeNamedItem(String name) throws DOMException;
public Node item(int index);
public int
getLength();
// Introduced in DOM Level 2:
public Node getNamedItemNS(namespaceURI, localName);
public Node setNamedItemNS(Node arg) throws DOMException;
public Node removeNamedItemNS(namespaceURI, localName)
throws DOMException ;
}
Element
public interface Element extends Node {
public String getTagName(); // String[R]:tagName =getName()
public String getAttribute(name); //value
// set/replace attr ; value not parsed; for value with entity reference,
// use setAttributeNode instead
public void
setAttribute(name, value) throws DOMException;
public void
removeAttribute(name) throws DOMException;
public Attr getAttributeNode(name);
public Attr setAttributeNode(Attr newAttr) // add/replace newAttr;
throws DOMException; // return replaced attr or null
public Attr removeAttributeNode(Attr oldAttr)
throws DOMException;
public NodeList getElementsByTagName(name);
// and additional DOM2 methods …
Additional ELEMENT methods in DOM2
// Introduced in DOM Level 2:
String getAttributeNS(namespaceURI, localName);
void setAttributeNS(namespaceURI, qualifiedName, value)
throws DOMException;
// set/replace attribute; value not parsed
void removeAttributeNS(namespaceURI, localName) throws
DOMException;
Attr getAttributeNodeNS(namespaceURI, localName);
Attr setAttributeNodeNS(Attr newAttr) throws DOMException;
NodeList getElementsByTagNameNS(namespaceURI,
localName);
boolean hasAttribute(name);
boolean hasAttributeNS(namespaceURI, localName);
};
the Document node
public interface Document extends Node {
// 3 attributes:
DocumentType[R]: doctype;
DOMImplementation[R]; implementation;
Element[R]:
documentElement;
// factory methods: <nodetype> create<nodetype>(data) ;
Element
createElement(String tagName)
throws DOMException;
DocumentFragment
createDocumentFragment();
Text
createTextNode(String data);
Comment
createComment(String data);
CDATASection
createCDATASection(String data)
throws DOMException;
ProcessingInstruction createProcessingInstruction(String target, String data)
throws DOMException;
the Document node (cont’d)
Attr
createAttribute(name) throws DOMException;
EntityReference createEntityReference(name)
throws DOMException;
// end of factory methods
NodeList
getElementsByTagName(tagname);
// DOM 2
Node
importNode(Node importedNode, boolean deep)
throws DOMException;
Element
createElementNS(namespaceURI, qualifiedName)
throws DOMException;
Attr
createAttributeNS(namespaceURI, qualifiedName)
throws DOMException;
NodeList getElementsByTagNameNS(namespaceURI, localName);
public Element getElementById(String elementId); }
CharacterData
public interface CharacterData extends Node {
public String
getData() throws DOMException;
public void
setData(String data) throws DOMException;
public int
getLength();
public String
substringData(int offset, int count)
throws DOMException;
public void
appendData(String arg) throws
DOMException;
public void
insertData(int offset, String arg)
throws DOMException;
public void
deleteData(int offset, int count)
throws DOMException;
public void
replaceData(int offset, int count, String arg)
throws DOMException; }
Attr, Text and Comment
public interface Attr extends Node {
public String
getName();
public boolean
getSpecified();
public String
getValue();
public void
setValue(String value);
public Element
getOwnerElement(); // DOM2
}
public interface Text extends CharacterData {
public Text splitText(int offset) throws DOMException;
}
public interface Comment extends CharacterData {
}
CDATASection, DocumentType and Notation
public interface CDATASection extends Text {}
public interface DocumentType extends Node {
String
getName();
NamedNodeMap
getEntities(); // GEs (int/external) only,
// PEs excluded
NamedNodeMap
getNotations();
// DOM2 only methods
String
getPublicId(); // publicId and
String
getSystemId(); // systemId of external subset if any
String
getInternalSubset(); // internal subset as a string
}
public interface Notation extends Node {
public String
getPublicId();
public String
getSystemId(); }
Entity, EntityReference and ProcessingInstruction
public interface Entity extends Node { // for GE or unparsed
public String
getPublicId(); // entity only.
public String
getSystemId();
public String
getNotationName(); }
// Entity’s replacement Text are stored as its readonly
// childNodes if available.
public interface EntityReference extends Node { }
// referred entity contents are children of this node.
public interface ProcessingInstruction extends Node {
public String
getTarget();
public String
getData();
public void
setData(String data) throws
DOMException; }
DOMException
public abstract class DOMException extends RuntimeException {
public DOMException(short code, String message) {
super(message); this.code = code; }
public short code;
// ExceptionCode
public static final short
INDEX_SIZE_ERR
= 1;
public static final short
DOMSTRING_SIZE_ERR
= 2;
public static final short
HIERARCHY_REQUEST_ERR = 3;
public static final short
WRONG_DOCUMENT_ERR
= 4;
public static final short
INVALID_CHARACTER_ERR
= 5;
public static final short
NO_DATA_ALLOWED_ERR
= 6;
public static final short
NO_MODIFICATION_ALLOWED_ERR = 7;
public static final short
NOT_FOUND_ERR
= 8;
public static final short
NOT_SUPPORTED_ERR
= 9;
public static final short
INUSE_ATTRIBUTE_ERR
= 10;
DOMException
// DOM2 only DOMException code
public static final short INVALID_STATE_ERR
= 11;
public static final short SYNTAX_ERR
= 12;
public static final short INVALID_MODIFICATION_ERR = 13;
public static final short NAMESPACE_ERR
= 14;
public static final short INVALID_ACCESS_ERR
= 15;
}
DOMImplementation and DocumentFragment
public interface DOMImplementation {
public boolean hasFeature(String feature, String version);
public DocumentType createDocumentType(qName, publicId,
systemId) throws DOMException;
public Document createDocument(
namespaceURI, // namespace URI of the document element
qName, // QName of the document element
DocumentType doctype) throws DOMException;
}
public interface DocumentFragment extends Node { }
legal feature string
Module
Feature String
XML
XML
HTML
HTML
Views
Views
StyleSheets
StyleSheets
CSS
CSS
CSS (extended interfaces)
CSS2
Events
Events
User Interface Events (UIEvent interface)
UIEvents
Mouse Events (MouseEvents interface)
MouseEvents
Mutation Events (MutationEvent interface)
MutationEvents
HTML Events
HTMLEvents
Traversal
Traversal
Range
Range
Module dependence
Module
Views
StyleSheets
CSS
CSS2
Events
UIEvents
MouseEvents
MutationEvents
HTMLEvents
Implies
XML or HTML
StyleSheets and XML or HTML
StyleSheets, Views and XML or HTML
CSS, StyleSheets, Views and XML or HTML
XML or HTML
Views, Events and XML or HTML
UIEvents, Views, Events and XML or HTML
Events and XML or HTML
Events and XML or HTML
DOMParsers and DOMImplementations
Problems:
• How to get a DOM object from an XML Document ?
– DOMParser
• HOW to construct DOM objects directly by programs ?
– get a DOMImplementation
• HOW to get a DOM object form an XML Document and
modify it by programs ?
– get a DOMParser and then get the DOMImplementation from the
DOM object.
XML
Document
DOM
Parser
DOM
Document
Use Apache’s xerces for DOM
• XML2DOM:
// find the DOM parser implementation class:
org.apache.xerces.parsers.DOMParser
DOMParser parser = new DOMParser();
parser.setFeature(("http://xml.org/sax/features/validation", true );
parser.setFeature(("http://xml.org/sax/features/namespace", true ); …
parser.parse( url_or_inputSource) ;
Document doc = parser.getDocument();
DOMImplementation =doc.getImplementation();
• Construct DOM from scratch:
// find DOMImplematation class:
org.apache.xerces.dom.DOMImplementationImpl
DOMImplementation dm = new DOMImplementationImpl();
// or dm = DOMImplementationImpl.getDOMImplementation(); // non-dom
Document doc = dm.createDocument(…);
Element e = doc.createElement(…);
Attr attr = doc.createAttributeNS(…);
Text txt = doc.createTextNode(“…”);
JAXP (Java API for XML Processing) 1.1
• Sun’s Java API for XML Processing
• three modules:
– for DOM Processing
– for SAX Processing
– for Transformation
• 5 packages
1. javax.xml.parsers
– Provides classes allowing the processing of XML documents.
– Two types of plugable parsers are supported:
– SAX (Simple API for XML)
– DOM (Document Object Model)
2. javax.xml.transform ( + javax.xml.transform.dom,
javax.xml.transform.sax, javax.xml.transform.stream)
– APIs for processing transformation instructions, and performing a
transformation from source to result.
JAXP’s DOM plugability mechanism
JAXP API for DOM
• javax.xml.parsers.DocumentBuilder
– Using this class, an application programmer can obtain a
Document from XML.
• javax.xml.parsers.DocumentBuilderFactory
– Defines a factory API that enables applications to obtain a
parser that produces DOM object trees from XML
documents.
– abstract class
– Concrete subclass can be obtained by the static method:
DocumentBuilderFactory.newInstance()
– desired capability of the parser can be specified by setting
the various properties of the obtained factory instance.
Example Code
import javax.xml.parsers.*;
DocumentBuilder builder;
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(true);
String location = "http://myserver/mycontent.xml";
try {
builder = factory.newDocumentBuilder();
Document doc1 = builder.parse(location);
Document doc2 = builder.newDocument(); //empty document
} catch (SAXException se) {// handle error
} catch (IOException ioe) { // handle error
} catch (ParserConfigurationException pce){// handle
error
}
javax.xml.dom.DocumentBuilder
• abstract DOMImplementation getDOMImplementation()
– Obtain an instance of a DOMImplementation object.
• abstract Document newDocument()
– Obtain a new instance of a DOM Document object to build a DOM tree
with.
• abstract boolean isNamespaceAware()
– Indicates whether or not this parser is configured to understand
namespaces.
• abstract boolean isValidating()
– Indicates whether or not this parser is configured to validate XML
documents.
• Document parse(File | InputSource | InputStream [, systemId] |
uriString )
– Parse the content of the given file as an XML document and return a new
DOM Document object.
• abstract void setEntityResolver(EntityResolver er)
– Specify the EntityResolver to be used to resolve entities present in the
XML document to be parsed.
• abstract void setErrorHandler(ErrorHandler eh)
– Specify the ErrorHandler to be used to report errors present in the XML
document to be parsed.
javax.xml.dom.DocumentBuilderFactory
• Object getAttribute(String name)
• void setAttribute(String name, Object value)
– Allows the user to set/get specific attributes on the underlying
implementation.
• boolean isIgnoringComments() ,
setIgnoringComments(boolean)
– Indicates whether or not the factory is configured to produce parsers
which ignores comments.
• Other properties:
– IgnoringElementContentWhitespace ; ExpandEntityReferences;
– Coalescing; // merge adjacent texts and CDATA into a text node
– NamespaceAware; Validating;
• abstract DocumentBuilder newDocumentBuilder()
– Creates a new instance of a DocumentBuilder using the currently
configured parameters.
• static DocumentBuilderFactory newInstance()
– Obtain a new instance of a DocumentBuilderFactory.
HOW DocumentBuilderFactory finds its instance
•Use the javax.xml.parsers.DocumentBuilderFactory
system property
•Use the above property at file “%JAVA_HOME%/lib/jaxp.properties"
in the JRE directory.
•look for the classname in the file META-INF/services/
javax.xml.parsers.DocumentBuilderFactory in
jars available to the runtime.
•Platform default DocumentBuilderFactory instance, which is
"org.apache.crimson.jaxp.DocumentBuilderFactoryImpl“ for
JAXP1.1 and crimson1.1.
Bootstrap DOM (level 3 core)
• Problem : how to get a DOMImplementation ?
– implementation dependant prior to level 3.
– xerces: => org.apache.xerces.dom.DOMImplmentationImpl;
– crimson =>org.apache.crimson.tree.DOMImplementationImpl
• two supporting class/interface:
– DOMImplementationRegistry
– DOMImplementationSource
public interface DOMImplementationSource {
DOMImplementation
getDOMImplementation(String features);
};
DOMImplementationRegistry
public class DOMImplementationRegistry
{ // The system property to specify the DOMImplementationSource class
names.
public static String PROPERTY =
"org.w3c.dom.DOMImplementationSourceList";
private static Vector sources = new Vector();
private static boolean initialized = false;
private static void initialize() throws ClassNotFoundException,
InstantiationException, IllegalAccessException
{
initialized = true;
String p = System.getProperty(PROPERTY);
if (p == null) { return;
}
StringTokenizer st = new StringTokenizer(p);
while (st.hasMoreTokens()) {
Object source = Class.forName(st.nextToken()).newInstance();
sources.addElement(source);
}}
public static DOMImplementation getDOMImplementation(String features)
throws ClassNotFoundException,
InstantiationException, IllegalAccessException
{
if (!initialized) { initialize(); }
int len = sources.size();
for (int i = 0; i < len; i++) {
DOMImplementationSource source =
(DOMImplementationSource) sources.elementAt(i);
DOMImplementation impl = source.getDOMImplementation(features);
if (impl != null) {
return impl;
}
}
return null;
}
/* Register an implementation.
*/
public static void addSource(DOMImplementationSource s)
throws ClassNotFoundException,
InstantiationException, IllegalAccessException
{
if (!initialized) { initialize(); }
sources.addElement(s);
// update system property accordingly
StringBuffer b = new
StringBuffer(System.getProperty(PROPERTY));
b.append(" " + s.getClass().getName());
System.setProperty(PROPERTY, b.toString()); }}
Get Your DOMImplementation via DOMImplementationRegistry
1. Add all known DOMImplementationSource classes or
classnames to your JVM:
A. put all classnames (space separated) into the System
property "org.w3c.dom.DOMImplementationSourceList”
System.putProperty(PROPERTY, classnames);
B. DOMImplementationRegistry
.addSource(DOMImplementationSource);
2. Query DOMImplementationReqistry:
DOMImplementation impl = DOMImplementationRegistry
.getDOMImplementation("XML 1.0");
Example: XDXTest
import java.io.File;
import org.w3c.dom.Document;
import org.apache.xerces.parsers.DOMParser;
public class XDXTest {
public void test(String xmlDocument, String outputFilename)
throws Exception {
File outputFile = new File(outputFilename);
DOMParser parser = new DOMParser();
// Get the DOM tree as a Document object
parser.parse(xmlDocument);
Document doc = parser.getDocument();
// Serialize
DOM2XML d2x = new DOM2XML();
d2x.toXML(doc, new File(outputFilename));
}
DOM SerializerTest (continued)
public static void main(String[] args) {
if (args.length != 2) {
System.out.println(
"Usage: java XDXTest " +
"[XML document to read] " +
"[filename to write out to]");
System.exit(0);
}
try {
XDXTest tester = new XDXTest();
tester.test(args[0], args[1]); // input file, outpt file name
} catch (Exception e) {
e.printStackTrace();
}
}
}
DOMSerializer
import java.io.*;
import org.w3c.dom.*
public class DOM2XML {
private String indent; // Indentation to use
private String lineSeparator; // Line separator to use
public DOM2XML() {
indent = "";
lineSeparator = "\n"; }
public void setIndent(String indent) { this.indent = indent; }
public void setLineSeparator(String lineSeparator) { …}
public void toXML(Document doc, OutputStream out)
throws IOException {
Writer writer = new OutputStreamWriter(out);
serialize(doc, writer); }
public void toXML(Document doc, File file)
throws IOException { … }
public void toXML(Document doc, Writer writer)
throws IOException { // Start serialization recursion with no indenting
serializeNode(doc, writer, "");
writer.flush(); }
public void serializeNode(Node node, Writer writer, String indentLevel)
throws IOException {
// Determine action based on node type
switch (node.getNodeType()) {
case Node.DOCUMENT_NODE:
writer.write("<?xml version=\"1.0\"?>");
writer.write(lineSeparator);
// recurse on each child
NodeList nodes = node.getChildNodes();
if (nodes != null) {
for (int i=0; i<nodes.getLength(); i++) {
serializeNode(nodes.item(i), writer, "");
} }
break;
case Node.ELEMENT_NODE:
String name = node.getNodeName();
writer.write(indentLevel + "<" + name);
NamedNodeMap attributes = node.getAttributes();
for (int i=0; i<attributes.getLength(); i++) {
Node current = attributes.item(i);
writer.write(" " + current.getNodeName() + "=\"" +
current.getNodeValue() + "\"");
} writer.write(">"); // end of STAG
NodeList children = node.getChildNodes();
if (children != null) {
if ((children.item(0) != null) && (children.item(0).getNodeType() ==
Node.ELEMENT_NODE)) { writer.write(lineSeparator); }
for (int i=0; i<children.getLength(); i++) {
serializeNode(children.item(i), writer, indentLevel + indent); }
if ((children.item(0) != null) && (children.item(children.getLength()-1)
.getNodeType() == Node.ELEMENT_NODE)) {
writer.write(indentLevel);
}}
writer.write("</" + name + ">"); writer.write(lineSeparator); break;
case Node.TEXT_NODE:
writer.write(node.getNodeValue());
break;
case Node.CDATA_SECTION_NODE:
writer.write("<![CDATA[" + node.getNodeValue() + "]]>");
break;
case Node.COMMENT_NODE:
writer.write(indentLevel + "<!-- " + node.getNodeValue() + " -->");
writer.write(lineSeparator);
break;
case Node.PROCESSING_INSTRUCTION_NODE:
writer.write("<?" + node.getNodeName() + " " + node.getNodeValue()
+ "?>");
writer.write(lineSeparator);
break;
case Node.ENTITY_REFERENCE_NODE:
writer.write("&" + node.getNodeName() + ";");
break;
case Node.DOCUMENT_TYPE_NODE:
DocumentType docType = (DocumentType)node;
writer.write("<!DOCTYPE " + docType.getName());
if (docType.getPublicId() != null) {
writer.write(" PUBLIC \"" + docType.getPublicId() + "\" ");
} else {
writer.write(" SYSTEM ");
}
writer.write("\"" + docType.getSystemId() + "\">");
writer.write(lineSeparator);
break;
}
}}