Transcript Document

1
Before today’s lecture
• Personal Project
– Due date (including demo your work): 4/12
– Grading scheme
Application
 All XML documents
 Schema documents
 Application source codes
 Web-based interfaces
 Other source codes
50%
Paper
Project paper
40%
Demonstration
Design layout and functionalities
10%
2
Before today’s lecture
• Final Project
– Group members:
• Deadline (for grouping your members): Before 4/10
• Send the name list of your group members to 尚純 or 紹楷
• For those who can’t make a team, we’ll make a group for you. The group
members will be posted on 4/12
• If you want to make a change, the deadline is on 4/15
– Project Topics:
• Will be posted on the web, pick one and send your topic to尚純 or 紹楷
• Alternatively, send a proposal for selecting your own topic.
• The proposal should include reference information of the topic and the
scope of the project.
• Teaching Assisstants:
吳尚純 [email protected]
李紹楷 [email protected]
3
Simple API for XML (SAX)
Is SAX too hard for mortal programmers?
And is the domination of DOM a bad thing?
4
• Introduction
• XML Parsing Operations
• The SAX API
• How SAX Processing Works
• SAX-based parsers
• Events
• An SAX Example: Step by Step
• Example (SAX1.0): Tree Diagram
• SAX 2.0
• Example: Printing the notes in an XML document
• Summary
5
Introduction
• Processing XML
– Create a Parser object  Point the object to an XML doc.  Process
• Basic Operations for processing an XML document
– A basic XML processing architecture
– 3 key layers: XML documents, The application, infrastructure for working
with XML doc.
Character Stream
XML
Serializer
Document(s)
Parser
Standardized
XML APIs
Application
6
Introduction (cont.)
• Basic Operations (cont.)
– Parsing is the first step that enables an application to work with an XML
doc.
– Parsing process breaks up the text of an XML document into small
identifiable pieces (nodes)
– Parser will break documents into pieces, recognized as start-end tags,
attribute value pairs, chunks of text content, processing instructions,
comments, and so on.
– These pieces are fed into application through well-defined APIs
implementing a particular parsing model
– Four parsing models are commonly in use:
7
Introduction (cont.)
•
Basic Operations (cont.)
–
Four parsing models are commonly in use:
1. Pull Parsing
 The application always ask the parser to give it the next piece of information
 It is as if the app. has to “pull” the info. out of the parser, activate the
communication by the app.
 The XML community has not yet defined standard APIs for the “pull parsing”
 It could happen soon because of its popularity!
2. Push Parsing
 The parser sends notifications to the application during the parsing process
 The notifications are sent in “reading” order (i.e., their appearance order in
the document)
8
Introduction (cont.)
•
Basic Operations (cont.)
2. Push Parsing
 Notifications are typically implemented as event callbacks in the application
 Known as event-based parsing
 Simple API for XML (SAX) is the standard for push parsing
3. One-step Parsing
 The parser reads the whole XML doc. and generates a data structure (a parse
tree) describing its entire contents (elements, attributes,… etc.)
 W3C Standard : XML DOM (Document Object Model): specifies the types
of objects that will be included in the parse tree, their properties, and
operations
 The DOM is a language- and platform-independent API.
 The biggest problem is memory overhead and computational efficiency
9
Introduction (cont.)
•
Basic Operations (cont.)
4. Hybrid Parsing
 Combine the characteristics of the other two parsing models to create
efficient parsers for special scenarios
 Lets break the concept of loading and parsing to analyse the condition
–
Loading the document: one-step parsing
–
Parsing the rest of the document: providing partial information
extracted from the document for the application
 For example, Push + one-step parsing
–
The application thinks it is working with a one-step parser; in reality,
the parsing process has just begun
–
As the application keep accessing more objects on the DOM tree, the
parsing continues incrementally
–
Just enough of the XML document is parsed at any given point to give
the application the objects it wants to see
10
An example of hybrid parsing
• In Sun's reference implementation, the
DOM API builds on the SAX API as
shown in the diagram,
• Sun's implementation of the Document
Object Model (DOM) API uses the SAX
libraries to read in XML data and
construct the tree of data objects that
constitutes the DOM.
• Sun's implementation also provides a
framework to help output the object tree
as XML data
11
Introduction (cont.)
• Why define many models?
– Trade-offs between memory efficiency, computational efficiency, and ease of
programming
– A table is presented to compare the trade-offs of the models
Model
Control of
Parsing
Control of
Context
Memory
Efficiency
Computation
al efficiency
Ease of
Programming
Pull
Application
Application
High
Highest
Low
Push (SAX)
Parser
Application
High
High
Low
One-step
(DOM)
Parser
Parser
Lowest
Lowest
High
One-step
(JDOM)
Parser
Parser
Low
Low
Highest
Hybrid (DOM)
Parser
Parser
Medium
Medium
High
Hybrid (JDOM) Parser
Parser
Medium
Medium
Highest
Introduction (cont.)
12
• How to choose between SAX and DOM: Whether you choose DOM or SAX is
going to depend on several factors:
– Purpose of the application:
• To make changes to the data and output it as XML, then in most cases, DOM is the way to go.
• SAX is much more complex to program, as you'd have to make changes to a copy of the data
rather than to the data itself.
– Amount of data: For large files, SAX is a better bet.
– How the data will be used: If only a small amount of the data will actually be used,
you may be better off using SAX to extract it into your application.
– On the other hand, if you know that you will need to refer back to large amounts of
information that has already been processed, SAX is probably not the right choice.
– The need for speed: SAX implementations are normally faster than DOM
implementations.
• It's important to remember that SAX and DOM are not mutually exclusive.
• Use DOM to create a stream of SAX events,
• Use SAX to create a DOM tree.
• In fact, most parsers used to create DOM trees are actually using SAX to do it!
13
The SAX APIs
• SAX (The Simple API for XML )
– SAX is the Simple API for XML, originally a Java-only API.
– SAX was the first widely adopted API for XML in Java, and
is a “de facto” standard.
– The current version is SAX 2.0.x, and there are versions for
several programming language environments other than Java
– Another method for accessing XML document’s contents
– Developed by XML-DEV mailing-list members
– Uses event-based model
• Notifications (events) are raised as document is parsed
14
The SAX APIs (cont.)
•
SAX Parsing architecture: using
the common abstract factory
design pattern
1.
Create an instance of
SAXParserFactory (used to create
an instance of SAX Parser)
2.
SAXReader: event trigger, when
the parse() method is invoked, the
reader starts firing events to the
application by invoking registered
callbacks
3.
Those methods are defined by the
interfaces ContentHandler,
ErrorHandler, DTDHandler, and
EntityResolver.
15
The SAX APIs (cont.)
• Here is a summary of the key objects in SAX APIs:
• SAXParserFactory
Creates an instance of the parser determined by the system property,
javax.xml.parsers.SAXParserFactory
• SAXParser
Defines several kinds of parse() methods. In general, you pass an XML data
source and a DefaultHandler object to the parser, which processes the XML and
invokes the appropriate methods in the handler object.
• SAXReader
Carries on the conversation with the SAX event handlers you define
16
The SAX APIs (cont.)
• DefaultHandler
Implements the ContentHandler, ErrorHandler, DTDHandler, and
EntityResolver interfaces (with null methods), so you can override only the
ones you're interested in.
• ContentHandler
Defines methods, which are invoked when the parser encounters the text in an
XML element or an inline processing instruction, respectively.
• ErrorHandler
Methods in response to various parsing errors.
• DTDHandler
Defines methods you will generally never be called upon to use. Used when
processing a DTD to recognize and act on declarations for an unparsed entity.
17
The SAX APIs (cont.)
• Being event-based means that the parser reads an XML
document from beginning to end,
• Each time it recognizes a syntax construction, it notifies the
application that is running it
• The SAX parser notifies the application by calling methods
from the ContentHandler interface.
• For example, when the parser comes to a less than symbol
("<"), it calls the startElement method;
18
The SAX API (cont.)
• when it comes to character data, it calls the characters
method;
• when it comes to the less than symbol followed by a
slash ("</"), it calls the endElement method
• To illustrate, let's look at an example XML document
and walk through what the parser does for each line.
19
How SAX Processing Works
• SAX analyzes an XML stream as
it goes by, much like an old
ticker tape.
• Consider the following XML
code snippet:
• A SAX processor analyzing this
code snippet would generate, in
general, the following events:
<?xml version="1.0"?>
<samples>
<server>UNIX</server>
<monitor>color</monitor>
</samples>
Start document
Start element (samples)
Characters (white space)
Start element (server)
Characters (UNIX)
End element (server)
Characters (white space)
Start element (monitor)
Characters (color)
End element (monitor)
Characters (white space)
End element (samples)
20
How SAX Processing Works (cont.)
•
The SAX API allows a developer to capture these
events and act on them
–
•
What does “the developer” represent for?
SAX processing involves the following steps:
1.
2.
3.
4.
Create an event handler.
Create the SAX parser.
Assign the event handler to the parser.
Parse the document, sending each event to the handler.
21
How SAX Processing Works (cont.)
• The pros and cons of event-based processing
– The advantages of this kind of processing are much like the
advantages of streaming media. (like interpreter?)
– Analysis can get started immediately, rather than waiting for all of
the data to be processed.
– The application is simply examining the data as it goes by, it
doesn't need to store it in memory:
– A huge advantage when it comes to large documents.
22
How SAX Processing Works (cont.)
• The pros and cons of event-based processing
– In fact, an application doesn't even have to parse the entire
document;
– Stop when certain criteria have been satisfied.
– In general, SAX is also much faster than the alternative, the DOM.
– On the other hand, because the application is not storing the data
in any way,
– it is impossible to make changes to it using SAX, or to move
backwards in the data stream.
23
SAX-based Parsers
• SAX-based parsers
– Use Sun Microsystem’s JAXP in Textbook
• Tools
– A text editor: XML files are simply text. To create and read them, a text editor is
all you need.
– JavaTM 2 SDK, Standard Edition version 1.4.x: SAX support has been built
into the latest version of Java (available at
http://java.sun.com/j2se/1.4.2/download.html), won't need to install any separate
classes. Using an earlier version of Java, such as Java 1.3.x, you'll also need
• an XML parser such as the Apache project's Xerces-Java (available at
http://xml.apache.org/xerces2-j/index.html),
• or Sun's Java API for XML Parsing (JAXP), part of the Java Web Services Developer
Pack (available at http://java.sun.com/webservices/downloads/webservicespack.html).
• You can also download the official version from SourceForge (available at
http://sourceforge.net/project/showfiles.php?group_id=29449).
– Other Languages: Should you wish to adapt the examples, SAX implementations
are also available in other programming languages.
– You can find information on C, C++, Visual Basic, Perl, and Python
implementations of a SAX parser at http://www.saxproject.org/?selected=langs.
24
Some SAX-based parsers.
Product
Description
JAXP
Sun’s JAXP is available from java.sun.com/xml.
JAXP supports both SAX and DOM.
Xerces
Apache’s Xerces parser is available at
www.apache.org. Xerces supports both SAX and
DOM.
MSXML 3.0
Microsoft’s msxml parser available at
msdn.microsoft.com/xml. This parser supports
both SAX and DOM.
25
Setup
• Java applications to illustrate SAX API
– Java 2 Standard Edition required
• Download at www.java.sun.com/j2se
• Installation instructions
– www.deitel.com/faq/java3install.htm
– JAXP required
• Download at java.sun.com/xml/download.html
26
Events
• SAX parser
– Invokes certain methods (Fig.
9.2) when events occur
– Programmers override these
methods to process data
Fig. 9.2
Methods invoked by the SAX
parser
Method Name
Description
setDocumentLocator
Invoked at the beginning of parsing.
startDocument
Invoked when the parser encounters the start of
an XML document.
endDocument
Invoked when the parser encounters the end of an
XML document.
startElement
Invoked when the start tag of an element is
encountered.
endElement
Invoked when the end tag of an element is
encountered.
characters
Invoked when text characters are encountered.
ignorableWhitespace
Invoked when whitespace that can be safely
ignored is encountered.
processingInstruction
Invoked when a processing instruction is
encountered.
27
28
The SAX API – an Example
<priceList>
[parser calls startElement]
<coffee>
[parser calls startElement]
<name>Mocha Java</name>
[parser calls startElement, characters, and endElement]
<price>11.95</price>
[parser calls startElement, characters, and endElement]
</coffee>
[parser calls endElement]
<priceList>
[parser calls endElement]
• The default implementations of the methods that the parser calls do nothing
• You need to write a subclass implementing the appropriate methods to get
the functionality you want
• For example, suppose you want to get the price per pound for Mocha Java.
• You would write a class extending DefaultHandler (the default
implementation of ContentHandler) in which you write your own
implementations of the methods startElement and characters
29
The SAX API – an Example (cont.)
• You code has three tasks.
– Scan the command line for the name (or URI) of an XML file.
– Create a parser object.
– Tell the parser object to parse the XML file named on the command line, and
tell it to send your code all of the SAX events it generates.
• Step I: Scan the command line
– For an argument. If there isn't an argument, you print an error message and exit.
– Otherwise, assume that the first argument is the name or URI of an XML file
public static void main(String argv[]) {
if (argv.length == 0 || (argv.length == 1 && argv[0].equals("-help"))) {
// Print an error message and exit...
}
PrintOutline s1 = new PrintOutline();
s1.parseURI(argv[0]);
}
30
The SAX API – an Example (cont.)
• Step II: Create a parser object
– To create a parser object, use JAXP's SAXParserFactory API to create a
SAXParser
public void parseURI(String uri) {
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
...
31
The SAX API – an Example (cont.)
• Step 3: Parse the file and handle any events
– We've created our parser object, we need to have it parse the file. That's
done with the parse() method
– Notice that the parse() method takes two arguments. The first is the URI
of the XML document, while the second is an object that implements
the SAX event handlers
public void parseURI(String uri) {
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
sp.parse(uri, this);
} catch (Exception e) {
System.err.println(e);
}
}
32
The SAX API – an Example (cont.)
– In the case of PrintOutline, you're extending the SAX DefaultHandler
interface:
– DefaultHandler has an implementation of a number of event handlers.
These implementations do nothing, which means all your code has to do is
implement handlers for the events you care about.
– Note: The exception handling above is sloppy; as an exercise for the reader,
feel free to handle specific exceptions, such as SAXException or
java.io.IOException.
– A major benefit of the DefaultHandler interface is that it shields you from
having to implement all of the event handlers.
– DefaultHandler implements all of the event handlers; you just implement
the ones you care about.
public class PrintOutline extends DefaultHandler{
…….
}
33
The SAX API – an Example (cont.)
• Step IV: Implementing event handlers
– startdocument() event handlers
– Simply writing out a basic XML declaration, regardless of whether
one was in the original XML document or not.
– Currently the base SAX API doesn't return the details of the XML
declaration
public void startDocument() {
System.out.println("<?xml version=\"1.0\"?>");
}
The SAX API – an Example (cont.)
• Next, here's what you do for startElement():
– Print the name of the elements and attributes
– Namespace URI in braces before the element's local name
– rawName contains the raw XML 1.0 name if a namespace URI doesn't
have
public void startElement(String namespaceURI, String localName, String rawName, Attributes attrs)
{
System.out.print("<");
System.out.print(rawName);
if (attrs != null) {
int len = attrs.getLength();
for (int i = 0; i < len; i++) {
System.out.print(" ");
System.out.print(attrs.getQName(i));
System.out.print("=\"");
System.out.print(attrs.getValue(i));
System.out.print("\"");
}
}
System.out.print(">");
}
34
35
The SAX API – an Example (cont.)
• More event handling
– characters() : printing the XML document to the console, you're simply
printing the portion of the character array that relates to this event
public void characters(char ch[ ], int start, int length)
{
System.out.print(new String(ch, start, length));
}
– endElement() : simply write out the end tag
– endDocument() : Do nothing just for the completeness.
public void endElement(String namespaceURI, String localName, String rawName) {
System.out.print("</");
System.out.print(rawName);
System.out.print(">");
}
public void endDocument() {
System.out.println("End of Document");
}
The SAX API – an Example (cont.)
• Step V: Error handling:
– SAX defines the ErrorHandler interface;
– Implemented by DefaultHandler;
– contains three methods: warning, error, and fatalError (defined by the XML
specification )
• warning(): Issued in response to a warning
• error(): Issued in response to an error condition.
• fatalError(): Issued in response to a fatal error
public void warning(SAXParseException ex) {
System.err.println("[Warning] "+ getLocationString(ex)+": "+ ex.getMessage());
}
public void error(SAXParseException ex) {
System.err.println("[Error] "+ getLocationString(ex)+": "+ ex.getMessage());
}
public void fatalError(SAXParseException ex) throws SAXException {
System.err.println("[Fatal Error] "+ getLocationString(ex)+": "+ ex.getMessage());
throw ex;
}
36
37
Example: Tree Diagram
• Java application
– Parse XML document with SAX-based parser
– Output document data as tree diagram
– extends org.xml.sax.HandlerBase
• implements interface EntityResolver
– Handles external entities
• implements interface DTDHandler
– Handles notations and unparsed entities
• implements interface DocumentHandler
– Handles parsing events
• implements interface ErrorHandler
– Handles errors
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Fig. 9.3 : Tree.java
// Using the SAX Parser to generate a tree diagram.
import
import
import
import
import
java.io.*;
org.xml.sax.*; // for HandlerBase class
javax.xml.parsers.SAXParserFactory;
javax.xml.parsers.ParserConfigurationException;
javax.xml.parsers.SAXParser;
public class Tree extends HandlerBase {
private int indent = 0; // indentation counter
// returns the spaces needed for indenting
private String spacer( int count )
{
String temp = "";
for ( int i = 0; i < count; i++ )
temp += " ";
Outline
38
Fig. 9.3 Application
to create a tree
diagram for an XML
document.
import specifies location of
import
location
classes
needed specifies
by application
of classes needed by
application
Assists in formatting
Assists in formatting
Override method to output
parsed document’s URL
return temp;
}
// method called before parsing
Override method to output
// it provides the document location
parsed document’s URL
public void setDocumentLocator( Locator loc )
{
System.out.println( "URL: " + loc.getSystemId() );
}
31
// method called at the beginning of a document
32
public void startDocument() throws SAXException
33
{
34
35
System.out.println( "[ document root ]" );
}
36
37
// method called at the end of the document
38
public void endDocument() throws SAXException
39
{
40
41
System.out.println( "[ document end ]" );
}
Overridden method
called
Outline
when root node encountered
39
Fig. 9.3 Application
to create a tree
diagram for an XML
(Part
2)
Overriddendocument.
method called
when
end of document is encountered
Overridden method called
when root node
encountered
42
43
// method called at the start tag of an element
44
public void startElement( String name,
45
46
AttributeList attributes ) throws SAXException
Overridden
method called
Overridden
method called
of document is
when startwhen
tag isend
encountered
encountered
{
47
System.out.println( spacer( indent++ ) +
48
"+-[ element : " + name + " ]");
49
50
if ( attributes != null )
51
52
for ( int i = 0; i < attributes.getLength(); i++ )
53
System.out.println( spacer( indent ) +
54
"+-[ attribute : " + attributes.getName( i ) +
55
" ] \"" + attributes.getValue( i ) + "\"" );
56
57
}
Overridden method called
when start tag is
encountered
Output each attribute’s
name
andeach
value
(if any)
Output
attribute’s
name and value (if any)
Outline
58
// method called at the end tag of an element
59
public void endElement( String name ) throws SAXException
60
{
61
62
indent--;
}
63
Overridden method called when
end of element is encountered
64
// method called when a processing instruction is found
65
public void processingInstruction( String target,
66
67
String value ) throws SAXException
{
68
System.out.println( spacer( indent ) +
69
70
}
71
// method called when characters are found
73
public void characters( char buffer[], int offset,
74
75
int length ) throws SAXException
{
76
String temp = new String( buffer, offset, length );
78
79
System.out.println( spacer( indent ) +
80
"+-[ text ] \"" + temp + "\"" );
81
82
83
}
}
Overridden method called
when processing
instruction is encountered
Overridden
method
Overridden method
called
whencalled
when
data is
character data
is character
encountered
encountered
if ( length > 0 ) {
77
Fig. 9.3 Application
to create a tree
diagram for an XML
document. (Part 3)
OverriddenOverridden
method called
method
when
called
processing instruction
when endisofencountered
element is
encountered
"+-[ proc-inst : " + target + " ] \"" + value + "\"" );
72
40
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
Outline
// method called when ignorable whitespace is found
public void ignorableWhitespace( char buffer[],
Overridden method called when
int offset, int length )
ignorable whitespaceFig.
is encountered
9.3 Application
{
to create a tree
if ( length > 0 ) {
System.out.println( spacer( indent ) + "+-[ ignorable ]" ); diagram for an XML
}
41
document. (Part 4)
}
// method called on a non-fatal (validation)
public void error( SAXParseException spe )
throws SAXParseException
{
// treat non-fatal errors as fatal errors
throw spe;
}
Overridden method called
error
when ignorable whitespace
encountered
Overridden method is
called
when
error (usually validation) occurs
Overridden method called
when error (usually
validation) occurs
Overridden method called
// method called on a parsing warning
public void warning( SAXParseException spe )
is detected
Overridden methodwhen
calledproblem
when problem
throws SAXParseException
(but
not considered
error)
is
detected
(but
not
considered
error)
{
System.err.println( "Warning: " + spe.getMessage() );
Method main starts
}
application
// main method
public static void main( String args[] )
{
boolean validate = false;
Method main starts application
113
Outline
if ( args.length != 2 ) {
114
System.err.println( "Usage: java Tree [validate] " +
115
System.err.println( "Options:" );
117
System.err.println( "
118
validate [yes|no] : " +
"DTD validation" );
119
System.exit( 1 );
}
121
122
123
if ( args[ 0 ].equals( "yes" ) )
126
129
SAXParserFactory
can instantiate SAX-based
parser
SAXParserFactory can
instantiate SAX-based parser
SAXParserFactory saxFactory =
SAXParserFactory.newInstance();
127
128
Allow command-line
Allow command-line
arguments
(if we want
arguments
to validate
(ifdocument)
we want to
validate document)
validate = true;
124
125
Fig. 9.3 Application
to create a tree
diagram for an XML
document. (Part 5)
"[filename]\n" );
116
120
42
saxFactory.setValidating( validate );
130
Outline
try {
131
SAXParser saxParser = saxFactory.newSAXParser();
132
saxParser.parse( new File( args[ 1 ] ), new Tree() );
133
}
134
catch ( SAXParseException spe ) {
135
System.err.println( "Parse Error: " + spe.getMessage() );
Instantiate SAX-based
parser
136
}
137
catch ( SAXException se ) {
138
se.printStackTrace();
139
}
140
catch ( ParserConfigurationException pce ) {
141
pce.printStackTrace();
142
}
143
catch ( IOException ioe ) {
144
ioe.printStackTrace();
145
}
146
147
148
149 }
Fig. 9.3 Application
to create a tree
diagram parser
for an XML
Instantiate SAX-based
document. (Part 6)
System.exit( 0 );
}
Handles errors (if any)
Handles errors (if any)
43
1
<?xml version = "1.0"?>
XML document does not reference DTD
Outline
44
2
3
<!-- Fig. 9.4 : spacing1.xml
4
<!-- Whitespaces in nonvalidating
5
<!-- XML document without DTD
-->
XML document
with elements
test, -->
example and object
parsing
-->
6
7
8
9
<test name = "
spacing 1
">
Fig. 9.4 XML
document
spacing1.xml.
document
does not
Root element testXML
contains
attribute
DTD
name with value “ reference
spacing
1 ”
<example><object>World</object></example>
</test>
URL: file:C:/Tree/spacing1.xml
[ document root ]
+-[ element : test ]
+-[ attribute : name ] " spacing 1 "
+-[ text ] "
"
Note that whitespace is preserved:
+-[ text ] "
"
+-[ element : example ]
attribute value (line 7), line feed
+-[ element : object ]
(end of line 7), indentation (line 8)
+-[ text ] "World"
and line feed (end of line 8)
+-[ text ] "
"
[ document end ]
XML document with
elements test, example
and object
Root element test
contains attribute name
with value
“ spacing 1 ”
Note that whitespace is
preserved: attribute value
(line 7), line feed (end of
line 7), indentation (line 8)
and line feed (end of line 8)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Outline
<?xml version = "1.0"?>
45
<!-- Fig. 9.5 : spacing2.xml
-->
<!-- Whitespace and nonvalidated parsing -->
<!-- XML document with DTD
-->
<!DOCTYPE
<!ELEMENT
<!ATTLIST
<!ELEMENT
<!ELEMENT
]>
Fig. 9.5 XML
document
DTD checks document’s characters, so any spacing2.xml.
“removable” whitespace is ignorable
test [
test (example)>
test name CDATA #IMPLIED>
element (object*)>
object (#PCDATA)>
<test name = " spacing 2 ">
<example><object>World</object></example>
</test>
URL: file:C:/Tree/spacing2.xml
[ document root ]
+-[ element : test ]
+-[ attribute : name ] " spacing 2 "
+-[ ignorable ]
Line feed at line 14, spaces at
+-[ ignorable ]
+-[ element : example ]
beginning of line 15 and line
+-[ element : object ]
feed at line 15 are ignored
+-[ text ] "World"
+-[ ignorable ]
[ document end ]
DTD checks document’s
characters, so any
“removable” whitespace is
ignorable
Line feed at line 14, spaces
at beginning of line 15 and
line feed at line 15 are
ignored
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<?xml version = "1.0"?>
Invalid document because element
contain element item
<!-- Fig. 9.6 : notvalid.xml
-->
example
cannot
<!-- Validation and non-validation -->
<!DOCTYPE test [
<!ELEMENT test (example)>
<!ELEMENT example (#PCDATA)>
]>
<test>
<?test message?>
<example><item><![CDATA[Hello & Welcome!]]></item></example>
</test>
URL: file:C:/Tree/notvalid.xml
[ document root ]
+-[ element : test ]
+-[ ignorable ]
+-[ ignorable ]
+-[ proc-inst : test ] "message"
+-[ ignorable ]
+-[ ignorable ]
+-[ element : example ]
+-[ element : item ]
+-[ text ] "Hello & Welcome!"
+-[ ignorable ]
[ document end ]
Outline
46
Fig. 9.6 Well-formed
XML document.
Invalid document because
element example cannot
contain element item
Validation disabled, so
document parses
successfully
Validation disabled, so
document parses successfully
Parser does not process
text in CDATA section and
returns character data
Parser does not process text in CDATA
section and returns character data
URL: file:C:/Tree/notvalid.xml
Validation
[ document root ]
+-[ element : test ]
+-[ ignorable ]
+-[ ignorable ]
+-[ proc-inst : test ] "message"
+-[ ignorable ]
+-[ ignorable ]
+-[ element : example ]
Parse Error: Element "example" does not allow "item"
enabled
Outline
47
Fig. 9.6 Well-formed
XML document.
(Part 2)
Validation enabled
Parsing terminates when
fatal error occurs at
Parsing terminates when
fatal item
element
error occurs at element item
1
2
3
4
5
6
7
8
Outline
<?xml version = "1.0"?>
<!-- Fig. 9.7 : valid.xml
<!-- DTD-less document
-->
-->
Fig. 9.7 Checking an
XML document without
a DTD for validity.
<test>
<example>Hello &amp; Welcome!</example>
</test>
URL: file:C:/Tree/valid.xml
[ document root ]
+-[ element : test ]
+-[ text ] "
"
+-[ text ] "
"
+-[ element : example ]
+-[ text ] "Hello "
+-[ text ] "&"
+-[ text ] " Welcome!"
+-[ text ] "
"
[ document end ]
48
Validation disabled in first
Validation disabled in first
output,
output,
so document parses
so document parses successfully
successfully
Validation enabled in
second output, and parsing
fails because DTD does not
exist
Validation enabled in second output, and
parsing fails because DTD does not exist
URL: file:C:/Tree/valid.xml
[ document root ]
Warning: Valid documents must have a <!DOCTYPE declaration.
Parse Error: Element type "test" is not declared.
49
Example: Tree Diagram (Summary)
•
SAX 1.0 supported!
•
When compiling, the message,
“Tree.java uses or overrides a deprecated API”
“Recompile with –deprecation for details”
•
After compiling, 3 warning (class has been deprecated) were
issued:
1. HandlerBase should be replaced by DefaultHandler
2. & 3. AttributeList should be replaced by Attributes

Better replace SAX1.0 with SAX2.0

Problem with Xerces vs. JAXP
50
SAX 2.0
• SAX 2.0
– Recently released
– We have been using JAXP
– Xerces parser (Apache) supports SAX 2.0
51
SAX 2.0 (cont.)
• SAX 2.0 major changes
– Class HandlerBase replaced with DefaultHandler
– AttributeList replaced with Attributes
– Element and attribute processing support namespaces
– Loading and parsing processes has changed
• Alternative methods can be applied
– Methods for retrieving and setting parser properties
• e.g., whether parser performs validation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Fig. 9.10 : printXML.java
// Using the SAX Parser to indent an XML document.
import
import
import
import
import
import
java.io.*;
org.xml.sax.*;
org.xml.sax.helpers.*;
javax.xml.parsers.SAXParserFactory;
javax.xml.parsers.ParserConfigurationException;
javax.xml.parsers.SAXParser;
public class PrintXML extends DefaultHandler {
private int indent = 0; // indention counter
// returns the spaces needed for indenting
private String spacer( int count )
{
String temp = "";
Outline
Fig. 9.10 Java
application that
indents an XML
document.
Replace class
ReplaceHandlerBase
class HandlerBase
with class
with class
DefaultHandler
DefaultHandler
Provides same service as
that of SAX 1.0
for ( int i = 0; i < count; i++ )
temp += " ";
return temp;
}
// method called at the beginning of a document
public void startDocument() throws SAXException
{
System.out.println( "<?xml version = \"1.0\"?>" );
}
52
Provides same service
as that of SAX 1.0
31
32
33
34
35
36
37
38
39
40
41
// method called at the end of the document
public void endDocument() throws SAXException
{
System.out.println( "---[ document end ]---" );
}
// method called at the start tag of an element
public void startElement( String uri, String eleName,
String raw, Attributes attributes ) throws SAXException
{
System.out.println( ">" );
indent += 3;
}
// method called at the end tag of an element
public void endElement( String uri, String eleName,
String raw ) throws SAXException
{
indent -= 3;
System.out.println( spacer(indent) + "</" + raw +
59
60
61
53
Fig. 9.10 Java
application that
indents an XML
document. (Part 2)
Provides same service as
Method startElement
that of SAX 1.0
System.out.print( spacer( indent ) + "<" + raw );
now has four arguments
(namespace
URI, element
Method startElement
if ( attributes != null )
name,now
qualified
element
name
has four
arguments
and element attributes)
for ( int i = 0; i < attributes.getLength(); i++ )
(namespace URI, element
System.out.print( " "+ attributes.getLocalName( i )
name, qualified element
" = " + "\"" +
Attributes
areelement
now stored in
name and
attributes.getValue( i ) + "\"" );
Attributes
object
attributes)
42
43
44
45
46
47
+
48
49
50
51
52
53
54
55
56
57
58
Provides same Outline
service
as that of SAX 1.0
}
Attributes are now stored
inendElement
Attributesnow
object
Method
has
three arguments (namespace
Method
endElement
URI,
element
name and
now
has
three
qualified elementarguments
name)
(namespace URI, element
name and qualified
">");
element name)
Outline
62
// method called when characters are found
63
public void characters( char buffer[], int offset,
64
65
int length ) throws SAXException
{
66
if ( length > 0 ) {
67
String temp = new String( buffer, offset,
54
Provides same service
as that of SAX
Fig.1.0
9.10 Java
application that
indents an XML
document. (Part 3)
length );
68
69
if ( !temp.trim().equals( "" ) )
70
System.out.println( spacer(indent) + temp.trim() );
71
72
}
Provides same service as
that of SAX 1.0
}
73
74
// method called when a processing instruction is found
75
public void processingInstruction( String target,
76
77
String value ) throws SAXException
{
78
System.out.println( spacer( indent ) +
79
80
"<?" + target + "
" + value + "?>");
}
81
82
// main method
83
public static void main( String args[] )
84
{
85
Provides same service as
that of SAX 1.0
Provides same service
as that of SAX 1.0
86
Outline
try {
87
XMLReader saxParser = ( XMLReader ) Class.forName(
88
"org.apache.xerces.parsers.SAXParser" ).newInstance();
89
90
saxParser.setContentHandler( new PrintXML() );
91
FileReader reader = new FileReader( args[ 0 ] );
92
saxParser.parse( new InputSource( reader ) );
93
}
94
catch ( SAXParseException spe ) {
95
System.err.println( "Parse Error: " +
96
}
97
catch ( SAXException se ) {
98
se.printStackTrace();
99
}
100
catch ( Exception e ) {
101
e.printStackTrace();
102
}
103
104
105
106 }
55
System.exit( 0 );
}
Fig. 9.10 Java
application that
indents an XML
document. (Part 4)
CreateCreate
XercesXerces
SAX-based
parser
SAX-based
parser
SAX-based parser parses
InputSource
SAX-based parser parses
spe.getMessage() ); InputSource
Lines: 86-92 replace with the following codes:
XMLReader xmlReader = null;
try
{
SAXParserFactory spfactory = SAXParserFactory.newInstance();
SAXParser saxParser = spfactory.newSAXParser();
xmlReader = saxParser.getXMLReader();
xmlReader.setContentHandler( new PrintXML() );
xmlReader.setErrorHandler(new PrintXML());
FileReader reader = new FileReader( argv[0] );
xmlReader.parse( new InputSource( reader ) );
}
1
2
3
<?xml version = "1.0"?>
<!-- Fig. 9.11 : test.xml -->
4
5 <?xml:stylesheet type = "text/xsl" href = "something.xsl"?>
6
7 <test>
8
<example value = "100">Hello and Welcome!</example>
9
10
<a>
Outline
Fig. 9.11 Sample
execution of
printXML.java
Processing instruction that
Processing
instruction that
links
to stylesheet
links to stylesheet
11
<b>12345</b>
12
</a>
13 </test>
<?xml version = "1.0"?>
<?xml:stylesheet type = "text/xsl" href = "something.xsl"?>
<test>
<example value = "100">
Hello and Welcome!
</example>
<a>
<b>
12345
</b>
</a>
</test>
---[ document end ]---
56
Output
57
Summary
• SAX is a faster,
• More lightweight way to read and manipulate XML data than
the Document Object Model (DOM).
• SAX is an event-based processor that allows you to deal with
elements, attributes, and other data as it shows up in the
original document. (streaming evenets)
• Because of this architecture, SAX is a read-only system,
• But that doesn't prevent you from using the data. Make a copy
and process it!
58
Summary (cont.)
•
Resources
–
Basic grounding in XML read through the "Introduction to XML" tutorial (developerWorks,
August 2002).
See the official SAX 2.0 page (http://www.saxproject.org).
–
Learn to use a SAX filter to manipulate data (developerWorks, October 2001).
–
Read about using SAX filters for flexible processing (developerWorks, March 2003).
–
Find out how to build SAX-like apps in PHP (developerWorks, March 2003).
–
Learn how to set up a SAX parser (developerWorks, July 2003).
–
Learn more about validation and the SAX ErrorHandler interface (developerWorks, June 2001).
–
Understand how to stop a SAX parser when you have enough data (developerWorks, June 2002).
–
Explore XSL transformations to and from a SAX stream (developerWorks, July 2002).
–
Turn a SAX stream into a DOM or JDOM object with "Converting from SAX" (developerWorks,
April 2001).
–
Download the Java 2 SDK, Standard Edition version 1.4.2
(http://java.sun.com/j2se/1.4.2/download.html).
–
SAX was developed by the members of the XML-DEV mailing list. Try the Java version, now a
SourceForge project (http://sourceforge.net/project/showfiles.php?group_id=29449).
–
Try SAX implementations: available in other programming languages
–
Get IBM's XML-related tools such as the DB2 XML Extender, which provides a bridge between
XML and relational systems. Visit the DB2 Developer Domain to learn more about DB2.
–
Find out how you can become an IBM Certified Developer in XML and related technologies
59
That’s it for today!
Have a nice and lovely spring holiday!
• Do not forget to check the web site for important
message regarding the demo date of your personal
project.
getLocationString()
•
•
•
•
The private method gives more details about the error.
The SAXParseException class defines methods such as getLineNumber() and
getColumnNumber() to provide the line and column number where the error
occurred.
getLocationString merely formats this information into a useful string
Putting this code into a separate method means you don't have to include this
code in every error handler
private String getLocationString(SAXParseException ex)
{
StringBuffer str = new StringBuffer();
String systemId = ex.getSystemId();
if (systemId != null){
int index = systemId.lastIndexOf('/');
if (index != -1)
systemId = systemId.substring(index + 1);
str.append(systemId);
}
str.append(':');
str.append(ex.getLineNumber());
str.append(':');
str.append(ex.getColumnNumber());
return str.toString();
}
60
61
Processing Instruction
• Processing Instructions
• An XML file can also contain processing
instructions that give commands or information to
an application that is processing the XML data.
• Processing instructions have the following format:
<?target instructions?>
62
• At the most basic level:
– An application can directly output XML markup
– In the figure, this is indicated by the application working with a
character stream
– Simple? Not really, must handle all the basic syntax rules (start-end tag,
attribute quoting, …. etc.) – a good topic for final project!
• Parsing and serialization:
– Parsing the XML document first,
– Constructing a data structure describing the XML document
– Utilizing the process of emitting XML markup from a data structure
– Utilizing the API for the processing methods