Transcript Chapter 8

An Introduction to SAX
Introduction to SAX:
a standard interface for eventbased XML parsing
Cheng-Chia Chen
Transparency No. 1
An Introduction to SAX
What is SAX ?
SAX : Simple API for XML
 Started as community-driven project
xml-dev mailing list
 Originally designed as Java API
Others (C++, Python, Perl) are now
supported
SAX2
 Namespaces
 configurable features and properties
Transparency No. 2
An Introduction to SAX
SAX Features
 Event-driven
 You provide various event handlers
 Fast and lightweight
 Document does not have to be entirely in memory
 Sequential read access only
 Does not support modification of document
Transparency No. 3
An Introduction to SAX
SAX Processing Model
Transparency No. 4
An Introduction to SAX
What is an Event-Based Interface?
Two major types of XML APIs:
 Tree-based APIs ==> DOM
 compiles an XML document into an internal tree structure,
then allows an application to navigate that tree.
 Event-based APIs. ==> SAX
 reports parsing events (such as the start and end of
elements) directly to the application through callbacks,
 usually does not build an internal tree.
 The application implements handlers to deal with the
different events, much like handling events in a graphical
user interface.
 Comparison: For tree-based APIs
 useful for many applications
 require more system resources, especially if the document
is large.
Transparency No. 5
An Introduction to SAX
How an event-based API works
 Sample document:





<?xml version="1.0“ ?>
<doc>
<para>Hello, world!</para>
</doc>
An event-based interface will break down the structure
of this document into a sequence of SAX events:







start document
start element: doc
start element: para
characters: Hello, world!
end element: para
end element: doc
end document
Transparency No. 6
An Introduction to SAX
Quick Start for SAX2 Application Writers
1. Make sure you have the required library:
1. the SAX2 interfaces and classes and
2. XML parsers that supports SAX2. ( both already in jdk 5 or 6)
 Xerces => org.apache.xerces.parsers.SAXParser
or

com.sun.org.apache.xerces.internal.parsers.SAXParser
2. Get the parser via XMLReaderFactory#createXMLReader()
 XMLReader parser = XMLReaderFactory.createXMLReader() ;
3. Create event handlers to receive information about the document.
 The most important one is the ContentHandler, which receives events
for the start and end of elements, character data, processing
instructions, and other basic XML structure.
 can just subclss a builtin adapter class DefaultHandler , and then
implement only the methods that you need.
Transparency No. 7
An Introduction to SAX
Example: (MyHandler.java)
 prints a message each time an element starts or ends:

import org.xml.sax.helpers.DefaultHandler;
 import org.xml.sax.Attributes; import static java.lang.System.out;
 public class MyHandler extends DefaultHandler {
 public void startElement (String uri, String localName, String qName,

Attributes atts)

{

out.println("Start element: " + localName);

}
 public void endElement (String uri, String localName,

String qName)

{

out.println("End element: " + qName);

}
 }
Transparency No. 8
An Introduction to SAX
The main program (SAXApp.java)



import org.xml.sax.XMLReader;
import org.xml.sax.helper.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
 public class SAXApp {
 // static final String parserClass =
 //
“org.apache.xerces.parsers.SAXParser "; // use my own parser!
 public static void main (String args[]) throws Exception
 {
 XMLReader xr = XMLReaderFactory.createXMLReader (/*parserClass*/);

DefaultHandler handler = new MyHandler();

xr.setContentHandler(handler);

for (int i = 0; i < args.length; i++) {

xr.parse(args[i]);

} }
Transparency No. 9
An Introduction to SAX
The input
 the input XML document (roses.xml):







<?xml version="1.0"?>
<poem>
<line>Roses are red,</line>
<line>Violets are blue.</line>
<line>Sugar is sweet,</line>
<line>and I love you.</line>
</poem>
 To parse this with your SAXApp application, you would
supply the absolute URL of the document on the
command line:
java SAXApp file://localhost/tmp/roses.xml or
java SAXApp file:///tmp/roses.xml
Transparency No. 10
An Introduction to SAX
The output
 The output should be as follows:
Start element: poem
Start element: line
End element: line
Start element: line
End element: line
Start element: line
End element: line
Start element: line
End element: line
End element: poem
Transparency No. 11
SAX Driver’s
parser classname
[
]
An Introduction to SAX
supplied by application writer
SAX
Implementation of
Parser
AttrbuteList
Locator
(supplied by
Driver writer)
Transparency No. 12
SAX Driver’s
parser classname
[ XMLReader
]
An Introduction to SAX
supplied by application writer
Content
SAX 2
XMLReader
Implementation of
Parser
Attrbutes
Locator
(supplied by
Driver writer)
Transparency No. 13
An Introduction to SAX
SAX 2.0: Java Road Map
 The SAX Java distribution contains
 17 core classes/interfaces,
 10 helper classes
 2 extension interfaces + 6 extension implementations
 For application writers
 7 interfaces available, but most XML applications will need
only one or two of them.
Transparency No. 14
An Introduction to SAX
SAX classes and interfaces
 Falling into five groups:
1. interfaces implemented by the parser:
 XMLReader, Attributes (required), and Locator (optional)
2.interfaces implemented by the application:
 ContentHandler, ErrorHandler, DTDHandler, and
 EntityResolver
 (all optional: ContentHandler will be the most important one for typical
XML applications)
 XMLFilter : for cascaded applications
 DeclHandler, LexicalHandler: for additional DTD/Lexical events
3.standard SAX classes supplied by SAX2:
 InputSource,
 SAXException,
 SAXParseException,
 SAXNotSupportedException, SAXNotRecognizedException
Transparency No. 15
An Introduction to SAX
SAX classes and interfaces
4. Helper classes in the org.xml.sax.helpers package:
 Default implementations:
 AttributesImpl, LocatorImpl, XMLFilterImpl
 NameSpaceSupport:
 NameSpaceSupport
 Factory Classes:
 XMLReaderFactory
5. Legacy SAX 1.0 classes:
Parser, ParserFactory, HandlerBase, AttributeList,
AttributeListImpl, DocumentHandler.
6. Conversion b/t SAX1.0 and SAX 2.0 Parser/XMLReader
 ParserAdaptor, XMLReaderAdaptor
Transparency No. 16
An Introduction to SAX
Interfaces for Parser Writers (org.xml.sax package)
 A SAX-conformant XML parser needs to implement only
two or three simple interfaces;
1. XMLReader
 the main interface to a SAX parser:
 allow the user to register handlers for callbacks, to set
the locale for error reporting, and to start an XML parse.
2. Attributes
 allow users to iterate through an attribute list.
 a convenience implementation available in the
AttributesImpl.
3. Locator
 allows users to find the location of current event in the
XML source document.
Transparency No. 17
An Introduction to SAX
Interfaces for Application Writers (org.xml.sax package)
 A SAX application may implement any or none of the
following interfaces, as required.
 may need only ContentHandler and possibly ErrorHandler.
 can implement all of these interfaces in a single class.
1. ContentHandler
 receive notification of basic document-related events like
the start and end of elements.
 applications use most often
 in many cases, it is the only one needed.
2. ErrorHandler
 used for special error handling.
Transparency No. 18
Interfaces for Application Writers (cont’d)
An Introduction to SAX
3. DTDHandler
 to receive notification of the NOTATION and unparsed
ENTITY declarations.
4. EntityResolver
 redirection of URIs in documents (or other types of
custom handling).
5. DECLHandler:
 To receive notification of Element and AttributeList
declaration in DTD.
6. LexicalHandler
 To receive notification of markup Boundary Events.
 Comment, CDATASection (begin and end)
 Entity Expansion (begin and end),…
7. XMLFilter:
 For cascading applcations.
Transparency No. 19
An Introduction to SAX
Standard SAX Classes (org.xml.sax package)
1. InputSource
 Input for a parser.
 wrap information for a single input, including a public identifier,
system identifier, byte stream, and character stream (as appropriate).
 may be instantiated by EntityResolvers.
2. SAXException :
 represents a general SAX exception.
 SAXParseException : represents a SAX exception tied to a specific
point in an XML source document.
 SAXNotSupportedException, SAXNotRecognizedException
4. DefaultHandler
 default implementations for ContentHandler, ErrorHandler,
DTDHandler, and EntityResolver.
 users can subclass this to simplify handler writing.
Transparency No. 20
An Introduction to SAX
Helper Classes (org.xml.sax.helpers package)
 provided simply as a convenience for Java programmers.
1. XMLReaderFactory
 used to load SAX parsers dynamically at run time, based
on the class name.
2. AttributesImpl
 default implementation of Attributes.
 can be used to make a copy of an Attributes
3. LocatorImpl
 used to make a persistent snapshot of a Locator's values
at a specific point in the parse.
4. XMLFilterImpl
Transparency No. 21
An Introduction to SAX
SAX2: Features and Properties
 standard methods to query and set features and properties in an
XMLReader.
 Features are boolean properties.
 can request an XMLReader
 to validate (or not to validate) a document, or
 to internalize (or not to internalize) all names,
 Use getFeature, setFeature, getProperty, and setProperty methods
to get/set feature/property of an XMLReader:
 EX: // check if a parser is doing validation!
try{ if( xmlReader.getFeature(
"http://xml.org/sax/features/validation")){
out.println("Parser is validating.");
}else{
out.println("Parser is not validating.");}
}catch(SAXException e){
out.println("Parser may or may not be validating.");
}
Transparency No. 22
An Introduction to SAX
SAX2 features
 See SAX2 standard feature flags for more
 Anyone can define his own features (by designating a unique uri) .
 A feature may be read-only or read/write, and it may be modifiable only
when parsing, or only when not parsing.
 http://xml.org/sax/features/namespaces
 true => Perform Namespace processing.

URI + localPart + prefixMapping
 false: Optionally do not perform Namespace processing (implies
namespace-prefixes).
 access: (parsing) read-only; (not parsing) read/write
 …/namespace-prefixes // qName + xmlns* attributes reported
 true: Report qualified names (pref:local) and namespace declarations
(xmlns*).
 false: no Namespace declarations reported, and optionally no
qualified names reported.
 access: (parsing) read-only; (not parsing) read/write
Transparency No. 23
An Introduction to SAX
standard Features supplied by SAX2
 …/string-interning
 true => All element names, prefixes, attribute names, Namespace URIs,
and local names are internalized using java.lang.String#intern().
 access: (parsing) read-only; (not parsing) read/write
 …/validation
 true => Report all validation errors (implies external-general-entities
and external-parameter-entities).
 access: (parsing) read-only; (not parsing) read/write
 …/external-general-entities
 true => Include all external general (text) entities.
 access: (parsing) read-only; (not parsing) read/write
 .../external-parameter-entities
 true: Include all external parameter entities, including the external DTD
subset.
 false: Do not include any external parameter entities, even the external
DTD subset.
 access: (parsing) read-only; (not parsing) read/write
Transparency No. 24
An Introduction to SAX
SAX2 Properties
 See standard SAX2 Properties for more
 http://xml.org/sax/properties/lexical-handler
 data type: org.xml.sax.ext.LexicalHandler
 description: The registered lexical handler. access: read/write
 …/declaration-handler
 data type: org.xml.sax.ext.DeclHandler
 description: The registered Declaration handler. access: read/write
 …/document-xml-version
 XML version; String:“1.0” or “1.1”
 …/dom-node
 data type: org.w3c.dom.Node
 description: the current DOM node being visited if this is a DOM tree
Walker

access: (parsing) read-only; (not parsing) read/write
 …/xml-string
// not supported by Xerces
 data type: java.lang.String
 description: The string source for the current event.
 access: read-only
Transparency No. 25
An Introduction to SAX
SAX2 Namespace Support
 standardized Namespace support
 essential for higher-level standards like XSL, XML
Schemas, RDF, and XLink.
 Namespace processing affects only element and
attribute names.
 With Namespace processing:

name = [URI] + localName (must not contain : )

and qName may be valid or not
 Without Namespace processing:

name = qName (qualified name may contains :),
 SAX2
 support either of these views or both simultaneously,
Transparency No. 26
An Introduction to SAX
Sax2 namespace support
 affects the ContentHandler and Attributes interfaces.
 In SAX2, the startElement and endElement callbacks in a content
handler look like this:
public void startElement (String uri, String localName,
String qName, Attributes atts)throws SAXException;
public void endElement (String uri, String localName,
String qName) throws SAXException;
 By default, an XML reader will report a Namespace URI and a local
name for every element, in both the start and end handler.
 Example:
<html:hr xmlns:html= "http://www.w3.org/1999/xhtml"/>
 uri = "http://www.w3.org/1999/xhtml"
 localName=“hr”
 qName = “html:hr” or “” depending on namespace-prefix
feature set or not
Transparency No. 27
An Introduction to SAX
startPrefixMapping, endPrefixMapping
 SAX2 also reports the scope of Namespace declarations,
so that applications can resolve prefixes in attribute
values or character data if necessary.
public void startPrefixMapping (String prefix,
String uri)
throws SAXException;
public void endPrefixMapping (String prefix)
throws SAXException;
Ex: Before the start-element event, the XML reader would
call :
startPrefixMapping("html","http://www.w3.org/1999/x
html")
After the end-element event ,the XML reader would call :
endPrefixMapping("html")
Transparency No. 28
An Introduction to SAX
Configuring Namespace Support
 "http://xml.org/features/namespaces" feature
 true [default] =>

Namespace URIs + local names valid, and

start/endPrefixMapping events reported.
 "http://xml.org/features/namespace-prefixes" feature
 true =>

prefixed names (qName) valid and

Namespace declarations (xmlns* attributes) reported

in attributes:
 false [default] => qualified prefixed names(qName) may
optionally be reported (in practice, all are reported), but

xmlns* attributes must not be reported.
Note: 1. At least one of both features must be true.
Suggestion: 1. namespace-aware: use default setting.
2. no use of namespace: toggle the default
Transparency No. 29
An Introduction to SAX
Configuration Example
 Consider the following simple sample document:
<h:hello xmlns:h ="http://www.greeting.com/ns/“
id ="a1"
h:person ="David"/>
 NS true ,NSP false (the default) => report prefixMapping events +




h:hello => "http://www.greeting.com/ns/" + "hello";
xmlns:h => not appearing in attrs;
id =>“”(empty string) + "id“
h:person => "http://www.greeting.com/ns/" + "person".
 namespaces, namespace-prefixes both true: prefixMapping events +
 h:hello => "http://www.greeting.com/ns/" + "hello“ +
“h:hello”
 xmlns:h => “…” + “h” + “xmlns:h”
 id =>“”(empty string) + "id“ + “id”
 h:person => "http://www.greeting.com/ns/" + "person“ + “h:person”.
 namespaces is false and namespace-prefixes is true:
 “” + “” +
 “” + “” +
"h:hello";
"id"; and
“” + “” +
“” + “” +
"xmlns:h";
"h:person".
Transparency No. 30
An Introduction to SAX
SAX2 packages
 3 packages
 org.xml.sax
 org.xml.sax.helpers
XMLReaderFactory
DefaultHandler
AttributesImpl
LocatorImpl
NamespaceSupport
XMLFilterImpl
AttributeListImpl,ParserAdapter,ParserFactory,
XMLReaderAdapter (sax 1.0 deprecated)
 org.xml.sax.ext
DeclHandler : for DTD declaration events
LexicalHandler : for Lexical events
defaultHandler2 :
Locator2, Locator2Impl, EntityResolver2, Attributes2,
Attributes2impl
Transparency No. 31
An Introduction to SAX
Package: org.xml.sax for SAX2
 Interfaces:











 Classes:
AttributeList
Attributes2  Attributes
ContentHandler

DocumentHandler
DTDHandler
EntityResolver2
EntiryResolver
ErrorHandler
Locator2 Locator
Parser
XMLReader
XMLFilter
 HandlerBase
 InputSource
Exceptions:




SAXException
SAXParseException
SAXNotRecognizedException
SAXNotSupportedException
Transparency No. 32
An Introduction to SAX
Interface org.xml.sax.AttributeList(SAX1.0 deprecated)
 Methods index:
 getLength()
 Return the number of
attributes in this list.
 getName(int index)
 Return the name of an
attribute in this list (by
position).
 getType(int index)
 Return the type of an attribute
in the list (by position).
 getIndex(String name)
 getType(String name)
 Return the type of an attribute
in the list (by name).
 getValue(int index)
 Return the value of an attribute
in the list (by position).
 getValue(String name)
 Return the value of an attribute
in the list (by name).
Transparency No. 33
interface org.xml.sax.attributes2 Attributes
 int getLength()
 int getIndex(String qName)
 int getIndex(String uri, String
localName)
 Look up the index of an attribute
by qName or uri+localName.
 0-based
 String getLocalName(int index)
 String getQName(int index)
 String getURI(int index)
 isDeclared (index | qName | uri,local)2
An Introduction to SAX
 String getType(int index)
 String getType(String qName)
 String getType(String uri,
String localName)
 possible results:
 "CDATA",
 "ID", "IDREF", "IDREFS",
"NMTOKEN"(+enumeration),
"NMTOKENS", "ENTITY",
"ENTITIES", "NOTATION"
 String getValue(int index)
 String getValue(String qName)
 String getValue(String uri,
String localName)
 isSpecified(index | qName | uri,local)2
Note: All methods return null if namespace processing does not support them.
the namespace feature is false => getValue(uri, localName) returns null.
e.g. if
Transparency No. 34
An Introduction to SAX
interface ContentHandler
 startDocument()
 endDocument()
 startElement( uri, localName,
qName, Attributes atts)
 endElement(uri, localName,
qName)
 startPrefixMapping(prefix, uri)
 Begin the scope of a prefix-URI
Namespace mapping.
 endPrefixMapping(prefix)
 no guarantee of proper nesting
among start- and end-prefixing
mapping
 characters(char[] ch, int start,
int length)
 Receive notification of character
data.
 ignorableWhitespace(char[] ch,
int start, int length)
 processingInstruction(target,
data)
 setDocumentLocator(Locator
locator)
 Receive an object for locating the
origin of SAX document events.
 will be invoked only once and
before any other method is called.
 skippedEntity( name)
 Receive notification of a skipped
entity.
Transparency No. 35
An Introduction to SAX
interface ContentHandler
 skippedEntity(name)
 Receive notification of a
skipped entity.
 The Parser will invoke this
method once for each entity
skipped.
 Non-validating processors
may skip entities if they have
not seen the declarations
(because, for example, the
entity was declared in an
external DTD subset).
 All processors may skip
external entities, depending on
the values of the
http://xml.org/sax/features/extern
al-general-entities and the
http://xml.org/sax/features/extern
al-parameter-entities features.
 <test>
 <a/>&ge1;bc<c/>
 </test>
Transparency No. 36
An Introduction to SAX
Interface org.xml.sax.DTDHandler
 Method Index
 notationDecl(String, String,
String) throws SAXException
 Receive notification of a
notation declaration event.
 parameters:

name+pubId+sysId
Ex:
<!NOTATION GIF PUBLIC “abc” >
 notationDecl(“GIF”, “abc”, “”)
 unparsedEntityDecl(name,
pubicId, systemId, notation)
 Receive notification of an
unparsed entity declaration
event.
Ex: <!ENTITY aPic SYSTEM ‘here”
NDATA GIF>
=>unparsedEntityDecl(
“aPic”,
“”, // publicId
“here”,// String systemId,
“GIF” // notationName)
Transparency No. 37
An Introduction to SAX
Interface org.xml.sax.Parser(SAX1.0; skipped!)
Method index
 parse(InputSource)
 Parse an XML document.
 parse(String)
 Parse an XML document from
a system identifier (URI).
 setDocumentHandler(Docume
ntHandler)
 Allow an application to
register a document event
handler.
 setDTDHandler(DTDHandler)
 Allow an application to
register a DTD event handler.
 setEntityResolver(EntityResolv
er)
 Allow an application to
register a custom entity
resolver.
 setErrorHandler(ErrorHandler)
 Allow an application to
register an error event handler.
 setLocale(Locale)
 Allow an application to request
a locale for errors and
warnings.
 Note: all return types are void.
Transparency No. 38
An Introduction to SAX
interface XMLReader
 ContentHandler :
 getContentHandler()
 setContentHandler(ContentHa
ndler handler)
 ErrorHandler
 getErrorHandler()
 setErrorHandler(ErrorHandler
handler)
 parse:
 DTDHandler
 getDTDHandler()
 setDTDHandler(DTDHandler
handler)
 EntityResolver
 getEntityResolver()
 setEntityResolver(EntityResol
ver resolver)
 parse(InputSource input)
 parse(String systemId)
 Features and Properties:
 boolean getFeature(name)
 Object getProperty(name)
 setFeature(name, boolean
value)
 setProperty(name, Object
value)
Transparency No. 39
An Introduction to SAX
Interface org.xml.sax.DocumentHandler(SAX1.0 skipped)
Method Index
 characters(char[], int, int)
 Receive notification of
character data.
 endDocument()
 Receive notification of the end
of a document.
 endElement(String)
 Receive notification of the end
of an element.
 ignorableWhitespace(char[],
int, int)
 Receive notification of
ignorable whitespace in
element content.
 processingInstruction(String,
String)
 Receive notification of a
processing instruction.
 setDocumentLocator(Locator)
 Receive an object for locating
the origin of SAX document
events.
 startDocument()
 Receive notification of the
beginning of a document.
 startElement(String,
AttributeList)
 Receive notification of the
beginning of an element.
Transparency No. 40
An Introduction to SAX
Interface org.xml.sax.Locator, org.xml.sax.ext.Locator2
Method Index
 getColumnNumber()
 Return the column number
where the current document
event ends.
 getLineNumber()
 Return the line number where
the current document event ends.
 getPublicId()
 Return the public identifier for the
current document event.
 getSystemId()
 getEncoding()2 :String
 caharacter encoding used
 getXMLVersion()2:String
 XML version for the entity
 Note: If an implementation
supports Locator2,
XMLReader.getFeature
(“…/use-locator2”)
will return true.
 Return the system identifier for
the current document event.
Transparency No. 41
An Introduction to SAX
Interface org.xml.sax.EntityResolver, org.xml.sax.ext.EntityResolver2
 InputSource resolveEntity(String pubilcId, String systemId)
 InputSource resolveEntity2(entityName, publicId, baseURI,
systemId) // baseURI + systemId  absolute URI
 Allow the application to resolve external entities
 The Parser will call this method before opening any external entity including:
 the external DTD subset( entityName is "[dtd]" ),
 external entities referenced within the DTD or within the document
element
 parameter entity  %name ; general entity  name
 InputSource getExternalSubset2(rootName, baseURI)
 Allows applications to provide an external subset for docs that don't explicitly
define one. // Either no DOCTYPE or has one but no external subset given.
 rootName: document root name; baseURI: absolute, additional hint.
 To use version 2, must setFeature(“…/use-entity-resolver2”, true)
 Version 2 will hide Version 1 if it is used.
Transparency No. 42
An Introduction to SAX
Special entity processing for XHTML dtd
import org.xml.sax.EntityResolver, org.xml.sax.InputSource;
public class MyResolver implements EntityResolver {
public InputSource resolveEntity (String publicId, String
systemId) {
if (publicId.equals(“-//W3c//DTD XHTML 1.0//EN”) ||
systemId.equals(“http://www.w3.org/TR/xhtml1/DTD/xhtml1strict.dtd") ) {
// return my local xhtml1.0 DTD
Reader reader = new FileReader(“myXhtmlDtdFile.dtd”);
return new InputSource(reader); }
else { // use the default behaviour
return null; } } }
Transparency No. 43
An Introduction to SAX
Interface org.xml.sax.ErrorHandler
Method Index
 error(SAXParseException)
 Receive notification of a
recoverable error.
 fatalError(SAXParseException)
 Receive notification of a nonrecoverable error.
 warning(SAXParseException)
 Receive notification of a warning.
Transparency No. 44
An Introduction to SAX
interface org.xml.sax.ext.DeclHandler
 attributeDecl(String eName, String aName, String type, String
valueDefault, String value)
 Report an attribute type declaration.
 valueDefault - "#IMPLIED", "#REQUIRED", "#FIXED" or null if none of
these applies.
 value - A string representing the attribute's default value, or null if
there is none.
 enumeartion or notations => [NOTATION](nm1|…|nmk)
 elementDecl(name, String model)
 Report an element type declaration.
 externalEntityDecl(name, publicId, systemId)
 Report a parsed external entity declaration.
 parameter entity => name begins with %.
 internalEntityDecl(name, String value)
 Report an internal entity declaration.
 parameter entity => name begins with %; value is replacement text.
Transparency No. 45
An Introduction to SAX
Interface org.xml.sax.ext.LexicalHandler
 optional extension handler for SAX2 to provide lexical
information about an XML document, such as comments and
CDATA section boundaries;
 XMLreaders are not required to support.
 apply to the entire document, not just to the document element,
 all lexical handler events must appear between startDocument and
endDocument events.
 set an LexicalHandler/DeclHandler for an XMLreader:
try{
setProperty("http://xml.org/sax/handlers/LexicalHan
dler“, aLexicalHandler)
setProperty("http://xml.org/sax/handlers/DeclHandle
r“, aDeclHandler)
}catch(SAXNotRecognizedException e){}
catch(SAXNotSupportedException e){}
Transparency No. 46
An Introduction to SAX
interface LexicalHandler
 startDTD(String name, String publicId, String systemId)
 Report the start of DTD declarations, if any.
 endDTD()
 Report the end of DTD declarations.
 startCDATA()
 Report the start of a CDATA section.
 endCDATA()
 Report the end of a CDATA section.
 comment(char[] ch, int start, int length)
 Report an XML comment anywhere in the document.
 endEntity(String name) // general or parameter entity
 Report the end of an entity [expansion].
 parameter entity begins with %
Transparency No. 47
An Introduction to SAX
interface LexicalHandler
 startEntity(String name)
 Report the beginning of an entity in document.
 name: name of the entity.

parameter entity  begin with ‘%’

external dtd subset  “[dtd]”
 NOTE:
 Entity references in attribute values -- and the start and
end of the document entity -- are never reported.
 Skipped entities will be reported through the skippedEntity
event, which is part of the ContentHandler interface.
Transparency No. 48
An Introduction to SAX
Class org.xml.sax.InputSource
 Constructors:
 InputSource()
 Zero-argument default
constructor.
 InputSource(InputStream)
 Create a new input source with
a byte stream.
 InputSource(Reader)
 Create a new input source with
a character stream.
 InputSource(String)
 Create a new input source with
a system identifier.
 access order:
 char stream, byte stream,
systmId, publicId.
 Methods
 getByteStream()
 Get the byte stream for this
input source.
 getCharacterStream()
 Get the character stream for
this input source.
 getEncoding()
 Get the character encoding for
a byte stream or URI.
 getPublicId()
 Get the public identifier for this
input source.
 getSystemId()
 Get the system identifier for
this input source.
Transparency No. 49
An Introduction to SAX
Class org.xml.sax.InputSource
 setByteStream(InputStream)
 Set the byte stream for this
input source.
 setCharacterStream(Reader)
 Set the character stream for
this input source.
 setEncoding(String)
 Set the character encoding, if
known.
 setPublicId(String)
 Set the public identifier for this
input source.
 setSystemId(String)
 Set the system identifier for
this input source.
Transparency No. 50
Class org.xml.sax.HandlerBase (SAX1.0 deprecated)




Constructor:
HandlerBase()
Methods:
characters(char[], int, int)
 fatalError(SAXParseException)
 Report a fatal XML parsing
error.

 Receive notification of
character data inside an
element.

endDocument()
 Receive notification of the end
of the document.


error(SAXParseException)
 Receive notification of a
recoverable parser error.
ignorableWhitespace(char[],
int, int)
 Receive notification of
ignorable whitespace in
element content.

notationDecl(String, String,
String)
 Receive notification of a
notation declaration.
endElement(String)
 Receive notification of the end
of an element.
An Introduction to SAX

processingInstruction(String,
String)
 Receive notification of a
processing instruction.
Transparency No. 51
Class org.xml.sax.HandlerBase (cont’d)
 resolveEntity(String, String)
 Resolve an external entity.
 setDocumentLocator(Locator)
 Receive a Locator object for
document events.
 startDocument()
 Receive notification of the
beginning of the document.
An Introduction to SAX
 unparsedEntityDecl(String,
String, String, String)
 Receive notification of an
unparsed entity declaration.
 warning(SAXParseException)
 Receive notification of a
parser warning.
 startElement(String,
AttributeList)
 Receive notification of the
start of an element.
Transparency No. 52
An Introduction to SAX
Class org.xml.sax.SAXException
 Constructors:
 SAXException(Exception)
Create a new SAXException wrapping an existing exception.
 SAXException(String)
Create a new SAXException.
 SAXException(String, Exception)
Create a new SAXException from an existing exception.
 Methods:
 getException()
Return the embedded exception, if any.
 getMessage()
Return a detail message for this exception.
 toString()
retrun a string representation of this exception.
Transparency No. 53
An Introduction to SAX
Class org.xml.sax.SAXParseException
extends SAXException;
Constructors:
 SAXParseException(message,
locator)
 Create a new
SAXParseException from a
message and a Locator.
 SAXParseException(message,
locator, exception)
 SAXParseException(message,
pubID, sysID, lineNo, colNo)
 Create a new
SAXParseException.
 SAXParseException(message,
pubID, sysID, lineNo, colNo,
exception)
 Create a new
SAXParseException with an
embedded exception.
 Wrap an existing exception in
a SAXParseException.
Transparency No. 54
An Introduction to SAX
Class org.xml.sax.SAXParseException
 Methods:
 getColumnNumber()
 The column number of the end of the text where the
exception occurred.
 getLineNumber()
 The line number of the end of the text where the exception
occurred.
 getPublicId()
 Get the public identifier of the entity where the exception
occurred.
 getSystemId()
 Get the system identifier of the entity where the exception
occurred.
Transparency No. 55
An Introduction to SAX
public class SAXNotRecognizedException
 extends SAXException
 Exception class for an unrecognized identifier.
 XMLReader will throw this exception when it finds an
unrecognized feature or property identifier;
 Constructor
 SAXNotRecognizedException(String message)
 Construct a new exception with the given message.
Transparency No. 56
An Introduction to SAX
public class SAXNotSupportedException
 extends SAXException
 Exception class for an unsupported operation.
 An XMLReader will throw this exception when it
recognizes a feature or property identifier, but cannot
perform the requested operation (setting a state or value)
 Constructor:
 SAXNotSupportedException(String message)

Construct a new exception with the given message.
Transparency No. 57
An Introduction to SAX
package org.xml.sax.helpers for SAX2
 AttributeListImpl implements AttributeList
 AttributesImpl implements Attributes
 DefaultHandler
 LocatorImpl implements Locator
 NamespaceSupport
 ParserAdapter :
 ParserFactory :
 XMLFilterImpl : impements XMLFilter
 XMLReaderAdapter
 XMLReaderFactory
Transparency No. 58
An Introduction to SAX
public class org.xml.sax.helper.AttributesImpl
 extends java.lang.Object implements Attributes
 Default implementation of the Attributes interface, with
the addition of manipulators so that the list can be
modified or reused.
 typical uses of this class:
 1. take a persistent snapshot of an Attributes object in a
startElement event;
 2. construct or modify an Attributes object in a SAX2
XMLReader or filter.
 replaces the deprecated SAX1 AttributeListImpl class;
 a much more efficient implementation using arrays
rather than Vector.
Transparency No. 59
An Introduction to SAX
public class org.xml.sax.helper.AttributesImpl, Attributes2Impl
 Constructors:
 AttributesImpl()
 AttributesImpl(Attributes atts)
 Methods











addAttribute(uri, localName, qName, type, value)
clear()
removeAttribute(int index)
setAttribute(int index, uri, localName, qName, type, value)
setLocalName(int index, localName)
setQName(int index, qName)
setType(int index, java.lang.String type)
setURI(int index, java.lang.String uri)
setValue(int index, java.lang.String value)
setDeclared(index, boolean)…,setSpecified(index, boolean)
+ methods declared in Attributes2
Transparency No. 60
An Introduction to SAX
public class DefaultHandler
 extends Object implements EntityResolver, DTDHandler,
ContentHandler, ErrorHandler
 a convenience base class for SAX2 applications:
 provides a default empty implementations for all 4 interfaces:

EntityResolver

DTDHandler

ContentHandler

ErrorHandler
 Application writers usually extend this class when they need to
implement only part of an interface;
 Constructor:
 public DefaultHandler()
Transparency No. 61
An Introduction to SAX
org.xml.sax.ext.DefaultHandler2 extends DefaultHandler
 Empty implementation of additional methods for 3
extensional Handlers
 LexicalHandler
 DeclHandler
 EntityResolver2
 Constructor Summary
 DefaultHandler2()
Transparency No. 62
An Introduction to SAX
public class org.xml.sax.helper.LocatorImpl
 extends java.lang.Object implements Locator
 a convenience implementation of Locator.
 available mainly for application writers, who can use it to make a
persistent snapshot of a locator at any point during a document parse:
Locator locator;
Locator startloc;
public void setLocator (Locator locator)
{ this.locator = locator; }
public void startDocument ()
{ // save the location of the start of the document
// for future use.
Locator startloc = new LocatorImpl(locator);
}
Transparency No. 63
An Introduction to SAX
org.xml.sax.helper.LocatorImpl org.xml.sax.ext.Lcoator2Impl
 Constructor Summary
 Locator(2)Impl()
 Locator(2)Impl(Locator locator) : Copy constructor.
 Method Summary








setColumnNumber(int columnNumber)
setLineNumber(int lineNumber)
setPublicId(String publicId)
setSystemId(String systemId)
setEncoding2(String encoding)
setXMLVersion2(String version)
+ getXXX()’s defined in Locator2.
Transparency No. 64
An Introduction to SAX
example: print the end location of an endElement event
pubilc class myHandler entends DefaultHandler {
Locator loc ; // locator provided by setDocumentLocator(…)
…
pubic void setDocumentLocator(Locator l) {
loc = l;
…
public void endElement(String uri, String lName, String qName)
{
…
System.out.println(“end of “ + qName + “ element at “
colum:” + loc.getColumnNumber() + “ line: “ +
loc.getLineNumber());
…}
Transparency No. 65
An Introduction to SAX
public class org.xml.sax.helper.NamespaceSupport
 extends java.lang.Object
 Encapsulate Namespace logic for use by SAX drivers.
 tracks the declarations currently in force for each context and
automatically processing XML 1.0 qNames into their
Namespace parts.
 Namespace support objects are reusable, but the reset
method must be invoked between each session.
 a simple session:
String[] parts = new String[3];
NamespaceSupport support = new NamespaceSupport();
support.pushContext();
support.declarePrefix("", "http://www.w3.org/1999/xhtml ");
support.declarePrefix("dc", "http://www.purl.org/dc#");
Transparency No. 66
An Introduction to SAX
public class org.xml.sax.helper.NamespaceSupport
String[] parts = support.processName(“p", parts, false);
// isAttribute=false
System.out.println("Namespace URI: " + parts[0]);
System.out.println("Local name: " + parts[1]);
System.out.println("Raw name: " + parts[2]);
String[] parts = support.processName("dc:title", parts, false);
System.out.println("Namespace URI: " + parts[0]);
System.out.println("Local name: " + parts[1]);
System.out.println("Raw name: " + parts[2]);
support.popContext();
Transparency No. 67
An Introduction to SAX
public class org.xml.sax.helper.NamespaceSupport
 Field:
 static String XMLNS // The XML Namespace as a constant.
 Constructor Summary
 NamespaceSupport()
 Method Summary













boolean declarePrefix(prefix, uri) // Declare a Namespace prefix.
Enumeration getDeclaredPrefixes()
Return an enumeration of all prefixes declared in this context.
Enumeration getPrefixes()
Return an enumeration of all active prefixes.
String getURI( prefix)
void popContext()
Revert to the previous Namespace context.
String[] processName(rawName,String[] parts, boolean isAttribute)
Process a raw XML 1.0 name.
void pushContext()
Start a new Namespace context.
void reset() // Reset this Namespace support object for
reuse.No. 68
Transparency
An Introduction to SAX
public class org.xml.sax.helpers.XMLReaderFactory
 Contains static methods for creating an XML reader from
an explicit class name, or for creating an XML reader
based on the value of the org.xml.sax.driver system
property:
try{XMLReader myReader =
XMLReaderFactory.createXMLReader([aClassName]
);
}catch(SAXException e) {
System.err.println(e.getMessage());
}
Transparency No. 69
An Introduction to SAX
public class org.xml.sax.helpers.XMLReaderFactory
 Method Summary
 static XMLReader createXMLReader()
 Attempt to create an XML reader from the system property
“org.xml.sax.driver”
 static XMLReader createXMLReader(String className)
 Attempt to create an XML reader from a class name.
 How to use XMLReaderFactory to create an XMLReader:
 1. XMLReader rd = XMLReaderFactory.
createXMLReader(“org.apache.xerces.parsers.SAXParser”);
 // or
 2.1 System.getProperties(). put(“org.xml.sax.driver”,

“org.apache.xerces.parsers.SAXParser”);
 2.2 XMLReader rd = XMLReaderFactory. createXMLReader();
 note: 2.1 can be replaced by
 java –Dorg.xml.sax.driver=org.apache.xerces.SAXParser
Transparency No. 70
An Introduction to SAX
Apache Xerces: org.apache.xerces.parsers.SAXParser
 Implements org.xml.sax.Parser, org.xml.sax.XMLReader
 provides a parser which implements the SAX1 and SAX2
parser APIs
 Constructor Summary
 SAXParser() // Default constructor.
 Methods
 String[] getFeaturesRecognized()
 String[] getPropertiesRecognized()
 …
 How to create an XMLReader /SAX Parser directly :
 org.xml.sax.XMLReader rd = new SAXParser();
 org.xml.sax.Parser parser = new SAXParser();
Transparency No. 71
The plugability mechanism of Sun’s JAXP
An Introduction to SAX
 http://java.sun.com/xml
 package: javax.xml.parsers
Class Summary
Document Builder
Defines the API to obtain DOM Document
instances from an XML document.
Defines a factory API that enables applications to
DocumentBuild
obtain a parser that produces DOM object trees
erFactory
from XML documents.
SAX Parser
Defines the API that wraps an XMLReader implementation class.
Defines a factory API that enables applications to
SAXParserFact
configure and obtain a SAX based parser to parse
ory
XML documents.
Transparency No. 72
An Introduction to SAX
sample code
SAXParser parser;
DefaultHandler handler = new MyApplicationParseHandler();
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true); //default false; sax: true
factory.setValidating(true);
try {
parser = factory.newSAXParser();
parser.parse("http://myserver/mycontent.xml", handler);
} catch (SAXException se) {
// handle error
} catch (IOException ioe) {
// handle error
} catch (ParserConfigurationException pce) {
// handle error
}
Transparency No. 73
An Introduction to SAX
How JAXP’s SAXParserFactory find its newInstance
 The order used to find a SAXParserFactory
implementation class:
1. javax.xml.parsers.SAXParserFactory system property.
2. find the above property from the file
"lib/jaxp.properties" in the JRE directory.
3. Use the classname in the file META-INF/services/
javax.xml.parsers.SAXParserFactory in jars available to
the runtime.
4. Platform default SAXParserFactory instance, which is
“com.sun.org.apache.xerces.internal.jaxp.SAXParserFa
ctoryImpl” in JAXP1.2,1.3
Transparency No. 74
An Introduction to SAX
javax.xml.parsers.SAXParserFactory
 abstract boolean getFeature(String name)
 abstract void setFeature(String name, boolean value)
 get/Set the particular feature in the underlying implementation of
org.xml.sax.XMLReader.
 boolean isNamespaceAware()
 void setNamespaceAware(boolean awareness)
 get/set the namespace support of the parser that would be produced
by this code.
 boolean isValidating()
 void setValidating(boolean validating)
 get/set the validdating property of the produced parsers.
 static SAXParserFactory newInstance()
 Obtain a new instance of a SAXParserFactory.
 abstract SAXParser newSAXParser()
 Creates a new instance of a SAXParser using the currently configured
factory parameters.
Transparency No. 75
An Introduction to SAX
javax.xml.parsers.SAXParser
 abstract XMLReader getXMLReader()
 abstract Parser getParser()
 Returns the XMLReader or SAX parser that is
encapsultated by the implementation of this class.
 abstract Object getProperty(String name)
 abstract void setProperty(String, Object)
 abstract boolean isNamespaceAware()
 abstract boolean isValidating()
 void parse(input, handler)
 handler => DefaultHandler or HandlerBase,
 input =>File, InputSource, InputStream, URI(String),
 void parse(InputStream, HandlerBase | DefaultHandler,
String uri)
 uri is used for resolving relative URI.
Transparency No. 76
An Introduction to SAX
SAX2: Filters
 The SAX interface assumes two basic streams:
1. a stream of requests flowing from the application to the SAX
driver; and
2. a stream of events (and other information) flowing from the
SAX driver to the application.
setFeature()
setProperty()
Application
(ContentHandler, parse(…)
DTDHandler,
startDocument()
ErrorHander,
…)
input
Source
SAXDriver
(XMLReader)
endDocument()
Transparency No. 77
An Introduction to SAX
extend SAX model to support a processing chain
setFeature()
setProperty()
(parent)
parse(…)
Application
startDocument() SAXDriver
(XMLReader)
endDocument()
setFeature()
XMLFilter
setProperty()
parse(…)
Application
startDocument()
SAXDriver
(XMLReader)
input
Source
endDocument()
Transparency No. 78
An Introduction to SAX
SAX2 support of XMLFilter
 a new interface, org.xml.sax.XMLFilter, and
 a new helper class, org.xml.sax.XMLFilterImpl
 pubic interface XMLFIlter extends XMLReader
 setParent(XMLReader)
 XMLReader getParent()
 piblic class XMLFilterImpl
 implement XMLFilter, ContentHandler, ErrorHandler,
DTDHandler, EntityResolver
 // by delegating all receiving event handlings to
registered external application handler.
 // note XMLFilter is itself a DefaultHandler
Transparency No. 79
An Introduction to SAX
Example
a simple filter that changes the Namespace URI
http://www.foo.com/ns/ to http://www.bar.com/ wherever
it appears in an element name
public class FooFilter extends XMLFilterImpl {
public FooFilter () { }
public FooFilter (XMLReader parent) { super(parent); }
public void startElement (String uri, String localName, String
qName, Attributes atts) throws SAXException
{ if (uri.equals("http://www.foo.com/ns/"))
uri = "http://www.bar.com/ns/";
super.startElement(uri, localName, qName, atts); }
Transparency No. 80
An Introduction to SAX
public void endElement (String uri, String localName,
String qName) throws SAXException {
if (uri.equals("http://www.foo.com/ns/"))
uri = "http://www.bar.com/ns/";
super.endElement(uri, localName, qName); }
startElement()
Application
(ContentHandler)
SAXFilterImpl
startElement(…) {
if(cntHandler != null)
cntHandler.startElement(..) ;
}
super.startElement()
startElement(…)
MySaxFilter
Transparency No. 81
An Introduction to SAX
XMLWriter
 XMLReader : xml document (IputSource)  SAX Events
 XMLWriter extends XMLFilterImpl
 SAX events  xml document (fragment)
 Ex:
XMLWriter w = new XMLWriter();
w.startDocument();
w.startElement("greeting");
w.characters("Hello, world!");
w.endElement("greeting");
w.endDocument();
=>output :
<?xml version="1.0" standalone="yes" ?>
<greeting>Hello world!</greeting>
Transparency No. 82