Transcript Document
1 Before today’s lecture • Personal Project – Due date (including demo your work): 4/12 – Grading scheme Application All XML documents Schema documents Application source codes Web-based interfaces Other source codes 50% Paper Project paper 40% Demonstration Design layout and functionalities 10% 2 Before today’s lecture • Final Project – Group members: • Deadline (for grouping your members): Before 4/10 • Send the name list of your group members to 尚純 or 紹楷 • For those who can’t make a team, we’ll make a group for you. The group members will be posted on 4/12 • If you want to make a change, the deadline is on 4/15 – Project Topics: • Will be posted on the web, pick one and send your topic to尚純 or 紹楷 • Alternatively, send a proposal for selecting your own topic. • The proposal should include reference information of the topic and the scope of the project. • Teaching Assisstants: 吳尚純 [email protected] 李紹楷 [email protected] 3 Simple API for XML (SAX) Is SAX too hard for mortal programmers? And is the domination of DOM a bad thing? 4 • Introduction • XML Parsing Operations • The SAX API • How SAX Processing Works • SAX-based parsers • Events • An SAX Example: Step by Step • Example (SAX1.0): Tree Diagram • SAX 2.0 • Example: Printing the notes in an XML document • Summary 5 Introduction • Processing XML – Create a Parser object Point the object to an XML doc. Process • Basic Operations for processing an XML document – A basic XML processing architecture – 3 key layers: XML documents, The application, infrastructure for working with XML doc. Character Stream XML Serializer Document(s) Parser Standardized XML APIs Application 6 Introduction (cont.) • Basic Operations (cont.) – Parsing is the first step that enables an application to work with an XML doc. – Parsing process breaks up the text of an XML document into small identifiable pieces (nodes) – Parser will break documents into pieces, recognized as start-end tags, attribute value pairs, chunks of text content, processing instructions, comments, and so on. – These pieces are fed into application through well-defined APIs implementing a particular parsing model – Four parsing models are commonly in use: 7 Introduction (cont.) • Basic Operations (cont.) – Four parsing models are commonly in use: 1. Pull Parsing The application always ask the parser to give it the next piece of information It is as if the app. has to “pull” the info. out of the parser, activate the communication by the app. The XML community has not yet defined standard APIs for the “pull parsing” It could happen soon because of its popularity! 2. Push Parsing The parser sends notifications to the application during the parsing process The notifications are sent in “reading” order (i.e., their appearance order in the document) 8 Introduction (cont.) • Basic Operations (cont.) 2. Push Parsing Notifications are typically implemented as event callbacks in the application Known as event-based parsing Simple API for XML (SAX) is the standard for push parsing 3. One-step Parsing The parser reads the whole XML doc. and generates a data structure (a parse tree) describing its entire contents (elements, attributes,… etc.) W3C Standard : XML DOM (Document Object Model): specifies the types of objects that will be included in the parse tree, their properties, and operations The DOM is a language- and platform-independent API. The biggest problem is memory overhead and computational efficiency 9 Introduction (cont.) • Basic Operations (cont.) 4. Hybrid Parsing Combine the characteristics of the other two parsing models to create efficient parsers for special scenarios Lets break the concept of loading and parsing to analyse the condition – Loading the document: one-step parsing – Parsing the rest of the document: providing partial information extracted from the document for the application For example, Push + one-step parsing – The application thinks it is working with a one-step parser; in reality, the parsing process has just begun – As the application keep accessing more objects on the DOM tree, the parsing continues incrementally – Just enough of the XML document is parsed at any given point to give the application the objects it wants to see 10 An example of hybrid parsing • In Sun's reference implementation, the DOM API builds on the SAX API as shown in the diagram, • Sun's implementation of the Document Object Model (DOM) API uses the SAX libraries to read in XML data and construct the tree of data objects that constitutes the DOM. • Sun's implementation also provides a framework to help output the object tree as XML data 11 Introduction (cont.) • Why define many models? – Trade-offs between memory efficiency, computational efficiency, and ease of programming – A table is presented to compare the trade-offs of the models Model Control of Parsing Control of Context Memory Efficiency Computation al efficiency Ease of Programming Pull Application Application High Highest Low Push (SAX) Parser Application High High Low One-step (DOM) Parser Parser Lowest Lowest High One-step (JDOM) Parser Parser Low Low Highest Hybrid (DOM) Parser Parser Medium Medium High Hybrid (JDOM) Parser Parser Medium Medium Highest Introduction (cont.) 12 • How to choose between SAX and DOM: Whether you choose DOM or SAX is going to depend on several factors: – Purpose of the application: • To make changes to the data and output it as XML, then in most cases, DOM is the way to go. • SAX is much more complex to program, as you'd have to make changes to a copy of the data rather than to the data itself. – Amount of data: For large files, SAX is a better bet. – How the data will be used: If only a small amount of the data will actually be used, you may be better off using SAX to extract it into your application. – On the other hand, if you know that you will need to refer back to large amounts of information that has already been processed, SAX is probably not the right choice. – The need for speed: SAX implementations are normally faster than DOM implementations. • It's important to remember that SAX and DOM are not mutually exclusive. • Use DOM to create a stream of SAX events, • Use SAX to create a DOM tree. • In fact, most parsers used to create DOM trees are actually using SAX to do it! 13 The SAX APIs • SAX (The Simple API for XML ) – SAX is the Simple API for XML, originally a Java-only API. – SAX was the first widely adopted API for XML in Java, and is a “de facto” standard. – The current version is SAX 2.0.x, and there are versions for several programming language environments other than Java – Another method for accessing XML document’s contents – Developed by XML-DEV mailing-list members – Uses event-based model • Notifications (events) are raised as document is parsed 14 The SAX APIs (cont.) • SAX Parsing architecture: using the common abstract factory design pattern 1. Create an instance of SAXParserFactory (used to create an instance of SAX Parser) 2. SAXReader: event trigger, when the parse() method is invoked, the reader starts firing events to the application by invoking registered callbacks 3. Those methods are defined by the interfaces ContentHandler, ErrorHandler, DTDHandler, and EntityResolver. 15 The SAX APIs (cont.) • Here is a summary of the key objects in SAX APIs: • SAXParserFactory Creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory • SAXParser Defines several kinds of parse() methods. In general, you pass an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object. • SAXReader Carries on the conversation with the SAX event handlers you define 16 The SAX APIs (cont.) • DefaultHandler Implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so you can override only the ones you're interested in. • ContentHandler Defines methods, which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively. • ErrorHandler Methods in response to various parsing errors. • DTDHandler Defines methods you will generally never be called upon to use. Used when processing a DTD to recognize and act on declarations for an unparsed entity. 17 The SAX APIs (cont.) • Being event-based means that the parser reads an XML document from beginning to end, • Each time it recognizes a syntax construction, it notifies the application that is running it • The SAX parser notifies the application by calling methods from the ContentHandler interface. • For example, when the parser comes to a less than symbol ("<"), it calls the startElement method; 18 The SAX API (cont.) • when it comes to character data, it calls the characters method; • when it comes to the less than symbol followed by a slash ("</"), it calls the endElement method • To illustrate, let's look at an example XML document and walk through what the parser does for each line. 19 How SAX Processing Works • SAX analyzes an XML stream as it goes by, much like an old ticker tape. • Consider the following XML code snippet: • A SAX processor analyzing this code snippet would generate, in general, the following events: <?xml version="1.0"?> <samples> <server>UNIX</server> <monitor>color</monitor> </samples> Start document Start element (samples) Characters (white space) Start element (server) Characters (UNIX) End element (server) Characters (white space) Start element (monitor) Characters (color) End element (monitor) Characters (white space) End element (samples) 20 How SAX Processing Works (cont.) • The SAX API allows a developer to capture these events and act on them – • What does “the developer” represent for? SAX processing involves the following steps: 1. 2. 3. 4. Create an event handler. Create the SAX parser. Assign the event handler to the parser. Parse the document, sending each event to the handler. 21 How SAX Processing Works (cont.) • The pros and cons of event-based processing – The advantages of this kind of processing are much like the advantages of streaming media. (like interpreter?) – Analysis can get started immediately, rather than waiting for all of the data to be processed. – The application is simply examining the data as it goes by, it doesn't need to store it in memory: – A huge advantage when it comes to large documents. 22 How SAX Processing Works (cont.) • The pros and cons of event-based processing – In fact, an application doesn't even have to parse the entire document; – Stop when certain criteria have been satisfied. – In general, SAX is also much faster than the alternative, the DOM. – On the other hand, because the application is not storing the data in any way, – it is impossible to make changes to it using SAX, or to move backwards in the data stream. 23 SAX-based Parsers • SAX-based parsers – Use Sun Microsystem’s JAXP in Textbook • Tools – A text editor: XML files are simply text. To create and read them, a text editor is all you need. – JavaTM 2 SDK, Standard Edition version 1.4.x: SAX support has been built into the latest version of Java (available at http://java.sun.com/j2se/1.4.2/download.html), won't need to install any separate classes. Using an earlier version of Java, such as Java 1.3.x, you'll also need • an XML parser such as the Apache project's Xerces-Java (available at http://xml.apache.org/xerces2-j/index.html), • or Sun's Java API for XML Parsing (JAXP), part of the Java Web Services Developer Pack (available at http://java.sun.com/webservices/downloads/webservicespack.html). • You can also download the official version from SourceForge (available at http://sourceforge.net/project/showfiles.php?group_id=29449). – Other Languages: Should you wish to adapt the examples, SAX implementations are also available in other programming languages. – You can find information on C, C++, Visual Basic, Perl, and Python implementations of a SAX parser at http://www.saxproject.org/?selected=langs. 24 Some SAX-based parsers. Product Description JAXP Sun’s JAXP is available from java.sun.com/xml. JAXP supports both SAX and DOM. Xerces Apache’s Xerces parser is available at www.apache.org. Xerces supports both SAX and DOM. MSXML 3.0 Microsoft’s msxml parser available at msdn.microsoft.com/xml. This parser supports both SAX and DOM. 25 Setup • Java applications to illustrate SAX API – Java 2 Standard Edition required • Download at www.java.sun.com/j2se • Installation instructions – www.deitel.com/faq/java3install.htm – JAXP required • Download at java.sun.com/xml/download.html 26 Events • SAX parser – Invokes certain methods (Fig. 9.2) when events occur – Programmers override these methods to process data Fig. 9.2 Methods invoked by the SAX parser Method Name Description setDocumentLocator Invoked at the beginning of parsing. startDocument Invoked when the parser encounters the start of an XML document. endDocument Invoked when the parser encounters the end of an XML document. startElement Invoked when the start tag of an element is encountered. endElement Invoked when the end tag of an element is encountered. characters Invoked when text characters are encountered. ignorableWhitespace Invoked when whitespace that can be safely ignored is encountered. processingInstruction Invoked when a processing instruction is encountered. 27 28 The SAX API – an Example <priceList> [parser calls startElement] <coffee> [parser calls startElement] <name>Mocha Java</name> [parser calls startElement, characters, and endElement] <price>11.95</price> [parser calls startElement, characters, and endElement] </coffee> [parser calls endElement] <priceList> [parser calls endElement] • The default implementations of the methods that the parser calls do nothing • You need to write a subclass implementing the appropriate methods to get the functionality you want • For example, suppose you want to get the price per pound for Mocha Java. • You would write a class extending DefaultHandler (the default implementation of ContentHandler) in which you write your own implementations of the methods startElement and characters 29 The SAX API – an Example (cont.) • You code has three tasks. – Scan the command line for the name (or URI) of an XML file. – Create a parser object. – Tell the parser object to parse the XML file named on the command line, and tell it to send your code all of the SAX events it generates. • Step I: Scan the command line – For an argument. If there isn't an argument, you print an error message and exit. – Otherwise, assume that the first argument is the name or URI of an XML file public static void main(String argv[]) { if (argv.length == 0 || (argv.length == 1 && argv[0].equals("-help"))) { // Print an error message and exit... } PrintOutline s1 = new PrintOutline(); s1.parseURI(argv[0]); } 30 The SAX API – an Example (cont.) • Step II: Create a parser object – To create a parser object, use JAXP's SAXParserFactory API to create a SAXParser public void parseURI(String uri) { try { SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); ... 31 The SAX API – an Example (cont.) • Step 3: Parse the file and handle any events – We've created our parser object, we need to have it parse the file. That's done with the parse() method – Notice that the parse() method takes two arguments. The first is the URI of the XML document, while the second is an object that implements the SAX event handlers public void parseURI(String uri) { try { SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); sp.parse(uri, this); } catch (Exception e) { System.err.println(e); } } 32 The SAX API – an Example (cont.) – In the case of PrintOutline, you're extending the SAX DefaultHandler interface: – DefaultHandler has an implementation of a number of event handlers. These implementations do nothing, which means all your code has to do is implement handlers for the events you care about. – Note: The exception handling above is sloppy; as an exercise for the reader, feel free to handle specific exceptions, such as SAXException or java.io.IOException. – A major benefit of the DefaultHandler interface is that it shields you from having to implement all of the event handlers. – DefaultHandler implements all of the event handlers; you just implement the ones you care about. public class PrintOutline extends DefaultHandler{ ……. } 33 The SAX API – an Example (cont.) • Step IV: Implementing event handlers – startdocument() event handlers – Simply writing out a basic XML declaration, regardless of whether one was in the original XML document or not. – Currently the base SAX API doesn't return the details of the XML declaration public void startDocument() { System.out.println("<?xml version=\"1.0\"?>"); } The SAX API – an Example (cont.) • Next, here's what you do for startElement(): – Print the name of the elements and attributes – Namespace URI in braces before the element's local name – rawName contains the raw XML 1.0 name if a namespace URI doesn't have public void startElement(String namespaceURI, String localName, String rawName, Attributes attrs) { System.out.print("<"); System.out.print(rawName); if (attrs != null) { int len = attrs.getLength(); for (int i = 0; i < len; i++) { System.out.print(" "); System.out.print(attrs.getQName(i)); System.out.print("=\""); System.out.print(attrs.getValue(i)); System.out.print("\""); } } System.out.print(">"); } 34 35 The SAX API – an Example (cont.) • More event handling – characters() : printing the XML document to the console, you're simply printing the portion of the character array that relates to this event public void characters(char ch[ ], int start, int length) { System.out.print(new String(ch, start, length)); } – endElement() : simply write out the end tag – endDocument() : Do nothing just for the completeness. public void endElement(String namespaceURI, String localName, String rawName) { System.out.print("</"); System.out.print(rawName); System.out.print(">"); } public void endDocument() { System.out.println("End of Document"); } The SAX API – an Example (cont.) • Step V: Error handling: – SAX defines the ErrorHandler interface; – Implemented by DefaultHandler; – contains three methods: warning, error, and fatalError (defined by the XML specification ) • warning(): Issued in response to a warning • error(): Issued in response to an error condition. • fatalError(): Issued in response to a fatal error public void warning(SAXParseException ex) { System.err.println("[Warning] "+ getLocationString(ex)+": "+ ex.getMessage()); } public void error(SAXParseException ex) { System.err.println("[Error] "+ getLocationString(ex)+": "+ ex.getMessage()); } public void fatalError(SAXParseException ex) throws SAXException { System.err.println("[Fatal Error] "+ getLocationString(ex)+": "+ ex.getMessage()); throw ex; } 36 37 Example: Tree Diagram • Java application – Parse XML document with SAX-based parser – Output document data as tree diagram – extends org.xml.sax.HandlerBase • implements interface EntityResolver – Handles external entities • implements interface DTDHandler – Handles notations and unparsed entities • implements interface DocumentHandler – Handles parsing events • implements interface ErrorHandler – Handles errors 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 // Fig. 9.3 : Tree.java // Using the SAX Parser to generate a tree diagram. import import import import import java.io.*; org.xml.sax.*; // for HandlerBase class javax.xml.parsers.SAXParserFactory; javax.xml.parsers.ParserConfigurationException; javax.xml.parsers.SAXParser; public class Tree extends HandlerBase { private int indent = 0; // indentation counter // returns the spaces needed for indenting private String spacer( int count ) { String temp = ""; for ( int i = 0; i < count; i++ ) temp += " "; Outline 38 Fig. 9.3 Application to create a tree diagram for an XML document. import specifies location of import location classes needed specifies by application of classes needed by application Assists in formatting Assists in formatting Override method to output parsed document’s URL return temp; } // method called before parsing Override method to output // it provides the document location parsed document’s URL public void setDocumentLocator( Locator loc ) { System.out.println( "URL: " + loc.getSystemId() ); } 31 // method called at the beginning of a document 32 public void startDocument() throws SAXException 33 { 34 35 System.out.println( "[ document root ]" ); } 36 37 // method called at the end of the document 38 public void endDocument() throws SAXException 39 { 40 41 System.out.println( "[ document end ]" ); } Overridden method called Outline when root node encountered 39 Fig. 9.3 Application to create a tree diagram for an XML (Part 2) Overriddendocument. method called when end of document is encountered Overridden method called when root node encountered 42 43 // method called at the start tag of an element 44 public void startElement( String name, 45 46 AttributeList attributes ) throws SAXException Overridden method called Overridden method called of document is when startwhen tag isend encountered encountered { 47 System.out.println( spacer( indent++ ) + 48 "+-[ element : " + name + " ]"); 49 50 if ( attributes != null ) 51 52 for ( int i = 0; i < attributes.getLength(); i++ ) 53 System.out.println( spacer( indent ) + 54 "+-[ attribute : " + attributes.getName( i ) + 55 " ] \"" + attributes.getValue( i ) + "\"" ); 56 57 } Overridden method called when start tag is encountered Output each attribute’s name andeach value (if any) Output attribute’s name and value (if any) Outline 58 // method called at the end tag of an element 59 public void endElement( String name ) throws SAXException 60 { 61 62 indent--; } 63 Overridden method called when end of element is encountered 64 // method called when a processing instruction is found 65 public void processingInstruction( String target, 66 67 String value ) throws SAXException { 68 System.out.println( spacer( indent ) + 69 70 } 71 // method called when characters are found 73 public void characters( char buffer[], int offset, 74 75 int length ) throws SAXException { 76 String temp = new String( buffer, offset, length ); 78 79 System.out.println( spacer( indent ) + 80 "+-[ text ] \"" + temp + "\"" ); 81 82 83 } } Overridden method called when processing instruction is encountered Overridden method Overridden method called whencalled when data is character data is character encountered encountered if ( length > 0 ) { 77 Fig. 9.3 Application to create a tree diagram for an XML document. (Part 3) OverriddenOverridden method called method when called processing instruction when endisofencountered element is encountered "+-[ proc-inst : " + target + " ] \"" + value + "\"" ); 72 40 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 Outline // method called when ignorable whitespace is found public void ignorableWhitespace( char buffer[], Overridden method called when int offset, int length ) ignorable whitespaceFig. is encountered 9.3 Application { to create a tree if ( length > 0 ) { System.out.println( spacer( indent ) + "+-[ ignorable ]" ); diagram for an XML } 41 document. (Part 4) } // method called on a non-fatal (validation) public void error( SAXParseException spe ) throws SAXParseException { // treat non-fatal errors as fatal errors throw spe; } Overridden method called error when ignorable whitespace encountered Overridden method is called when error (usually validation) occurs Overridden method called when error (usually validation) occurs Overridden method called // method called on a parsing warning public void warning( SAXParseException spe ) is detected Overridden methodwhen calledproblem when problem throws SAXParseException (but not considered error) is detected (but not considered error) { System.err.println( "Warning: " + spe.getMessage() ); Method main starts } application // main method public static void main( String args[] ) { boolean validate = false; Method main starts application 113 Outline if ( args.length != 2 ) { 114 System.err.println( "Usage: java Tree [validate] " + 115 System.err.println( "Options:" ); 117 System.err.println( " 118 validate [yes|no] : " + "DTD validation" ); 119 System.exit( 1 ); } 121 122 123 if ( args[ 0 ].equals( "yes" ) ) 126 129 SAXParserFactory can instantiate SAX-based parser SAXParserFactory can instantiate SAX-based parser SAXParserFactory saxFactory = SAXParserFactory.newInstance(); 127 128 Allow command-line Allow command-line arguments (if we want arguments to validate (ifdocument) we want to validate document) validate = true; 124 125 Fig. 9.3 Application to create a tree diagram for an XML document. (Part 5) "[filename]\n" ); 116 120 42 saxFactory.setValidating( validate ); 130 Outline try { 131 SAXParser saxParser = saxFactory.newSAXParser(); 132 saxParser.parse( new File( args[ 1 ] ), new Tree() ); 133 } 134 catch ( SAXParseException spe ) { 135 System.err.println( "Parse Error: " + spe.getMessage() ); Instantiate SAX-based parser 136 } 137 catch ( SAXException se ) { 138 se.printStackTrace(); 139 } 140 catch ( ParserConfigurationException pce ) { 141 pce.printStackTrace(); 142 } 143 catch ( IOException ioe ) { 144 ioe.printStackTrace(); 145 } 146 147 148 149 } Fig. 9.3 Application to create a tree diagram parser for an XML Instantiate SAX-based document. (Part 6) System.exit( 0 ); } Handles errors (if any) Handles errors (if any) 43 1 <?xml version = "1.0"?> XML document does not reference DTD Outline 44 2 3 <!-- Fig. 9.4 : spacing1.xml 4 <!-- Whitespaces in nonvalidating 5 <!-- XML document without DTD --> XML document with elements test, --> example and object parsing --> 6 7 8 9 <test name = " spacing 1 "> Fig. 9.4 XML document spacing1.xml. document does not Root element testXML contains attribute DTD name with value “ reference spacing 1 ” <example><object>World</object></example> </test> URL: file:C:/Tree/spacing1.xml [ document root ] +-[ element : test ] +-[ attribute : name ] " spacing 1 " +-[ text ] " " Note that whitespace is preserved: +-[ text ] " " +-[ element : example ] attribute value (line 7), line feed +-[ element : object ] (end of line 7), indentation (line 8) +-[ text ] "World" and line feed (end of line 8) +-[ text ] " " [ document end ] XML document with elements test, example and object Root element test contains attribute name with value “ spacing 1 ” Note that whitespace is preserved: attribute value (line 7), line feed (end of line 7), indentation (line 8) and line feed (end of line 8) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Outline <?xml version = "1.0"?> 45 <!-- Fig. 9.5 : spacing2.xml --> <!-- Whitespace and nonvalidated parsing --> <!-- XML document with DTD --> <!DOCTYPE <!ELEMENT <!ATTLIST <!ELEMENT <!ELEMENT ]> Fig. 9.5 XML document DTD checks document’s characters, so any spacing2.xml. “removable” whitespace is ignorable test [ test (example)> test name CDATA #IMPLIED> element (object*)> object (#PCDATA)> <test name = " spacing 2 "> <example><object>World</object></example> </test> URL: file:C:/Tree/spacing2.xml [ document root ] +-[ element : test ] +-[ attribute : name ] " spacing 2 " +-[ ignorable ] Line feed at line 14, spaces at +-[ ignorable ] +-[ element : example ] beginning of line 15 and line +-[ element : object ] feed at line 15 are ignored +-[ text ] "World" +-[ ignorable ] [ document end ] DTD checks document’s characters, so any “removable” whitespace is ignorable Line feed at line 14, spaces at beginning of line 15 and line feed at line 15 are ignored 1 2 3 4 5 6 7 8 9 10 11 12 13 14 <?xml version = "1.0"?> Invalid document because element contain element item <!-- Fig. 9.6 : notvalid.xml --> example cannot <!-- Validation and non-validation --> <!DOCTYPE test [ <!ELEMENT test (example)> <!ELEMENT example (#PCDATA)> ]> <test> <?test message?> <example><item><![CDATA[Hello & Welcome!]]></item></example> </test> URL: file:C:/Tree/notvalid.xml [ document root ] +-[ element : test ] +-[ ignorable ] +-[ ignorable ] +-[ proc-inst : test ] "message" +-[ ignorable ] +-[ ignorable ] +-[ element : example ] +-[ element : item ] +-[ text ] "Hello & Welcome!" +-[ ignorable ] [ document end ] Outline 46 Fig. 9.6 Well-formed XML document. Invalid document because element example cannot contain element item Validation disabled, so document parses successfully Validation disabled, so document parses successfully Parser does not process text in CDATA section and returns character data Parser does not process text in CDATA section and returns character data URL: file:C:/Tree/notvalid.xml Validation [ document root ] +-[ element : test ] +-[ ignorable ] +-[ ignorable ] +-[ proc-inst : test ] "message" +-[ ignorable ] +-[ ignorable ] +-[ element : example ] Parse Error: Element "example" does not allow "item" enabled Outline 47 Fig. 9.6 Well-formed XML document. (Part 2) Validation enabled Parsing terminates when fatal error occurs at Parsing terminates when fatal item element error occurs at element item 1 2 3 4 5 6 7 8 Outline <?xml version = "1.0"?> <!-- Fig. 9.7 : valid.xml <!-- DTD-less document --> --> Fig. 9.7 Checking an XML document without a DTD for validity. <test> <example>Hello & Welcome!</example> </test> URL: file:C:/Tree/valid.xml [ document root ] +-[ element : test ] +-[ text ] " " +-[ text ] " " +-[ element : example ] +-[ text ] "Hello " +-[ text ] "&" +-[ text ] " Welcome!" +-[ text ] " " [ document end ] 48 Validation disabled in first Validation disabled in first output, output, so document parses so document parses successfully successfully Validation enabled in second output, and parsing fails because DTD does not exist Validation enabled in second output, and parsing fails because DTD does not exist URL: file:C:/Tree/valid.xml [ document root ] Warning: Valid documents must have a <!DOCTYPE declaration. Parse Error: Element type "test" is not declared. 49 Example: Tree Diagram (Summary) • SAX 1.0 supported! • When compiling, the message, “Tree.java uses or overrides a deprecated API” “Recompile with –deprecation for details” • After compiling, 3 warning (class has been deprecated) were issued: 1. HandlerBase should be replaced by DefaultHandler 2. & 3. AttributeList should be replaced by Attributes Better replace SAX1.0 with SAX2.0 Problem with Xerces vs. JAXP 50 SAX 2.0 • SAX 2.0 – Recently released – We have been using JAXP – Xerces parser (Apache) supports SAX 2.0 51 SAX 2.0 (cont.) • SAX 2.0 major changes – Class HandlerBase replaced with DefaultHandler – AttributeList replaced with Attributes – Element and attribute processing support namespaces – Loading and parsing processes has changed • Alternative methods can be applied – Methods for retrieving and setting parser properties • e.g., whether parser performs validation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 // Fig. 9.10 : printXML.java // Using the SAX Parser to indent an XML document. import import import import import import java.io.*; org.xml.sax.*; org.xml.sax.helpers.*; javax.xml.parsers.SAXParserFactory; javax.xml.parsers.ParserConfigurationException; javax.xml.parsers.SAXParser; public class PrintXML extends DefaultHandler { private int indent = 0; // indention counter // returns the spaces needed for indenting private String spacer( int count ) { String temp = ""; Outline Fig. 9.10 Java application that indents an XML document. Replace class ReplaceHandlerBase class HandlerBase with class with class DefaultHandler DefaultHandler Provides same service as that of SAX 1.0 for ( int i = 0; i < count; i++ ) temp += " "; return temp; } // method called at the beginning of a document public void startDocument() throws SAXException { System.out.println( "<?xml version = \"1.0\"?>" ); } 52 Provides same service as that of SAX 1.0 31 32 33 34 35 36 37 38 39 40 41 // method called at the end of the document public void endDocument() throws SAXException { System.out.println( "---[ document end ]---" ); } // method called at the start tag of an element public void startElement( String uri, String eleName, String raw, Attributes attributes ) throws SAXException { System.out.println( ">" ); indent += 3; } // method called at the end tag of an element public void endElement( String uri, String eleName, String raw ) throws SAXException { indent -= 3; System.out.println( spacer(indent) + "</" + raw + 59 60 61 53 Fig. 9.10 Java application that indents an XML document. (Part 2) Provides same service as Method startElement that of SAX 1.0 System.out.print( spacer( indent ) + "<" + raw ); now has four arguments (namespace URI, element Method startElement if ( attributes != null ) name,now qualified element name has four arguments and element attributes) for ( int i = 0; i < attributes.getLength(); i++ ) (namespace URI, element System.out.print( " "+ attributes.getLocalName( i ) name, qualified element " = " + "\"" + Attributes areelement now stored in name and attributes.getValue( i ) + "\"" ); Attributes object attributes) 42 43 44 45 46 47 + 48 49 50 51 52 53 54 55 56 57 58 Provides same Outline service as that of SAX 1.0 } Attributes are now stored inendElement Attributesnow object Method has three arguments (namespace Method endElement URI, element name and now has three qualified elementarguments name) (namespace URI, element name and qualified ">"); element name) Outline 62 // method called when characters are found 63 public void characters( char buffer[], int offset, 64 65 int length ) throws SAXException { 66 if ( length > 0 ) { 67 String temp = new String( buffer, offset, 54 Provides same service as that of SAX Fig.1.0 9.10 Java application that indents an XML document. (Part 3) length ); 68 69 if ( !temp.trim().equals( "" ) ) 70 System.out.println( spacer(indent) + temp.trim() ); 71 72 } Provides same service as that of SAX 1.0 } 73 74 // method called when a processing instruction is found 75 public void processingInstruction( String target, 76 77 String value ) throws SAXException { 78 System.out.println( spacer( indent ) + 79 80 "<?" + target + " " + value + "?>"); } 81 82 // main method 83 public static void main( String args[] ) 84 { 85 Provides same service as that of SAX 1.0 Provides same service as that of SAX 1.0 86 Outline try { 87 XMLReader saxParser = ( XMLReader ) Class.forName( 88 "org.apache.xerces.parsers.SAXParser" ).newInstance(); 89 90 saxParser.setContentHandler( new PrintXML() ); 91 FileReader reader = new FileReader( args[ 0 ] ); 92 saxParser.parse( new InputSource( reader ) ); 93 } 94 catch ( SAXParseException spe ) { 95 System.err.println( "Parse Error: " + 96 } 97 catch ( SAXException se ) { 98 se.printStackTrace(); 99 } 100 catch ( Exception e ) { 101 e.printStackTrace(); 102 } 103 104 105 106 } 55 System.exit( 0 ); } Fig. 9.10 Java application that indents an XML document. (Part 4) CreateCreate XercesXerces SAX-based parser SAX-based parser SAX-based parser parses InputSource SAX-based parser parses spe.getMessage() ); InputSource Lines: 86-92 replace with the following codes: XMLReader xmlReader = null; try { SAXParserFactory spfactory = SAXParserFactory.newInstance(); SAXParser saxParser = spfactory.newSAXParser(); xmlReader = saxParser.getXMLReader(); xmlReader.setContentHandler( new PrintXML() ); xmlReader.setErrorHandler(new PrintXML()); FileReader reader = new FileReader( argv[0] ); xmlReader.parse( new InputSource( reader ) ); } 1 2 3 <?xml version = "1.0"?> <!-- Fig. 9.11 : test.xml --> 4 5 <?xml:stylesheet type = "text/xsl" href = "something.xsl"?> 6 7 <test> 8 <example value = "100">Hello and Welcome!</example> 9 10 <a> Outline Fig. 9.11 Sample execution of printXML.java Processing instruction that Processing instruction that links to stylesheet links to stylesheet 11 <b>12345</b> 12 </a> 13 </test> <?xml version = "1.0"?> <?xml:stylesheet type = "text/xsl" href = "something.xsl"?> <test> <example value = "100"> Hello and Welcome! </example> <a> <b> 12345 </b> </a> </test> ---[ document end ]--- 56 Output 57 Summary • SAX is a faster, • More lightweight way to read and manipulate XML data than the Document Object Model (DOM). • SAX is an event-based processor that allows you to deal with elements, attributes, and other data as it shows up in the original document. (streaming evenets) • Because of this architecture, SAX is a read-only system, • But that doesn't prevent you from using the data. Make a copy and process it! 58 Summary (cont.) • Resources – Basic grounding in XML read through the "Introduction to XML" tutorial (developerWorks, August 2002). See the official SAX 2.0 page (http://www.saxproject.org). – Learn to use a SAX filter to manipulate data (developerWorks, October 2001). – Read about using SAX filters for flexible processing (developerWorks, March 2003). – Find out how to build SAX-like apps in PHP (developerWorks, March 2003). – Learn how to set up a SAX parser (developerWorks, July 2003). – Learn more about validation and the SAX ErrorHandler interface (developerWorks, June 2001). – Understand how to stop a SAX parser when you have enough data (developerWorks, June 2002). – Explore XSL transformations to and from a SAX stream (developerWorks, July 2002). – Turn a SAX stream into a DOM or JDOM object with "Converting from SAX" (developerWorks, April 2001). – Download the Java 2 SDK, Standard Edition version 1.4.2 (http://java.sun.com/j2se/1.4.2/download.html). – SAX was developed by the members of the XML-DEV mailing list. Try the Java version, now a SourceForge project (http://sourceforge.net/project/showfiles.php?group_id=29449). – Try SAX implementations: available in other programming languages – Get IBM's XML-related tools such as the DB2 XML Extender, which provides a bridge between XML and relational systems. Visit the DB2 Developer Domain to learn more about DB2. – Find out how you can become an IBM Certified Developer in XML and related technologies 59 That’s it for today! Have a nice and lovely spring holiday! • Do not forget to check the web site for important message regarding the demo date of your personal project. getLocationString() • • • • The private method gives more details about the error. The SAXParseException class defines methods such as getLineNumber() and getColumnNumber() to provide the line and column number where the error occurred. getLocationString merely formats this information into a useful string Putting this code into a separate method means you don't have to include this code in every error handler private String getLocationString(SAXParseException ex) { StringBuffer str = new StringBuffer(); String systemId = ex.getSystemId(); if (systemId != null){ int index = systemId.lastIndexOf('/'); if (index != -1) systemId = systemId.substring(index + 1); str.append(systemId); } str.append(':'); str.append(ex.getLineNumber()); str.append(':'); str.append(ex.getColumnNumber()); return str.toString(); } 60 61 Processing Instruction • Processing Instructions • An XML file can also contain processing instructions that give commands or information to an application that is processing the XML data. • Processing instructions have the following format: <?target instructions?> 62 • At the most basic level: – An application can directly output XML markup – In the figure, this is indicated by the application working with a character stream – Simple? Not really, must handle all the basic syntax rules (start-end tag, attribute quoting, …. etc.) – a good topic for final project! • Parsing and serialization: – Parsing the XML document first, – Constructing a data structure describing the XML document – Utilizing the process of emitting XML markup from a data structure – Utilizing the API for the processing methods