XML Tools Leonidas Fegaras 1

Download Report

Transcript XML Tools Leonidas Fegaras 1

CSE 6331 © Leonidas Fegaras XML Tools

XML Tools

Leonidas Fegaras 1

XML Processing

Well-formedness checks & reference expansion document XML document parser XML infoset document validator XML infoset (annotated) application DTD or XML schema storage system CSE 6331 © Leonidas Fegaras XML Tools 2

Tools for XML Processing

• DOM: a language-neutral interface for manipulating XML data – requires that the entire document be in memory • SAX: push-based stream processing – hard to write non-trivial applications • XPath: a declarative tree-navigation language – beautiful and easy to use – is part of many other languages • XSLT: a language for transforming XML based on templates – very ugly!

• XQuery: full-fledged query language – influenced by OQL • XmlPull: pull-based stream processing – far better than SAX, but not a standard yet CSE 6331 © Leonidas Fegaras XML Tools 3

DOM

The Document Object Model (DOM) is a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content and structure of XML documents.

The following is part of the DOM interface: public interface Node { public String getNodeName (); public String getNodeValue (); public NodeList getChildNodes (); public NamedNodeMap getAttributes (); } public interface Element extends Node { public NodeList getElementsByTagName ( String name ); } public interface Document extends Node { public Element getDocumentElement (); } public interface NodeList { public int getLength (); public Node item ( int index ); } CSE 6331 © Leonidas Fegaras XML Tools 4

Traversing the DOM Tree

• Finding all children of node n with a given tagname NodeList nl = n.getChildNodes(); for (int i=0; i

DOM Example

import java.io.File; import javax.xml.parsers.*; import org.w3c.dom.*; /*[dept=“cse”]/tel/text() class Test { public static void main ( String args[] ) throws Exception { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new File("depts.xml")); NodeList nodes = doc.getDocumentElement().getChildNodes(); for (int i=0; i

Better Programming

import java.io.File; import javax.xml.parsers.*; import org.w3c.dom.*; import java.util.Vector; class Sequence extends Vector { } Sequence () { super(); } Sequence ( String filename ) throws Exception { super(); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new File(filename)); add((Object) doc.getDocumentElement()); Sequence child ( String tagname ) { Sequence result = new Sequence(); for (int i = 0; i

SAX

• SAX is a Simple API for XML that allows you to process a document as it's being read – in contrast to DOM, which requires the entire document to be read before it takes any action) • The SAX API is event based – The XML parser sends events, such as the start or the end of an element, to an event handler, which processes the information CSE 6331 © Leonidas Fegaras XML Tools 8

Parser Events

• Receive notification of the beginning of a document void startDocument () • Receive notification of the end of a document void endDocument () • Receive notification of the beginning of an element void startElement ( String namespace, String localName, String qName, Attributes atts ) • Receive notification of the end of an element void endElement ( String namespace, String localName, String qName ) • Receive notification of character data void characters ( char[] ch, int start, int length ) CSE 6331 © Leonidas Fegaras XML Tools 9

SAX Example: a Printer

import java.io.FileReader; import javax.xml.parsers.*; import org.xml.sax.*; import org.xml.sax.helpers.*; class Printer extends DefaultHandler { public Printer () { super(); } public void startDocument () {} public void endDocument () { System.out.println(); } public void startElement ( String uri, String name, String tag, Attributes atts ) { System.out.print(“<” + tag + “>”); } public void endElement ( String uri, String name, String tag ) { System.out.print(“”); } public void characters ( char text[], int start, int length ) { System.out.print(new String(text,start,length)); } } CSE 6331 © Leonidas Fegaras XML Tools 10

The Child Handler

class Child extends DefaultHandler { DefaultHandler next;

// the next handler in the pipeline

String ptag;

// the tagname of the child

boolean keep; short level;

// are we keeping or skipping events?

// the depth level of the current element

public Child ( String s, DefaultHandler n ) { super(); next = n; ptag = s; keep = false; level = 0; } public void startDocument () throws SAXException { next.startDocument(); } public void endDocument () throws SAXException { next.endDocument(); } CSE 6331 © Leonidas Fegaras XML Tools 11

The Child Handler (cont.)

} public void startElement ( String nm, String ln, String qn, Attributes a ) throws SAXException { if (level++ == 1) keep = ptag.equals(qn); if (keep) next.startElement(nm,ln,qn,a); } public void endElement ( String nm, String ln, String qn ) throws SAXException { if (keep) next.endElement(nm,ln,qn); if (--level == 1) keep = false; } public void characters ( char[] text, int start, int length ) throws SAXException { if (keep) next.characters(text,start,length); } CSE 6331 © Leonidas Fegaras XML Tools 12

Forming the Pipeline

class SAX { public static void main ( String args[] ) throws Exception { SAXParserFactory pf = SAXParserFactory.newInstance(); SAXParser parser = pf.newSAXParser(); DefaultHandler handler = new Child("gradstudent", new Child("name", new Printer())); parser.parse(new InputSource(new FileReader("cs.xml")), handler); } } SAX parser Child:gradstudent Child:name Printer CSE 6331 © Leonidas Fegaras XML Tools 13

Input Stream

Computer Science

SAX Events

SD: SE: department SE: deptname C: Computer Science EE: deptname SE: gradstudent SE: name Smith John SE: lastname C: Smith EE: lastname SE: firstname C: John EE: firstname EE: name EE: gradstudent ...

...

EE: department ED: CSE 6331 © Leonidas Fegaras XML Tools

Example

Child: gradstudent Child: name Printer

14

XmlPull

Unlike SAX, you pull events from document • Create a pull parser: XmlPullParser xpp; xpp = factory.newPullParser(); • Pull the next event: xpp.getEventType() • Type of events: – START_TAG – END_TAG – TEXT – START_DOCUMENT – END_DOCUMENT • More information at: http://www.xmlpull.org/ CSE 6331 © Leonidas Fegaras XML Tools 15

Better XmlPull Events

class Attributes { public String[] names; public String[] values; } abstract class Event { } class StartTag extends Event { public String tag; public Attributes attributes; } class EndTag extends Event { public String tag; } class CData extends Event { public String text; } class EOS extends Event {} CSE 6331 © Leonidas Fegaras XML Tools 16

Iterators

import org.xmlpull.v1.XmlPullParser; import org.xmlpull.v1.XmlPullParserFactory; abstract class Iterator { abstract public void open (); // open the stream iterator abstract public void close (); // close the stream iterator abstract public Event next (); // get the next tuple from stream } abstract class Filter extends Iterator { Iterator input; } CSE 6331 © Leonidas Fegaras XML Tools 17

Document Reader

class Document extends Iterator { String path; int state; FileReader reader; XmlPullParser xpp; static XmlPullParserFactory factory; Event getEvent () { int eventType = xpp.getEventType(); if (eventType == XmlPullParser.START_TAG) { int len = xpp.getAttributeCount(); String[] names = new String[len]; String[] values = new String[len]; for (int i = 0; i

Document Reader (cont.)

public void open () { reader = new FileReader(path); xpp = factory.newPullParser(); xpp.setInput(reader); state = 0; } public void close () { reader.close(); } public Event next () { if (state > 0) { state++; if (state == 2) return new EOS(); }; Event e = getEvent(); if (xpp.getEventType() != XmlPullParser.END_DOCUMENT) xpp.next(); return e; } CSE 6331 © Leonidas Fegaras XML Tools 19

The Child Iterator

class Child extends Filter { String tag; short nest; // the nesting level of the event boolean keep; // are we in keeping mode?

public void open () { keep = false; nest = 0; input.open(); } public Event next () { while (true) { Event t = input.next(); if (t instanceof EOS) return t; else if (t instanceof StartTag) { if (nest++ == 1) { keep = tag.equals(((StartTag) t).tag); if (!keep) continue; } } else if (t instanceof EndTag) if (--nest == 1 && keep) { keep = false; return t; }; if (keep) return t; } } } CSE 6331 © Leonidas Fegaras XML Tools 20

XSL Transformation

A stylesheet specification language for converting XML documents into various forms (XML, HTML, plain text, etc).

• Can transform each XML element into another element, add new elements into the output file, or remove elements.

• Can rearrange and sort elements, test and make decisions about which elements to display, and much more. • Based on XPath: CSE 6331 © Leonidas Fegaras XML Tools 21

XSLT Templates

• XSL uses XPath to define parts of the source document that match one or more predefined templates.

• When a match is found, XSLT will transform the matching part of the source document into the result document.

• The parts of the source document that do not match a template will end up unmodified in the result document (they will use the default templates).

Form: The default (implicit) templates visit all nodes and strip out all tags: CSE 6331 © Leonidas Fegaras XML Tools 22

Other XSLT Elements

select the value of an XML element and add it to the output stream of the transformation, e.g. .

copy the entire XML element to the output stream of the transformation.

apply the template rules to the elements that match the XPath expression.

add an element to the output with a tag-name derived from the XPath.

Example: CSE 6331 © Leonidas Fegaras XML Tools 23

Copy the Entire Document

CSE 6331 © Leonidas Fegaras XML Tools 24

More on XSLT

Conflict resolution

: more specific templates overwrite more general templates. Templates are assigned default priorities, but they can be overwritten using priority=“n” in a template.

• Modes can be used to group together templates. No mode is an empty mode.

• Conditional and loop statements: body body • Variables can be used to name data: value Variables are used as {$x} in XPaths.

CSE 6331 © Leonidas Fegaras XML Tools 25

Using XSLT

import javax.xml.parsers.*; import org.xml.sax.*; import org.w3c.dom.*; import javax.xml.transform.*; import javax.xml. . transform.dom.*; import javax.xml.transformstream.*; import java.io.*; class XSLT { public static void main ( String argv[] ) throws Exception { File stylesheet = new File("x.xsl"); File xmlfile = new File("a.xml"); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document document = db.parse(xmlfile); StreamSource stylesource = new StreamSource(stylesheet); TransformerFactory tf = TransformerFactory.newInstance(); Transformer transformer = tf.newTransformer(stylesource); DOMSource source = new DOMSource(document); StreamResult result = new StreamResult(System.out); transformer.transform(source,result); } } CSE 6331 © Leonidas Fegaras XML Tools 26