snu oopsla lab. dom/sax applications the ubiquitous xml(9) © copyright 2001 snu oopsla lab

56
SNU OOPSLA Lab. DOM/SAX Applications The ubiquitous XML(9) © copyright 2001 SNU OOPSLA Lab.

Upload: corey-thompson

Post on 13-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

SNUOOPSLA Lab.

DOM/SAX Applications

The ubiquitous XML(9)

© copyright 2001 SNU OOPSLA Lab.

2SNUOOPSLA Lab.The ubiquitous XML

DOM SAXHow to make XML application?

DOM/SAX Applications

DOM

3SNUOOPSLA Lab.The ubiquitous XML

Contents of DOM What is the DOM ? Java implementation Nodes Elements Attributes Node lists

DOM

4SNUOOPSLA Lab.The ubiquitous XML

What is DOM ? DOM(Document Object Model)

Was developed by W3C Specify how future Web browser and embedded

scripts should access HTML and XML documents

DOM

5SNUOOPSLA Lab.The ubiquitous XML

Java implementation SUN provide a class for parsing XML,

called Xml Document. Xml Document method parses XML file,

build the document tree. To use the SUN parser => import org.w3c.dom.*; import com.sun.xml .tree.*; import org.xml.sax.*;

DOM

6SNUOOPSLA Lab.The ubiquitous XML

Nodes (1/4) Nodes

describe elements, text, comments, processing instructions, CDATA section, entity references ...

The Node interface itself defines a number of methods. 1. Each node has characteristics (type, name, value) 2. Having a contextual location in the document tree. 3. Capability to modify its contents.

DOM

7SNUOOPSLA Lab.The ubiquitous XML

Nodes (2/4) Node characteristics

getNodeType => determining its type

getNodeName => returning the name of the node

setNodeValue => replacing the value of node

hasChildNodes => whether node has children or not

getAttributes => accessing attribute

DOM

8SNUOOPSLA Lab.The ubiquitous XML

Nodes (3/4) Node navigation

When processing a document via the DOM interface, it is to use node as a stepping-stones.

Each node has methods that return references to surrounding nodes.getParentNode( ) getPreviousSibling( )

getFirstChild( )

getChildNodes( )

getLastChild( )

getNextSibling( )

DOM

9SNUOOPSLA Lab.The ubiquitous XML

Nodes (4/4) Node manipulation

remove child method. appendChild method insertbefore method replaceChild methodEx) Old Child

New Child

DOM

10SNUOOPSLA Lab.The ubiquitous XML

Documents• An entire XML document is represented

by a special type of node. - getDoctype - getImplementation - getDocumentElement - getElementsByTagName

DOM

11SNUOOPSLA Lab.The ubiquitous XML

Elements• Element interface

• Extends the Node interfaces• Adds element-specific functionality• General element processing

- getTagName method

- getElementsByTagName method - normalize method

Ex)Here is some

text “here is some text”

DOM

12SNUOOPSLA Lab.The ubiquitous XML

Attributes• Attribute characteristics - getName method

- getValue - setValue - getSpecified

• Creating attribute - createAttribute

DOM

13SNUOOPSLA Lab.The ubiquitous XML

Node lists• The Nodelist interface contains two

method - Node item(int index); int getLength( ); 3

getLength( );

node 1

node 0getLength( )

node 2

Item(1);

DOM

14SNUOOPSLA Lab.The ubiquitous XML

Named node maps• The NamedNodemap interface is designed to

contain nodes, in no particular order, that can be accessed by name.

4

getLength( ); Lang

IDgetLength( )

Security

Item(1);

Added

getNamedItem(“Security”);

setNamedItem(O);

removeNamedItem(“Added”)

DOM

15SNUOOPSLA Lab.The ubiquitous XML

DOM SAX How to make XML application?

DOM/SAX Applications

SAX

16SNUOOPSLA Lab.The ubiquitous XML

Contents of SAX What is SAX? Call-backs and interfaces The Parser Document handlers Attribute lists Error handlers Locators Handler bases

SAX

17SNUOOPSLA Lab.The ubiquitous XML

What is SAX?

SAX(the Simple API for XML) Is a standard API for event-driven

processing of XML data Allowing parsers to deliver information to

applications in digestible chunks

SAX

18SNUOOPSLA Lab.The ubiquitous XML

Call-backs and interfaces The SAX interface are:

Parser Document Handler AttributeList ErrorHandler EntityResolver Locator DTD Handler

SAX

19SNUOOPSLA Lab.The ubiquitous XML

The Parser

The Work of Parser The parser developer creates a class that actually

parses the XML document or data stream The parser reads the XML source data Stops reading when encounters a meaningful object Sends the information to the main application by

calling an appropriate method Waits for this method to return before continuing

SAX

20SNUOOPSLA Lab.The ubiquitous XML

Document handlers In order for the application to receive basic markup

events from the parser, the application developer must create a class that implements the DocumentHandler interface.

Application

Parser

Document Handler

create

give

startDocument()

startElement()

characters()

endElement()

endDocument()

<!……………>

<->………….</->

parsing

FeedbackWhen event driven

Event driven

SAX

21SNUOOPSLA Lab.The ubiquitous XML

Attribute lists A wrapper object for all attribute details

int getLength(); … to associate how many attributes are present.

String getName(int i); … to discover the name of one of the attributes

String getType(int i); … when a DTD is in use, to get a data type

String getType(String name); assigned to each attribute. String getValue(int i); … to get the value of an attribute String getValue(String name);

SAX

22SNUOOPSLA Lab.The ubiquitous XML

Error handlers When the application needs to be informed of

warnings and errors It can implement ErrorHandler interface

SAX

23SNUOOPSLA Lab.The ubiquitous XML

Locators Necessity

An error message is not particularly helpful when no indication is given as to where the error occurred.

Locator interface can tell the entity, line number and character number

of the warning or error

SAX

24SNUOOPSLA Lab.The ubiquitous XML

Handler bases HandlerBase class

Providing some sensible default behavior for each event, which could be subclassed to add application-specific functionality

SAX

25SNUOOPSLA Lab.The ubiquitous XML

DOM/SAX Applications

DOM SAX How to make XML application?

Making XML Application

26SNUOOPSLA Lab.The ubiquitous XML

Contents

XML Application Architecture Parser Basics Kinds of Parsers The Document Object Model(DOM)

DOM Application The Simple API for XML(SAX)

SAX Application

Making XML Application

27SNUOOPSLA Lab.The ubiquitous XML

XML Application Architecture An XML Application is typically built

around an XML parser It has an interface to its users, and an

interface to some sort of back-end data store

XMLApplicationUser

InterfaceDataStore

XML Parser

Making XML Application

28SNUOOPSLA Lab.The ubiquitous XML

Parser Basics A piece of code that reads a document

and analyzes its structure How to use a parser

Create a parser object Pass your XML document to the parser Process the results

Building an XML Application is obviously more involved than this

Making XML Application

29SNUOOPSLA Lab.The ubiquitous XML

Kinds of Parsers Validating versus non-validating parsers

Validating parsers validate XML documents as they parse them

Non-validating parsers ignore any validation errors

Parsers that support the Document Object Model(DOM)

Parsers that support the Simple API for XML(SAX)

Making XML Application

30SNUOOPSLA Lab.The ubiquitous XML

DOM Parser Tree structure that contains all of the

elements of a document Provides a variety of functions to

examine the contents and structure of the document

Making XML Application

31SNUOOPSLA Lab.The ubiquitous XML

SAX Parser Generates events at various points in

the document It’s up to you to decide what to do with

each of those events

Making XML Application

32SNUOOPSLA Lab.The ubiquitous XML

DOM vs SAX Why use DOM?

Need to know a lot about the structure of a document

Need to move parts of the document around

Need to use the information in the document more than once

Why use SAX? Only need to extract a few

elements from an XML document

Making XML Application

33SNUOOPSLA Lab.The ubiquitous XML

DOM

DOM interfaces Node : The base data type of the DOM. Element : The vast majority of the objects

you’ll deal with are Elements. Attr : Represents an attribute of an

element. Text : The actual content of an Element or

Attr. Document : Represents the entire XML

document.

Making XML Application

34SNUOOPSLA Lab.The ubiquitous XML

Common DOM methods

getDocumentElement() Returns the root element of the document.

getFirstChild() and getLastChild() Returns the first or last child of a given Node.

getNextSibling() and getPreviousSibling() These methods return the next or previous sibling of

a given Node) getAttribute(attrName)

For a given Node, returns the attribute with the requested name

- Document Class- Node Class

Making XML Application

35SNUOOPSLA Lab.The ubiquitous XML

Our first DOM Application!<?xml version="1.0"?>

<sonnet type="Shakespearean"> <author>

<last-name>Shakespeare</last-name><first-name>William</first-name><nationality>British</nationality><year-of-birth>1564</year-of-birth><year-of-death>1616</year-of-death>

</author><title>Sonnet 130</title><lines> <line> My mistress’s eyes are …

Sonnet.xml

First Application simply reads an XML document and writes the document’s contents to standard outputParse the sonnet.xml

Making XML Application

36SNUOOPSLA Lab.The ubiquitous XML

domOne to Watch Over Me

public class domOne

{public void parseAndPrint(String uri)...public void printDOMTree(Node node)...public static void main(String argv[])...

domOne.java

Create a new class called domOneIt has two methods, parseAndPrint and printDOMTree

In main methodprocess the command line, create a domOne object, pass the file name to domOne objectdomOne object creates a parser object, parses the document, then process the DOM tree via the printDOMTree method

Making XML Application

37SNUOOPSLA Lab.The ubiquitous XML

Create a domOne objectpublic static void main(String argv[])

{if (argv.length == 0){

System.out.println("Usage: ... ");...System.exit(1);

}domOne d1 = new domOne();d1.parseAndPrint(argv[0]);

}

Sonnet.xml

Create a separate class called domOneTo parse the file and print the results, create a new instance of the domOne classUse a recursive function to go through the DOM tree and print out the results

Making XML Application

38SNUOOPSLA Lab.The ubiquitous XML

Create a parser objecttry

{DOMParser parser = new DOMParser();parser.parse(uri);doc = parser.getDocument();

}

In a parseAndPrint method

Create a new Parser object using a DOMParser objectDOMParser object : a java class that implements the DOM interface

ExceptionAn invalid URI, a DTD that can’t be found, or an XML document that isn’t valid or well-formed

Making XML Application

39SNUOOPSLA Lab.The ubiquitous XML

Parse the XML documenttry

{DOMParser parser = new DOMParser();parser.parse(uri);doc = parser.getDocument();

}

...

if (doc != null)printDOMTree(doc);

Parsing the document is don with a single line of codeGet the Document object created by the parserPass it the printDOMTree Method

Making XML Application

40SNUOOPSLA Lab.The ubiquitous XML

Process the DOM treepublic void printDOMTree(Node node)

{int nodeType = Node.getNodeType();switch (nodeType){

case DOCUMENT_NODE:

printDOMTree(((Document)node).GetDocumentElement()); ...

case ELEMENT_NODE: ...

NodeList children = node.getChildNodes(); if (children != null) { for(int i =0;i < children.getLength();i++) printDOMTree(children.item(i); }

Call the printDOMTree recursively for each of the node’s children

Making XML Application

41SNUOOPSLA Lab.The ubiquitous XML

Nodes a-plentyDocument Statistics for sonnet.xml:

====================================Document Nodes: 1Element Nodes: 23Entity Reference Nodes: 0CDATA Sections: 0Text Nodes: 45Processing Instructions: 0

----------Total: 69 Nodes

Just run domCounter program that counts the number of nodesIn sonnet.xml, there are twenty-four tags. Why not twenty-four nodes?

There are actually 69 nodes in sonnet.xml; one document node, 23 element nodes, and 45 text nodes.

Making XML Application

42SNUOOPSLA Lab.The ubiquitous XML

Sample node listing<?xml version="1.0"?><!DOCTYPE sonnet SYSTEM "sonnet.dtd"><sonnet type="Shakespearean"> <author>

<last-name>Shakespeare</last-name>

1. The Document node2. The Element node corresponding to the <sonnet> tag3. A Text node containing the carriage return at the end of the <sonnet> tag and the two spaces in front of the <author> tag 4. The Element node corresponding to the <author> tag5. A Text node containing the carriage return at the end of the <author> tag and the four spaces in front of the <last-name> tag6. The Element node corresponding to the <last-name> tag7. A Text node containing the characters “Shakespeare”

The nodes returned by the parser All of the blank spaces at the start of the lines at the left are Text

Making XML Application

43SNUOOPSLA Lab.The ubiquitous XML

Brief : DOM Believe it or not, that’s about all you

need to know to work with DOM objects. Our domOne code did several things: Created a Parser object Gave the Parser an XML document to

parse Took the Document object from the

Parser and examined it

Making XML Application

44SNUOOPSLA Lab.The ubiquitous XML

A wee listing of SAX events startDocument

Signals the start of the document. endDocument

Signals the end of the document. startElement

Signals the start of an element. endElement

Signals the end of an element. Characters

Contains character data, similar to a DOM Text node.

Making XML Application

45SNUOOPSLA Lab.The ubiquitous XML

SAX interfaces The SAX API actually defines four

interfaces for handling events EntityHandler TDHandler DocumentHandler ErrorHandler

All of these interfaces are implemented by HandlerBase.

Making XML Application

46SNUOOPSLA Lab.The ubiquitous XML

Our first SAX Application!<?xml version="1.0"?>

<sonnet type="Shakespearean"> <author>

<last-name>Shakespeare</last-name><first-name>William</first-name><nationality>British</nationality><year-of-birth>1564</year-of-birth><year-of-death>1616</year-of-death>

</author><title>Sonnet 130</title><lines> <line> My mistress’s eyes are …

Sonnet.xml

This application is similar to domOne, except it uses the SAX API instead of DOMParse the sonnet.xml

Making XML Application

47SNUOOPSLA Lab.The ubiquitous XML

SAX method in saxOne.javapublic class saxOne extends HandlerBase

{ public void startDocument()...public void startElement(String name, AttributeList attrs)...public void characters(char ch[], int start, int length)...public void ignorableWhitespace(char ch[],int start, int length)...public void endElement(String name)...public void endDocument()...public void warning(SAXParseException ex)...public void error(SAXParseException ex)...public void fatalError(SAXParseException ex) throws SAXException

saxOne.java

SAX methods that handle SAX events

Making XML Application

48SNUOOPSLA Lab.The ubiquitous XML

Create a saxOne object

Create a separate class called saxOneThe main procedure creates an instance of this class and uses it to parse the XML documentsaxOne extends the HandlerBase class, we can use saxOne as an event handler for a SAX parser

public static void main(String argv[])

{if (argv.length == 0){

System.out.println("Usage: ... ");...System.exit(1);

}saxOne s1 = new saxOne();s1.parseURI(argv[0]);

}

Making XML Application

49SNUOOPSLA Lab.The ubiquitous XML

Create a Parser object

It first creates a new Parser objectIn this sample, we use the SAXParser class instead of DOMParsersetDocumentHandler and setErrorHandler tell our newly-created SAXParser to use saxOne to handle events

SAXParser parser = new SAXParser();parser.setDocumentHandler(this);parser.setErrorHandler(this);

try{

parser.parse(uri);}

Making XML Application

50SNUOOPSLA Lab.The ubiquitous XML

Parse the XML document

Once our SAXParser object is set up, it takes a single line of code to process our document.

SAXParser parser = new SAXParser();parser.setDocumentHandler(this);parser.setErrorHandler(this);

try{

parser.parse(uri);}

Making XML Application

51SNUOOPSLA Lab.The ubiquitous XML

Process SAX eventspublic void startDocument()...public void startElement(String name, AttributeList attrs)...public void characters(char ch[], int start, int length)...public void ignorableWhitespace(char ch[],int start, int length)...

As the SAXParser object parses our document, it calls our implementations of the SAX event handlers as the various SAX events occur.Each event handler writes the appropriate information to System.out

Ex) For startElement events, we write the XML syntax of the original tag out to the screen.

Making XML Application

52SNUOOPSLA Lab.The ubiquitous XML

A cavalcade of ignorable eventsDocument Statistics for sonnet.xml:====================================DocumentHandler Events: startDocument 1 endDocument 1 startElement 23 endElement 23 processingInstruction 0 character 20 ignorableWhitespace 25ErrorHandler Events: warning 0 error 0 fatalError 0

----------Total: 93 events

The SAX interface returns more events than you might thinkOne advantage of the SAX interface is that the twenty-five ignorableWhitespace events are simply ignoredWe don’t have to write code to handle those events

Making XML Application

53SNUOOPSLA Lab.The ubiquitous XML

Sample event listing<?xml version="1.0"?><!DOCTYPE sonnet SYSTEM "sonnet.dtd"><sonnet type="Shakespearean"> <author>

<last-name>Shakespeare</last-name>

1. A startDocument event2. A startElement event for the <sonnet> element3. An ignorableWhitespace event for the line break and the two blank spaces in front of the <author> tag4. A startElement event for the <author> element5. An ignorableWhitespace event for the line break and the four blank spaces in front of the <last-name> tag6. A startElement event for the <last-name> tag7. A character event for the characters “Shakespeare”8. An endElement event for the <last-name> tag

The events returned by the parser

Making XML Application

54SNUOOPSLA Lab.The ubiquitous XML

SAX vs DOM – part one<book id="1">

<verse> Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans. Many a brave soul did it send hurrying down to Hades, and many a hero did it yield a prey to dogs and vultures, for so were the counsels of Jove fulfilled from the day on which the son of Atreus, king of men, and great Achilles, first fell out with one another.</verse><verse> And which of the gods was it that set them on to quarrel? It was the son of Jove and Leto; for he was angry with the king and sent a pestilence upon ...

SAX API would be much more efficientDoing this with the DOM would take a lot of memory

Making XML Application

55SNUOOPSLA Lab.The ubiquitous XML

SAX vs DOM – part one...

<address><name> <title>Mrs.</title> <first-name>Mary</first-name> <last-name>McGoon</last-name></name><street>1401 Main Street</street><city>Anytown</city><state>NC</state><zip>34829</zip>

</address>

<address><name>

...

If we were parsing an XML document containing 10,000 address, and we wanted to sort them by last name??DOM would automatically store all of the dataWe could use DOM functions to move the nodes n the DOM tree

Making XML Application

56SNUOOPSLA Lab.The ubiquitous XML

Brief : SAX

At this point, we’ve covered the two major APIs for working with XML documents

We’ve also discussed when you might want to use each one

Thinks some advanced parser functions that you might need as you build an XML application

Making XML Application