processing xml with java

37
1 Processing XML Processing XML with Java with Java Representation and Management of Data on the Internet A comprehensive tutorial about XML processing with Java XML tutorial of W3Schools

Upload: cameron-sellers

Post on 03-Jan-2016

47 views

Category:

Documents


2 download

DESCRIPTION

Processing XML with Java. Representation and Management of Data on the Internet. A comprehensive tutorial about XML processing with Java XML tutorial of W3Schools. Resources used for this presentation. The Hebrew University of Jerusalem – CS Faculty. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Processing XML with Java

1

Processing XML with JavaProcessing XML with Java

Representation and Management of Data on the Internet

A comprehensive tutorial about XML processing with Java

XML tutorial of W3Schools

Page 2: Processing XML with Java

2

Resources used for this presentationResources used for this presentation

• The Hebrew University of Jerusalem – CS Faculty.

• An Introduction to XML and Web Technologies – Course’s Literature.

Page 3: Processing XML with Java

3

ParsersParsers

• What is a parser?

Parser

Formal grammar

Input AnalyzedData

The structure(s) of the input, according to the atomic elements

and their relationships (as described in the grammar)

Page 4: Processing XML with Java

4

XML-Parsing StandardsXML-Parsing Standards

• We will consider two parsing methods that implement W3C standards for accessing XML

• DOM- convert XML into a tree of objects - “random access” protocol

• SAX- “serial access” protocol- event-driven parsing

Page 5: Processing XML with Java

5

XML ExamplesXML Examples

Page 6: Processing XML with Java

6

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;">

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

world.xmlworld.xmlroot element

validating DTD file

reference to an entity

Page 7: Processing XML with Java

7

countries

country

Asia

continent

Israel

name

year

2001

6,199,008

population

city

capital

yes

name

Jerusalem

country

Europe

continent

France

nameyear

2004

60,424,213

population

city

capital

no

name

Ashdod

XML Tree ModelXML Tree Modelelement

element

attribute simple content

Page 8: Processing XML with Java

8

<!ELEMENT countries (country*)> <!ELEMENT country (name,population?,city*)> <!ATTLIST country continent CDATA #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT city (name)><!ATTLIST city capital (yes|no) "no"> <!ELEMENT population (#PCDATA)> <!ATTLIST population year CDATA #IMPLIED> <!ENTITY eu "Europe"> <!ENTITY as "Asia"><!ENTITY af "Africa"><!ENTITY am "America"><!ENTITY au "Australia">

world.dtdworld.dtd

default value

Open world.xml in your browser

Check world2.xml for #PCDATA exmaple

As opposed to

required

parsedNot parsed

Page 9: Processing XML with Java

9

<?xml version="1.0"?>

<forsale date="12/2/03"

xmlns:xhtml="http://www.w3.org/1999/xhtml">

<book>

<title> <xhtml:em>DBI:</xhtml:em>

<![CDATA[Where I Learned <xhtml>.]]>

</title>

<comment

xmlns="http://www.cs.huji.ac.il/~dbi/comments">

<par>My <xhtml:b> favorite </xhtml:b> book!</par>

</comment>

</book>

</forsale>

sales.xmlsales.xml

“xhtml” namespace declaration

default namespace declaration

namespace overriding

(non-parsed) character data

NamespacesNamespaces

Page 10: Processing XML with Java

10

<?xml version="1.0"?>

<forsale date="12/2/03"

xmlns:xhtml="http://www.w3.org/1999/xhtml">

<book>

<title> <xhtml:h1> DBI </xhtml:h1>

<![CDATA[Where I Learned <xhtml>.]]>

</title>

<comment

xmlns="http://www.cs.huji.ac.il/~dbi/comments">

<par>My <xhtml:b> favorite </xhtml:b> book!</par>

</comment>

</book>

</forsale>

sales.xmlsales.xml

Namespace: “http://www.w3.org/1999/xhtml”

Local name: “h1”

Qualified name: “xhtml:h1”

Page 11: Processing XML with Java

11

<?xml version="1.0"?>

<forsale date="12/2/03"

xmlns:xhtml="http://www.w3.org/1999/xhtml">

<book>

<title> <xhtml:h1> DBI </xhtml:h1>

<![CDATA[Where I Learned <xhtml>.]]>

</title>

<comment

xmlns="http://www.cs.huji.ac.il/~dbi/comments">

<par>My <xhtml:b> favorite </xhtml:b> book!</par>

</comment>

</book>

</forsale>

sales.xmlsales.xml

Namespace: “http://www.cs.huji.ac.il/~dbi/comments”

Local name: “par”

Qualified name: “par”

Page 12: Processing XML with Java

12

<?xml version="1.0"?>

<forsale date="12/2/03"

xmlns:xhtml="http://www.w3.org/1999/xhtml">

<book>

<title> <xhtml:h1>DBI</xhtml:h1>

<![CDATA[Where I Learned <xhtml>.]]>

</title>

<comment

xmlns="http://www.cs.huji.ac.il/~dbi/comments">

<par>My <xhtml:b> favorite </xhtml:b> book!</par>

</comment>

</book>

</forsale>

sales.xmlsales.xml

Namespace: “”

Local name: “title”

Qualified name: “title”

Page 13: Processing XML with Java

13

<?xml version="1.0"?>

<forsale date="12/2/03"

xmlns:xhtml="http://www.w3.org/1999/xhtml">

<book>

<title> <xhtml:h1>DBI</xhtml:h1>

<![CDATA[Where I Learned <xhtml>.]]>

</title>

<comment

xmlns="http://www.cs.huji.ac.il/~dbi/comments">

<par>My <xhtml:b> favorite </xhtml:b> book!</par>

</comment>

</book>

</forsale>

sales.xmlsales.xml

Namespace: “http://www.w3.org/1999/xhtml”

Local name: “b”

Qualified name: “xhtml:b”

Page 14: Processing XML with Java

14

DOM DOM –– D Documentocument O Objectbject MModelodel

Page 15: Processing XML with Java

15

DOM ParserDOM Parser

• DOM = Document Object Model

• Parser creates a tree object out of the document

• User accesses data by traversing the tree- The tree and its traversal conform to a W3C standard

• The API allows for constructing, accessing and manipulating the structure and content of XML documents

Page 16: Processing XML with Java

16

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;">

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

Page 17: Processing XML with Java

17

The DOM TreeThe DOM TreeDocument

countries

country

Asia

continent

Israel

name

year

2001

6,199,008

population

city

capital

yes

name

Jerusalem

country

Europe

continent

France

nameyear

2004

60,424,213

population

city

capital

no

name

Ashdod

Page 18: Processing XML with Java

18

Using a DOM TreeUsing a DOM Tree

DOM Parser DOM TreeXML File

API

Application

in memory

Page 19: Processing XML with Java

19

Page 20: Processing XML with Java

20

Creating a DOM TreeCreating a DOM Tree

• A DOM tree is generated by a DocumentBuilder

• The builder is generated by a factory, in order to be

implementation independent

• The factory is chosen according to the system

configuration

DocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance();

DocumentBuilder builder = factory.newDocumentBuilder();

Document doc = builder.parse("world.xml");

Page 21: Processing XML with Java

21

Configuring the FactoryConfiguring the Factory

• The methods of the document-builder factory enable you to configure the properties of the document building

• For example- factory.setValidating(true) - factory.setIgnoringComments(false)

Read more about DocumentBuilderFactory Class, DocumentBuilder Class

Page 22: Processing XML with Java

22

The The NodeNode Interface Interface

• The nodes of the DOM tree include- a special root (denoted document)

• The Document interface retrieved by builder.parse(…) actually extends the Node Interface

- element nodes - text nodes and CDATA sections- attributes- comments- and more ...

• Every node in the DOM tree implements the Node interface

Page 23: Processing XML with Java

23

A light-weight fragment of the document. Can hold several sub-trees

InterfacesInterfaces in a DOM Tree in a DOM TreeDocumentFragment

Document

CharacterDataText

Comment

CDATASection

Attr

Element

DocumentType

Notation

Entity

EntityReference

ProcessingInstruction

Node

NodeList

NamedNodeMap

DocumentType

Figure as appears in : “The XML Companion” - Neil Bradley

Page 24: Processing XML with Java

24

Interfaces in the DOM TreeInterfaces in the DOM Tree

Document

Document Type Element

Attribute Element ElementAttribute Text

ElementText Entity Reference TextText

TextComment

Page 25: Processing XML with Java

25

Node NavigationNode Navigation

• Every node has a specific location in tree

• Node interface specifies methods for tree navigation- Node getFirstChild();- Node getLastChild();- Node getNextSibling();- Node getPreviousSibling();- Node getParentNode();- NodeList getChildNodes();- NamedNodeMap getAttributes()

Page 26: Processing XML with Java

26

Node Navigation (cont)Node Navigation (cont)

getFirstChild()

getPreviousSibling()

getChildNodes()

getNextSibling()

getLastChild()

getParentNode()

Page 27: Processing XML with Java

27

Node PropertiesNode Properties

• Every node has- a type- a name- a value- attributes

• The roles of these properties differ according to the node types

• Nodes of different types implement different interfaces (that extend Node)

Page 28: Processing XML with Java

28

InterfacenodeNamenodeValueattributes

Attrname of attributevalue of attributenull

CDATASection"#cdata-section"content of the Sectionnull

Comment"#comment"content of the commentnull

Document"#document"nullnull

DocumentFragment"#document-fragment"nullnull

DocumentTypedoc-type namenullnull

Elementtag namenullNodeMap

Entityentity namenullnull

EntityReferencename of entity referencednullnull

Notationnotation namenullnull

ProcessingInstructiontargetentire contentnull

Text"#text"content of the text nodenull

Names, Values and AttributesNames, Values and Attributes

Page 29: Processing XML with Java

29

Node Types - Node Types - getNodeType()getNodeType()

ELEMENT_NODE = 1

ATTRIBUTE_NODE = 2

TEXT_NODE = 3

CDATA_SECTION_NODE = 4

ENTITY_REFERENCE_NODE = 5

ENTITY_NODE = 6

PROCESSING_INSTRUCTION_NODE = 7

COMMENT_NODE = 8

DOCUMENT_NODE = 9

DOCUMENT_TYPE_NODE = 10

DOCUMENT_FRAGMENT_NODE = 11

NOTATION_NODE = 12

if (myNode.getNodeType() == Node.ELEMENT_NODE) { //process node …}

Read more about Node Interface

Page 30: Processing XML with Java

30

import org.w3c.dom.*;

import javax.xml.parsers.*;

public class EchoWithDom {

public static void main(String[] args) throws Exception {

DocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance();

factory.setIgnoringElementContentWhitespace(true);

DocumentBuilder builder = factory.newDocumentBuilder();

Document doc = builder.parse(“world.xml");

new EchoWithDom().echo(doc);

}

Page 31: Processing XML with Java

31

private void echo(Node n) {

print(n);

if (n.getNodeType() == Node.ELEMENT_NODE) {

NamedNodeMap atts = n.getAttributes();

++depth;

for (int i = 0; i < atts.getLength(); i++) echo(atts.item(i));

--depth; }

depth++;

for (Node child = n.getFirstChild(); child != null;

child = child.getNextSibling()) echo(child);

depth--;

}

Page 32: Processing XML with Java

32

private int depth = 0;

private String[] NODE_TYPES = {

"", "ELEMENT", "ATTRIBUTE", "TEXT", "CDATA",

"ENTITY_REF", "ENTITY", "PROCESSING_INST",

"COMMENT", "DOCUMENT", "DOCUMENT_TYPE",

"DOCUMENT_FRAG", "NOTATION" };

private void print(Node n) {

for (int i = 0; i < depth; i++) System.out.print(" ");

System.out.print(NODE_TYPES[n.getNodeType()] + ":");

System.out.print("Name: "+ n.getNodeName());

System.out.print(" Value: "+ n.getNodeValue()+"\n");

}}

Page 33: Processing XML with Java

33

public class WorldParser {

public static void main(String[] args) throws Exception {

DocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance();

factory.setIgnoringElementContentWhitespace(true);

DocumentBuilder builder =

factory.newDocumentBuilder();

Document doc = builder.parse("world.xml");

printCities(doc);

}

Another ExampleAnother Example

Page 34: Processing XML with Java

34

public static void printCities(Document doc) {

NodeList cities = doc.getElementsByTagName("city");

for(int i=0; i<cities.getLength(); ++i) {

printCity((Element)cities.item(i));

}

}

public static void printCity(Element city) {

Node nameNode =

city.getElementsByTagName("name").item(0);

String cName = nameNode.getFirstChild().getNodeValue();

System.out.println("Found City: " + cName);

}

Another Example (cont)Another Example (cont)

Searches within descendents

Page 35: Processing XML with Java

35

Normalizing the DOM TreeNormalizing the DOM Tree

• Normalizing a DOM Tree has two effects:- Combine adjacent textual nodes- Eliminate empty textual nodes

• To normalize, apply the normalize() method to the document element

Created by node

manipulation…

Page 36: Processing XML with Java

36

Node ManipulationNode Manipulation

• Children of a node in a DOM tree can be manipulated - added, edited, deleted, moved, copied, etc.

• To constructs new nodes, use the methods of Document- createElement, createAttribute, createTextNode,

createCDATASection etc.

• To manipulate a node, use the methods of Node- appendChild, insertBefore, removeChild, replaceChild,

setNodeValue, cloneNode(boolean deep) etc.

Page 37: Processing XML with Java

37

Node Manipulation (cont)Node Manipulation (cont)

RefNew

insertBefore

Old

New

replaceChild

cloneNode

deep = 'false'

deep = 'true'

Figure as appears in “The XML Companion” - Neil Bradley