structured -document processing languages spring 2011

30
Structured Structured -Document -Document Processing Languages Processing Languages Spring 2011 Spring 2011 Course Review Course Review Repetitio mater studiorum est! Repetitio mater studiorum est!

Upload: frisco

Post on 04-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Structured -Document Processing Languages Spring 2011. Course Review. Repetitio mater studiorum est!. Goals of the Course. Learn about central models and languages for manipulating transforming and querying structured documents (or XML) "Generic XML processing technology". XML?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Structured -Document  Processing Languages  Spring 2011

StructuredStructured-Document -Document Processing LanguagesProcessing Languages

Spring 2011 Spring 2011

Course ReviewCourse Review

Repetitio mater studiorum est!Repetitio mater studiorum est!

Page 2: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 2

Goals of the CourseGoals of the Course

Learn about central models and languages for Learn about central models and languages for – manipulatingmanipulating– transforming and transforming and – querying querying

structured documents (or XML)structured documents (or XML)

"Generic XML processing technology""Generic XML processing technology"

Page 3: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 3

XML?XML?

ExtensibleExtensible Markup Language Markup Language is is notnot a markup a markup language! language! – does not fix a tag set nor its semantics does not fix a tag set nor its semantics

(like markup languages like HTML do)(like markup languages like HTML do)

XML XML isis– A way to use markup to represent informationA way to use markup to represent information– A A metalanguagemetalanguage

» supports definition of specific markup languages through XML supports definition of specific markup languages through XML DTDs or SchemasDTDs or Schemas

» E.g. XHTML a reformulation of HTML using XMLE.g. XHTML a reformulation of HTML using XML

Page 4: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 4

XML Encoding of Structure: XML Encoding of Structure: ExampleExample

<S><S>

SS

EE

<W A="1"><W A="1"> <W><W></W></W> <E b=‘2’ /><E b=‘2’ />HiHi there!there!

WW

HiHi

WW

there!there!

</W></W> </S></S>

b=b=22

A=A=11

Page 5: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 5

Basics of XML DTDsBasics of XML DTDs

A A Document Type DeclarationDocument Type Declaration provides a provides a grammar (grammar (document type definitiondocument type definition,, DTD DTD) for a ) for a class of documentsclass of documents

Syntax (in the prolog of document instance):Syntax (in the prolog of document instance):<!DOCTYPE rootElemType SYSTEM "ex.dtd"<!DOCTYPE rootElemType SYSTEM "ex.dtd"<!-- "<!-- "external subsetexternal subset" in file ex.dtd --> " in file ex.dtd -->

[ <!-- "[ <!-- "internal subsetinternal subset" may come here --" may come here --> >

]>]> DTD = union of the external and internal subsetDTD = union of the external and internal subset

Page 6: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 6

Example DTDExample DTD

<!ELEMENT invoice (client, item+)><!ELEMENT invoice (client, item+)>

<!ATTLIST invoice num NMTOKEN #REQUIRED><!ATTLIST invoice num NMTOKEN #REQUIRED>

<!ELEMENT client (name, email?)> <!ELEMENT client (name, email?)>

<!ATTLIST client num NMTOKEN #REQUIRED><!ATTLIST client num NMTOKEN #REQUIRED>

<!ELEMENT name (#PCDATA)> <!ELEMENT name (#PCDATA)>

<!ELEMENT email (#PCDATA)> <!ELEMENT email (#PCDATA)>

<!ELEMENT item (#PCDATA)><!ELEMENT item (#PCDATA)>

<!ATTLIST item <!ATTLIST item

priceprice NMTOKEN #REQUIREDNMTOKEN #REQUIRED

unit (FIM | EUR) ”EUR” >unit (FIM | EUR) ”EUR” >

Page 7: Structured -Document  Processing Languages  Spring 2011

Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model

SDPL 2011 Course Review 7

ReviewReview

Page 8: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 8

XML NamespacesXML Namespaces

<xsl:stylesheet version=<xsl:stylesheet version="1.0""1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/TR/xhtml1/strict">xmlns="http://www.w3.org/TR/xhtml1/strict">

<!-- XHTML is the ’default namespace’ --><!-- XHTML is the ’default namespace’ --><xsl:template match="doc/title"><xsl:template match="doc/title"> <h1><h1>

<xsl:apply-templates /><xsl:apply-templates /> </h1> </h1> </xsl:template> </xsl:template>

</xsl:stylesheet> </xsl:stylesheet>

Page 9: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 9

3. XML Processor APIs3. XML Processor APIs

How can applications manipulate structured How can applications manipulate structured (XML) documents?(XML) documents?– An overview of XML processor interfacesAn overview of XML processor interfaces

3.1 SAX: an event-based interface3.1 SAX: an event-based interface

3.2 DOM: an object-based interface3.2 DOM: an object-based interface

3.3 JAXP: Java API for XML Processing3.3 JAXP: Java API for XML Processing

3.4 StAX: Streaming API for XML3.4 StAX: Streaming API for XML

Page 10: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 10

A SAX-based applicationA SAX-based application

Application Main Application Main RoutineRoutine

startDocument()startDocument()

startElement()startElement()

characters()characters()

parse()parse()

Callback

Callback

Routines

Routines

endElement()endElement() <A i="1"><A i="1"> </A></A>Hi!Hi!

"A",[i="1"]"A",[i="1"]

"Hi!""Hi!"

"A""A"<?xml version='1.0'?><?xml version='1.0'?>

Page 11: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 11Course Review

Page 12: Structured -Document  Processing Languages  Spring 2011

Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model

SDPL 2011 Course Review 12

ReviewReview

Page 13: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 13

3.4 Streaming API for XML (StAX)3.4 Streaming API for XML (StAX)

Event-driven streaming API, like SAXEvent-driven streaming API, like SAX "Pull API""Pull API"

– lets the application to ask for individual eventslets the application to ask for individual events

Two sets of APIs:Two sets of APIs: – "cursor" ("cursor" (XMLStreamReaderXMLStreamReader), and "iterator" (), and "iterator" (XMLEventReaderXMLEventReader))

Bidirectional: Bidirectional: – XMLStreamWriterXMLStreamWriter or or XMLEventWriter XMLEventWriter support support

"marshaling" data into XML"marshaling" data into XML

Page 14: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review

A Pull-Parsing ApplicationA Pull-Parsing Application

ApplicationApplication EventReaderEventReader.nextEvent().nextEvent()

Parser APIParser API

<?xml version='1.0'?><?xml version='1.0'?>

StartDocumentStartDocument

Hi!Hi!

CharactersCharacters "Hi!""Hi!"

</A></A>EndElementEndElement "A""A" <A i="1"><A i="1">

StartElementStartElement

"A",[i="1"]"A",[i="1"]

14

Page 15: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 15

DOM: What is it? DOM: What is it?

Object-based, language-neutral API for XML and Object-based, language-neutral API for XML and HTML documentsHTML documents

– Allows programs/scripts to Allows programs/scripts to » build build » navigate and navigate and » modify documentsmodify documents

““DDirectly irectly OObtainable in btainable in MMemory” vs emory” vs ““SSerial erial AAccess ccess XXML”ML”

Page 16: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 16

<invoice form="00" <invoice form="00" type="estimated">type="estimated"> <addressdata><addressdata> <name>John Doe</name><name>John Doe</name> <address><address> <streetaddress>Pyynpolku 1<streetaddress>Pyynpolku 1 </streetaddress></streetaddress> <postoffice>70460 KUOPIO<postoffice>70460 KUOPIO </postoffice></postoffice> </address></address> </addressdata></addressdata> ......

DOM structure modelDOM structure model

invoiceinvoice

namename

addressdataaddressdata

addressaddress

form="00"form="00"type="estimated"type="estimated"

John DoeJohn Doe streetaddressstreetaddress postofficepostoffice

70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1

......

DocumentDocument

ElementElement

NamedNodeMapNamedNodeMap

TextText

Page 17: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 17Course Review

Page 18: Structured -Document  Processing Languages  Spring 2011

Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model

SDPL 2011 Course Review 18

ReviewReview

Page 19: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 19

Trans form ation P rocess

O utput P ro cess

X M L

T ext

H T M L

S tyleS heet

SourceDocument

Sourc e TreeR esult T ree

XSLT TransformationsXSLT Transformations

Page 20: Structured -Document  Processing Languages  Spring 2011

Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model

SDPL 2011 Course Review 20

ReviewReview

Page 21: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 21

JAXP (Java API for XML JAXP (Java API for XML Processing)Processing)

Interface for “plugging-in” and using XML Interface for “plugging-in” and using XML processors in standard Java applicationsprocessors in standard Java applications– org.xml.saxorg.xml.sax:: SAX 2.0 SAX 2.0– javax.xml.streamjavax.xml.stream: StAX : StAX – org.w3c.domorg.w3c.dom:: DOM Level 2 (+ Level 3 Core) DOM Level 2 (+ Level 3 Core)– javax.xml.parsersjavax.xml.parsers::

initialization and use of parsersinitialization and use of parsers– javax.xml.transformjavax.xml.transform::

initialization and use of XSLT transformers initialization and use of XSLT transformers

Page 22: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 22

XMLXML

.getXMLReader().getXMLReader()

JAXP: Using a SAX parser (1)JAXP: Using a SAX parser (1)

f.xmlf.xml

.parse(.parse( ” ”f.xml”)f.xml”)

.newSAXParser().newSAXParser()

Page 23: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 23

f.xmlf.xml

JAXP: Using a DOM parser (1)JAXP: Using a DOM parser (1)

.parse(”f.xml”).parse(”f.xml”)

.newDocument().newDocument()

.newDocumentBuilder().newDocumentBuilder()

Page 24: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 24

XSLTXSLT

JAXP: Using Transformers (1)JAXP: Using Transformers (1)

.newTransformer(…).newTransformer(…)

.transform(.,.).transform(.,.)

Page 25: Structured -Document  Processing Languages  Spring 2011

Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model

SDPL 2011 Course Review 25

ReviewReview

Page 26: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 26

W3C XQueryW3C XQuery

Functional expression languageFunctional expression language– A query is a side-effect-free A query is a side-effect-free expressionexpression

Operates on Operates on sequencessequences of items of items– XML nodes or atomic valuesXML nodes or atomic values

Strongly-typedStrongly-typed: : (XML Schema) types may be assigned to (XML Schema) types may be assigned to expressions statically, and results can be validated expressions statically, and results can be validated

Extends XPath 2.0Extends XPath 2.0 ((but not all axesbut not all axes required) required)

– common for common for XQuery 1.0 and XPath 2.0:XQuery 1.0 and XPath 2.0:» Functions and OperatorsFunctions and Operators, W3C Rec. 01/2007, W3C Rec. 01/2007

Roughly: XQuery Roughly: XQuery XPath 2.0 + XSLT' + SQL' XPath 2.0 + XSLT' + SQL'

Page 27: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 27

FLWOR ("flower") ExpressionsFLWOR ("flower") Expressions

forfor, , letlet, , wherewhere, , order byorder by and and returnreturn clauses clauses (~SQL (~SQL selectselect--fromfrom--wherewhere))

Form: Form: (ForClause | LetClause)+ (ForClause | LetClause)+ WhereClause? WhereClause? OrderByClause?OrderByClause?""returnreturn" Expr" Expr

binds variables to values, and uses these binds variables to values, and uses these bindings to construct a result bindings to construct a result (an ordered sequence of items)(an ordered sequence of items)

Page 28: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 28

XQuery ExampleXQuery Example

forfor $pn $pn in distinct-valuesin distinct-values((docdoc(”sp.xml”)//pno)(”sp.xml”)//pno)

letlet $sp:= $sp:=docdoc(”sp.xml”)//sp_tuple[pno=$pn](”sp.xml”)//sp_tuple[pno=$pn]

where countwhere count($sp) >= 3($sp) >= 3

order byorder by $pn $pn

returnreturn

<well_supplied_item><well_supplied_item>

<pno><pno>{{$pn$pn}}</pno></pno>

<avgprice> <avgprice> {avg{avg($sp/price)($sp/price)}} </avgprice> </avgprice>

</well_supplied_item> </well_supplied_item>

Page 29: Structured -Document  Processing Languages  Spring 2011

Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model

SDPL 2011 Course Review 29

ReviewReview

Page 30: Structured -Document  Processing Languages  Spring 2011

SDPL 2011 Course Review 30

Course Main MessageCourse Main Message

XML is a universal way to represent XML is a universal way to represent information as tree-like data structures information as tree-like data structures

There are specialized and powerful There are specialized and powerful technologies for processing ittechnologies for processing it– hype has settledhype has settled– R&D still going onR&D still going on