adt 2008 lecture 2 xml, xpath &...

72
ADT 2008 ADT 2008 Lecture 2 Lecture 2 XML, XPath & XQuery XML, XPath & XQuery Chapter 10 in Chapter 10 in Silberschatz, Korth, Sudarshan Silberschatz, Korth, Sudarshan “Database System Concepts” “Database System Concepts” Stefan Manegold [email protected] http://www.cwi.nl/~manegold/

Upload: others

Post on 29-Sep-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

ADT 2008ADT 2008Lecture 2Lecture 2

XML, XPath & XQueryXML, XPath & XQuery

Chapter 10 inChapter 10 inSilberschatz, Korth, SudarshanSilberschatz, Korth, Sudarshan“Database System Concepts”“Database System Concepts”

Stefan [email protected]

http://www.cwi.nl/~manegold/

Page 2: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

2

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Google HitsGoogle Hits

28 Mio.UvA

316 Mio.ABC

9.4 Mio.CWI

296 Mio.SQL

711 Mio.sex

2100 Mio.XML

of 3-letter combinations

Page 3: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

3

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XML: Document Markup vs. High Data VolumeXML: Document Markup vs. High Data Volume

Page 4: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

4

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Source Code MarkupSource Code Markup

Page 5: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

5

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Source Code MarkupSource Code Markup

Page 6: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

6

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Source Code MarkupSource Code Markup

Page 7: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

7

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

HTML­Style Presentational MarkupHTML­Style Presentational Markup

Page 8: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

8

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

■ XML:  Extensible Markup Language■ Defined by the WWW Consortium (         ): http://www.w3c.org/XML/■ Derived from SGML (Standard Generalized Markup Language), but simpler to 

use than SGML ■ Documents have tags giving extra information about sections of the document

● E.g.  <title> XML </title>  <slide> Introduction …</slide>■ Extensible, unlike HTML

● Users can add new tags, and separately specify how the tag should be handled for display

XML IntroductionXML Introduction

Page 9: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

9

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XML Introduction (Cont.)XML Introduction (Cont.)

■ The ability to specify new tags, and to create nested tag structures make XML a great way to exchange data, not just documents.

● Much of the use of XML has been in data exchange applications, not as a replacement for HTML

■ Tags make data (relatively) self­documenting ● E.g.

     <bank>                <account>  

               <account_number> A­101     </account_number>               <branch_name>      Downtown </branch_name>               <balance>              500         </balance>

                </account>                <depositor>

               <account_number> A­101    </account_number>               <customer_name> Johnson </customer_name>

                </depositor>               </bank>

Page 10: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

10

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

A Little Bit Of HistoryA Little Bit Of History

• Database world• 1980 relational databases• 1990 nested relational model and object oriented databases• 2000 semi­structured databases

• Documents world• 1974 SGML (Structured Generalized Markup Language)• 1990 HTML (Hypertext Markup Language)• 1992 URL (Universal Resource Locator)

Data + documents = information1996 XML (Extended Markup Language)URI (Universal Resource Identifier)

Page 11: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

11

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XML Introduction (Cont.)XML Introduction (Cont.)

Page 12: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

12

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XML: MotivationXML: Motivation

■ Data interchange is critical in today’s networked world● Examples:

Banking:  funds transfer Order processing (especially inter­company orders) Scientific data

– Chemistry:  ChemML, …

– Genetics:    BSML (Bio­Sequence Markup Language), …● Paper flow of information between organizations is being replaced by 

electronic flow of information■ Each application area has its own set of standards for representing information■ XML has become the basis for all new generation data interchange formats

Page 13: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

13

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XML Motivation (Cont.)XML Motivation (Cont.)

■ Earlier generation formats were based on plain text with line headers indicating the meaning of fields

● Similar in concept to email headers● Does not allow for nested structures, no standard “type” language● Tied too closely to low level document structure (lines, spaces, etc)

■ Each XML based standard defines what are valid elements, using●  XML type specification languages to specify the syntax

DTD (Document Type Descriptors) XML Schema

● Plus textual descriptions of the semantics■ XML allows new tags to be defined as required

● However, this may be constrained by DTDs■ A wide variety of tools is available for parsing, browsing and querying XML 

documents/data

Page 14: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

14

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Comparison with Relational DataComparison with Relational Data

■ Inefficient: tags, which in effect represent schema information, are repeated

■ Better than relational tuples as a data­exchange format● Unlike relational tuples, XML data is self­documenting due to 

presence of tags● Non­rigid format: tags can be added● Allows nested structures● Wide acceptance, not only in database systems, but also in 

browsers, tools, and applications

Page 15: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

15

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Structure of XML DataStructure of XML Data

■ Tag:  label for a section of data■ Element: section of data beginning with <tagname> and ending with 

matching </tagname>■ Elements must be properly nested

● Proper nesting  <account> … <balance>  …. </balance> </account> 

● Improper nesting   <account> … <balance>  …. </account> </balance> 

● Formally:  every start tag must have a unique matching end tag, that is in the context of the same parent element.

■ Every document must have a single top­level element

Page 16: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

16

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Example of Nested ElementsExample of Nested Elements

 <bank­1>      <customer>

      <customer_name> Hayes </customer_name>      <customer_street> Main </customer_street>      <customer_city>     Harrison </customer_city>      <account>

     <account_number> A­102 </account_number>     <branch_name>      Perryridge </branch_name>     <balance>               400 </balance>

      </account>          <account>               …          </account>

       </customer>         .         .

       </bank­1>

Page 17: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

17

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Motivation for NestingMotivation for Nesting

■ Nesting of data is useful in data transfer● Example:  elements representing customer_id, customer_name, and 

address nested within an order element■ Nesting is not supported, or discouraged, in relational databases

● With multiple orders, customer name and address are stored redundantly

● normalization replaces nested structures in each order by foreign key into table storing customer name and address information

● Nesting is supported in object­relational databases■ But nesting is appropriate when transferring data

● External application does not have direct access to data referenced by a foreign key

Page 18: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

18

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Structure of XML Data (Cont.)Structure of XML Data (Cont.)

■ Mixture of text with sub­elements is legal in XML. ● Example:     <account>

      This account is seldom used any more.       <account_number> A­102</account_number>       <branch_name> Perryridge</branch_name>       <balance>400 </balance></account>

● Useful for document markup, but discouraged for data representation

Page 19: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

19

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

AttributesAttributes

■ Elements can have attributes

           <account acct­type = “checking” >             <account_number> A­102 </account_number>             <branch_name> Perryridge </branch_name>             <balance> 400 </balance>

             </account>

■ Attributes are specified by  name=value pairs inside the starting tag of an element

■ An element may have several attributes, but each attribute name can only occur once

<account  acct­type = “checking”  monthly­fee=“5”>

Page 20: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

20

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Attributes vs. SubelementsAttributes vs. Subelements

■ Distinction between subelement and attribute● In the context of documents, attributes are part of markup, while 

subelement contents are part of the basic document contents● In the context of data representation, the difference is unclear and 

may be confusing Same information can be represented in two ways

– <account  account_number = “A­101”>  …. </account>

– <account>     <account_number>A­101</account_number> … </account>

● Suggestion: use attributes for identifiers of elements, and use subelements for contents

● Attributes can be used to qualify tags

=> avoid the so­called tag soup

Page 21: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

21

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Data on the WebData on the Web

Page 22: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

22

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

The (Future) Web: A Huge XML DatabaseThe (Future) Web: A Huge XML Database

Page 23: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

23

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

The (Future) Web: A Huge XML DatabaseThe (Future) Web: A Huge XML Database

Page 24: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

24

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

The (Future) Web: A Huge XML DatabaseThe (Future) Web: A Huge XML Database

Page 25: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

25

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

NamespacesNamespaces

■ XML data has to be exchanged between organizations■ Same tag name may have different meaning in different organizations, 

causing confusion on exchanged documents■ Specifying a unique string as an element name avoids confusion■ Better solution: use  unique­name:element­name■ Avoid using long unique names all over document by using XML 

Namespaces

     <bank Xmlns:FB=‘http://www.FirstBank.com’>      …

 <FB:branch>    <FB:branchname>Downtown</FB:branchname>

 <FB:branchcity>    Brooklyn   </FB:branchcity> </FB:branch>…

</bank>

Page 26: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

26

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

More on XML SyntaxMore on XML Syntax

■ Elements without subelements or text content can be abbreviated by ending the start tag with a  />  and deleting the end tag

● <account  number=“A­101” branch=“Perryridge”  balance=“200 />

■ To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below

● <![CDATA[<account> … </account>]]>

Here, <account> and </account> are treated as just strings

CDATA stands for “character data”

■ A comment may appear wherever a tag is allowed:● <!­­ This is a comment and ignored b the XML parser ­­>

■ Processing instructions can be used to control specific XML parsers:● <?php sql (“SELECT * FROM ...”) ... ?>

Page 27: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

27

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XML Document SchemaXML Document Schema

■ Database schemas constrain what information can be stored, and the data types of stored values

■ XML documents are not required to have an associated schema● We also speak of semi­structured data

■ However, schemas are very important for XML data exchange● Otherwise, a site cannot automatically interpret data received from 

another site■ Two mechanisms for specifying XML schema

● Document Type Definition (DTD) Widely used

● XML Schema  Newer, increasing use

Page 28: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

28

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Document Type Definition (DTD)Document Type Definition (DTD)

■ The type of an XML document can be specified using a DTD■ DTD constraints structure of XML data

● What elements can occur● What attributes can/must an element have● What subelements can/must occur inside each element, and how 

many times.■ DTD does not constrain data types

● All values represented as strings in XML■ DTD syntax

● <!ELEMENT element (subelements­specification) >● <!ATTLIST   element (attributes)  >

Page 29: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

29

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Element Specification in DTDElement Specification in DTD

■ Subelements can be specified as● names of elements, or● #PCDATA (parsed character data), i.e., character strings● EMPTY (no subelements) or ANY (anything can be a subelement)

■ Example<! ELEMENT depositor (customer_name  account_number)>

    <! ELEMENT customer_name (#PCDATA)><! ELEMENT account_number (#PCDATA)>

■ Subelement specification may have regular expressions  <!ELEMENT bank ( ( account | customer | depositor)+)>

Notation: 

–  “|”   ­  alternatives

–  “+”  ­  1 or more occurrences

–  “*”   ­  0 or more occurrences

Page 30: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

30

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Bank DTDBank DTD

<!DOCTYPE bank [<!ELEMENT bank ( ( account | customer | depositor)+)><!ELEMENT account (account_number branch_name balance)><! ELEMENT customer(customer_name customer_street                                                                              customer_city)><! ELEMENT depositor (customer_name account_number)><! ELEMENT account_number (#PCDATA)><! ELEMENT branch_name (#PCDATA)><! ELEMENT balance(#PCDATA)><! ELEMENT customer_name(#PCDATA)><! ELEMENT customer_street(#PCDATA)><! ELEMENT customer_city(#PCDATA)>

]>

Page 31: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

31

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Attribute Specification in DTDAttribute Specification in DTD

■ Attribute specification : for each attribute  ● Name● Type of attribute 

CDATA ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) 

–   more on this later ● Whether  

mandatory (#REQUIRED) has a default value (value),  or neither (#IMPLIED)

■ Examples● <!ATTLIST account  acct­type CDATA “checking”>● <!ATTLIST customer

customer_id   ID          # REQUIREDaccounts       IDREFS # REQUIRED   >

Page 32: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

32

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

IDs and IDREFsIDs and IDREFs

■ An element can have at most one attribute of type ID■ The ID attribute value of each element in an XML document must be 

distinct● Thus the ID attribute value is an object identifier

■ An attribute of type IDREF must contain the ID value of an element in the same document

■ An attribute of type IDREFS contains a set of (0 or more) ID values.  Each ID value must contain the ID value of an element in the same document

Page 33: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

33

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Bank DTD with AttributesBank DTD with Attributes

■ Bank DTD with ID and IDREF attribute types.      <!DOCTYPE bank­2[

     <!ELEMENT account (branch, balance)>     <!ATTLIST account              account_number ID          # REQUIRED

          owners                IDREFS # REQUIRED>      <!ELEMENT customer(customer_name, customer_street,  

                                                                          customer_city)>      <!ATTLIST customer

           customer_id        ID          # REQUIRED           accounts            IDREFS # REQUIRED>

          … declarations for branch, balance, customer_name,                                     customer_street and customer_city]>

Page 34: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

34

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XML data with ID and IDREF attributesXML data with ID and IDREF attributes

<bank­2><account account_number=“A­401” owners=“C100 C102”>         <branch_name> Downtown </branch_name>        <balance>          500 </balance></account><customer customer_id=“C100” accounts=“A­401”>         <customer_name>Joe         </customer_name>        <customer_street> Monroe  </customer_street>        <customer_city>     Madison</customer_city></customer><customer customer_id=“C102” accounts=“A­401 A­402”>         <customer_name> Mary     </customer_name>        <customer_street> Erin       </customer_street>        <customer_city>     Newark </customer_city></customer>

</bank­2>

Page 35: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

35

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Limitations of DTDsLimitations of DTDs

■ No typing of text elements and attributes● All values are strings, no integers, reals, etc.

■ Difficult to specify unordered sets of subelements● Order is usually irrelevant in databases (unlike in the document­

layout environment from which XML evolved)● (A | B)* allows specification of an unordered set, but

Cannot ensure that each of A and B occurs only once■ IDs and IDREFs are untyped

● The owners attribute of an account may contain a reference to another account, which is meaningless owners attribute should ideally be constrained to refer to 

customer elements

Page 36: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

36

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XML SchemaXML Schema

■ XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs.  Supports

● Typing of values E.g. integer, string, etc Also, constraints on min/max values

● User­defined, comlex types● Many more features, including

uniqueness and foreign key constraints, inheritance ■ XML Schema is itself specified in XML syntax, unlike DTDs

● More­standard representation, but verbose■ XML Scheme is integrated with namespaces ■ BUT:  XML Schema is significantly more complicated than DTDs.

Page 37: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

37

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XML Schema Version of Bank DTDXML Schema Version of Bank DTD

<xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema><xs:element name=“bank” type=“BankType”/>

<xs:element name=“account”><xs:complexType>      <xs:sequence>            <xs:element name=“account_number” type=“xs:string”/>            <xs:element name=“branch_name”      type=“xs:string”/>            <xs:element name=“balance”               type=“xs:decimal”/>      </xs:squence></xs:complexType>

</xs:element>….. definitions of customer and depositor ….<xs:complexType name=“BankType”>

<xs:squence><xs:element ref=“account”   minOccurs=“0” maxOccurs=“unbounded”/><xs:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/><xs:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>

</xs:sequence></xs:complexType></xs:schema>

Page 38: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

38

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XML Schema Version of Bank DTDXML Schema Version of Bank DTD

■ Choice of “xs:” was ours ­­ any other namespace prefix could be chosen

■ Element “bank” has type “BankType”, which is defined separately● xs:complexType is used later to create the named complex type 

“BankType”■ Element “account” has its type defined in­line

Page 39: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

39

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

More features of XML SchemaMore features of XML Schema

■ Attributes specified by xs:attribute tag:● <xs:attribute name = “account_number”/>● adding the attribute use = “required” means value must be 

specified■ Key constraint: “account numbers form a key for account elements 

under the root bank element:<xs:key name = “accountKey”>

<xs:selector xpath = “]bank/account”/><xs:field xpath = “account_number”/>

<\xs:key>

■ Foreign key constraint from depositor to account:<xs:keyref name = “depositorAccountKey” refer=“accountKey”>

<xs:selector xpath = “]bank/account”/><xs:field xpath = “account_number”/>

<\xs:keyref>

Page 40: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

40

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Querying and Transforming XML DataQuerying and Transforming XML Data

■ Translation of information from one XML schema to another■ Querying on XML data ■ Above two are closely related, and handled by the same tools■ Standard XML querying/translation languages

● XPath Simple language consisting of path expressions

● XSLT Simple language designed for translation from XML to XML 

and XML to HTML● XQuery

An XML query language with a rich set of features

Page 41: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

41

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Tree Model of XML DataTree Model of XML Data

■ Query and transformation languages are based on a tree model of XML data

■ An XML document is modeled as a tree, with nodes corresponding to elements and attributes

● Element nodes have child nodes, which can be attributes or subelements

● Text in an element is modeled as a text node child of the element● Children of a node are ordered according to their order in the XML 

document● Element and attribute nodes (except for the root node) have a single 

parent, which is an element node● The root node has a single child, which is the root element of the 

document

Page 42: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

42

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Tree Model of XML DataTree Model of XML Data

Page 43: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

43

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Tree Model of XML DataTree Model of XML Data

Page 44: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

44

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Relationships between nodes: Relationships between nodes: Descendant/AncestorDescendant/Ancestor

Page 45: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

45

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Relationships between nodes: Relationships between nodes: Preceding/FollowingPreceding/Following

Page 46: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

46

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Document PartitioningDocument Partitioning

Page 47: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

47

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

The Sibling RelationshipThe Sibling Relationship

Page 48: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

48

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XPathXPath

• Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title <title>No Such Thing As A Bad Day</title>

• Publications with Don Chamberlin as author or editor $cat//*[(author|editor) = “Don Chamberlin”]

<book><title>XQuery from the Experts</title>…</book>,

<spec><title>XQuery Formal Semantics</title>…</spec>

Page 49: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

49

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XPath: Path ExpressionsXPath: Path Expressions

Page 50: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

50

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XPath: Path ExpressionsXPath: Path Expressions

Page 51: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

51

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XPath AxesXPath Axes

Page 52: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

52

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Absolute Path ExpressionsAbsolute Path Expressions

Page 53: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

53

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Absolute Path ExpressionsAbsolute Path Expressions

Page 54: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

54

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XPath Expressions against JTR DocumentXPath Expressions against JTR Document

/child::*

/child::JackTheRipper/child::scene/child::victim

/descendant::node()/child::suspect

/descendant::node()/attribute::??? there are no attributes (yet?)

/descendant::node()/child::scene/following-sibling::node()

/descendant::node()/child::timeline/child::*/following-sibling::node()

/descendant::node()/child::inspector/self::node()/parent::node()/child::*

Page 55: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

55

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XPath: Abbreviated SyntaxXPath: Abbreviated Syntax

//timeline/event/@time

/descenadant-or-self::node()/child::timeline/child::event/attribute::time

Page 56: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

56

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XPath: PredicatesXPath: Predicates

Page 57: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

57

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XPath: PredicatesXPath: Predicates

//panel/bubbles/bubble[1]

(//panel/bubbles/bubble)[1]

//panel[2]/scene/text()

//*[3]

/descendant::*[3]

Page 58: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

58

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XPath: Boolean ExpressionsXPath: Boolean Expressions

//*[./@id]

//*[@time]

//*[name and picture]

//*[not(*)]

//*[doctor/text() = “Watson” or inspector/text() = “Holmes”]

Page 59: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

59

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Functions in XPathFunctions in XPath

■ XPath provides several functions● The function count()  at the end of a path counts the number of 

elements in the set generated by the path E.g. /bank­2/account[count(./customer) > 2] 

– Returns accounts with > 2 customers● Also function for testing position (1, 2, ..) of node w.r.t. siblings

■ Boolean connectives and and or and function not() can be used in predicates

■ IDREFs can be referenced using function id()● id() can also be applied to sets of references such as IDREFS and 

even to strings containing multiple references separated by blanks● E.g.  /bank­2/account/id(@owner) 

returns all customers referred to from the owners attribute of account elements.

Page 60: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

60

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

why• Motivation & The Big Picture

WHAT   • Crash Course XQuery

who• XML files   Saxon, Galax, GNU Qexo• XML DBMS   eXist, BerkeleyDB, MonetDB, X­Hive, Tamino, Xyleme• XML EAI   BEA Liquid Data, Data Direct, Mark Logic, Mono, Ipedo• XML RDBMS Oracle10g, SQLserver 2005, DB2

how• Under The Hood of MonetDB/XQuery• Some Benchmarks

XML DatabasesXML Databases

Page 61: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

61

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XQueryXQuery

A full­fledged XQuery implementation is available:

⊳ MonetDB/XQuery:  http://monetdb­xquery.org/

Page 62: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

62

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XQueryXQuery

Page 63: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

63

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XQuery: FLWOR ExpressionsXQuery: FLWOR Expressions

for $t in //timelinelet $s := $t/ancestor::scenewhere contains ($t/event/text(), “murder”)return ($s/victim, $s/suspect)

   //timeline $t

Page 64: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

64

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XQuery ExampleXQuery Example

Page 65: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

65

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XQuery ExampleXQuery Example

(1, 10) (1, 20) (2, 10) (2, 20) (3, 10) (3, 20)

Page 66: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

66

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XQuery: FLWOR ExpressionsXQuery: FLWOR Expressions

 event 

//timeline/event

//timeline/event

$c/time/text()

let $v := /JackTheRipper/scene/victimfor $t in $v/timeline

let $s := $t/ancestor::scenewhere contains ($t/event/text(), “murder”)return ($s/victim, $s/suspect)

Page 67: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

67

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

XQuery: Element ConstructionXQuery: Element Construction

//scene/suspect

Page 68: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

68

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

JoinsJoins■ Joins are specified in a manner very similar to SQL

for $a  in  /bank/account,      $c  in  /bank/customer,      $d  in  /bank/depositor

     where   $a/account_number = $d/account_number       and $c/customer_name = $d/customer_name

     return <cust_acct> { $c $a } </cust_acct>

■ The same query can be expressed with the selections specified as XPath selections:   for $a in /bank/account         $c in /bank/customer $d in /bank/depositor[

                      account_number = $a/account_number and                      customer_name  = $c/customer_name]   return <cust_acct> { $c $a } </cust_acct>

Page 69: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

69

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Nested QueriesNested Queries■ The following query converts data from the flat structure for 

bank  information into the nested structure used in bank­1      <bank­1> {

    for $c in /bank/customer    return

  <customer>     { $c/* }     { for $d in /bank/depositor[customer_name = 

$c/customer_name],           $a in 

/bank/account[account_number=$d/account_number]     return $a }

       </customer>     } </bank­1>■ $c/* denotes all the children of the node to which $c is bound, 

without the enclosing top­level tag■ $c/text() gives text content of an element without any 

subelements / tags

Page 70: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

70

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Sorting in XQuery Sorting in XQuery ■ The order by clause can be used at the end of any expression.               

E.g. to return customers sorted by name     for $c in /bank/customer

     order by $c/customer_name              return <customer> { $c/* } </customer>■ Use order by $c/customer_name  to sort in descending order■ Can sort at multiple levels of nesting (sort  by customer_name, and by 

account_number within each customer)        <bank­1> {

   for $c in /bank/customer   order by $c/customer_namereturn       <customer>    { $c/* }          { for $d in /bank/depositor[customer_name=$c/customer_name],                   $a in /bank/account[account_number=$d/account_number]

  order by $a/account_number         return <account> { $a/* } </account> }

       </customer>       } </bank­1>

Page 71: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

71

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Functions and Other XQuery FeaturesFunctions and Other XQuery Features

■ User defined functions  declare function balances($c as xs:string) as xs:decimal* {     for $d in /bank/depositor[customer_name = $c],           $a in /bank/account[account_number = $d/account_number]     return $a/balance

       }■ Types are optional for function parameters and return values■ The * (as in decimal*) indicates a sequence of values of that type

■ Universal and existential quantification in where clause predicates● some $e in path satisfies P     ● every $e in path satisfies P 

■ XQuery also supports If­then­else clauses

Page 72: ADT 2008 Lecture 2 XML, XPath & XQueryhomepages.cwi.nl/~manegold/teaching/adt/lectures/old/ADT2008_02… · 9 Stefan.Manegold@CWI.nl Lecture 2: XML, XPath & XQuery ADT 2008 XML Introduction

72

[email protected] Lecture 2: XML, XPath & XQuery ADT 2008

Further Reading MaterialFurther Reading Material