ernestina menasalvas facultad de informática. universidad politécnica de madrid...

138
Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid [email protected]

Upload: kristian-mason

Post on 26-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Ernestina Menasalvas

Facultad de Informática.

Universidad Politécnica de Madrid

[email protected]

Page 2: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Sources

The slides are taken from:• Database System Concepts©Silberschatz, Korth and

Sudarshan. See www.db-book.com for conditions on re-use • XML tutorial at http://www.w3schools.com/xml/default.asp (the

ones with no footnote)

Page 3: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

AGENDA

• Introduction• Structure of XML Data• XML Document Schema• Querying and Transformation• Application Program Interfaces to XML• Storage of XML Data• XML Applications

Page 4: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Introduction

Page 5: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

What is XML?

• XML stands for EXtensible Markup Language • XML is a markup language much like HTML • XML was designed to describe data • XML tags are not predefined. You must define your own tags• Defined by the WWW Consortium (W3C)• Derived from SGML (Standard Generalized Markup Language),

but simpler to use than SGML • Documents have tags giving extra information about sections of

the document– E.g. <title> XML </title> <slide> Introduction …</slide>

• XML uses a Document Type Definition (DTD) or an XML Schema to describe the data

• XML with a DTD or XML Schema is designed to be self-descriptive

Page 6: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

The Main Difference Between XML and HTML

• XML was designed to carry data.• XML is not a replacement for HTML.

XML and HTML were designed with different goals:

– XML was designed to describe data and to focus on what data is.HTML was designed to display data and to focus on how data looks.

– HTML is about displaying information, while XML is about describing information.

• XML is a Complement to HTML• It is important to understand that XML is not a replacement for

HTML. In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data.

• XML is a cross-platform, software and hardware independent tool for transmitting information.

Page 7: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

How can XML be used

• It is important to understand that XML was designed to store, carry, and exchange data. XML was not designed to display data.

• XML can Separate Data from HTML• With XML, your data is stored outside your HTML.• When HTML is used to display data, the data is stored inside your

HTML. With XML, data can be stored in separate XML files. This way you can concentrate on using HTML for data layout and display, and be sure that changes in the underlying data will not require any changes to your HTML.

• XML data can also be stored inside HTML pages as "Data Islands". You can still concentrate on using HTML only for formatting and displaying the data.

Page 8: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Why using XML

• XML is Used to Exchange Data– With XML, data can be exchanged between incompatible systems.

• XML and B2B– With XML, financial information can be exchanged over the Internet.

• XML Can be Used to Share Data– With XML, plain text files can be used to share data.

• XML Can be Used to Store Data– With XML, plain text files can be used to store data.

• XML Can Make your Data More Useful– With XML, your data is available to more users.

• XML Can be Used to Create New Languages– XML is the mother of WAP and WML.

• If Developers Have Sense– If they DO have sense, all future applications will exchange their data in

XML.

Page 9: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XML: applications

• Data interchange is critical in today’s networked world– Examples:

• Banking: funds transfer

• Order processing (especially inter-company orders)

• Scientific data

– Chemistry: ChemML, …– Genetics: BSML (Bio-Sequence Markup Language),

…– Paper flow of information between organizations is being replaced by

electronic flow of information

• Each application area has its own set of standards for representing information

• XML has become the basis for all new generation data interchange formats

©Silberschatz, Korth and Sudarshan10.9Database System Concepts - 5th Edition, Aug 22, 2005.

Page 10: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XML applications (Cont.)

• Earlier generation formats were based on plain text with line headers indicating the meaning of fields– Similar in concept to email headers

– Does not allow for nested structures, no standard “type” language

– Tied too closely to low level document structure (lines, spaces, etc)

• Each XML based standard defines what are valid elements, using– XML type specification languages to specify the syntax

• DTD (Document Type Descriptors)

• XML Schema

– Plus textual descriptions of the semantics

• XML allows new tags to be defined as required– However, this may be constrained by DTDs

• A wide variety of tools is available for parsing, browsing and querying XML documents/data

©Silberschatz, Korth and Sudarshan10.10Database System Concepts - 5th Edition, Aug 22, 2005.

Page 11: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XML Does not DO Anything

• XML was not designed to DO anything.• Maybe it is a little hard to understand, but XML does not DO

anything. XML was created to structure, store and to send information.

• The following example is a note to Tove from Jani, stored as XML:<note> <to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

• The note has a header and a message body. It also has sender and receiver information. But still, this XML document does not DO anything. It is just pure information wrapped in XML tags.

• Someone must write a piece of software to send, receive or display it.

Page 12: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Comparison with Relational Data

• Inefficient: tags, which in effect represent schema information, are repeated

• Better than relational tuples as a data-exchange format– Unlike relational tuples, XML data is self-documenting due to presence

of tags

– Non-rigid format: tags can be added

– Allows nested structures

– Wide acceptance, not only in database systems, but also in browsers, tools, and applications

©Silberschatz, Korth and Sudarshan10.12Database System Concepts - 5th Edition, Aug 22, 2005.

Page 13: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Syntax

Page 14: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XML syntax rules

• Rules are very simple and very strict. easy to learn, and very easy to use.

• XML documents use a self-describing and simple syntax.<?xml version="1.0" encoding="ISO-8859-1"?>

<note>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

• The first line in the document - the XML declaration - defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification of XML and uses the ISO-8859-1 (Latin-1/West European) character set.

• The next line describes the root element of the document (like it was saying: "this document is a note"):

• <note> The next 4 lines describe 4 child elements of the root (to, from, heading, and body):

• And finally the last line defines the end of the root element

Page 15: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XML syntax rules

• All XML Elements Must Have a Closing Tag– With XML, it is illegal to omit the closing tag.

• In HTML some elements do not have to have a closing tag. <p>This is a paragraph <p>This is another paragraph

• XML declaration does not have a closing tag. This is not an error. The declaration is not a part of the XML document itself. It is not an XML element, and it should not have a closing tag.<p>This is a paragraph</p>

• XML Tags are Case Sensitive Unlike HTML, – With XML, the tag <Letter> is different from the tag <letter>.

• XML Elements Must be Properly Nested<b><i>This text is bold and italic</i></b>

• Comments in XML similar to that of HTML.<!-- This is a comment -->

Page 16: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XML syntax rules• All XML documents must contain a single tag pair to define a root

element.– All other elements must be within this root element.– All elements can have sub elements (child elements). Sub elements must be

correctly nested within their parent element:<root> <child> <subchild>.....</subchild> </child> </root>

• XML Attribute Values Must be Quoted• With XML, it is illegal to omit quotation marks around attribute

values. • XML elements can have attributes in name/value pairs just like in HTML.

In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct:

<?xml version="1.0" encoding="ISO-8859-1"?> <note date="12/11/2002"> <to>Tove</to> <from>Jani</from> </note>

• With XML, the white space in your document is not truncated. This is unlike HTML.

Page 17: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XML is Free and Extensible

• XML tags are not predefined. You must "invent" your own tags.

• The tags used to mark up HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like <p>, <h1>, etc.).

• XML allows the author to define his own tags and his own document structure.

• The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags are "invented" by the author of the XML document.

Page 18: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XML tags

• The ability to specify new tags, and to create nested tag structures make XML a great way to exchange data, not just documents.

– Much of the use of XML has been in data exchange applications, not as a replacement for HTML

• Tags make data (relatively) self-documenting – E.g.

<bank> <account>

<account_number> A-101 </account_number> <branch_name> Downtown </branch_name> <balance> 500 </balance>

</account> <depositor>

<account_number> A-101 </account_number> <customer_name> Johnson </customer_name>

</depositor> </bank>

©Silberschatz, Korth and Sudarshan10.18Database System Concepts - 5th Edition, Aug 22, 2005.

Page 19: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Structure of XML Data

Page 20: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Structure of XML Data

• Tag: label for a section of data• Element: section of data beginning with <tagname>

and ending with matching </tagname>• Elements must be properly nested

– Proper nesting• <account> … <balance> …. </balance> </account>

– Improper nesting • <account> … <balance> …. </account> </balance>

– Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element.

• Every document must have a single top-level element

©Silberschatz, Korth and Sudarshan10.20Database System Concepts - 5th Edition, Aug 22, 2005.

Page 21: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Elements are extensible

• XML documents can be extended to carry more information.• Look at the following XML NOTE example:

<note>

<to>Tove</to>

<from>Jani</from>

<body>Don't forget me this weekend!</body>

</note>

• Let's imagine that we created an application to produce this output:

MESSAGE To: Tove

From: Jani

Don't forget me this weekend!

Page 22: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Elements are extensible

• Imagine that the author of the XML document added some extra information to it:<note>

<date>2002-08-01</date>

<to>Tove</to>

<from>Jani</from>

<body>Don't forget me this weekend!</body>

</note>

• Should the application break or crash?• No. • The application should still be able to find the <to>, <from>, and

<body> elements in the XML document and produce the same output.

Page 23: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Elements have Content

• An XML element is everything from (including) the element's start tag to (including) the element's end tag.

• An element can have:– element content, – mixed content, – simple content, or – empty content. – An element can also have attributes.

Page 24: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Elements have Content

<book><title>My First XML</title><prod id="33-657" media="paper"></prod><chapter>Introduction to XML<para>What is HTML</para><para>What is XML</para></chapter><chapter>XML Syntax<para>Elements must have a closing tag</para><para>Elements must be properly nested</para></chapter></book>

In the example above only the prod element has attributes. The attribute named id has the value "33-657".

The attribute named media has the value "paper". 

book has element content, because it contains other elements.

Chapter has mixed content because it contains both text and other elements

Para has simple content (or text content)

Prod has empty content, because it carries no information

Page 25: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Element Naming

• XML elements must follow these naming rules:– Names can contain letters, numbers, and other characters – Names must not start with a number or punctuation character – Names must not start with the letters xml (or XML, or Xml, etc) – Names cannot contain spaces

• follow these simple rules:– Any name can be used, no words are reserved, but the idea is to

make names descriptive. – Avoid "-" and "." in names. (subtract. or property )– Element names can be as long as you like,– XML documents often have a corresponding database, in which fields

exist corresponding to elements in the XML document. Use the naming rules of your database for the elements in the XML documents.

– Non-English letters like éòá are perfectly legal in XML element names, but watch out for problems if your software vendor doesn't support them.

– The ":" should not be used in element names because it is reserved to be used for something called namespaces

Page 26: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XML elements attributes

• XML elements can have attributes in the start tag, just like HTML.

• Attributes are used to provide additional information about elements.

• From HTML <IMG SRC="computer.gif">.

The SRC attribute provides additional information about the IMG element.

• In HTML (and in XML) attributes provide additional information about elements:

• Attributes often provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but important to the software that wants to manipulate the element:

• <file type="gif">computer.gif</file>

• Attribute values must always be enclosed in quotes, but either single or double quotes can be used.

<person sex="female">

<person sex='female'>

Page 27: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Attributes

• Elements can have attributes <account acct-type = “checking” >

<account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> 400 </balance>

</account>• Attributes are specified by name=value pairs inside the starting

tag of an element• An element may have several attributes, but each attribute name

can only occur once<account acct-type = “checking” monthly-fee=“5”>

©Silberschatz, Korth and Sudarshan10.27Database System Concepts - 5th Edition, Aug 22, 2005.

Page 28: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Use of Elements vs. Attributes

• Data can be stored in child elements or in attributes.• Take a look at these examples:

<person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname></person>

<person> <sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname></person>

• In the first example sex is an attribute. In the last, sex is a child element. Both examples provide the same information.

• There are no rules about when to use attributes, and when to use child elements.

• In XML use child elements if the information feels like data.

Page 29: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Attributes vs. Subelements

• Distinction between subelement and attribute– In the context of documents, attributes are part of markup, while

subelement contents are part of the basic document contents

– In the context of data representation, the difference is unclear and may be confusing

• Same information can be represented in two ways

– <account account_number = “A-101”> …. </account>– <account>

<account_number>A-101</account_number> … </account>

– Suggestion: use attributes for identifiers of elements, and use subelements for contents

©Silberschatz, Korth and Sudarshan10.29Database System Concepts - 5th Edition, Aug 22, 2005.

Page 30: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Avoid using attributes?

• Some of the problems with using attributes are: – attributes cannot contain multiple values (child

elements can) – attributes are not easily expandable (for future

changes) – attributes cannot describe structures (child elements

can) – attributes are more difficult to manipulate by program

code – attribute values are not easy to test against a

Document Type Definition (DTD) - which is used to define the legal elements of an XML document

Page 31: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Avoid using attributes?

• If you use attributes as containers for data, you end up with documents that are difficult to read and maintain.

• Try to use elements to describe data. • Use attributes only to provide information that is not relevant to the

data.• Don't end up like this (this is not how XML should be used):

<note day="12" month="11" year="2002" to="Tove" from="Jani" heading="Reminder" body="Don't forget me this weekend!"> </note>

• metadata (data about data) should be stored as attributes, and that data itself should be stored as elements.

Page 32: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Elements can be nested

<bank-1> <customer>

<customer_name> Hayes </customer_name> <customer_street> Main </customer_street> <customer_city> Harrison </customer_city> <account>

<account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> 400 </balance>

</account> <account> … </account>

</customer> . .

</bank-1>

©Silberschatz, Korth and Sudarshan10.32Database System Concepts - 5th Edition, Aug 22, 2005.

Page 33: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Motivation for Nesting

• Nesting of data is useful in data transfer– Example: elements representing customer_id, customer_name, and

address nested within an order element

• Nesting is not supported, or discouraged, in relational databases– With multiple orders, customer name and address are stored

redundantly

– normalization replaces nested structures in each order by foreign key into table storing customer name and address information

– Nesting is supported in object-relational databases

• But nesting is appropriate when transferring data– External application does not have direct access to data referenced

by a foreign key

©Silberschatz, Korth and Sudarshan10.33Database System Concepts - 5th Edition, Aug 22, 2005.

Page 34: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Elements and text

• Mixture of text with sub-elements is legal in XML. – Example: <account>

This account is seldom used any more. <account_number> A-102</account_number> <branch_name> Perryridge</branch_name> <balance>400 </balance></account>

– Useful for document markup, but discouraged for data representation

©Silberschatz, Korth and Sudarshan10.34Database System Concepts - 5th Edition, Aug 22, 2005.

Page 35: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Namespaces

• XML data has to be exchanged between organizations• Same tag name may have different meaning in different

organizations, causing confusion on exchanged documents• Specifying a unique string as an element name avoids confusion• Better solution: use unique-name:element-name• Avoid using long unique names all over document by using XML

Namespaces

<bank Xmlns:FB=‘http://www.FirstBank.com’> …

<FB:branch> <FB:branchname>Downtown</FB:branchname>

<FB:branchcity> Brooklyn </FB:branchcity> </FB:branch>…

</bank>

©Silberschatz, Korth and Sudarshan10.35Database System Concepts - 5th Edition, Aug 22, 2005.

Page 36: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

More on XML Syntax

• Elements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag– <account number=“A-101” branch=“Perryridge” balance=“200 />

• To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below– <![CDATA[<account> … </account>]]>

Here, <account> and </account> are treated as just strings

CDATA stands for “character data”

©Silberschatz, Korth and Sudarshan10.36Database System Concepts - 5th Edition, Aug 22, 2005.

Page 37: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

DTD

Page 38: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Well Formed XML Documents

• A "Well Formed" XML document has correct XML syntax.• A "Well Formed" XML document is a document that conforms to the XML

syntax rules that were described in the previous chapters:– XML documents must have a root element – XML elements must have a closing tag – XML tags are case sensitive – XML elements must be properly nested – XML attribute values must always be quoted

<?xml version="1.0" encoding="ISO-8859-1"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>

Page 39: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Valid XML Documents

• A "Valid" XML document also conforms to a DTD.

• A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD):

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE note SYSTEM "InternalNote.dtd">

<note>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

Page 40: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

DTD vs SCHEMA

• A DTD defines the legal elements of an XML document. • The purpose of a DTD is to define the legal building blocks of an

XML document. It defines the document structure with a list of legal elements.

• Why Use a DTD?– With a DTD, each of your XML files can carry a description of its own

format.

– With a DTD, independent groups of people can agree to use a standard DTD for interchanging data.

– Your application can use a standard DTD to verify that the data you receive from the outside world is valid.

– You can also use a DTD to verify your own data.

• W3C supports an XML based alternative to DTD called XML Schema

Page 41: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Document Type Definition (DTD)

• The type of an XML document can be specified using a DTD• DTD constraints structure of XML data

– What elements can occur

– What attributes can/must an element have

– What subelements can/must occur inside each element, and how many times.

• DTD does not constrain data types– All values represented as strings in XML

• DTD syntax– <!ELEMENT element (subelements-specification) >– <!ATTLIST element (attributes) >

©Silberschatz, Korth and Sudarshan10.41Database System Concepts - 5th Edition, Aug 22, 2005.

Page 42: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

DTD introduction• A DTD can be declared inline inside an XML document, or as an external

reference.• If the DTD is declared inside the XML file It should be wrapped in a

DOCTYPE definition with the following syntax:<!DOCTYPE root-element [element-declarations]>

• Example XML document with an internal DTD:

<?xml version="1.0"?><!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>]>

<note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend</body></note>

Page 43: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

DTD introduction (cont)

• External DTD Declaration– If the DTD is declared in an external file, it should be wrapped in a DOCTYPE

definition with the following syntax:

• <!DOCTYPE root-element SYSTEM "filename">• This is the same XML document as above, but with an external DTD

<?xml version="1.0"?><!DOCTYPE note SYSTEM "note.dtd"><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>

• And this is the file "note.dtd" which contains the DTD:<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

Page 44: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

DTD - XML Building Blocks

• Seen from a DTD point of view, all XML documents (and HTML documents) are made up by the following building blocks:– Elements

– Attributes

– Entities

– PCDATA

– CDATA

Page 45: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Elements and attributes

• Elements are the main building blocks of both XML and HTML documents.

• Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img".

• Examples:<body>some text</body><message>some text</message>

• Attributes provide extra information about elements.

Attributes are always placed inside the opening tag of an element. Attributes always come in name/value pairs. The following "img" element has additional information about a source file:

<img src="computer.gif" />

The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty it is closed by a " /".

Page 46: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Entities

• Some characters have a special meaning in XML, like the less than sign (<) that defines the start of an XML tag.

• Most of you know the HTML entity: "&nbsp;". This "no-breaking-space" entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser.

• The following entities are predefined in XML:Entity References Character&lt; <&gt; >&amp; &&quot; "&apos; '

Page 47: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Entities

• Entities are variables used to define shortcuts to standard text or special characters.

• Entity references are references to entities

• Entities can be declared internal or external

Page 48: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

An Internal Entity Declaration

• Syntax

• <!ENTITY entity-name "entity-value">

• Example

DTD Example:

<!ENTITY writer "Donald Duck."><!ENTITY copyright "Copyright W3Schools.">

XML example:

<author>&writer;&copyright;</author>

• Note: An entity has three parts: an ampersand (&), an entity name, and a semicolon (;).

Page 49: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

An External Entity Declaration

• Syntax

• <!ENTITY entity-name SYSTEM "URI/URL">

DTD Example:

<!ENTITY writer SYSTEM "http://www.w3schools.com/entities.dtd">

<!ENTITY copyright SYSTEM "http://www.w3schools.com/entities.dtd">

• XML example:

<author>&writer;&copyright;</author>

Page 50: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

PCDATA and CDATA

• PCDATA means parsed character data.– Think of character data as the text found between the start tag and the

end tag of an XML element.

– PCDATA is text that WILL be parsed by a parser. The text will be examined by the parser for entities and markup.

– Tags inside the text will be treated as markup and entities will be expanded.

– However, parsed character data should not contain any &, <, or > characters; these need to be represented by the &amp; &lt; and &gt; entities, respectively.

• CDATA means character data.– CDATA is text that will NOT be parsed by a parser. Tags inside the

text will NOT be treated as markup and entities will not be expanded.

Page 51: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Empty Elements and PCDATA

• Empty elements are declared with the category keyword EMPTY:

<!ELEMENT element-name EMPTY>• Example:

<!ELEMENT br EMPTY>

• XML example:<br />

• Elements with Parsed Character Data• Elements with only parsed character data are declared with

#PCDATA inside parentheses:

<!ELEMENT element-name (#PCDATA)>• Example:• <!ELEMENT from (#PCDATA)>

Page 52: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

More on elements

• Elements declared with the category keyword ANY, can contain any combination of parsable data:

<!ELEMENT element-name ANY>Example:<!ELEMENT note ANY>

• Elements with one or more children (sequences) are declared with the name of the children elements inside parentheses:

<!ELEMENT element-name (child1)>or<!ELEMENT element-name (child1,child2,...)>

• Example:<!ELEMENT note (to,from,heading,body)>

• When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. In a full declaration, the children must also be declared, and the children can also have children. The full declaration of the "note" element is:

<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

Page 53: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Elements (cont)

• Declaring Only One Occurrence of an Element <!ELEMENT element-name (child-name)><!ELEMENT note (message)>

• The example above declares that the child element "message" must occur once, and only once inside the "note" element.

• Declaring Minimum One Occurrence of an Element<!ELEMENT element-name (child-name+)><!ELEMENT note (message+)>

• The + sign in the example above declares that the child element "message" must occur one or more times inside the "note" element.

• Declaring Zero or More Occurrences of an Element <!ELEMENT element-name (child-name*)><!ELEMENT note (message*)>

• The * sign in the example above declares that the child element "message" can occur zero or more times inside the "note" element.

Page 54: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

More on elements

• Declaring Zero or One Occurrences of an Element <!ELEMENT element-name (child-name?)><!ELEMENT note (message?)>

• The ? sign in the example above declares that the child element "message" can occur zero or one time inside the "note" element.

• Declaring either/or Content<!ELEMENT note (to,from,header,(message|body))>The example above declares that the "note" element must contain a "to" element,

a "from" element, a "header" element, and either a "message" or a "body" element.

• Declaring Mixed Content– <!ELEMENT note (#PCDATA|to|from|header|message)*>

• The example above declares that the "note" element can contain zero or more occurrences of parsed character data, "to", "from", "header", or "message" elements.

Page 55: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Bank DTD

<!DOCTYPE bank [<!ELEMENT bank ( ( account | customer | depositor)+)><!ELEMENT account (account_number branch_name balance)><! ELEMENT customer(customer_name customer_street customer_city)><! ELEMENT depositor (customer_name account_number)><! ELEMENT account_number (#PCDATA)><! ELEMENT branch_name (#PCDATA)><! ELEMENT balance(#PCDATA)><! ELEMENT customer_name(#PCDATA)><! ELEMENT customer_street(#PCDATA)><! ELEMENT customer_city(#PCDATA)>

]>

©Silberschatz, Korth and Sudarshan10.56Database System Concepts - 5th Edition, Aug 22, 2005.

Page 56: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Declaring Attributes

• An attribute declaration has the following syntax:

<!ATTLIST element-name attribute-name attribute-type default-value>

• DTD example:

<!ATTLIST payment type CDATA "check">

• XML example:

<payment type="check" />

Page 57: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Attribute Specification in DTD

• Attribute specification : for each attribute – Name

– Type of attribute

• CDATA

• ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs)

– more on this later – Whether

• mandatory (#REQUIRED)

• has a default value (value),

• or neither (#IMPLIED)

• Examples– <!ATTLIST account acct-type CDATA “checking”>

– <!ATTLIST customercustomer_id ID # REQUIREDaccounts IDREFS # REQUIRED >

©Silberschatz, Korth and Sudarshan10.58Database System Concepts - 5th Edition, Aug 22, 2005.

Page 58: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

The attribute-type can be one of the followingType Description

• CDATA The value is character data

• (en1|en2|..)The value must be one from an enumerated list

• ID The value is a unique id

• IDREF The value is the id of another element

• IDREFS The value is a list of other ids

• NMTOKEN The value is a valid XML name

• NMTOKENS The value is a list of valid XML names

• ENTITY The value is an entity

• ENTITIES The value is a list of entities

• NOTATION The value is a name of a notation

• xml: The value is a predefined xml value

Page 59: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Attributes

• The default-value can be one of the following:Value Explanation

Value The default value of the attribute

#REQUIRED The attribute is required

#IMPLIED The attribute is not required

#FIXED value The attribute value is fixed

• DTD:<!ELEMENT square EMPTY><!ATTLIST square width CDATA "0">

• Valid XML:<square width="100" />

• In the example above, the "square" element is defined to be an empty element with a "width" attribute of type CDATA. If no width is specified, it has a default value of 0.

Page 60: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

attributes

• #REQUIRED• Syntax• <!ATTLIST element-name attribute_name attribute-type

#REQUIRED>• DTD:

<!ATTLIST person number CDATA #REQUIRED>

• Valid XML:• <person number="5677" />• Invalid XML:• <person />

• Use the #REQUIRED keyword if you don't have an option for a default value, but still want to force the attribute to be present.

Page 61: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

attributes

• #IMPLIED• <!ATTLIST element-name attribute-name attribute-type #IMPLIED>• DTD:

<!ATTLIST contact fax CDATA #IMPLIED>• Valid XML:

<contact fax="555-667788" />• Valid XML:

<contact />• Use the #IMPLIED keyword if you don't want to force the author to include

an attribute, and you don't have an option for a default value.

• #FIXED• <!ATTLIST element-name attribute-name attribute-type #FIXED "value">• DTD:

<!ATTLIST sender company CDATA #FIXED "Microsoft">• Valid XML:

<sender company="Microsoft" />• Invalid XML:• <sender company="W3Schools" />• Use the #FIXED keyword when you want an attribute to have a fixed value

without allowing the author to change it. If an author includes another value, the XML parser will return an error.

Page 62: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

attributes

• Enumerated Attribute Values• <!ATTLIST element-name attribute-name (en1|en2|..) default-

value>

• Example• DTD:• <!ATTLIST payment type (check|cash) "cash">

• XML example:• <payment type="check" />• or• <payment type="cash" />

• Use enumerated attribute values when you want the attribute value to be one of a fixed set of legal values.

Page 63: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

IDs and IDREFs

• An element can have at most one attribute of type ID• The ID attribute value of each element in an XML document must

be distinct– Thus the ID attribute value is an object identifier

• An attribute of type IDREF must contain the ID value of an element in the same document

• An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document

©Silberschatz, Korth and Sudarshan10.64Database System Concepts - 5th Edition, Aug 22, 2005.

Page 64: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Bank DTD with Attributes

• Bank DTD with ID and IDREF attribute types.

<!DOCTYPE bank-2[ <!ELEMENT account (branch, balance)> <!ATTLIST account

account_number ID # REQUIRED owners IDREFS # REQUIRED>

<!ELEMENT customer(customer_name, customer_street, customer_city)>

<!ATTLIST customer customer_id ID # REQUIRED accounts IDREFS # REQUIRED>

… declarations for branch, balance, customer_name, customer_street and customer_city]>

©Silberschatz, Korth and Sudarshan10.65Database System Concepts - 5th Edition, Aug 22, 2005.

Page 65: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XML data with ID and IDREF attributes

<bank-2><account account_number=“A-401” owners=“C100 C102”>

<branch_name> Downtown </branch_name> <balance> 500 </balance>

</account><customer customer_id=“C100” accounts=“A-401”>

<customer_name>Joe </customer_name> <customer_street> Monroe </customer_street> <customer_city> Madison</customer_city>

</customer><customer customer_id=“C102” accounts=“A-401 A-402”>

<customer_name> Mary </customer_name> <customer_street> Erin </customer_street> <customer_city> Newark </customer_city>

</customer></bank-2>

©Silberschatz, Korth and Sudarshan10.66Database System Concepts - 5th Edition, Aug 22, 2005.

Page 66: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Example: TV Schedule DTD

• By David Moisan. Copied from http://www.davidmoisan.org/

<!DOCTYPE TVSCHEDULE [

<!ELEMENT TVSCHEDULE (CHANNEL+)><!ELEMENT CHANNEL (BANNER,DAY+)><!ELEMENT BANNER (#PCDATA)><!ELEMENT DAY (DATE,(HOLIDAY|PROGRAMSLOT+)+)><!ELEMENT HOLIDAY (#PCDATA)><!ELEMENT DATE (#PCDATA)><!ELEMENT PROGRAMSLOT (TIME,TITLE,DESCRIPTION?)><!ELEMENT TIME (#PCDATA)><!ELEMENT TITLE (#PCDATA)> <!ELEMENT DESCRIPTION (#PCDATA)>

<!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED><!ATTLIST CHANNEL CHAN CDATA #REQUIRED><!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED><!ATTLIST TITLE RATING CDATA #IMPLIED><!ATTLIST TITLE LANGUAGE CDATA #IMPLIED>

]>

Page 67: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Newspaper Article DTD

• Copied from http://www.vervet.com/ <!DOCTYPE NEWSPAPER [

<!ELEMENT NEWSPAPER (ARTICLE+)><!ELEMENT ARTICLE (HEADLINE,BYLINE,LEAD,BODY,NOTES)><!ELEMENT HEADLINE (#PCDATA)><!ELEMENT BYLINE (#PCDATA)><!ELEMENT LEAD (#PCDATA)><!ELEMENT BODY (#PCDATA)><!ELEMENT NOTES (#PCDATA)>

<!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED><!ATTLIST ARTICLE EDITOR CDATA #IMPLIED><!ATTLIST ARTICLE DATE CDATA #IMPLIED><!ATTLIST ARTICLE EDITION CDATA #IMPLIED>

<!ENTITY NEWSPAPER "Vervet Logic Times"><!ENTITY PUBLISHER "Vervet Logic Press"><!ENTITY COPYRIGHT "Copyright 1998 Vervet Logic Press">

]>

Page 68: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Limitations of DTDs

• No typing of text elements and attributes– All values are strings, no integers, reals, etc.

• Difficult to specify unordered sets of subelements– Order is usually irrelevant in databases (unlike in the document-layout

environment from which XML evolved)

– (A | B)* allows specification of an unordered set, but

• Cannot ensure that each of A and B occurs only once

• IDs and IDREFs are untyped– The owners attribute of an account may contain a reference to another

account, which is meaningless

• owners attribute should ideally be constrained to refer to customer elements

©Silberschatz, Korth and Sudarshan10.69Database System Concepts - 5th Edition, Aug 22, 2005.

Page 69: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XML Schema

• XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports– Typing of values

• E.g. integer, string, etc

• Also, constraints on min/max values

– User-defined, complex types

– Many more features, including

• uniqueness and foreign key constraints, inheritance

• XML Schema is itself specified in XML syntax, unlike DTDs– More-standard representation, but verbose

• XML Scheme is integrated with namespaces • BUT: XML Schema is significantly more complicated than DTDs.

©Silberschatz, Korth and Sudarshan10.70Database System Concepts - 5th Edition, Aug 22, 2005.

Page 70: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Schemas

• XML Schema is an XML-based alternative to DTD.• An XML schema describes the structure of an XML document.• The XML Schema language is also referred to as XML Schema

Definition (XSD).• The purpose of an XML Schema is to define the legal building

blocks of an XML document, just like a DTD.• An XML Schema:

– defines elements that can appear in a document

– defines attributes that can appear in a document

– defines which elements are child elements

– defines the order of child elements

– defines the number of child elements

– defines whether an element is empty or can include text

– defines data types for elements and attributes

– defines default and fixed values for elements and attributes

Page 71: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

An XML Schema

The following XML Schema file called "note.xsd" that defines the elements of the XML document above ("note.xml"):

<?xml version="1.0"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"targetNamespace="http://www.w3schools.com"xmlns="http://www.w3schools.com"elementFormDefault="qualified">

<xs:element name="note"> <xs:complexType> <xs:sequence>

<xs:element name="to" type="xs:string"/><xs:element name="from" type="xs:string"/><xs:element name="heading" type="xs:string"/><xs:element name="body" type="xs:string"/>

</xs:sequence> </xs:complexType></xs:element>

</xs:schema>

Page 72: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

A Reference to an XML Schema

• This XML document has a reference to an XML Schema:

<?xml version="1.0"?>

<note

xmlns="http://www.w3schools.com"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3schools.com note.xsd">

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

Page 73: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Simple elements

• XML Schemas define the elements of your XML files.• A simple element is an XML element that contains only text. It

cannot contain any other elements or attributes.

• A simple element is an XML element that can contain only text. It cannot contain any other elements or attributes.

• However, the "only text" restriction is quite misleading. The text can be of many different types. It can be one of the types included in the XML Schema definition (boolean, string, date, etc.), or it can be a custom type that you can define yourself.

• You can also add restrictions (facets) to a data type in order to limit its content, or you can require the data to match a specific pattern.

Page 74: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Defining a Simple Element

• The syntax for defining a simple element is:

<xs:element name="xxx" type="yyy"/>

• where xxx is the name of the element and yyy is the data type of the element.

• XML Schema has a lot of built-in data types. The most common types are:

xs:string

xs:decimal

xs:integer

xs:boolean

xs:date

xs:time

Page 75: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Example

• Here are some XML elements:

<lastname>Refsnes</lastname>

<age>36</age>

<dateborn>1970-03-27</dateborn>

• And here are the corresponding simple element definitions:

<xs:element name="lastname" type="xs:string"/>

<xs:element name="age" type="xs:integer"/>

<xs:element name="dateborn" type="xs:date"/>

Page 76: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Default and Fixed Values for Simple Elements

• Simple elements may have a default value OR a fixed value specified.

• A default value is automatically assigned to the element when no other value is specified.

• In the following example the default value is "red":

<xs:element name="color" type="xs:string" default="red"/>

• A fixed value is also automatically assigned to the element, and you cannot specify another value.

• In the following example the fixed value is "red":

<xs:element name="color" type="xs:string" fixed="red"/>

Page 77: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Attributes

• Simple elements cannot have attributes. If an element has attributes, it is considered to be of a complex type. But the attribute itself is always declared as a simple type.

• The syntax for defining an attribute is:

<xs:attribute name="xxx" type="yyy"/>

where xxx is the name of the attribute and yyy specifies the data type of the attribute.

• The most common types are: xs:string xs:decimal xs:integer xs:boolean xs:date xs:time

Page 78: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Attributes (cont.)

• Here is an XML element with an attribute:

<lastname lang="EN">Smith</lastname>

• And here is the corresponding attribute definition:

<xs:attribute name="lang" type="xs:string"/>

• Default and fixed like with elements

Page 79: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Restrictions (facets)

• Restrictions are used to define acceptable values for XML elements or attributes. Restrictions on XML elements are called facets.

• Restrictions on Values

• The following example defines an element called "age" with a restriction. The value of age cannot be lower than 0 or greater than 120:

<xs:element name="age">

<xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> </xs:restriction></xs:simpleType>

</xs:element>

Page 80: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Restrictions on a Set of Values

• To limit the content of an XML element to a set of acceptable values, we would use the enumeration constraint.

• The example below defines an element called "car" with a restriction. The only acceptable values are: Audi, Golf, BMW:

<xs:element name="car">

<xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction></xs:simpleType>

</xs:element>

• Many Restrictions can be settled in data types (lenght, values, set of values)

Page 81: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

What is a Complex Element?

• A complex element is an XML element that contains other elements and/or attributes.

• There are four kinds of complex elements:– empty elements

– elements that contain only other elements

– elements that contain only text

– elements that contain both other elements and text

• Note: Each of these elements may contain attributes as well!

Page 82: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

How to Define a Complex Element

• Look at this complex XML element, "employee", which contains only other elements:

<employee><firstname>John</firstname><lastname>Smith</lastname></employee>

• We can define a complex element in an XML Schema two different ways:1. The "employee" element can be declared directly by naming the element, like this:

<xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType></xs:element>

Page 83: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Complex element definition

• The "employee" element can have a type attribute that refers to the name of the complex type to use:

<xs:element name="employee" type="personinfo"/>

<xs:complexType name="personinfo">

<xs:sequence>

<xs:element name="firstname" type="xs:string"/>

<xs:element name="lastname" type="xs:string"/>

</xs:sequence>

</xs:complexType>

Page 84: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Complex elements

• If you use the method described above, several elements can refer to the same complex type, like this:

<xs:element name="employee" type="personinfo"/>

<xs:element name="student" type="personinfo"/>

<xs:element name="member" type="personinfo"/>

<xs:complexType name="personinfo">

<xs:sequence>

<xs:element name="firstname" type="xs:string"/>

<xs:element name="lastname" type="xs:string"/>

</xs:sequence>

</xs:complexType>

Page 85: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

More features of XML Schema

• Attributes specified by xs:attribute tag:– <xs:attribute name = “account_number”/>

– adding the attribute use = “required” means value must be specified

• Key constraint: “account numbers form a key for account elements under the root bank element:

<xs:key name = “accountKey”><xs:selector xpath = “]bank/account”/><xs:field xpath = “account_number”/>

<\xs:key>• Foreign key constraint from depositor to account:

<xs:keyref name = “depositorAccountKey” refer=“accountKey”>

<xs:selector xpath = “]bank/account”/><xs:field xpath = “account_number”/>

<\xs:keyref>

©Silberschatz, Korth and Sudarshan10.91Database System Concepts - 5th Edition, Aug 22, 2005.

Page 86: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

An XSD example<?xml version="1.0" encoding="ISO-8859-1"?><shiporder orderid="889923"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:noNamespaceSchemaLocation="shiporder.xsd"> <orderperson>John Smith</orderperson> <shipto> <name>Ola Nordmann</name> <address>Langgt 23</address> <city>4000 Stavanger</city> <country>Norway</country> </shipto> <item> <title>Empire Burlesque</title> <note>Special Edition</note> <quantity>1</quantity> <price>10.90</price> </item> <item> <title>Hide your heart</title> <quantity>1</quantity> <price>9.90</price> </item></shiporder>

Page 87: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Create an XML Schema of the example

• Now we want to create a schema for the XML document in the slide

• We start by opening a new file that we will call "shiporder.xsd". To create the schema we could simply follow the structure in the XML document and define each element as we find it. We will start with the standard XML declaration followed by the xs:schema element that defines a schema:<?xml version="1.0" encoding="ISO-8859-1" ?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

...

...

</xs:schema>• In the schema above we use the standard namespace (xs), and the URI

associated with this namespace is the Schema language definition, which has the standard value of http://www.w3.org/2001/XMLSchema.

Page 88: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Example cont.

• Next, we have to define the "shiporder" element. This element has an attribute and it contains other elements, therefore we consider it as a complex type. The child elements of the "shiporder" element is surrounded by a xs:sequence element that defines an ordered sequence of sub elements:

<xs:element name="shiporder">

<xs:complexType>

<xs:sequence>

...

...

</xs:sequence>

...

</xs:complexType>

</xs:element>

Page 89: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Example (cont.)

• Then we have to define the "orderperson" element as a simple type (because it does not contain any attributes or other elements). The type (xs:string) is prefixed with the namespace prefix associated with XML Schema that indicates a predefined schema data type:

<xs:element name="orderperson" type="xs:string"/>

• Next, we have to define two elements that are of the complex type: "shipto" and "item". We start by defining the "shipto" element:

<xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complexType></xs:element>

Page 90: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

example

• Now we can define the "item" element. This element can appear multiple times inside a "shiporder" element. This is specified by setting the maxOccurs attribute of the "item" element to "unbounded" which means that there can be as many occurrences of the "item" element as the author wishes. Notice that the "note" element is optional. We have specified this by setting the minOccurs attribute to zero:

<xs:element name="item" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string" minOccurs="0"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> </xs:complexType></xs:element>

Page 91: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

example

• We can now declare the attribute of the "shiporder" element. Since this is a required attribute we specify use="required".

• Note: The attribute declarations must always come last:

• <xs:attribute name="orderid" type="xs:string" use="required"/>

Page 92: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

example

the complete listing of the schema file called "shiporder.xsd":

<?xml version="1.0" encoding="ISO-8859-1" ?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="shiporder"> <xs:complexType> <xs:sequence> <xs:element name="orderperson" type="xs:string"/> <xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>

Page 93: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Example (cont)

<xs:element name="item" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string" minOccurs="0"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="orderid" type="xs:string" use="required"/> </xs:complexType></xs:element>

</xs:schema>

Page 94: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Divide the Schema

• The previous design method is very simple, but can be difficult to read and maintain when documents are complex. The next design method is based on defining all elements and attributes first, and then referring to them using the ref attribute.

<?xml version="1.0" encoding="ISO-8859-1" ?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<!-- definition of simple elements --><xs:element name="orderperson" type="xs:string"/><xs:element name="name" type="xs:string"/><xs:element name="address" type="xs:string"/><xs:element name="city" type="xs:string"/><xs:element name="country" type="xs:string"/><xs:element name="title" type="xs:string"/><xs:element name="note" type="xs:string"/><xs:element name="quantity" type="xs:positiveInteger"/><xs:element name="price" type="xs:decimal"/>

<!-- definition of attributes --><xs:attribute name="orderid" type="xs:string"/>

Page 95: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Divide the Schema<!-- definition of complex elements --><xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="address"/> <xs:element ref="city"/> <xs:element ref="country"/> </xs:sequence> </xs:complexType></xs:element><xs:element name="item"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="note" minOccurs="0"/> <xs:element ref="quantity"/> <xs:element ref="price"/> </xs:sequence> </xs:complexType></xs:element>……

Page 96: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Querying and transforming

Page 97: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Querying and Transforming XML Data

• Translation of information from one XML schema to another• Querying on XML data • Above two are closely related, and handled by the same tools• Standard XML querying/translation languages

– XPath

• Simple language consisting of path expressions

– XSLT

• Simple language designed for translation from XML to XML and XML to HTML

– XQuery

• An XML query language with a rich set of features

©Silberschatz, Korth and Sudarshan10.103Database System Concepts - 5th Edition, Aug 22, 2005.

Page 98: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Tree Model of XML Data

• Query and transformation languages are based on a tree model of XML data

• An XML document is modeled as a tree, with nodes corresponding to elements and attributes– Element nodes have child nodes, which can be attributes or

subelements

– Text in an element is modeled as a text node child of the element

– Children of a node are ordered according to their order in the XML document

– Element and attribute nodes (except for the root node) have a single parent, which is an element node

– The root node has a single child, which is the root element of the document

©Silberschatz, Korth and Sudarshan10.104Database System Concepts - 5th Edition, Aug 22, 2005.

Page 99: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XPath

• XPath is used to address (select) parts of documents using path expressions

• A path expression is a sequence of steps separated by “/”– Think of file names in a directory hierarchy

• Result of path expression: set of values that along with their containing elements/attributes match the specified path

• E.g. /bank-2/customer/customer_name evaluated on the bank-2 data we saw earlier returns <customer_name>Joe</customer_name><customer_name>Mary</customer_name>

• E.g. /bank-2/customer/customer_name/text( ) returns the same names, but without the enclosing tags

©Silberschatz, Korth and Sudarshan10.105Database System Concepts - 5th Edition, Aug 22, 2005.

Page 100: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

What is XPath?

• XPath is a syntax for defining parts of an XML document • XPath uses path expressions to navigate in XML documents • XPath contains a library of standard functions • XPath is a major element in XSLT • XPath is a W3C Standard • XPath Path Expressions

– XPath uses path expressions to select nodes or node-sets in an XML document. These path expressions look very much like the expressions you see when you work with a traditional computer file system.

• XPath Standard Functions– XPath includes over 100 built-in functions. There are functions for

string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, and more.

Page 101: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XPath Terminology

• Nodes: In XPath, there are seven kinds of nodes: element, attribute, text, namespace, processing-instruction, comment, and document (root) nodes. XML documents are treated as trees of nodes.

• The root of the tree is called the document node (or root node).

• Look at the following XML document:<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price></book>

</bookstore>

Page 102: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Example of nodes in the XML document above

• <bookstore> (document node)

• <author>J K. Rowling</author> (element node)

• lang="en" (attribute node)

• Atomic values

• Atomic values are nodes with no children or parent.

• Example of atomic values:

• J K. Rowling

• "en"

Page 103: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Relationship of Nodes

• Parent• Each element and attribute has one parent.

• In the following example; the book element is the parent of the title, author, year, and price:

<book>

<title>Harry Potter</title>

<author>J K. Rowling</author>

<year>2005</year>

<price>29.99</price>

</book>

Page 104: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Relationship of Nodes

• Children

• Element nodes may have zero, one or more children.

• In the following example; the title, author, year, and price elements are all children of the book element:

<book>

<title>Harry Potter</title>

<author>J K. Rowling</author>

<year>2005</year>

<price>29.99</price>

</book>

Page 105: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Relationship of Nodes• Siblings

– Nodes that have the same parent.– In the following example; the title, author, year, and price elements are all siblings:<book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price></book>

• Ancestors– A node's parent, parent's parent, etc.– In the following example; the ancestors of the title element are the book element and the

bookstore element:

<bookstore>

<book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price></book>

</bookstore>

Page 106: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Relationship of Nodes

• Descendants

• A node's children, children's children, etc.

• In the following example; descendants of the bookstore element are the book, title, author, year, and price elements:

<bookstore>

<book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price></book>

</bookstore>

Page 107: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Xpath syntax

• The XML Example Document

• We will use the following XML document in the examples below.

<?xml version="1.0" encoding="ISO-8859-1"?>

<bookstore>

<book> <title lang="eng">Harry Potter</title> <price>29.99</price></book>

<book> <title lang="eng">Learning XML</title> <price>39.95</price></book>

</bookstore>

Page 108: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Selecting Nodes

• XPath uses path expressions to select nodes in an XML document. The node is selected by following a path or steps. The most useful path expressions are listed below:

Expression Description

nodename Selects all child nodes of the node

/ Selects from the root node

// Selects nodes in the document from the

current node that match the selection no matter where they are

. Selects the current node

.. Selects the parent of the current node

@ Selects attributes

Page 109: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Selecting nodes examples

Path Expression Result• bookstore Selects all the child nodes of the bookstore

element• /bookstore Selects the root element bookstore

• bookstore/book Selects all book elements that are children of

bookstore• //book Selects all book elements no matter where

they are in the document• bookstore//book Selects all book elements that are

descendant of the bookstore element, no

matter where they are under the bookstore element

• //@lang Selects all attributes that are named lang

Page 110: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XPath (Cont.)

• The initial “/” denotes root of the document (above the top-level tag)

• Path expressions are evaluated left to right– Each step operates on the set of instances produced by the previous

step

• Selection predicates may follow any step in a path, in [ ]– E.g. /bank-2/account[balance > 400]

• returns account elements with a balance value greater than 400

• /bank-2/account[balance] returns account elements containing a balance subelement

• Attributes are accessed using “@”– E.g. /bank-2/account[balance > 400]/@account_number

• returns the account numbers of accounts with balance > 400

– IDREF attributes are not dereferenced automatically (more on this later)

©Silberschatz, Korth and Sudarshan10.116Database System Concepts - 5th Edition, Aug 22, 2005.

Page 111: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Functions in XPath

• XPath provides several functions– The function count() at the end of a path counts the number of

elements in the set generated by the path• E.g. /bank-2/account[count(./customer) > 2]

– Returns accounts with > 2 customers– Also function for testing position (1, 2, ..) of node w.r.t. siblings

• Boolean connectives and and or and function not() can be used in predicates

• IDREFs can be referenced using function id()– id() can also be applied to sets of references such as IDREFS and

even to strings containing multiple references separated by blanks– E.g. /bank-2/account/id(@owner)

• returns all customers referred to from the owners attribute of account elements.

©Silberschatz, Korth and Sudarshan10.117Database System Concepts - 5th Edition, Aug 22, 2005.

Page 112: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

More XPath Features

• Operator “|” used to implement union – E.g. /bank-2/account/id(@owner) | /bank-2/loan/id(@borrower)

• Gives customers with either accounts or loans• However, “|” cannot be nested inside other operators.

• “//” can be used to skip multiple levels of nodes – E.g. /bank-2//customer_name

• finds any customer_name element anywhere under the /bank-2 element, regardless of the element in which it is contained.

• A step in the path can go to parents, siblings, ancestors and descendants of the nodes generated by the previous step, not just to the children– “//”, described above, is a short from for specifying “all

descendants”– “..” specifies the parent.

• doc(name) returns the root of a named document

©Silberschatz, Korth and Sudarshan10.118Database System Concepts - 5th Edition, Aug 22, 2005.

Page 113: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

What is XQuery?

• XQuery is the language for querying XML data • XQuery for XML is like SQL for databases • XQuery is built on XPath expressions • XQuery is supported by all the major database engines

(IBM, Oracle, Microsoft, etc.) • XQuery is a W3C Recommendation • XQuery can be used to:

– Extract information to use in a Web Service

– Generate summary reports

– Transform XML data to XHTML

– Search Web documents for relevant information

Page 114: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XQuery

• XQuery is a general purpose query language for XML data • Currently being standardized by the World Wide Web Consortium

(W3C)– The textbook description is based on a January 2005 draft of the

standard. The final version may differ, but major features likely to stay unchanged.

• XQuery is derived from the Quilt query language, which itself borrows from SQL, XQL and XML-QL

• XQuery uses a for … let … where … order by …result … (FLOWR)

• syntax for SQL from where SQL where order by SQL order by result SQL select let allows temporary variables, and has no equivalent in SQL

©Silberschatz, Korth and Sudarshan10.120Database System Concepts - 5th Edition, Aug 22, 2005.

Page 115: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

FLWOR Syntax in XQuery

• For clause uses XPath expressions, and variable in for clause ranges over values in the set returned by XPath

• Simple FLWOR expression in XQuery – find all accounts with balance > 400, with each result enclosed in an

<account_number> .. </account_number> tag

for $x in /bank-2/account let $acctno := $x/@account_number where $x/balance > 400 return <account_number> { $acctno } </account_number>

– Items in the return clause are XML text unless enclosed in {}, in which case they are evaluated

©Silberschatz, Korth and Sudarshan10.121Database System Concepts - 5th Edition, Aug 22, 2005.

Page 116: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Flowr expression

• Let clause not really needed in this query, and selection can be done In XPath. Query can be written as:

for $x in /bank-2/account[balance>400]return <account_number> {

$x/@account_number } </account_number>

Page 117: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Joins• Joins are specified in a manner very similar to SQL

for $a in /bank/account, $c in /bank/customer, $d in /bank/depositor

where $a/account_number = $d/account_number and $c/customer_name = $d/customer_name

return <cust_acct> { $c $a } </cust_acct>

• The same query can be expressed with the selections specified as XPath selections: for $a in /bank/account $c in /bank/customer

$d in /bank/depositor[ account_number = $a/account_number and customer_name = $c/customer_name] return <cust_acct> { $c $a } </cust_acct>

©Silberschatz, Korth and Sudarshan10.123Database System Concepts - 5th Edition, Aug 22, 2005.

Page 118: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Nested Queries

• The following query converts data from the flat structure for bank information into the nested structure used in bank-1

<bank-1> { for $c in /bank/customer return

<customer> { $c/* } { for $d in /bank/depositor[customer_name =

$c/customer_name], $a in

/bank/account[account_number=$d/account_number] return $a }

</customer> } </bank-1>

• $c/* denotes all the children of the node to which $c is bound, without the enclosing top-level tag

• $c/text() gives text content of an element without any subelements / tags

©Silberschatz, Korth and Sudarshan10.124Database System Concepts - 5th Edition, Aug 22, 2005.

Page 119: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Sorting in XQuery

• The order by clause can be used at the end of any expression. E.g. to return customers sorted by name for $c in /bank/customer order by $c/customer_name

return <customer> { $c/* } </customer>• Use order by $c/customer_name to sort in descending order• Can sort at multiple levels of nesting (sort by customer_name, and by

account_number within each customer) <bank-1> {

for $c in /bank/customer order by $c/customer_namereturn <customer> { $c/* } { for $d in

/bank/depositor[customer_name=$c/customer_name], $a in

/bank/account[account_number=$d/account_number] }order by $a/account_number

return <account> $a/* </account> </customer>

} </bank-1>

©Silberschatz, Korth and Sudarshan10.125Database System Concepts - 5th Edition, Aug 22, 2005.

Page 120: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Functions and Other XQuery Features

• User defined functions with the type system of XMLSchema function balances(xs:string $c) returns list(xs:decimal*) { for $d in /bank/depositor[customer_name = $c], $a in /bank/account[account_number = $d/account_number] return $a/balance

}• Types are optional for function parameters and return values• The * (as in decimal*) indicates a sequence of values of that type• Universal and existential quantification in where clause predicates

– some $e in path satisfies P

– every $e in path satisfies P

• XQuery also supports If-then-else clauses

©Silberschatz, Korth and Sudarshan10.126Database System Concepts - 5th Edition, Aug 22, 2005.

Page 121: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XQuery Conditional Expressions

• "If-Then-Else" expressions are allowed in XQuery.

• Look at the following example:

for $x in doc("books.xml")/bookstore/book

return if ($x/@category="CHILDREN")

then <child>{data($x/title)}</child>

else <adult>{data($x/title)}</adult>

Page 122: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

XQuery Conditional Expressions

• The result of the example above will be:

<adult>Everyday Italian</adult>

<child>Harry Potter</child>

<adult>Learning XML</adult>

<adult>XQuery Kick Start</adult>

Page 123: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Present the Result In an HTML List

• Look at the following XQuery FLWOR expression:

for $x in doc("books.xml")/bookstore/book/title

order by $x

return $x

• The expression above will select all the title elements under the book elements that are under the bookstore element, and return the title elements in alphabetical order.

Page 124: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Present the Result In an HTML List

• Now we want to list all the book-titles in our bookstore in an HTML list. We add <ul> and <li> tags to the FLWOR expression:

<ul>{for $x in doc("books.xml")/bookstore/book/titleorder by $xreturn <li>{$x}</li>}</ul>

• The result of the above will be:

<ul><li><title lang="en">Everyday Italian</title></li><li><title lang="en">Harry Potter</title></li><li><title lang="en">Learning XML</title></li><li><title lang="en">XQuery Kick Start</title></li></ul>

Page 125: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Presenting results as HTML

• Now we want to eliminate the title element, and show only the data inside the title element:

<ul>{for $x in doc("books.xml")/bookstore/book/titleorder by $xreturn <li>{data($x)}</li>}</ul>

• The result will be (an HTML list):

<ul><li>Everyday Italian</li><li>Harry Potter</li><li>Learning XML</li><li>XQuery Kick Start</li></ul>

Page 126: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Accesing and storing XML

Page 127: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Application Program Interface

• There are two standard application program interfaces to XML data:– SAX (Simple API for XML)

• Based on parser model, user provides event handlers for parsing events

– E.g. start of element, end of element– Not suitable for database applications

– DOM (Document Object Model)

• XML data is parsed into a tree representation

• Variety of functions provided for traversing the DOM tree

• E.g.: Java DOM API provides Node class with methods getParentNode( ), getFirstChild( ), getNextSibling( ) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), …

• Also provides functions for updating DOM tree

©Silberschatz, Korth and Sudarshan10.140Database System Concepts - 5th Edition, Aug 22, 2005.

Page 128: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Storage of XML Data

• XML data can be stored in – Non-relational data stores

• Flat files

– Natural for storing XML– But has all problems of files (no concurrency, no

recovery, …)• XML database

– Database built specifically for storing XML data, supporting DOM model and declarative querying

– Currently no commercial-grade systems– Relational databases

• Data must be translated into relational form

• Advantage: mature database systems

• Disadvantages: overhead of translating data and queries

©Silberschatz, Korth and Sudarshan10.141Database System Concepts - 5th Edition, Aug 22, 2005.

Page 129: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Storage of XML in Relational Databases

• Alternatives:– String Representation

– Tree Representation

– Map to relations

©Silberschatz, Korth and Sudarshan10.142Database System Concepts - 5th Edition, Aug 22, 2005.

Page 130: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

String Representation

• Store each top level element as a string field of a tuple in a relational database– Use a single relation to store all elements, or

– Use a separate relation for each top-level element type

• E.g. account, customer, depositor relations

– Each with a string-valued attribute to store the element

• Indexing:– Store values of subelements/attributes to be indexed as extra fields

of the relation, and build indices on these fields

• E.g. customer_name or account_number

– Some database systems support function indices, which use the result of a function as the key value.

• The function should return the value of the required subelement/attribute

©Silberschatz, Korth and Sudarshan10.143Database System Concepts - 5th Edition, Aug 22, 2005.

Page 131: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

String Representation (Cont.)

• Benefits: – Can store any XML data even without DTD

– As long as there are many top-level elements in a document, strings are small compared to full document

• Allows fast access to individual elements.

• Drawback: Need to parse strings to access values inside the elements– Parsing is slow.

©Silberschatz, Korth and Sudarshan10.144Database System Concepts - 5th Edition, Aug 22, 2005.

Page 132: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Tree Representation

• Tree representation: model XML data as tree and store using relations nodes(id, type, label, value) child (child_id, parent_id)

• Each element/attribute is given a unique identifier• Type indicates element/attribute• Label specifies the tag name of the element/name of attribute• Value is the text value of the element/attribute• The relation child notes the parent-child relationships in the tree

– Can add an extra attribute to child to record ordering of children

bank (id:1)

customer (id:2) account (id: 5)

customer_name(id: 3)

account_number (id: 7)

©Silberschatz, Korth and Sudarshan10.145Database System Concepts - 5th Edition, Aug 22, 2005.

Page 133: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Tree Representation (Cont.)

• Benefit: Can store any XML data, even without DTD• Drawbacks:

– Data is broken up into too many pieces, increasing space overheads

– Even simple queries require a large number of joins, which can be slow

©Silberschatz, Korth and Sudarshan10.146Database System Concepts - 5th Edition, Aug 22, 2005.

Page 134: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Mapping XML Data to Relations

• Relation created for each element type whose schema is known:– An id attribute to store a unique id for each element

– A relation attribute corresponding to each element attribute

– A parent_id attribute to keep track of parent element

• As in the tree representation

• Position information (ith child) can be store too

• All subelements that occur only once can become relation attributes– For text-valued subelements, store the text as attribute value

– For complex subelements, can store the id of the subelement

• Subelements that can occur multiple times represented in a separate table– Similar to handling of multivalued attributes when converting ER

diagrams to tables

©Silberschatz, Korth and Sudarshan10.147Database System Concepts - 5th Edition, Aug 22, 2005.

Page 135: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Storing XML Data in Relational Systems

• Publishing: process of converting relational data to an XML format• Shredding: process of converting an XML document into a set of

tuples to be inserted into one or more relations• XML-enabled database systems support automated publishing

and shredding• Some systems offer native storage of XML data using the xml data

type. Special internal data structures and indices are used for efficiency

©Silberschatz, Korth and Sudarshan10.148Database System Concepts - 5th Edition, Aug 22, 2005.

Page 136: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

SQL/XML

• New standard SQL extension that allows creation of nested XML output– Each output tuple is mapped to an XML element row

<bank>

<account>

<row>

<account_number> A-101 </account_number>

<branch_name> Downtown </branch_name>

<balance> 500 </balance>

</row>

…. more rows if there are more output tuples …

</account>

</bank>

©Silberschatz, Korth and Sudarshan10.149Database System Concepts - 5th Edition, Aug 22, 2005.

Page 137: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

SQL Extensions

• xmlelement creates XML elements• xmlattributes creates attributes

select xmlelement (name “account,

xmlattributes (account_number as account_number),

xmlelement (name “branch_name”, branch_name),

xmlelement (name “balance”, balance))

from account

©Silberschatz, Korth and Sudarshan10.150Database System Concepts - 5th Edition, Aug 22, 2005.

Page 138: Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid emenasalvas@fi.upm.es

Ernestina Menasalvas

Facultad de Informática.

Universidad Politécnica de Madrid

[email protected]