1 advanced topics xml and databases. 2 xml u overview u structure of xml data –xml document type...

16
1 Advanced Topics XML and Databases

Upload: oliver-shepherd

Post on 23-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

1

Advanced Topics

XML and Databases

Page 2: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

2

XML

Overview Structure of XML Data

– XML Document Type Definition DTD

– Namespaces

– XML Schema Query and Transformation

– XPath

– XSLT

– XQuery

Page 3: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

3

XML Overview

eXtensible Markup Language xML Hyper-Text Markup Language (HTML) for document

presentation and Standard Generalized Markup Language SGML for document management.

XML can handle structured data typical of DBMS. XML is flexible and can handle semi-structured data

that cannot be handled by relational DBMS. XML is the de facto representation to exchange data

between applications on the Web.

Page 4: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

4

XML Overview

Markup Language– separation of content and markup;– meaning of the markup;– E.g., HTML shows document markup for

presentation;– Tags – <title> Database System Concepts </title>– HTML has a specific set of tags;– XML is extensible and applications can specify tags

as needed.

Page 5: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

5

XML Overview

Comparison with DBMS– Focus is on the EXCHANGE of data between

applications.– Storage and management of XML is more complex

than for relational DBMS since XML is semi-structured.

– Tagged XML means that the message is self-documenting. No need for catalog, etc.

– Format of XML is not rigid and an application can ignore any fields.

– Versatile since most browsers are XML enabled and most DBMS vendors support XML data.

Page 6: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

6

Structure of XML Data

XML document; single root, e.g., bank in Figure 10.1 Element: bank is the root element and document also

contains customer, account and depositor elements. Elements in the XML document must be properly

nested, i.e., matching start and end tag within parent. <account> <balance> </balance> </account> is

properly nested. <account> <balance> </account> </balance> is not

properly nested. Figure 10.2 – Combine unstructured data (text) and

semi-structured data. This is one of the strengths of XML data exchange.

Page 7: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

7

Structure of XML Data

Nested data in XML can be considered similar to the output of a join from multiple tables or an unnormalized (nested) relational table.

Figure 10.3 shows account elements nested within customer elements. – Advantage is that there is no need to join customer

and account.– Shipping address is stored with each shipment. – Disadvantage is that if customer and account is a

many-to-many relationship then the account information will be replicated with all the disadvantages of replicated information.

Page 8: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

8

Structure of XML Data

Element Subelement– <element> </element> or <element/>

Attribute – Figure 10.4– Attribute is of type string; it cannot be repeated

within an element and cannot have sub-elements.– account is an element; acct-type is an attribute; account-number and branch-name and balance are

subelements of element account.

Page 9: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

9

XML Namespace

Namespace allows organizations to specify globally unique names for element tags.

Each tag or attribute is associated with a URI and this combination of URI and tag (attribute) is unique.

Namespace can be declared in the root element. <bank xmlns:FB=http://www.FirstBank.com> …. <FB:branch> <FB:branchname> …. </FB:branchname> <FB:branchaddress> … </FB:branchaddress> </bank>

Page 10: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

10

XML DTD

XML documents do not have to conform to any schema or set of pre-defined tags.

However, in most cases, applications require that data conforms to some pre-defined tags.

XML DTD

– Allowed list of elements and subelements within elements.

– Does not identify data types and other constraints.

– | (or) + (1 or more) ? (0 or more)

Page 11: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

11

XML DTD Figure 10.6 DTD Example– bank element consists of one or more account or

customer or depositor elements (in that order).– account element has subelements account-number,

branch-number, balance, etc.– elements account-number, branch-name, etc. are of

type #PCDATA (text or string).– empty – element has no contents.– any – element can have any subelements.– attrributes must have a type declaration and a default

value. <!ATTLIST account acct-type CDATA “checking”>

Page 12: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

12

XML DTD ID and IDREF and IDREFS Figure 10.7 ID

– An attribute of type ID for an element provides a unique (global) identifier or key for that element.

– An element can at most have one such attribute of type ID.– <!ATTLIST account account-number ID #REQUIRED

An attribute of type IDREF is a reference to an element; its value MUST BE the unique ID value of some element in the document.

IDREFS is a set of ID values. ID and IDREF and IDREFS capture primary key and foreign key

functionality of the relational data model. Figure 10.8 Example of XML document with ID and IDREFS. IDREF must point to an ID but there is no type checking so it can point

to the ID of an account or the ID of a customer or the ID of a branch!

Page 13: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

13

XML Schema – Figure 10.9 XML Schema is closer in spirit to relational schemas. It is closely associated with namespaces, e.g., xmlns:xsd=http://www.w3.org/2001/XMLSchema>

Supports uniqueness of primary keys and constraints on foreign keys.

element has name and type complexType (account or customer or depositor) is a sequence of

subelements. complexType BankType is a sequence of references to elements

of type account or customer or depositor.– More well defined than XML DTD since IDREF could refer to

an element irrespective of whether it was an account or a customer.

minOccurs and maxOccurs are multiplicity constraints.

Page 14: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

14

Query and Transformation of XML

3 kinds of query languages– XPath is the building block of path expressions.– XSLT is a transformation language.

» Originally designed to convert to HTML.» XSLT can transform one XML document to another so it is also a query

language.» Most widely supported.

– XQuery is more like an object query language. Tree model of XML data

– Root– Nodes are either elements or attributes.– Element nodes can have children which are subelements or

attributes of that element.

Page 15: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

15

Query and Transformation of XML Path expression

– Sequence of /xx/yy/zz where / refers to the root.– Result is a set of values from the XML document.– /bank-2/customer/customer-name on Figure 10.8 returns <customer-

name>Joe</customer-name> and <customer-name>Lisa</customer-name> and <customer-name>Mary</customer-name>

– /bank-2/customer/customer-name/text() would return only the values and not the tagged elements.

– /bank-2/account/@account-number also returns the set of account numbers. @ cannot be applied to IDREFS.

Selection– /bank-2/account[balance > 400]– /bank-2/account[balance > 400]/@account-number

Count– /bank-2/account/[customer/count() > 2]

Skip intermediate elements– /bank-2//name

Page 16: 1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation

16

BMGTG402 Namespace

<402s04grade xmlns:402s04=http://www.rhsmith.umd.edu/is/aqiuol/402s04>

<402s04:grade>

<402s04:student> …. </402s04:student>

<402s04:team> … </402s04:team>

</402s04:grade>