dtd and xml schema - simon fraser university€¦ · –dtd: the document type definition language,...

32
DTD and XML Schema

Upload: others

Post on 15-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

DTD and XML Schema

Page 2: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 2

XML• Extensible Markup Language

– A standard adopted in 1998 by the W3C (World Wide Web Consortium)

• Optional mechanisms for specifying document structure– DTD: the Document Type Definition Language, part of

the XML standard– XML Schema: a more recent specification built on top of

XML• Query languages for XML

– XPath: lightweight– XSLIT: document transformation language– XQuery: a full-blown language

Page 3: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 3

Example

Root element

Mandatory statement

XML element

Element name

Element content

Page 4: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 4

Hierarchical StructurePersonList Student

Title Contents

Person Person

Name: John Doe

Id: 111111111

Address

Number: 123

Street: Main St

Name: Joe Public

Id: 666666666

Address

Number: 666

Street: Hollow Rd

Page 5: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 5

Document Type Definitions

• A set of rules for structuring an XML document– Specified as part of the document itself, or– Give a URL where its DTD can be found– A document that conforms to its DTD is said valid

• XML does not require a document has a DTD, but it must be well formed

• A grammar that specifies a legal XML document, based on the tags used in the document and their attributes

Page 6: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 6

Example – DTD

<!DOCTYPE PersonList[<!ELEMENT PersonList (Title, Contents)><!ELEMENT Title EMPTY><!ELEMENT Contents (Person*)><!ELEMENT Person (Name, Id, Address)><!ELEMENT Name (#PCDATA)><!ELEMENT Id (#PCDATA)><!ELEMENT Address (Number, Street)><!ELEMENT Number (#PCDATA)><!ELEMENT Street (#PCDATA)><!ATTLIST PersonList Type CDATA #IMPLIED

Date CDATA #IMPLIED><!ATTLIST Title Value CDATA #REQUIRED>

]>

Page 7: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 7

DTD Components• Name (e.g., PersonList)

– Must coincide with the tag name of the root element of the document

• One ELEMENT statement for each allowed tag, including the root tag

• For each tag that can have attributes, the ATTLIST statement specifies the allowed attributes and their types

<!DOCTYPE PersonList[<!ELEMENT PersonList (Title, Contents)><!ELEMENT Title EMPTY><!ELEMENT Contents (Person*)><!ELEMENT Person (Name, Id, Address)><!ELEMENT Name (#PCDATA)><!ELEMENT Id (#PCDATA)><!ELEMENT Address (Number, Street)><!ELEMENT Number (#PCDATA)><!ELEMENT Street (#PCDATA)><!ATTLIST PersonList Type CDATA #IMPLIED

Date CDATA #IMPLIED><!ATTLIST Title Value CDATA #REQUIRED>

]>

Page 8: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 8

Specification

• *: a subelement can appear zero or more times– +: a subelement can appear at least one time

• #PCDATA (parsed character data), CDATA: character strings

• #IMPLIED: an attribute is optional• ?: a subelement is optional

– <!ELEMENT Person (Name, Id, Address?)>• |: alternatives of subelements

– <!ELEMENT Name ((First, Last)|(Last, First))>

Page 9: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 9

Types for Attributes

• CDATA: character strings

• ID: unique values• IDREF: referential• IDREFS: list of IDREF

Page 10: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 10

DTD as Data Definition Language

• There are some limitations• Namespaces are not in native design• DTD syntax is quite different from XML• Very limited set of basic types• Limited ways to specify data consistency

constraints– No keys, weak referential integrity, no type references

• No referential integrity for elements• Ordered elements• Global definition of elements

Page 11: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 11

Why XML Schema?• Use the same syntax as that used for ordinary XML

documents– An alternative to DTD

• Integrated with the namespace mechanism– Different schemas can be imported from different namespaces and

integrated into one schema• Provide a number of built-in types similar to SQL, e.g.,

string, integer, and time• Define complex types from simpler ones• The same element name can be defined as different types

depending on where the element is nested• Support keys and referential integrity constraints• Easy to specify documents where elements are unordered

Page 12: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 12

Schema and Instance

• Goal: describing XML schema using XML• An XML document D that conforms to a

given schema (which is another XML document) is said to be schema valid– D is called an instance of the schema

Page 13: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 13

XML Schema and Namespaces• An XML schema document begins with a declaration of the

namespaces to be used• http://www.w3.org/2001/XMLSchema – the namespace

identifying the names of tags and attributes used in a schema (not in the instances)– Describe the structural properties of documents in general, e.g.,

schema, attribute, element, …• http://www.w3.org/2001/XMLSchema-instance – another

namespace used in conjunction with the above one– Identify a small number of special names that are defined in the

XML Schema Specification and are used in the instance documents, e.g., schemaLocation

• The target namespace – identifies the set of names defined by a particular schema document to be used in the instances

Page 14: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 14

Schema and An Instance Document

Page 15: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 15

Report Document

Page 16: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 16

Primitive Types

• DTD has very limited primitive types– CDATA, ID, IDREF, IDREFS

• Many useful primitive types in XML Schema– Decimal, integer, float, Boolean, date, …

• Derive new primitive types from the basic ones– The mechanism is similar to the CREATE

DOMAIN statement in SQL

Page 17: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 17

Deriving Simple Types

• IDREFS is not one of the primitive types<simpleType name=“myIdrefs”>

<list itemType=“IDREF”/></simpleType>

• Union of multiple typesSuppose local phone numbers are 7 digits long and long

distance numbers are 10 digits long<simpleType name=“phoneNumber”>

<union memberTypes=“phone7digits phone10digits”/></simpleType>

Page 18: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 18

Deriving Simple Types by Restriction

• Constrain a basic type using one or more constraints from a fixed repertoire defined by the XML Schema specification<simpleType name=“phone7digits”>

<restriction base=“integer”><minInclusive value=“1000000”/><maxInclusive value=“9999999”/>

</restriction></simpleType>

Page 19: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 19

More Examples

• Phone numbers in XXX-YYYY format<simpleType name=“phone7digitsAndDash”>

<restriction base=“string”><pattern value=“[0-9]{3}-[0-9]{4}”/>

</restriction></simpleType>

• More restrictions on basic string type– <length value=“7”/> – strings of length 7– <minLength value=“7”/> – strings of length >= 7– <maxLength value=“14”/> – strings of length <=14

Page 20: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 20

Enumeration

• Restrict the domain to a finite set• Can be applied to any base type

<simpleType name=“emergencyNumbers”><restriction base=“integer”><enumeration value=“911”/><enumeration value=“333”/><enumeration value=“5431234”/>

</restriction></simpleType>

Page 21: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 21

More Examples on Simple Types

Page 22: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 22

Complex Types

Page 23: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 23

Basics of Complex Types

• Tag complexType• Tag sequence: a list of elements that must occur

in the given order• Using minOccurs and maxOccurs• Associating attributes with type• A complex type can be associated with an element

<element name=“Student” type=“adm:studentType”/>

Page 24: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 24

Element without Content

• Just associate attributes with types• Example

<complexType name=“courseTakenType”><attribute name=“CrsCode” type=“adm:courseRef”/><attribute name=“Semester” type=“string”/>

</complexType>

Page 25: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 25

Compositors• Tags describing how elements can be combined into

groups, e.g., sequence– Required when a tag has complex content– Required even if the type has only one child element!

• Compositor all: allow elements appear in any order<complexType name=“addressType”>

<all><element name=“StreetName” type=“string”/><element name=“StreetNumber” type=“string”/><element name=“city” type=“string”/>

</all></complexType>

Page 26: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 26

Restrictions on Compositor All• All must appear directly below complexType

<complexType name=“studentType2”><sequence>

<all><element name=“First” type=“string”/><element name=“Last” type=“string”/>

</all><element name=“Address” type=“string”/>

</sequence></complexType>

• No element within all can be repeated<complexType name=“studentType3”>

<all><element name=“First” type=“string”/><element name=“Last” type=“string”/> <element name=“Address” type=“string” minOccurs=“1”

maxOccurs=“unbounded”/></all>

</complexType>

Page 27: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 27

Compositor Choice<complexType name=“addressType”>

<sequence><choice>

<element name=“POBox” type=“string”/><sequence>

<element name=“Name” type=“string”/><element name=“Number” type=“string”/>

</sequence></choice><element name=“City” type=“string”/>

</sequence></complexType>

Page 28: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 28

Local Element Names• Two complex types can have elements that share the

same name– Names of students and names of courses– Impossible in DTD, where all element declarations are global

Page 29: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 29

Anonymous Types

• Useful for types that might not be reused<element name=“Report”>

<complexType><sequence>

<element name=“Students” type=…/><element name=“Classes” type=…/><element name=“Course” type=…/>

</sequence></complexType>

</element>

Page 30: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 30

Keys

Page 31: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 31

Foreign Key Constraints

Page 32: DTD and XML Schema - Simon Fraser University€¦ · –DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top

CMPT 354: Database I -- DTD and XML Schema 32

Summary

• DTD: a set of rules for structuring an XML document

• XML Schema: a more sophisticated tool to specify structures of XML documents– XML Schema is written in XML

• Assignment 3