dtd and xml schema - computing science - simon fraser ... · cmpt 354: database i -- dtd and xml...
TRANSCRIPT
DTD and XML Schema
CMPT 354: Database I -- DTD and XML Schema 2
XML• Extensible Markup Language
– A standard adopted in 1998 by the W3C (World Wide Web Consortium)
• Optional mechanisms for specifying document structure– DTD: the Document Type Definition Language, part of
the XML standard– XML Schema: a more recent specification built on top of
XML• Query languages for XML
– XPath: lightweight– XSLIT: document transformation language– XQuery: a full-blown language
CMPT 354: Database I -- DTD and XML Schema 3
Example
Root element
Mandatory statement
XML element
Element name
Element content
CMPT 354: Database I -- DTD and XML Schema 4
Hierarchical StructurePersonList Student
Title Contents
Person Person
Name: John Doe
Id: 111111111
Address
Number: 123
Street: Main St
Name: Joe Public
Id: 666666666
Address
Number: 666
Street: Hollow Rd
CMPT 354: Database I -- DTD and XML Schema 5
Document Type Definitions
• A set of rules for structuring an XML document– Specified as part of the document itself, or– Give a URL where its DTD can be found– A document that conforms to its DTD is said valid
• XML does not require a document has a DTD, but it must be well formed
• A grammar that specifies a legal XML document, based on the tags used in the document and their attributes
CMPT 354: Database I -- DTD and XML Schema 6
Example – DTD
<!DOCTYPE PersonList[<!ELEMENT PersonList (Title, Contents)><!ELEMENT Title EMPTY><!ELEMENT Contents (Person*)><!ELEMENT Person (Name, Id, Address)><!ELEMENT Name (#PCDATA)><!ELEMENT Id (#PCDATA)><!ELEMENT Address (Number, Street)><!ELEMENT Number (#PCDATA)><!ELEMENT Street (#PCDATA)><!ATTLIST PersonList Type CDATA #IMPLIED
Date CDATA #IMPLIED><!ATTLIST Title Value CDATA #REQUIRED>
]>
CMPT 354: Database I -- DTD and XML Schema 7
DTD Components• Name (e.g., PersonList)
– Must coincide with the tag name of the root element of the document
• One ELEMENT statement for each allowed tag, including the root tag
• For each tag that can have attributes, the ATTLIST statement specifies the allowed attributes and their types
<!DOCTYPE PersonList[<!ELEMENT PersonList (Title, Contents)><!ELEMENT Title EMPTY><!ELEMENT Contents (Person*)><!ELEMENT Person (Name, Id, Address)><!ELEMENT Name (#PCDATA)><!ELEMENT Id (#PCDATA)><!ELEMENT Address (Number, Street)><!ELEMENT Number (#PCDATA)><!ELEMENT Street (#PCDATA)><!ATTLIST PersonList Type CDATA #IMPLIED
Date CDATA #IMPLIED><!ATTLIST Title Value CDATA #REQUIRED>
]>
CMPT 354: Database I -- DTD and XML Schema 8
Specification
• *: a subelement can appear zero or more times– +: a subelement can appear at least one time
• #PCDATA (parsed character data), CDATA: character strings
• #IMPLIED: an attribute is optional• ?: a subelement is optional
– <!ELEMENT Person (Name, Id, Address?)>• |: alternatives of subelements
– <!ELEMENT Name ((First, Last)|(Last, First))>
CMPT 354: Database I -- DTD and XML Schema 9
Types for Attributes
• CDATA: character strings
• ID: unique values• IDREF: referential• IDREFS: list of IDREF
CMPT 354: Database I -- DTD and XML Schema 10
DTD as Data Definition Language
• There are some limitations• Namespaces are not in native design• DTD syntax is quite different from XML• Very limited set of basic types• Limited ways to specify data consistency
constraints– No keys, weak referential integrity, no type references
• No referential integrity for elements• Ordered elements• Global definition of elements
CMPT 354: Database I -- DTD and XML Schema 11
Why XML Schema?• Use the same syntax as that used for ordinary XML
documents– An alternative to DTD
• Integrated with the namespace mechanism– Different schemas can be imported from different namespaces and
integrated into one schema• Provide a number of built-in types similar to SQL, e.g.,
string, integer, and time• Define complex types from simpler ones• The same element name can be defined as different types
depending on where the element is nested• Support keys and referential integrity constraints• Easy to specify documents where elements are unordered
CMPT 354: Database I -- DTD and XML Schema 12
Schema and Instance
• Goal: describing XML schema using XML• An XML document D that conforms to a
given schema (which is another XML document) is said to be schema valid– D is called an instance of the schema
CMPT 354: Database I -- DTD and XML Schema 13
XML Schema and Namespaces• An XML schema document begins with a declaration of the
namespaces to be used• http://www.w3.org/2001/XMLSchema – the namespace
identifying the names of tags and attributes used in a schema (not in the instances)– Describe the structural properties of documents in general, e.g.,
schema, attribute, element, …• http://www.w3.org/2001/XMLSchema-instance – another
namespace used in conjunction with the above one– Identify a small number of special names that are defined in the
XML Schema Specification and are used in the instance documents, e.g., schemaLocation
• The target namespace – identifies the set of names defined by a particular schema document to be used in the instances
CMPT 354: Database I -- DTD and XML Schema 14
Schema and An Instance Document
CMPT 354: Database I -- DTD and XML Schema 15
Report Document
CMPT 354: Database I -- DTD and XML Schema 16
Primitive Types
• DTD has very limited primitive types– CDATA, ID, IDREF, IDREFS
• Many useful primitive types in XML Schema– Decimal, integer, float, Boolean, date, …
• Derive new primitive types from the basic ones– The mechanism is similar to the CREATE
DOMAIN statement in SQL
CMPT 354: Database I -- DTD and XML Schema 17
Deriving Simple Types
• IDREFS is not one of the primitive types<simpleType name=“myIdrefs”>
<list itemType=“IDREF”/></simpleType>
• Union of multiple typesSuppose local phone numbers are 7 digits long and long
distance numbers are 10 digits long<simpleType name=“phoneNumber”>
<union memberTypes=“phone7digits phone10digits”/></simpleType>
CMPT 354: Database I -- DTD and XML Schema 18
Deriving Simple Types by Restriction
• Constrain a basic type using one or more constraints from a fixed repertoire defined by the XML Schema specification<simpleType name=“phone7digits”>
<restriction base=“integer”><minInclusive value=“1000000”/><maxInclusive value=“9999999”/>
</restriction></simpleType>
CMPT 354: Database I -- DTD and XML Schema 19
More Examples
• Phone numbers in XXX-YYYY format<simpleType name=“phone7digitsAndDash”>
<restriction base=“string”><pattern value=“[0-9]{3}-[0-9]{4}”/>
</restriction></simpleType>
• More restrictions on basic string type– <length value=“7”/> – strings of length 7– <minLength value=“7”/> – strings of length >= 7– <maxLength value=“14”/> – strings of length <=14
CMPT 354: Database I -- DTD and XML Schema 20
Enumeration
• Restrict the domain to a finite set• Can be applied to any base type
<simpleType name=“emergencyNumbers”><restriction base=“integer”><enumeration value=“911”/><enumeration value=“333”/><enumeration value=“5431234”/>
</restriction></simpleType>
CMPT 354: Database I -- DTD and XML Schema 21
More Examples on Simple Types
CMPT 354: Database I -- DTD and XML Schema 22
Complex Types
CMPT 354: Database I -- DTD and XML Schema 23
Basics of Complex Types
• Tag complexType• Tag sequence: a list of elements that must occur
in the given order• Using minOccurs and maxOccurs• Associating attributes with type• A complex type can be associated with an element
<element name=“Student” type=“adm:studentType”/>
CMPT 354: Database I -- DTD and XML Schema 24
Element without Content
• Just associate attributes with types• Example
<complexType name=“courseTakenType”><attribute name=“CrsCode” type=“adm:courseRef”/><attribute name=“Semester” type=“string”/>
</complexType>
CMPT 354: Database I -- DTD and XML Schema 25
Compositors• Tags describing how elements can be combined into
groups, e.g., sequence– Required when a tag has complex content– Required even if the type has only one child element!
• Compositor all: allow elements appear in any order<complexType name=“addressType”>
<all><element name=“StreetName” type=“string”/><element name=“StreetNumber” type=“string”/><element name=“city” type=“string”/>
</all></complexType>
CMPT 354: Database I -- DTD and XML Schema 26
Restrictions on Compositor All• All must appear directly below complexType
<complexType name=“studentType2”><sequence>
<all><element name=“First” type=“string”/><element name=“Last” type=“string”/>
</all><element name=“Address” type=“string”/>
</sequence></complexType>
• No element within all can be repeated<complexType name=“studentType3”>
<all><element name=“First” type=“string”/><element name=“Last” type=“string”/> <element name=“Address” type=“string” minOccurs=“1”
maxOccurs=“unbounded”/></all>
</complexType>
CMPT 354: Database I -- DTD and XML Schema 27
Compositor Choice<complexType name=“addressType”>
<sequence><choice>
<element name=“POBox” type=“string”/><sequence>
<element name=“Name” type=“string”/><element name=“Number” type=“string”/>
</sequence></choice><element name=“City” type=“string”/>
</sequence></complexType>
CMPT 354: Database I -- DTD and XML Schema 28
Local Element Names• Two complex types can have elements that share the
same name– Names of students and names of courses– Impossible in DTD, where all element declarations are global
CMPT 354: Database I -- DTD and XML Schema 29
Anonymous Types
• Useful for types that might not be reused<element name=“Report”>
<complexType><sequence>
<element name=“Students” type=…/><element name=“Classes” type=…/><element name=“Course” type=…/>
</sequence></complexType>
</element>
CMPT 354: Database I -- DTD and XML Schema 30
Keys
CMPT 354: Database I -- DTD and XML Schema 31
Foreign Key Constraints
CMPT 354: Database I -- DTD and XML Schema 32
Summary
• DTD: a set of rules for structuring an XML document
• XML Schema: a more sophisticated tool to specify structures of XML documents– XML Schema is written in XML
• Assignment 3