xml. 2 xml- some links xml tutorials – some links me=html

83
XML

Upload: kerry-walton

Post on 04-Jan-2016

236 views

Category:

Documents


0 download

TRANSCRIPT

XML

2

XML- Some Links

• XML Tutorials – Some Links• http://www.zvon.org/index.php?

nav_id=tutorials&mime=html• http://www.w3schools.com/• http://java.sun.com/xml/tutorial_intro.html

Introduction• XML: Extensible Markup Language• Defined by the WWW Consortium (W3C)• Originally intended as a document markup language

but not a database language– Derived from SGML (Standard Generalized

Markup Language), but simpler to use than SGML

– Extensible, unlike HTML• Users can add new tags, and separately specify

how the tag should be handled for display

4

XML: Application Areas

• XML can be effectively used in the following areas:• Use XML to describe meta-content regarding

documents or on-line resources.– Meta-content is information about a document’s

content – title, author, file size, creation date, keywords and so on.

– Meta-content can be used for example, for searching, information filtering and document management.

– A meta-content description that is external to a file (and not inside an HTML file), improves search performance.

5

XML: Application Areas

• Use XML to publish and exchange database contents– Many applications extract data from backend

database systems, transform the results using the <table> tag of HTML and display it on the screen.

– If data is delivered as an XML document that preserves the original information, such as column names and data types, the information can be used by the client for purposes other than displaying it on the screen.

6

XML: Application Areas

• Use XML as a messaging format for communication between application programs.– Messaging is an exchange of messages

between organizations or between application systems within an organization.

– Traditionally, messaging among companies has traditionally been done by Electronic Data Exchange (EDI). But many small companies cannot participate in EDI because of the cost of building and operating an EDI system. The other problem is agreeing on a standard message format to use. Among the skills needed to build an EDI system is a good understanding of the X12 or EDIFACT message format.

– It is easier to learn XML syntax for messaging.

7

• Data interchange is critical in today’s networked world– Examples:

• Banking: funds transfer• Order processing (especially inter-company orders)• Scientific data

– Chemistry: ChemML, …– Genetics: BSML (Bio-Sequence Markup Language), …

– Paper flow of information between organizations is being replaced by electronic flow of information

• Each application area has its own set of standards for representing information

• XML has become the basis for all new generation data interchange formats

8

Why Use XML?• Simplicity

– XML’s character-based format is simpler and human-readable than binary formats.

– XML messages can be easily created, read and modified using common text editors.

– XML’s ability to represent tree-structured data• Richness Of Data structures

– Can express complex data structures using a rooted tree structure. For many applications, a tree structure is general and powerful enough to express fairly complex data.

• International Character Handling– XML’s ability to handle international character set is

especially handy for Web applications and business transactions that may cross national boundaries.

– XML 1.0 Recommendation is defined based on the ISO-10646 (Unicode) character set.

XML

• The ability to specify new tags, and to create nested tag structures made XML a great way to exchange data, not just documents.

– Much of the use of XML has been in data exchange applications, not as a replacement for HTML

10

ExampleTags make data (relatively) self-documenting

Example:

<?xml version="1.0"?><!-- Books.xml --><books><book> <title>Cosmos</title>

<author>Carl Sagan </author><isbn> 1234 </isbn>

<price>70.00</price> </book><book>

<title>War And Peace</title><author>Leo Tolstoy</author><isbn> 4222</isbn>

<price>50.00</price> </book><book> </book>

</books>

11

Structure Of an XML Document

• Tag: label for a section of data• Element: section of data beginning with <tagname> and

ending with matching </tagname>• Elements must be properly nested

– Proper nesting• <Order> … <LineItem> …. </LineItem> </Order>

– Improper nesting • <Order> … <LineItem> …. </Order> </LineItem>

– Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element.

• Every document must have a single top-level element

12

Nesting Of Data• Nesting of data is useful in data transfer

– Example: elements representing customer-id, customer name, and address nested within an order element

• Nesting is not supported, or discouraged, in relational databases– With multiple orders, customer name and address are

stored redundantly– normalization replaces nested structures in each order by

foreign key into table storing customer name and address information

– Nesting is supported in object-relational databases• But nesting is appropriate when transferring data

– External application does not have direct access to data referenced by a foreign key

13

XML For Text and XML For Data

• XML For Text (Document-centric): is used for marking up documents. The order of the information in the text is important to understand its meaning – Cannot reorder the segments of text.

• XML For Data (Data-centric): Data can be represented in a number of different ways and still have the same meaning.

14

Creating XML Documents• Structure Of XML Documents

• Well-Formed XML Documents

– Obey the rules defined in the XML standard.

– XML elements must have a closing tag

– If there is no element content (usually is the case when the element has an attribute) an abbreviation can be used.

– <Item Type=“Book”/>

– Items must nest completely (cannot be partially nested)

– XML Documents can have only one root element.

15

Creating XML Documents• Structure Of XML Documents• Valid XML Documents:• A valid XML document obeys the rules defined in

the Document Type Definition (DTD).• The structure of an XML document can be pre-

defined with a DTD.• An application can then compare the definition of a

DTD with an actual XML document and check whether the document conforms to the rules defined in the DTD. If so, it is a valid XML document.

16

An XML-based System

XML Document

XML DTD

XML Processor XML Application

17

Some XML Processors

• msxml• XP• nsgmls• saxon• Well-formed XML Documents:

– An XML processor can build a tree structure.• Valid XML Documents

– Obey the rules specified in the DTD– msxml parser is a validating processor.

18

Some XML Parsers and Validators

• XML Validators:• A Java-based DTD and Schema Validator: • http://wwws.sun.com/software/xml/developers/multischema/• To validate an XML file (example.xml) against its DTD

(example.dtd):• java -jar C:\pathWhereYouInstalledValidator\msv-20020414\

msv.jar example.dtd example.xml• To validate an XML file (example.xml) against its schema

(example.xsd):• java -jar C:\pathWhereYouInstalledValidator\msv-20020414\

msv.jar example.xsd example.xml• XML Cooktop: http://www.xmlcooktop.com/• XML Spy: http://www.xmlspy.com/

19

XML Document Type Definitions (DTD)

• A DTD defines the structure of an XML document.• The type of an XML document can be specified using

a DTD• DTD constrains structure of XML data

– What elements can occur– What attributes can/must an element have– What sub-elements can/must occur inside each

element, and how many times.• DTD does not constrain data types

– All values represented as strings in XML

20

XML Document Type Definitions (DTD)

• A DTD defines the structure of an XML document.• Internal DTD: A DTD can be placed inside an XML

document, with a declaration shown below:<!DOCTYPE document_name [ document_type_definition ]>

• External DTD: A DTD can be stored in an external file and the location of the file is given in the declaration:

<!DOCTYPE document_name SYSTEM “url_of_dtd”>

21

Anatomy Of an XML Document And Its Definition in DTD

• Structuring Elements: Elements are the items that make up the structure of the document.

• Elements can contain either data content or a mixture of data content and other elements.

22

• Element-only Content: Elements may only contain other elements. This is the way elements are nested within one another.

• Example:A DTD entry:<!ELEMENT BookOrder (Customer, Order_LineItem+)>An XML entry:<BookOrder>

<Customer /><Order_LineItem /><Order_LineItem />

</BookOrder>A plus (+) means an element can occur one or more times

23

• Mixed Content: Elements may contain zero or more instances of a list of elements, along with any amount of text in any position.

• Example:A DTD entry:<!ELEMENT BookOrder (#PCDATA | Customer|

Order_LineItem)*>An XML entry:<BookOrder>

This is a Book Order For <Customer> Mary Jones </Customer>

</BookOrder>A plus (*) means an element can occur zero or more timesA vertical bar (|) is used to separate elements in a list where any one of the listed

elements can appear.

24

• Text-OnlyContent: Elements may contain text strings.

• Example:A DTD entry:

<!ELEMENT Customer (#PCDATA )>

An XML entry:

<Customer> Mary Jones </Customer>

#PCData (parsed character data) indicates that element’ contents contain text characters and cannot contain any characters that are part of the mark up, such as delimiter <, >, &.

25

• Empty Content: Elements cannot contain anything at all. Any data can be expressed as attributes.

• Example:A DTD entry:

<!ELEMENT Customer EMPTY>

An XML entry:

<Customer />

26

• ANY Content: Elements may contain any type of content - element or text.

• Example:A DTD entry:

<!ELEMENT Customer ANY>

Two XML entries:

<Customer> Mary Jones </Customer>

Or

<Customer>

<Customer> Mary Jones </Customer>

</Customer>

27

Attributes

• Elements can have attributes. This is another way of representing data points is to use attributes attached to elements.

• An element can be broken down into smaller chunks of data (Example: name into firstName and lastName) and the data is attached to the element itself.

• Attributes are specified by name=value pairs inside the starting tag of an element

• An element may have several attributes, but each attribute name can only occur once.

• Example: • <Order order_type=“purchase” priority=“1”>

28

Using Attributes

• Attributes represent information about an element (meta information) and different from the content.

• Attributes are often used with empty elementsSuggestion: use attributes for identifiers of elements, and use

elements for contents

Note: More discussion on elements and attributes later.

29

Attributes

• Attribute specification : for each attribute – Name– Type of attribute

• CDATA• ID (identifier) or IDREF (ID reference) or

IDREFS (multiple IDREFs)

30

Attributes

• An element can have at most one attribute of type ID• The ID attribute value of each element in an XML document

must be distinct

– Thus the ID attribute value is an object identifier• An attribute of type IDREF must contain the ID value of an

element in the same document• An attribute of type IDREFS contains a set of (0 or more) ID

values. Each ID value must contain the ID value of an element in the same document

31

Using Attributes• Example:A DTD entry:

<!ELEMENT Customer EMPTY>

<!ATTLIST Customer ------ element name in which the attributes appear

FirstName CDATA #REQUIRED ------ the attribute must contain some value

LastName CDATA #REQUIRED>

An XML entry:

<Customer

FirstName=“Mary”

LastName=“Jones” />

32

Using Attributes

CDATA section contains content that is treated as plain text. No parsing actions are performed on the content and can also contain the special characters that are not permitted in #PCDATA.

• #REQUIRED: Attribute must appear in the element.

• #FIXED “default”: Attribute value is a constant and must

appear as such in an XML document.• #IMPLIED: If the attribute does not appear in the element, then

an application using the XML document may supply any value.

33

Elements vs Attributes• Representing BookOrder using Elements• BookOrder1.dtd<!ELEMENT BookOrder

(CustomerFirstName,CustomerLastName,CustomerCity,CustomerZip OrderLineItem+)>

<!ELEMENT CustomerFirstName (#PCDATA)>

<!ELEMENT CustomerLastName (#PCDATA)>

<!ELEMENT CustomerCity (#PCDATA)>

<!ELEMENT CustomerZip (#PCDATA)>

<!ELEMENT OrderLineItem (itemNo,quantity,price)>

<!ELEMENT itemNo (#PCDATA)>

<!ELEMENT quantity (#PCDATA)>

<!ELEMENT price (#PCDATA)>

34

BookOrders.xmlbased on BookOrder1.dtd

<?xml version=“1.0” ?><DOCTYPE BookOrder SYSTEM “url/BookOrder1.dtd”><BookOrder>

<CustomerFirstName> Mary </CustomerFirstName> <CustomerLastName> Jones </CustomerLastName> <CustomerCity> San Jose </CustomerCity> <CustomerZip> 95053 </CustomerZip> <OrderLineItem>

<itemNo> 1234 </itemNo> <quantity> 10</quantity><price>100.00</price>

</OrderLineItem><OrderLineItem>

<itemNo> 1444 </itemNo> <quantity> 5</quantity><price>50.00</price>

</OrderLineItem></BookOrder>

35

Elements Vs Attributes

• BookOrder2.dtd using attributes<!ELEMENT BookOrder (OrderLineItem+)>

<!ATTLIST BookOrder

CustomerFirstName CDATA #REQUIRED

CustomerLastName CDATA #REQUIRED

CustomerCity CDATA #REQUIRED

CustomerZip CDATA #REQUIRED >

<!ELEMENT OrderLineItem EMPTY >

<!ATTLIST OrderLineItem

itemNo CDATA #REQUIRED

quantity CDATA #REQUIRED

price CDATA #REQUIRED >

36

BookOrders.xmlbased on BookOrder2.dtd

<?xml version=“1.0” ?><DOCTYPE BookOrder SYSTEM “url/BookOrder2.dtd”>

<BookOrderCustomerFirstName=“Mary”CustomerLastName=“Jones” CustomerCity=“San Jose” CustomerZip=“95053” > <OrderLineItem

itemNo=“1234” quantity=”10”price=“100.00”/>

<OrderLineItem>itemNo=“1444” quantity=”5”price=“50.00”/>

</BookOrder>

37

Limitations of DTDs

• No typing of text elements and attributes– All values are strings, no integers, reals, etc.

• Difficult to specify unordered sets of subelements– Order is usually irrelevant in databases– (A | B)* allows specification of an unordered set, but

• Cannot ensure that each of A and B occurs only once• IDs and IDREFs are untyped

– The owners attribute of an account may contain a reference to another account, which is meaningless

• owners attribute should ideally be constrained to refer to customer elements

38

Writing DTDsExample1.dtd

<!ELEMENT skills (skill)*>

<!ELEMENT skill (name)>

<!ATTLIST skill level CDATA #REQUIRED>

<!ELEMENT name ( #PCDATA )>

Example1.xml

<?xml version = "1.0"?>

<!-- Example1.xml -->

<!DOCTYPE skills SYSTEM "example1.dtd">

<skills>

<skill level="1">

<name > XML How to write DTD </name>

</skill>

</skills>

39

Validating Sun Validator

• Invoking the Validator:• java -jar C:\pathName\msv.jar Example1.dtd Example2.xml

Output:start parsing a grammar.

validating Example1.xml

the document is valid.

40

ID Attribute Type

• Attributes using the ID type serve as unique identifiers for a given instance of an element.

• The value of an ID attribute must be a valid XML name, unique within a document and use the #IMPLIED or #REQUIRED default value.

• #IMPLIED: No default value. The attribute is optional.

• #REQUIRED: The attribute must appear in every element type.

• #FIXED: Attributes may be optional, but when present are constrained to the given value.

• There may only be one ID attribute for one element type.

41

Example<!ELEMENT PurchaseOrders

(Order)*><!ELEMENT Order (Item+)><!ATTLIST Order

OrderId ID #REQUIRED

><!ELEMENT Item

(ItemName,price)><!ATTLIST Item

ItemId ID #REQUIRED><!ELEMENT ItemName

(#PCDATA)><!ELEMENT price (#PCDATA)>

<?xml version = "1.0"?><!-- Example3.xml --><!DOCTYPE PurchaseOrder SYSTEM "example3.dtd"><PurchaseOrders> <Order OrderId="od123"> <Item ItemId="I1">

<ItemName> Item1 </ItemName><price> 20.00 </price>

</Item> <Item ItemId="I2">

<ItemName> Item2 </ItemName><price> 20.00 </price>

</Item>

</Order> <Order OrderId="od456"> <Item ItemId="I3">

<ItemName> Item3 </ItemName><price> 20.00 </price>

</Item> <Item ItemId="I4">

<ItemName> Item4 </ItemName><price> 20.00 </price>

</Item> </Order>

</PurchaseOrders>

42

Example

<!ELEMENT PurchaseOrders (Order,Manufacturer)*>

<!ELEMENT Order (Item+)><!ATTLIST Order

OrderId ID #REQUIRED><!ELEMENT Item (ItemName)><!ATTLIST Item

ItemId ID #REQUIREDManf IDREF #REQUIRED

><!ELEMENT ItemName (#PCDATA)><!ELEMENT Manufacturer EMPTY><!ATTLIST Manufacturer ManfId

ID #REQUIRED >

<?xml version = "1.0"?><!– Example4.xml

--><!DOCTYPE PurchaseOrders SYSTEM

"example4.dtd"><PurchaseOrders> <Order OrderId="od123"> <Item ItemId="I1"

Manf = "m444"> <ItemName> Item1

</ItemName></Item>

</Order><Manufacturer ManfId = "m444"/>

</PurchaseOrders>

43

XML Schemas

44

Useful Links

• Schema tutorial links:• http://www.w3schools.com/schema/default.asp• http://www.xfront.com/• http://www.w3.org/TR/xmlschema-0/

45

Describing Metadata using Schemas

• XML Schema Strategy - two documents are used : – a schema document specifies the properties

(metadata) for a class of resources (objects);– each instance document provides specific values

for the properties.

46

XML Schema: Specifies the Properties for a Class of Resources

<xsd:complexType name="Book"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string" maxOccurs="unbounded"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType>

"For the class of Book resources, we identify fiveproperties - Title, Author, Date, ISBN, and Publisher"

47

XML Instance Document: Specifies Values for the Properties

<Book> <Title>Cosmos</Title> <Author>Sagan</Author> <Date>1977</Date> <ISBN>0-111-34319-4</ISBN> <Publisher>Star Publishers Co.</Publisher></Book>

An XML document with a specific instance of a Book

48

Why Schemas– Enhanced datatypes

• 44+ versus 10• Can create your own datatypes

– Example: "This is a new type based on the string type and elements of this type must follow this pattern: ddd-dddd, where 'd' represents a digit".

– Written in the same syntax as instance documents: less syntax to remember

– Can extend or restrict a type (derive new type definitions on the basis of old ones)

– Can express sets, i.e., can define the child elements to occur in any order

– Can specify element content as being unique (keys on content) and uniqueness within a region

– Can define multiple elements with the same name but different content– Can define substitutable elements - e.g., the "Book" element is

substitutable for the "Publication" element.

49

Schemas and DTDs

• One difference between XML Schemas and DTDs is that the XML Schema vocabulary is associated with a name (namespace). Likewise, the new vocabulary that you define must be associated with a name (namespace).

• With DTDs neither set of vocabulary is associated with a name (namespace).

50

XML Schema Vocabulary

• Schema element: All XML schemas have schema as the root element.

• XML Schema namespace: The elements and datatypes that are used to construct schemas, like, schema, element, complexType, sequence and string come from the http://…/XMLSchema namespace

51

Plants.xsd<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"targetNamespace="http://www.MyDefs.com"xmlns="http://www.MyDefs.com"elementFormDefault="qualified">

<xsd:element name="plants"> <xsd:complexType> <xsd:sequence> <xsd:element ref="plant" minOccurs="1"

maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element>

<xsd:element name="plant"> <xsd:complexType> <xsd:sequence> <xsd:element name="plantName" type="xsd:string" minOccurs="1"

maxOccurs="1"/> <xsd:element name="category" type="xsd:string" minOccurs="1"

maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element></xsd:schema>

52

Plants.xml

<?xml version="1.0"?><plantsxmlns="http://www.MyDefs.com"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation=“Plants.xsd"><plant> <plantName>Daisy</plantName> <category>perennial</category></plant> </plants>

53

Element Definitions• Elements are defined using element construct.• A complex type can be defined for an element, where a complex type usually

contains a set of element and/or attribute declarations and element references.• Example:<xsd:element name="BookCatalog"> <xsd:complexType> <xsd:sequence> <xsd:element name="Book" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/>

</xsd:sequence> </xsd:complexType> </xsd:element>

</xsd:sequence</xsd:complexType>< /xsd:element

54

Example<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="BookCatalog"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/>

…</xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> ….• An element can be declared and we can refer to that element declaration. • The default value for minOccurs is "1"• The default value for maxOccurs is "1"

55

Element DeclarationsUsing Anonymous Types

• An element can be declared and we can refer to that element declaration. Alternatively, an element declaration can be declared in line.

Method 1:<xsd:element name="BookCatalog"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1"

maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element>

Method 2 - Inlining

<xsd:complexType> <xsd:sequence> <xsd:element name="Book"

maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="Title"

type="xsd:string"/> <xsd:element name="Author"

type="xsd:string"/> …. </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element></xsd:schema>

56

Element DeclarationsUsing Named Types

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookCatalog"> <xsd:complexType> <xsd:sequence> <xsd:element name="Book" type="BookPublication" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:complexType name="BookPublication"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType></xsd:schema>

57

Simple vs Complex Types

• When do you use the complexType element and when do you use the simpleType element?– Use the complexType element when you want to

define child elements and/or attributes of an element

– Use the simpleType element when you want to create a new type that is a refinement of a built-in type (string, date, gYear, etc)

58

Expressing Alternates

• DTD: <!ELEMENT category (annual | perennial)>

• Schema: Use choice (an exclusive-or) to contain only one element – either annual or perennial.

59

Plants.xsdUsing sequence and choice<xsd:element name="plant">

<xsd:complexType> <xsd:sequence> <xsd:element name="plantName" type="xsd:string"/> <xsd:element name = "category">

<xsd:complexType><xsd:choice>

<xsd:element name="annual" type="xsd:string"/> <xsd:element name="perennial" type="xsd:string"/> </xsd:choice>

</xsd:complexType> </xsd:element></xsd:sequence></xsd:complexType></xsd:element>

----------------------------------------------------------------------------------<plant> <plantName> Daisy </plantName> <category>

<annual> Start early </annual></category>

</plant>

60

Terminology: Declaration vs Definition

• In a schema:– You declare elements and attributes. Schema components that are

declared are those that have a representation in an XML instance document.

– You define components that are used just within the schema document(s). Schema components that are defined are those that have no representation in an XML instance document.

Declarations:- element declarations- attribute declarations

Definitions:- type (simple, complex) definitions- attribute group definitions- model group definitions

61

Terminology: Global versus Local

• Global element declarations, global type definitions:– These are element declarations/type definitions

that are immediate children of <schema>• Local element declarations, local type definitions:

– These are element declarations/type definitions that are nested within other elements/types.

62

Global type definition

Global type definition

Global element declaration

Local element declarationsLocal type definition

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:complexType name="Publication"> <xsd:sequence> <xsd:element name="Title" type="xsd:string" maxOccurs="unbounded"/> <xsd:element name="Author" type="xsd:string" maxOccurs="unbounded"/> <xsd:element name="Date" type="xsd:gYear"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="BookPublication"> <xsd:complexContent> <xsd:extension base="Publication" > <xsd:sequence> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element name="Book" type="BookPublication" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element></xsd:schema>

63

Data Types• Data types are the basic building blocks.

– Primitive Data types: Are built-in types.– String, Boolean, Float, Double, ID, IDREF, Entity etc.– Cannot contain child elements or attributes.– Derived Data types: Defined in terms of existing types.

For example, CDATA type is derived from the String type.– May have attributes, elements or mixed content.– Built-in Derived types for XML Schema: Some of the

commonly used derived data types are built into XML Schema

– Examples: CDATA, token, name, integer, long, short, byte, date etc.

64

Aspects Of Data types

• All XML datatypes are comprised of three parts:– A value space– A lexical space– A set of facets

65

Problem

• Defining the Date element to be of type string is unsatisfactory (it allows any string value to be input as the content of the Date element, including non-date strings).

– We would like to constrain the allowable content that Date can have. Modify the BookCatalog schema to restrict the content of the Date element to just date values (actually, year values. See next two slides).

• Similarly, constrain the content of the ISBN element to content of this form: d-ddddd-ddd-d or d-ddd-ddddd-d or d-dd-dddddd-d, where 'd' stands for 'digit'

66

The date Datatype

• A built-in datatype (i.e., schema validators know about this datatype)

• This datatype is used to represent a specific day (year-month-day)

• Elements declared to be of type date must follow this form: CCYY-MM-DD

Example: 2003-05-31 represents May 31, 2003

• Elements declared to be of type gYear must follow this form: CCYY

67

RegularExpressions

• Using Regular Expressions and patterns, the string type can be restricted.Example:<xsd:simpleType name="ISBNType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{1}-\d{5}-\d{3}-\d{1}"/> <xsd:pattern value="\d{1}-\d{3}-\d{5}-\d{1}"/> <xsd:pattern value="\d{1}-\d{2}-\d{6}-\d{1}"/> </xsd:restriction></xsd:simpleType>

OR <xsd:simpleType name="ISBNType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{1}-\d{5}-\d{3}-\d{1}|\d{1}-\d{3}-\d{5}-\d{1}|\d{1}-\

d{2}-\d{6}-\d{1}"/> </xsd:restriction></xsd:simpleType>

68

Built-in Datatypes

• Primitive Datatypes

– string

– boolean

– decimal

– float

– double

– duration

– dateTime

– time

– date

– gYearMonth

– gYear

– gMonthDay

• Atomic, built-in

– "Hello World"

– {true, false}

– 7.08

– 12.56E3, 12, 12560, 0, -0, INF, -INF, NAN

– 12.56E3, 12, 12560, 0, -0, INF, -INF, NAN

– P1Y2M3DT10H30M12.3S– format: CCYY-MM-DDThh-mm-ss

– format: hh:mm:ss.sss

– format: CCYY-MM-DD

– format: CCYY-MM

– format: CCYY

– format: --MM-DD

Note: 'T' is the date/time separator INF = infinity NAN = not-a-number

69

Built-in Datatypes (cont.)

• Primitive Datatypes

– gDay

– gMonth

– hexBinary

– base64Binary

– anyURI

– QName

– NOTATION

• Atomic, built-in

– format: ---DD (note the 3 dashes)

– format: --MM--

– a hex string

– a base64 string

– http://www.xfront.com

– a namespace qualified name

– a NOTATION from the XML spec

70

Built-in Datatypes (cont.)• Derived types

– normalizedString

– token

– language

– IDREFS

– ENTITIES

– NMTOKEN

– NMTOKENS

– Name

– NCName

– ID

– IDREF

– ENTITY

– integer

– nonPositiveInteger

• Subtype of primitive datatype

– A string without tabs, line feeds, or carriage returns

– String w/o tabs, l/f, leading/trailing spaces, consecutive spaces

– any valid xml:lang value, e.g., EN, FR, ...

– must be used only with attributes

– must be used only with attributes

– must be used only with attributes

– must be used only with attributes

– part (no namespace qualifier)

– must be used only with attributes

– must be used only with attributes

– must be used only with attributes

– 456

– negative infinity to 0

71

Built-in Datatypes (cont.)

• Derived types

– negativeInteger

– long

– int

– short

– byte

– nonNegativeInteger

– unsignedLong

– unsignedInt

– unsignedShort

– unsignedByte

– positiveInteger

• Subtype of primitive datatype

– negative infinity to -1

– -9223372036854775808 to 9223372036854775808

– -2147483648 to 2147483647

– -32768 to 32767

– -127 to 128

– 0 to infinity

– 0 to 18446744073709551615

– 0 to 4294967295

– 0 to 65535– 0 to 255

– 1 to infinity

Note: the following types can only be used with attributes (which we will discuss later): ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, and ENTITIES.

72

Creating your own Datatypes

• A new datatype can be defined from an existing datatype (called the "base" type) by specifying values for one or more of the optional facets for the base type.

• Example. The string primitive datatype has six optional facets:

– length

– minLength

– maxLength

– pattern

– enumeration

– whitespace (legal values: preserve, replace, collapse)

73

Example of Creating a New Datatype by Specifying Facet Values

<xsd:simpleType name="TelephoneNumber"> <xsd:restriction base="xsd:string"> <xsd:length value="8"/> <xsd:pattern value="\d{3}-\d{4}"/> </xsd:restriction></xsd:simpleType>

1. This creates a new datatype called 'TelephoneNumber'. 2. Elements of this type can hold string values, 3. But the string length must be exactly 8 characters long and 4. The string must follow the pattern: ddd-dddd, where 'd' represents a 'digit'. (Obviously, in this example the regular expression makes the length facet redundant.)

1

2

3

4

74

Another Example

<xsd:simpleType name="US-Flag-Colors"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="red"/> <xsd:enumeration value="white"/> <xsd:enumeration value="blue"/> </xsd:restriction></xsd:simpleType>

This creates a new type called US-Flag-Colors.An element declared to be of this typemust have either the value red, or white, or blue.

75

Example

<xsd:simpleType name= "EarthSurfaceElevation"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="-1290"/> <xsd:maxInclusive value="29035"/> </xsd:restriction></xsd:simpleType>

This creates a new datatype called 'EarthSurfaceElevation'.Elements declared to be of this type can hold an integer. However, the integer is restricted to have a value between -1290 and 29035, inclusive.

76

General Form of Creating a New Datatype by Specifying Facet Values

<xsd:simpleType name= "name"> <xsd:restriction base= "xsd:source"> <xsd:facet value= "value"/> <xsd:facet value= "value"/> … </xsd:restriction></xsd:simpleType>

Facets: - length - minlength - maxlength - pattern - enumeration - minInclusive - maxInclusive - minExclusive - maxExclusive ...

Sources: - string - boolean - number - float - double - duration - dateTime - time ...

See DatatypeFacets.html for a mapping of datatypes to their facets.

77

Multiple Facets - "and" them together, or "or" them together?

An element declared to be of type TelephoneNumbermust be a string of length=8 and the string must follow the pattern: 3 digits, dash, 4 digits.

<xsd:simpleType name="US-Flag-Color"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="red"/> <xsd:enumeration value="white"/> <xsd:enumeration value="blue"/> </xsd:restriction></xsd:simpleType>

<xsd:simpleType name="TelephoneNumber"> <xsd:restriction base="xsd:string"> <xsd:length value="8"/> <xsd:pattern value="\d{3}-\d{4}"/> </xsd:restriction></xsd:simpleType>

An element declared to be of type US-Flag-Color must be a string with a value of either red, or white, or blue.

Patterns, enumerations => "or" them togetherAll other facets => "and" them together

78

Creating a simpleType from another simpleType

• Thus far we have created a simpleType using one of the built-in datatypes as our base type.

• However, we can create a simpleType that uses another simpleType as the base. See next slide.

79

<xsd:simpleType name= "EarthSurfaceElevation"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="-1290"/> <xsd:maxInclusive value="29035"/> </xsd:restriction></xsd:simpleType>

<xsd:simpleType name= "BostonAreaSurfaceElevation"> <xsd:restriction base="EarthSurfaceElevation"> <xsd:minInclusive value="0"/> <xsd:maxInclusive value="120"/> </xsd:restriction></xsd:simpleType>

This simpleType uses EarthSurfaceElevation as its base type.

80

Fixing a Facet Value

• Sometimes when we define a simpleType we want to require that one (or more) facet have an unchanging value. That is, we want to make the facet a constant.

<xsd:simpleType name= "ClassSize"> <xsd:restriction base="xsd:nonNegativeInteger"> <xsd:minInclusive value="10" fixed="true"/> <xsd:maxInclusive value="60"/> </xsd:restriction></xsd:simpleType>

simpleTypes whichderive from thissimpleType may not change thisfacet.

81

<xsd:simpleType name= "ClassSize"> <xsd:restriction base="xsd:nonNegativeInteger"> <xsd:minInclusive value="10" fixed="true"/> <xsd:maxInclusive value="60"/> </xsd:restriction></xsd:simpleType>

<xsd:simpleType name= "BostonIEEEClassSize"> <xsd:restriction base="ClassSize"> <xsd:minInclusive value="15"/> <xsd:maxInclusive value="60"/> </xsd:restriction></xsd:simpleType>

Error! Cannotchange the valueof a fixed facet!

82

Element Containing a User-Defined Simple Type

Example. Create a schema element declaration for an elevation element. Declare the elevation element to be an integer with a range -1290 to 29035

<elevation>5240</elevation>

Here's one way of declaring the elevation element:

<xsd:simpleType name="EarthSurfaceElevation"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="-1290"/> <xsd:maxInclusive value="29035"/> </xsd:restriction></xsd:simpleType><xsd:element name="elevation" type="EarthSurfaceElevation"/>

83

Element Containing a User-Defined Simple Type (cont.)

Here's an alternative method for declaring elevation:

<xsd:element name="elevation"> <xsd:simpleType> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="-1290"/> <xsd:maxInclusive value="29035"/> </xsd:restriction> </xsd:simpleType></xsd:element>

The simpleType definition isdefined inline, it is an anonymoussimpleType definition.

The disadvantage of this approach isthat this simpleType may not bereused by other elements.