www.monash.edu.au cse4500 information retrieval systems xml schema – part 1

35
www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

Upload: martin-maurice-reed

Post on 27-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

CSE4500 Information Retrieval Systems

XML Schema – Part 1

Page 2: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

2

Why Schema?

• Expressed in XML• Ability to derive new data type• Extensible• Self Documenting

Page 3: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

3

Example- XML Doc

<?xml version="1.0" encoding="UTF-8"?>

<book xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="flatBook.xsd">

<author>John Howard</author>

<editor> George W Bush</editor>

<title>Memoir of Saddam</title>

</book>

Page 4: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

4

Example- Schema File

<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema>

<xs:element name="book"><xs:complexType>

<xs:sequence><xs:element name="author" type="xs:string"/><xs:element name="editor" type="xs:string"/><xs:element name="title" type="xs:string"/>

</xs:sequence></xs:complexType>

</xs:element></xs:schema>

Page 5: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

5

Attaching document to a schema

XML document entry:<?xml version="1.0" encoding="UTF-8"?>

<bookshop xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="D:\subject\2003\IR\Examples\bookshopLocal.xsd">

XML Schema entry:<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

Page 6: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

6

Element Content Models - revisited

• Content Models:– Any– Empty

> no child element nor text node are expected.

– Simple (text only)> only text node is expected

– Complex (element only)> only child element is expected

– Mixed> both child element and text node are expected

• Attributes, Comments and Processing Instructions are ignored.

Page 7: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

7

Data Types

• Simple Type– contains a simple (text only) without any attribute.

• Complex Type– May contain any, empty, simple, complex (element only),

or mixed content model.– A simple content with an attribute is considered as a

complex type.– All complex types are user-derived data types.

Page 8: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

8

Data Types

• Built-in data types– Data types that are defined in the W3C’s specification.– http://www.w3.org/TR/xmlschema-2/#built-in-datatypes

> Primitive data types

– eg string, date, float, decimal, etc

> Derived data types

– eg interger, nonNegativeInteger. These are derived from decimal.

– Example: <xs:element name="author" type="xs:string"/>

Page 9: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

9

Data Types

• User-derived data types– Data types that are defined by the XML Schema designer.– Example:

<xs:element name="book"><xs:complexType> <xs:sequence>

<xs:element name="title” type="xs:string“/>

<xs:element name=“publisher”

type="xs:string"/> </xs:sequence></xs:complexType>

</xs:element

Page 10: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

10

Declaration vs Definition

• Declaration– It is used to declare an element or an attribute with

its associated name and data type.– <xs:element name="author" type="xs:string"/>

• Definition– It is used to define a user derived data type.

<xs:complexType><xs:sequence>

<xs:element name="author" type="xs:string"/>

<xs:element name="editor" type="xs:string"/>

<xs:element name="title" type="xs:string"/>

</xs:sequence></xs:complexType>

Page 11: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

11

Element Declaration

• <xs:element name=“elementName” type=“dataType”>• Examples:

Simple type<xs:element name="author" type="xs:string"/>

Complex Type<xs:element name="book">

<xs:complexType> <xs:sequence>

<xs:element name="title” type="xs:string“/>< <xs:element

name=“publisher” type="xs:string"/>

</xs:sequence></xs:complexType>

</xs:element

Page 12: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

12

Attribute Declaration

• <xs:attribute name=“attribute_name” type=“datatype” use=“…”>

• The data type of an attribute is always a simple type.

• Possible values for attribute use> required> prohibited> optional

– The default value is optional

– Prohibited mainly used to create a derived type without the concerned attribute.

Page 13: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

13

Simple Type with Simple Content (1)

<title> Harry Potter and The Philosopher Stone </title>

<xs:element name=“title” type=“xs:string”>

element title is a simple type

Page 14: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

14

Simple Type with Simple Content (2)

<title language=“english”> Harry Potter and The Philosopher Stone </title>

<xs:element name="title"><xs:complexType>

<xs:simpleContent><xs:extension base="xs:string">

<xs:attribute name="language" type="xs:string“ use="required"/>

</xs:extension></xs:simpleContent>

</xs:complexType></xs:element>

element title IS NOT a simple type (it is a complex type)

attribute language is a simple type

Page 15: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

15

Complex Type Definition

<book>

<title language=“english”> Harry Potter and The Philosopher Stone </title>

</book>

element book and title is a complex type

Page 16: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

16

ComplexType Example

<xs:element name="book">

<xs:complexType><xs:sequence><xs:element name="title">

<xs:complexType><xs:simpleContent>

<xs:extension base="xs:string"><xs:attribute name="language" type="xs:string"

use="required"/></xs:extension>

</xs:simpleContent></xs:complexType>

</xs:element></xs:sequence>

</xs:complexType></xs:element>

Page 17: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

17

Complex Type with Simple Content

• Complex Type with Simple Content<title language=“english”> Harry Potter and The Philosopher Stone

</title>

<xs:element name="title"><xs:complexType>

<xs:simpleContent><xs:extension base="xs:string">

<xs:attribute name="language"

type="xs:string"

use="required"/></xs:extension>

</xs:simpleContent></xs:complexType>

</xs:element>

Page 18: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

18

Complex Type with Complex Content

• A complex content model contains one or more child elements.

• The structure of child elements is determined by the following keywords:– sequence– choice– all

Page 19: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

19

Sequence

• Ordered List

<book><title>Professional XML</title><publisher> WROX </publisher>

</book>

<xs:element name="book"><xs:complexType>

<xs:sequence><xs:element name="title” type="xs:string"

maxOccurs="unbounded"/>< xs:element name=“publisher” type="xs:string"/>

</xs:sequence></xs:complexType>

</xs:element>

Page 20: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

20

Choice – XML Schema<xs:element name="book">

<xs:complexType><xs:sequence>

<xs:element name="author"><xs:complexType>

<xs:choice><xs:sequence>

<xs:element name="firstname" type="xs:string"/>

<xs:element name="middlename" type="xs:string"/>

<xs:element name="lastname" type="xs:string"/></xs:sequence><xs:sequence>

<xs:element name="lastname" type="xs:string"/><xs:element name="firstname"

type="xs:string"/></xs:sequence>

</xs:choice></xs:complexType>

</xs:element></xs:sequence>

</xs:complexType></xs:element>

Page 21: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

21

Choice – XML Document

<book><author>

<firstname>George</firstname><middlename>Walker</middlename><lastname>Bush</lastname>

</author></book>

<book><author><lastname>Howard</lastname><firstname>John</firstname></author>

</book>

Page 22: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

22

All

• unordered list• cardinality of each member of the list is

1(maxOccur=1 and minOccurs=1)• cardinality of the list can be either 0 or 1

– 0 => minOccurs=0, maxOccurs=1– 1 => minOccurs=1, maxOccurs=1

Page 23: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

23

All – XML Schema

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="book">

<xs:complexType>

<xs:all minOccurs="0">

<xs:element name="author" type="xs:string"/>

<xs:element name="editor" type="xs:string"/>

</xs:all>

</xs:complexType>

</xs:element>

</xs:schema>

Page 24: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

24

All – XML Doc

<?xml version="1.0"?>

<book xsi:noNamespaceSchemaLocation="all.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<editor>George Bush</editor>

<author>John Howard</author>

</book>

Page 25: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

25

Complex Type with Empty Content

• There are two ways that an empty content model for the complex type can be created:– Verbose

> As a restriction of an ANY type

– Compact> Omitting the keyword for defining the content model.

• Example:– Break element in an HTML => <br/>

Page 26: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

26

Verbose

<xs:element name=“br">

<xs:complexType>

<xs:complexContent>

<xs:restriction base="xs:anyType">

</xs:restriction>

</xs:complexContent>

</xs:complexType>

</xs:element>

Page 27: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

27

Compact

<xs:element name=“br”>

<xs:complexType>

</xs:complexType>

</xs:element>

Page 28: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

28

Complex Content with Mixed Content

<?xml version="1.0"?>

<book xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="mixedContent.xsd">

<title>Harry Potter and The Philosopher's Stone</title> written by J.K Rowling

</book>

book element has a mixed content model

Page 29: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

29

Complex Type with Mixed Content

<xs:element name="book">

<xs:complexType mixed="true">

<xs:sequence>

<xs:element name="title" type="xs:string" maxOccurs="unbounded"/>

</xs:sequence>

</xs:complexType>

</xs:element>

Page 30: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

30

Attaching an Attribute to an Element

• The content model of an element determines the method used to attach an attribute to the element.

Page 31: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

31

Attaching an attribute to an element with a simple content

• Use an extension of a simple type

<xs:element name="title"><xs:complexType> <xs:simpleContent>

<xs:extension base="xs:string"> <xs:attribute name="language”

type="xs:string“ use="required"/>

</xs:extension> </xs:simpleContent></xs:complexType>

</xs:element>

Page 32: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

32

Attaching an attribute to an element with a complex content

• To attach an attribute to the element in this category, we place the declaration of attribute after the declaration of child elements.

<xs:element name="person"><xs:complexType> <xs:sequence>

<xs:element name="firstname" type="xs:string"/><xs:element name="lastname" type="xs:string"/>

</xs:sequence> <xs:attribute name="ID" type="xs:ID"/></xs:complexType>

</xs:element>

Page 33: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

33

Attaching an attribute to an element with an empty content

• The declaration of the attribute is placed within the definition of a complexType.

<img src=“whitehouse.jpg”>

<xs:element name="img"><xs:complexType>

<xs:attribute name="src" type="xs:string" use="required"/>

</xs:complexType></xs:element>

Page 34: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

34

Cardinality

• The number of the minimum and the maximum instances in a given element can be specified using the attributes minOccurs and maxOccurs.

• The default values for the maximum and the minimum are ONE.

• Example:<xs:element name="title" type="xs:string"

maxOccurs="unbounded"/>

<xs:element name="title" type="xs:string" minOccurs=“0”maxOccurs="unbounded"/>

Page 35: Www.monash.edu.au CSE4500 Information Retrieval Systems XML Schema – Part 1

www.monash.edu.au

35

Week 3 Reflection

Content Model Attribute Data Type

Empty N/A ?

Simple (text only) Yes ?

Simple (text only) No ?

Complex (element only)

N/A ?

Mixed N/A ?