management of xml and semistructured data lecture 11: schemas wednesday, may 2nd, 2001
TRANSCRIPT
![Page 1: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/1.jpg)
Management of XML and Semistructured Data
Lecture 11: Schemas
Wednesday, May 2nd, 2001
![Page 2: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/2.jpg)
Outline
• XML Schema
• Types in Xduce
• Regular tree languages
![Page 3: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/3.jpg)
Attributes in XML Schema
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
. . . . . .
</xsd:sequence>
<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>
</xsd:complexType>
</xsd:element>
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
. . . . . .
</xsd:sequence>
<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>
</xsd:complexType>
</xsd:element>
Attributes are associated to the type, not to the elementOnly to complex types; more trouble if we want to add attributesto simple types.
![Page 4: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/4.jpg)
“Mixed” Content, “Any” Type
• Better than in DTDs: can still enforce the type, but now may have text between any elements
• Means anything is permitted there
<xsd:complexType mixed="true"> . . . .
<xsd:complexType mixed="true"> . . . .
<xsd:element name="anything" type="xsd:anyType"/> . . . .
<xsd:element name="anything" type="xsd:anyType"/> . . . .
![Page 5: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/5.jpg)
“All” Group
• A restricted form of & in SGML• Restrictions:
– Only at top level– Has only elements– Each element occurs at most once
• E.g. “comment” occurs 0 or 1 times
<xsd:complexType name="PurchaseOrderType">
<xsd:all> <xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:all>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="PurchaseOrderType">
<xsd:all> <xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:all>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
![Page 6: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/6.jpg)
Derived Types by Extensions <complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
<complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
Corresponds to inheritance
![Page 7: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/7.jpg)
Derived Types by Restrictions
• (*): may restrict cardinalities, e.g. (0,infty) to (1,1); may restrict choices; other restrictions…
<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>
<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>
Corresponds to set inclusion
![Page 8: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/8.jpg)
Simple Types
• String
• Token
• Byte
• unsignedByte
• Integer
• positiveInteger
• Int (larger than integer)
• unsignedInt
• Long
• Short
• ...
• Time
• dateTime
• Duration
• Date
• ID
• IDREF
• IDREFS
![Page 9: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/9.jpg)
Facets of Simple Types
Examples
• length
• minLength
• maxLength
• pattern
• enumeration
• whiteSpace
• maxInclusive
• maxExclusive
• minInclusive
• minExclusive
• totalDigits
• fractionDigits
•Facets = additional properties restricting a simple type
•15 facets defined by XML Schema
![Page 10: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/10.jpg)
Facets of Simple Types
• Can further restrict a simple type by changing some facets
• Restriction = subset
![Page 11: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/11.jpg)
Not so Simple Types
• List types:
• Union types
• Restriction types
<xsd:simpleType name="listOfMyIntType">
<xsd:list itemType="myInteger"/>
</xsd:simpleType>
<xsd:simpleType name="listOfMyIntType">
<xsd:list itemType="myInteger"/>
</xsd:simpleType>
<listOfMyInt>20003 15037 95977 95945</listOfMyInt>
![Page 12: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/12.jpg)
Types in XDuce
• Xduce = a functional programming language (like ML)
• Emphasis: type checking for its functions• Data model = ordered trees
– Captures XML elements and attributes
• Types = regular expressions– Same expressive power as XML Schema– Simpler concept– Closer connection to regular tree languages
![Page 13: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/13.jpg)
Values in XDuce<bib> <book> <title> ML for the Working Programmer </title> <author> Paulson </author> <year> 1991 </year> </book> <paper> ... </paper> ...</bib>
<bib> <book> <title> ML for the Working Programmer </title> <author> Paulson </author> <year> 1991 </year> </book> <paper> ... </paper> ...</bib>
val x = bib[book[title[“ML for the Working Programmer”], author[“Paulson”], year[“1991”] ], paper[....], ... ]
val x = bib[book[title[“ML for the Working Programmer”], author[“Paulson”], year[“1991”] ], paper[....], ... ]
![Page 14: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/14.jpg)
Types in XDuce
<!ELEMENT bib ((book|paper)*)><!ELEMENT book (title, author*, year, publisher?)><!ELEMENT title #PCDATA>...
<!ELEMENT bib ((book|paper)*)><!ELEMENT book (title, author*, year, publisher?)><!ELEMENT title #PCDATA>...
type Bib = bib[(Book|Paper)*]type Book = book[Title, Author*, Year, Publisher?]type Title = title[String]...
type Bib = bib[(Book|Paper)*]type Book = book[Title, Author*, Year, Publisher?]type Title = title[String]...
![Page 15: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/15.jpg)
Types in XDuce
• Important idea:– Types are first class citizens– Element names are second class
• This is consistent with regular expressions and automata:– Type = state (we will see later)
![Page 16: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/16.jpg)
Example of Types in XDuce
type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]
type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]
![Page 17: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/17.jpg)
Formal Definition of Types in XDuce
T ::= variable
::= base type
::= () /* empty sequence */
::= T,T /* concatenation */
::= T | T /* alternation */
Where are “*” and “?” ?
![Page 18: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/18.jpg)
Types in XDuce
Derived types:
• Given T, the type T* is an abbreviation for:– type X = T, X | ()
• Similarly, T+ and T? are abbreviations for:– type X = T, T*– type Y = T | ()
![Page 19: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/19.jpg)
Types in XDuce
• Danger with recursion:– Type X = a[], X, b[] | ()– What is is ?
• Need to restrict to tail recursive types
![Page 20: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/20.jpg)
Subsumption in Xduce Types
• Definition. T1 <: T2 if the set defined by T1 is a subset of that defined by T2
• Examples– Name, Addr <: Name, Addr, Tel?– Name, Addr, Tel <: Name, Addr, Tel?– T, T, T <: T*
![Page 21: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/21.jpg)
XDuce
• Main goal: given a function, check that it is type correct– Come to Benjamin Pierce’s talk on Monday
• One note:– The type checking algorithm in Xduce incomplete (will
see why, in a couple of lectures)
• Important piece of typechecking:– Checking if T1 <: T2
• Obviously can’t do this for context free languages• But can do for regular languages (next)
![Page 22: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/22.jpg)
Regular Tree Languages
• Given a ranked alphabet, L = L0 L1 . . . Lk • Ranked trees are T ::= a[T1,...,Ti] a Li
Definition Bottom-up tree automata isA = (L, Q, , QF) where:– L = ranked alphabet– Q = set of states– = transition relation, : (i=0,k L x Qi) Q– QF = terminal states
![Page 23: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/23.jpg)
Bottom Up Tree Authomata
Computation on a tree t• For each node t = a[t1,...,ti], if the roots of t1,..., ti are
labeled with states q1, ..., qi and q in (a, q1, ..., qi), then label t with q
• If the root is labeled with a state in QF, then accept
The language accepted by A consists of all trees t accepted by A
A regular tree language is a set of trees accepted by some automaton A
![Page 24: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/24.jpg)
Example of Tree Automaton
• L0 = {b}, L2 = {a}
• Q = {q1, q2}
• (b) = q1, (a,q1,q1) = q2, (a,q2,q2) = q1
• What does this accept ?
![Page 25: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/25.jpg)
Properties of Regular Tree Languages
• If T1, T2 are regular, then so are:– T1 T2– T1 – T2– T1 T2
• If A is a nondeterministic bottom up tree automaton, then there exists an equivalent deterministic one– Not true for “top-down” automata
• If T1, T2 are regular, then it is decidable whether T1 T2
![Page 26: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/26.jpg)
Top-down Automata
• Defined similarly, just the computation differs:– Start from the root at an initial state, move downwards
– If all leaves end in an accepting state, then accept
• Here deterministic automata are strictly weaker– e.g. cannot recognize the set {a[a,b], a[b,a]}
• Nondeterministic bottom up = = deterministic bottom up = nondeterministic top down
![Page 27: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/27.jpg)
Example of a Bottom-up Automaton
• A = (L, Q, , , q0, QF) where
– L = L0 L2, L0 = {a, b}, L2 = {a}
– Q = {T0, T1}– (a) = T0, (b) = T1,– (a, T1, T0) = T1, (a, T0, T1) = T1
type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]
type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]
![Page 28: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/28.jpg)
Regular Tree Languages and XDuce types
• For ranked alphabets, tail-recursive Xduce types correspond precisely to regular tree languages
• Same is true for unranked alphabets, but there the definition of regular tree lnaugages is more complex
![Page 29: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001](https://reader035.vdocuments.site/reader035/viewer/2022062722/56649f2b5503460f94c46911/html5/thumbnails/29.jpg)
Conclusion for Schemas
A Theoretical View
• XML Schemas = Xduce types = regular tree languages
• DTDs = strictly weaker
A Practical View
• XML Schemas still too complex