relax ng, a schema language for xml
DESCRIPTION
Here is a presentation I gave on Relax NG, a schema language for XML, that I gave at OSCON 2003.TRANSCRIPT
9 July 2003 Slide 2
Introductions
A brief, swift, technical overview of RELAX NG
Some comparisons with DTDs and W3C XML Schema
As John Cowan has said: “Once RELAX NG crosses the ‘blood-brain barrier,’ you can never go back!”
9 July 2003 Slide 3
What Is RELAX NG?
RELAX NG is a schema language for XML A schema language for XML describes constraints
for a vocabulary beyond ordinary XML syntax RELAX NG is simple, intuitive, elegant, easy to
use and learn, and has a foundation in finite tree automata
RELAX NG is being forwarded as part of ISO/IEC 19757 Document Schema Definition Languages or DSDL (see http://www.dsdl.org)
9 July 2003 Slide 4
When & Who?
Version 1.0 specs were developed by the RELAX NG technical committee at OASIS between April and December 2001
Merges Murata Makoto’s RELAX and James Clark’s TREX
James Clark is chair of the RELAX NG technical committee
9 July 2003 Slide 5
An Elegant Alternative
Murata Makoto coauthored the RELAX NG tutorial and specification with James Clark
Offers an XML as well as a compact, non-XML syntax
Committed to simplicity, modularity, composability
No side effects, no PSVI An alternative to the dominant W3C XML
Schema
9 July 2003 Slide 6
XML Schema Competition?
Is RELAX NG poised to threaten XML Schema’s dominance? No.
Will RELAX NG replace XML Schema? No.
Will RELAX NG continue to attract seasoned schema developers based on word of mouth? Probably.
9 July 2003 Slide 7
A Few Preliminaries
Patterns describe content and structure of instances
An instance of a schema is a document that complies with that schema
Most RELAX NG patterns can act as a document element for the schema
Structure namespace also denotes version: http://relaxng.org/ns/structure/1.0
9 July 2003 Slide 8
DTDs & RELAX NG
RELAX NG is an evolution of the DTD Elements and other structures are defined,
not declared No concept of associating a schema with an
instance, as with a document type declaration, as in:
<!DOCTYPE date SYSTEM "date.dtd">
9 July 2003 Slide 9
Element Definitions
The <element> element Must have a name attribute or a <name>
child element In compact syntax, defined with an element keyword
XML Schema likewise has an <xs:element> element
9 July 2003 Slide 10
For Example, Elements & Schemas
Following is an element definition which is also a complete schema in XML syntax (date.rng):
<element name="date" xmlns="http://relaxng.org/ns/structure/1.0">
<text/>
</element>
Compact syntax, namespace assumed by default (date.rnc):
element date { text }
9 July 2003 Slide 11
Validating with Jing
Instance (date.xml):
<date>2003-07-09</date>
Jing is a multi-platform RELAX NG validator, written by James Clark, in Java
Validation the element examples:jing date.rng date.xml
jing –c date.rnc date.xml
9 July 2003 Slide 12
Other RELAX NG Tools
James Clark’s Trang, a schema translator (http://thaiopensource.com/relaxng/trang.html)
Sun’s Multi-schema Validator or MSV (http://wwws.sun.com/software/xml/developers/
multischema/) Asami Tomoharu’s Relaxer schema
compiler (http://www.relaxer.org)
9 July 2003 Slide 13
Adding an Attribute
Content models formed by simple nesting The <attribute> element is an example In XML syntax, <text/> is assumed as a
child of <attribute> and can be left out Compact syntax, however, requires the text keyword
XML Schema likewise uses an <xs:attribute> element
9 July 2003 Slide 14
Attributes Examples XML syntax (att.rng):
<element name="date" xmlns="http://relaxng.org/ns/structure/1.0"> <attribute name="type"/> <text/></element>
Compact syntax (att.rnc):element date { attribute type { text }, text}
Match (att.xml):<date type="ISO">2003-07-09</date>
9 July 2003 Slide 15
Empty Elements
Empty elements in XML may have attributes but no text or child element content
Element definitions may not be entirely empty If no attributes are defined, must use <empty/>
or empty in the content of the element definition XML Schema signals empty content by the
absence of a content model
9 July 2003 Slide 16
Empty Elements Examples XML syntax (empty.rng):
<element name="br" xmlns="http://relaxng.org/ns/structure/1.0">
<empty/> </element> Compact (empty.rnc):
element br {empty} XML syntax (image.rng):
<element name="image" xmlns="http://relaxng.org/ns/structure/1.0">
<attribute name="source"/></element>
Compact (image.rnc): element image {attribute source {text}}
Match empty.xml, image.xml
9 July 2003 Slide 17
Namespaces
In RNG, the ns attribute defines the namespace that the pattern should match
ns namespace matches a namespace defined in the instance with xmlns
ns is inherited by child elements xmlns declares, ns defines the matching
namespace Compact syntax uses default and namespace
keywords
9 July 2003 Slide 18
Namespaces Examples XML syntax (ns.rng):
<element name="date" ns="http://www.wyeast.net/date" xmlns="http://relaxng.org/ns/structure/1.0"> <text/></element>
Compact (ns.rnc): default namespace = "http://www.wyeast.net/date"element date {text}
Match (ns.xml): <date xmlns="http://www.wyeast.net/date">2003-07-09</date>
9 July 2003 Slide 19
Occurrence Constraints
<optional> is equivalent to ? (zero or one) in DTDs
<oneOrMore> is equivalent to + in DTDs <zeroOrMore> is equivalent to * in
DTDs ? and + and * work in RNC RELAX NG does not have minOccurs
and maxOccurs equivalents
9 July 2003 Slide 20
One or More Examples XML syntax (dates.rng):
<element name="dates" xmlns=“http://relaxng.org/ns/structure/1.0"> <oneOrMore> <element name="date"><text/></element> </oneOrMore></element>
Compact (dates.rnc): element dates {element date {text}*}
Match (dates.xml): <dates> <date>2003-07-07</date> <date>2003-07-08</date> <date>2003-07-09</date></date>
9 July 2003 Slide 21
choice & group
<choice> matches instances with any one of its children
Compact uses | as in DTDs <group> matches instances with all of its
children Compact uses () as in DTDs XML Schema also uses xs:choice and xs:group
9 July 2003 Slide 22
choice & group Examples XML syntax (instant.rng):
<element name="instant"><choice><group><element name="date"><text/></element><element name="time"><text/></element>
</group><element name="date-time"><text/></element></choice>
</element> Compact (instant.rnc):
element instant {(element date {text}, element time {text})| element date-time {text}}
9 July 2003 Slide 23
Definitions
Create named definitions with <define> Can refer to named definition with <ref> No name conflict with name of definition or
name of element or attribute Similar to <complexType> in XML
Schema
9 July 2003 Slide 24
Definition Examples
XML syntax (see def.rng):<define name="date">
<element name="date"><element name="year"><text/></element>
<element name="month"><text/></element><element name="day"><text/></element></element>
</define>
Compact syntax (see def.rnc):date = element date {element year {text},
element month {text}, element day {text}}
9 July 2003 Slide 25
Grammar
If <define> elements are used, both <grammar> and <start> must be used as well
<grammar> becomes root element for schema
<start> indicates document element in the instance (similar to what DOCTYPE does)
9 July 2003 Slide 26
grammar, start & ref Example
XML syntax (def.rng/.rnc with def.xml):<grammar xmlns="http://relaxng.org/ns/structure/1.0"><start>
<ref name="date"/></start><define name="date">
<element name="date"><element name="year"><text/></element>
<element name="month"><text/></element><element name="day"><text/></element></element>
</define></grammar>
9 July 2003 Slide 27
Datatypes
RELAX NG supports external datatype libraries, namely XML Schema datatypes
The datatypeLibrary attribute indicates the namespace for the datatype library (inherited)
The <data> element with the type attribute <param>, child element of <data>, indicates
facets of datatype per XML Schema
9 July 2003 Slide 28
Datatypes in Compact Syntax
The datatype library is automatically declared for XML Schema datatypes
xsd prefix is required unless XML Schema datatypes is redeclared
parameters are defined with literal strings
9 July 2003 Slide 29
Datatype Examples XML syntax (year.rng with year.xml):
<element name="year" xmlns="http://relaxng.org/ns/structure/1.0">datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <data type="gYear">
<param name="minInclusive">2002</param> <param name="maxInclusive">2005</param> </data>
</element>
Compact syntax (year.rnc also with year.xml):element year {
xsd:gYear {minInclusive="2002" maxInclusive="2005"}}
9 July 2003 Slide 30
Enumerations
In DTDs, enumerations are only possible in attributes, defined in a DTD like this:
<!ATTLIST week day (m|w|f) #REQUIRED>
Enumerations possible in RELAX NG in both elements and attributes
RELAX NG uses the <value> element in XML syntax, literals separated by | in compact syntax
XML Schema uses the <xs:enumeration> facet element, which is a child of <xs:restriction>, which is a child of <xs:simpleType>
9 July 2003 Slide 31
Enumeration Examples
XML syntax (day.rng with day.xml):<element name="day"
xmlns="http://relaxng.org/ns/structure/1.0"><choice><value>m</value><value>w</value><value>f</value></choice>
</element>
Compact syntax (day.rnc also with day.xml):element day { "m" | "w" | "f" }
9 July 2003 Slide 32
Lists
A list is whitespace-separated sequence of tokens
RELAX NG uses the <list> element to define a list, followed by one or more <data> elements
Compact syntax uses the list keyword followed by a comma separated list of types
9 July 2003 Slide 33
More on Lists
Can use occurrence constraints such as <optional> or ?, <oneOrMore> or +, <zeroOrMore> or *
DTDs use NMTOKENS, IDREFS, and ENTITIES, but for attributes only
XML Schema uses <xs:list>, a child of <xs:simpleType>, which can be constrained by facets
9 July 2003 Slide 34
List Examples
XML syntax (vertex.rng with vertex.xml):<element name="vertex" xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<list> <data type="float"/> <data type="float"/> <data type="float"/> </list></element>
Compact syntax (vertex.rnc also with vertex.xml):element vertex {list {xsd:float, xsd:float, xsd:float}}
9 July 2003 Slide 35
Interleave
In RELAX NG, you can define a pattern where element may appear in any order with <interleave>
Restores SGML’s & connector (both in either order) Uses a non-deterministic content model, which DTDs
and XML Schema forbid Can come close with ANY in DTDs and <xs:all>
or <xs:choice> in XML Schema Can use occurrence constraints in <interleave>
9 July 2003 Slide 36
Interleave Examples XML syntax (name.rng with name.xml)<element name="name" xmlns="http://relaxng.org/ns/structure/1.0">
<interleave> <element name="family"><text/></element> <oneOrMore> <element name="given"><text/></element> </oneOrMore> </interleave></element>
Compact syntax (name.rnc with name.xml):element name { element family {text}& element given{text}+ }
9 July 2003 Slide 37
Mixed Content Model
Mixed content models allow child elements to appear in any order and to be mixed with text
In DTDs, the model is (see mixed.dtd):<!ELEMENT name (#PCDATA | family | given)*>
RELAX NG XML syntax uses <mixed> element
9 July 2003 Slide 38
More on Mixed Content
<mixed> is syntax sugar for <interleave> with <text/>
Compact syntax uses mixed keyword, comma separated patterns
XML Schema uses the boolean attribute mixed on either <xs:complexType> or <xs:complexContent>
9 July 2003 Slide 39
Mixed Examples
XML syntax (mixed.rng with instance mixed.xml):
<element name="name" xmlns="http://relaxng.org/ns/structure/1.0">
<mixed><element name="family"><text/></element><element name="given"><text/></element></mixed>
</element> Compact syntax (mixed.rnc with instance mixed.xml):
element name {mixed { element family {text}, element given{text}}}
9 July 2003 Slide 40
Grammars
The <externalRef> element references an external pattern via a required href attribute
You can combine grammars using the combine attribute on define with a value of interleave or &= (match in any order) or choice or |= (match one of any)
The <include> element merges grammar together (has an href attribute)
9 July 2003 Slide 41
More on Grammars
You can nest a grammar in an <include> element which will override a definition with the same name
You can also use <notAllowed/> when merging grammars, forcing you to redefine not-allowed grammars
Grammars can be nested; <parentRef> references or escapes to the parent grammar in a nested grammar, allowing you to redefine a child grammar using a parent, a sort of import
9 July 2003 Slide 42
Name Classes
Name classes allow you to include or exclude whole classes of names in a pattern
<name> allows a given name <anyName/> allows any name <nsName> allows names from a given
namespace The <except> element removes names
from a class
9 July 2003 Slide 43
Schematron & RELAX NG
Schematron is an assertion-based, rather than a grammar-based schema language
Uses path expressions (XSLT and XPath) The “feather duster” that can reach corners
of your instances, where grammars can’t (Rick Jelliffe)
Co-occurrence constraints possible Can embed Schematron in RELAX NG
9 July 2003 Slide 44
Annotations
You can use elements and attributes from other vocabularies in RELAX NG, such as XHTML
<a:documentation> is a special documentation element defined in the RELAX NG DTD compatibility spec (http://www.oasis-open.org/committees/relax-ng/compatibility.html)
<div> allows a place to add foreign attributes for documentation purposes
9 July 2003 Slide 45
RELAX NG Resources
RELAX NG: http://www.relaxng.org OASIS: http://www.oasis-open.org RELAX: http://www.xml.gr.jp/relax/ James Clark, TREX, Jing, Trang:
http://www.thaiopensource.com Design of RELAX NG:
http://www.thaiopensource.com/relaxng/design.html