internet-based application architectures for the 21 century: the role of xml henry s. thompson hcrc...
TRANSCRIPT
Internet-based Application Architectures for the 21
Century:The Role of XML
Henry S. ThompsonHCRC Language Technology
GroupDivision of Informatics
University of Edinburgh
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
2
Introduction It's the Web, stupid!
E-commerce, E-business, E-finance Web servers in your microwave ADSL by Easter
Seriously, in the Academy Virtual communities of effort multiply
our effectiveness– Shared resources– Shared tools
Enfranchisement is the key Everybody can play
– Whether Bill lets them or not
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
3XML is ASCII for the 21st century ASCII (ISO 646) solved a fundamental
interchange problem for flat text documents What bits encode what characters
– (For a pretty parochial definition of 'character') UNICODE/ISO 10646 extends that
solution to the whole world XML thought it was doing the same for
simple tree-structured documents The emphasis in the XML design was on
simplifying SGML to move it to the Web XML didn't touch SGML's architectural
vision– flexible linearisation/transfer syntax– for tree-structured documents with internal links
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
4Digression: Just what is XML? It's a markup language used for
annotating text It is concerned with logical structure
to identify sections, titles, section headers, chapters, paragraphs,…
It is not concerned with appearance you say 'this is a subtitle'
not 'this is in bold, 14pt, centered' you say 'this is an example'
not 'this is in verbatim, indented by 5pts, ragged right'
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
5
Why is XML a big deal? It is an official W3C
Recommendation It is vendor-independent, platform
independent, application independent,… unlike Word documents, RTF
documents, PDF documents, Postscript documents,…
It is human readable ditto (for most values of 'human')
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
6
Unformatted textInternet-based Application Architectures for the 21st Century:The Role of XMLLet's skip straight to an example of XML syntax for a simple bit of structure:<tip><emph>Never</emph> stand up in a canoe!</tip>
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
7
Formatted textInternet-based Application Architectures for the 21st
Century:The Role of XML
Let's skip straight to an example of XML syntax for a simple bit of structure: <tip><emph>Never</emph> stand up in a canoe!</tip>
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
8
XML marked up text<article>
<title> Internet-based Application Architectures for the 21st Century: </title> <subtitle>The Role of XML</subtitle>
<section> <para> Let's skip <emph>straight</emph> to an example of XML syntax for a simple bit of structure:</para> <example> <tip><emph>Never</emph> stand up in a canoe!</tip></example> </para> </section></article>
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
9Connecting structure and form There is a stylesheet langauge
called XSLT which will allow us to write simple style rules which will produce the formatted presentation from the structured version
For example <template match='emph'> <I><apply-templates/></I></template>
will do part of the Transformation job
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
10
XML vs HTML Isn’t XML just like HTML?
Like XML, you can sometimes use HTML to mark up things for content rather than appearance,– e.g. <H2>This is a subtitle</H2>
appearance of <H2> is defined elsewhere, usually by someone else
but– a lot of HTML markup is for appearance
e.g. <I>this is italics</I>, <B>this is in bold</B>
– you couldn’t markup <RECIPIENT>Some names</RECIPIENT>
If you know HTML, easy to understand basic XML syntax
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
11
XML vs SGML SGML is more complicated than it
need be because it was designed in the old
days (12 years ago!)
XML is a simplified subset of SGML much less minimisation makes processing easier qua complexity: sits somewhere
between HTML and SGML
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
12
Who is in charge of XML? XML is a W3C Recommendation The W3C is The World Wide Web
Consortium, a voluntary association of companies and non-profit organisations. Membership costs serious money, confers voting rights. Complex procedures, with the Chairman (Tim Berners-Lee) having ultimate authority, guided by a committee of the whole called the Advisory Council.
The XML recommendation was written by the W3C’s XML Working Group.
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
13Digression, v.2: Just what is XML? It's a markup language used for
transferring data It is concerned with data models
to convert between application-appropriate and transfer-appropriate forms
It is not concerned with human beings It's produced and consumed by
programs
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
14
XML as UI A slogan of Adam Bosworth I interpret it in two ways:
At the client end– Use XML plus XSL as the basis for what the
user sees on his/her screen– Use XLinks from a master document to pull
together disparate sources of information At the server end
– Use XML as a uniform interface for any data source onto the web
– Not just documents, but E.g. Databases, process control information, stock quotes
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
15
Application data
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
16
Structured markup<POORDERHDR><DATETIME qualifier="DOCUMENT"> <YEAR>1996</YEAR> <MONTH>06</MONTH> <DAY>30</DAY> <HOUR>23</HOUR> <MINUTE>59</MINUTE> <SECOND>59</SECOND> <SUBSECOND>0000</SUBSECOND> <TIMEZONE>+0100</TIMEZONE> </DATETIME> <OPERAMT qualifier="EXTENDED" type="T"> <VALUE>670000</VALUE> <NUMOFDEC>2</NUMOFDEC> <SIGN>+</SIGN> <CURRENCY>USD</CURRENCY>. . .
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
17
What just happened!? The whole transfer syntax story just
went meta, that's what happened! XML has been a runaway success, on a
much greater scale than its designers anticipated Not for the reason they had hoped
– Because separation of form from content is right But for a reason they barely thought about
– Data must travel the web Tree structured documents are a useable
transfer syntax for just about anything So data-oriented web users think of XML as
a transfer mechanism for their data
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
18The Cambridge Communiqué A W3C Note resulting from a meeting
this August (http://www.w3.org/TR/schema-arch)
Signalled a widespread acceptance of layering:"XML has defined a transfer syntax for tree-structured documents;
"Many data-oriented applications are being defined which build their own data structures on top of an XML document layer, effectively using XML documents as a transfer mechanism for structured data; "
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
19
The Communiqué, cont'd Called for support in XML Schema for
specifying mapping between the XML document data model (or XML Infoset) and application-specific data models
XML Schema is a W3C recommendation-in-progress for definiing the structure of document families
A grammar for markup structure E.g.
artice -> title, subtitle?, section+
orPOORDERHDR -> DATETIME, ORDERAMT
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
20XML Schema: some details Fortunately, XML Schema is actually
notated in XML itself So there are elements defined for use
in schemas to define. . . Elements :-) Attributes Types
A type is a collection of constraints on element content and attribute values
A type may be either simple, for constraining string values complex, for constraining elements which
contain other elements
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
21
A simple example<!ELEMENT text (#PCDATA|emph|name)*><!ATTLIST text timestamp NMTOKEN #REQUIRED>
<xs:element name="text"> <xs:complexType content="mixed"> <xs:element ref="emph"/> <element ref="name"/> <xs:attribute name="timestamp" type="date" minOccurs="1"/> </xs:complexType></xs:element>
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
22Richer type definition example<xs:complexType name='personName'> <xs:element name='title' minOccurs='0'/> <xs:element name='forename' minOccurs='0' maxOccurs='unbounded'/> <xs:element name='surname'/> <xs:attribute name='id' type='integer'/></xs:complexType>
<xs:element name='owner' type='personName'/>
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
23
Mapping between layers We can think of this in two ways
In terms of an abstract data modelling language– Entity-Relation– UML– RDF
In concrete implementation terms– Tables and rows– Class instances and instance variables
The first is more portable The second more immediately
useful
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
24
Mapping between layers 2 Regardless of what approach we take,
we need A vocabulary of data model components An attachment of that vocabulary to schema
components Sample vocabularies
entity, relationship, collection table, row, column instance, variable, list, dictionary
Where should attachment be specified? In the schema
– convenient Outside it
– modular
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
25Specifying mapping in the schema Probably reasonable if done in high-
level (e.g. RDF, UML, ER) terms See example infoset-xmpl.xml,
infoset-uml.xsd
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
26Specifying mapping outside Requires some duplication of
structural information Encourages cross-language working XSLT is the obvious candidate See example infoset-xmpl.xsl
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
27
Compile the Mapping Perhaps we can get the benefits of
both approaches Annotate the schema Compile the annotations into XSLT
– With bindings for separate implementations
Semi-structured data has a role to play here Particularly when the data model
antecedes the document model
Language Technology Group
Henry S. Thompson
Dagstuhl, 2000-03-20
28
Take-home message The point at which idiosyncratic
scripting takes over can be moved one layer up
Using public consensual declarative standards is a Good Thing
Interoperability makes things better for everyone