internet-based application architectures for the 21 century: the role of xml henry s. thompson hcrc...

28
Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University of Edinburgh

Upload: frederick-stewart

Post on 14-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Internet-based Application Architectures for the 21

Century:The Role of XML

Henry S. ThompsonHCRC Language Technology

GroupDivision of Informatics

University of Edinburgh

Page 2: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

2

Introduction It's the Web, stupid!

E-commerce, E-business, E-finance Web servers in your microwave ADSL by Easter

Seriously, in the Academy Virtual communities of effort multiply

our effectiveness– Shared resources– Shared tools

Enfranchisement is the key Everybody can play

– Whether Bill lets them or not

Page 3: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

3XML is ASCII for the 21st century ASCII (ISO 646) solved a fundamental

interchange problem for flat text documents What bits encode what characters

– (For a pretty parochial definition of 'character') UNICODE/ISO 10646 extends that

solution to the whole world XML thought it was doing the same for

simple tree-structured documents The emphasis in the XML design was on

simplifying SGML to move it to the Web XML didn't touch SGML's architectural

vision– flexible linearisation/transfer syntax– for tree-structured documents with internal links

Page 4: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

4Digression: Just what is XML? It's a markup language used for

annotating text It is concerned with logical structure

to identify sections, titles, section headers, chapters, paragraphs,…

It is not concerned with appearance you say 'this is a subtitle'

not 'this is in bold, 14pt, centered' you say 'this is an example'

not 'this is in verbatim, indented by 5pts, ragged right'

Page 5: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

5

Why is XML a big deal? It is an official W3C

Recommendation It is vendor-independent, platform

independent, application independent,… unlike Word documents, RTF

documents, PDF documents, Postscript documents,…

It is human readable ditto (for most values of 'human')

Page 6: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

6

Unformatted textInternet-based Application Architectures for the 21st Century:The Role of XMLLet's skip straight to an example of XML syntax for a simple bit of structure:<tip><emph>Never</emph> stand up in a canoe!</tip>

Page 7: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

7

Formatted textInternet-based Application Architectures for the 21st

Century:The Role of XML

Let's skip straight to an example of XML syntax for a simple bit of structure: <tip><emph>Never</emph> stand up in a canoe!</tip>

Page 8: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

8

XML marked up text<article>

<title> Internet-based Application Architectures for the 21st Century: </title> <subtitle>The Role of XML</subtitle>

<section> <para> Let's skip <emph>straight</emph> to an example of XML syntax for a simple bit of structure:</para> <example> &lt;tip>&lt;emph>Never&lt;/emph> stand up in a canoe!&lt;/tip></example> </para> </section></article>

Page 9: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

9Connecting structure and form There is a stylesheet langauge

called XSLT which will allow us to write simple style rules which will produce the formatted presentation from the structured version

For example <template match='emph'> <I><apply-templates/></I></template>

will do part of the Transformation job

Page 10: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

10

XML vs HTML Isn’t XML just like HTML?

Like XML, you can sometimes use HTML to mark up things for content rather than appearance,– e.g. <H2>This is a subtitle</H2>

appearance of <H2> is defined elsewhere, usually by someone else

but– a lot of HTML markup is for appearance

e.g. <I>this is italics</I>, <B>this is in bold</B>

– you couldn’t markup <RECIPIENT>Some names</RECIPIENT>

If you know HTML, easy to understand basic XML syntax

Page 11: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

11

XML vs SGML SGML is more complicated than it

need be because it was designed in the old

days (12 years ago!)

XML is a simplified subset of SGML much less minimisation makes processing easier qua complexity: sits somewhere

between HTML and SGML

Page 12: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

12

Who is in charge of XML? XML is a W3C Recommendation The W3C is The World Wide Web

Consortium, a voluntary association of companies and non-profit organisations. Membership costs serious money, confers voting rights. Complex procedures, with the Chairman (Tim Berners-Lee) having ultimate authority, guided by a committee of the whole called the Advisory Council.

The XML recommendation was written by the W3C’s XML Working Group.

Page 13: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

13Digression, v.2: Just what is XML? It's a markup language used for

transferring data It is concerned with data models

to convert between application-appropriate and transfer-appropriate forms

It is not concerned with human beings It's produced and consumed by

programs

Page 14: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

14

XML as UI A slogan of Adam Bosworth I interpret it in two ways:

At the client end– Use XML plus XSL as the basis for what the

user sees on his/her screen– Use XLinks from a master document to pull

together disparate sources of information At the server end

– Use XML as a uniform interface for any data source onto the web

– Not just documents, but E.g. Databases, process control information, stock quotes

Page 15: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

15

Application data

Page 16: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

16

Structured markup<POORDERHDR><DATETIME qualifier="DOCUMENT"> <YEAR>1996</YEAR> <MONTH>06</MONTH> <DAY>30</DAY> <HOUR>23</HOUR> <MINUTE>59</MINUTE> <SECOND>59</SECOND> <SUBSECOND>0000</SUBSECOND> <TIMEZONE>+0100</TIMEZONE> </DATETIME> <OPERAMT qualifier="EXTENDED" type="T"> <VALUE>670000</VALUE> <NUMOFDEC>2</NUMOFDEC> <SIGN>+</SIGN> <CURRENCY>USD</CURRENCY>. . .

Page 17: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

17

What just happened!? The whole transfer syntax story just

went meta, that's what happened! XML has been a runaway success, on a

much greater scale than its designers anticipated Not for the reason they had hoped

– Because separation of form from content is right But for a reason they barely thought about

– Data must travel the web Tree structured documents are a useable

transfer syntax for just about anything So data-oriented web users think of XML as

a transfer mechanism for their data

Page 18: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

18The Cambridge Communiqué A W3C Note resulting from a meeting

this August (http://www.w3.org/TR/schema-arch)

Signalled a widespread acceptance of layering:"XML has defined a transfer syntax for tree-structured documents;

"Many data-oriented applications are being defined which build their own data structures on top of an XML document layer, effectively using XML documents as a transfer mechanism for structured data; "

Page 19: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

19

The Communiqué, cont'd Called for support in XML Schema for

specifying mapping between the XML document data model (or XML Infoset) and application-specific data models

XML Schema is a W3C recommendation-in-progress for definiing the structure of document families

A grammar for markup structure E.g.

artice -> title, subtitle?, section+

orPOORDERHDR -> DATETIME, ORDERAMT

Page 20: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

20XML Schema: some details Fortunately, XML Schema is actually

notated in XML itself So there are elements defined for use

in schemas to define. . . Elements :-) Attributes Types

A type is a collection of constraints on element content and attribute values

A type may be either simple, for constraining string values complex, for constraining elements which

contain other elements

Page 21: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

21

A simple example<!ELEMENT text (#PCDATA|emph|name)*><!ATTLIST text timestamp NMTOKEN #REQUIRED>

<xs:element name="text"> <xs:complexType content="mixed"> <xs:element ref="emph"/> <element ref="name"/> <xs:attribute name="timestamp" type="date" minOccurs="1"/> </xs:complexType></xs:element>

Page 22: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

22Richer type definition example<xs:complexType name='personName'> <xs:element name='title' minOccurs='0'/> <xs:element name='forename' minOccurs='0' maxOccurs='unbounded'/> <xs:element name='surname'/> <xs:attribute name='id' type='integer'/></xs:complexType>

<xs:element name='owner' type='personName'/>

Page 23: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

23

Mapping between layers We can think of this in two ways

In terms of an abstract data modelling language– Entity-Relation– UML– RDF

In concrete implementation terms– Tables and rows– Class instances and instance variables

The first is more portable The second more immediately

useful

Page 24: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

24

Mapping between layers 2 Regardless of what approach we take,

we need A vocabulary of data model components An attachment of that vocabulary to schema

components Sample vocabularies

entity, relationship, collection table, row, column instance, variable, list, dictionary

Where should attachment be specified? In the schema

– convenient Outside it

– modular

Page 25: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

25Specifying mapping in the schema Probably reasonable if done in high-

level (e.g. RDF, UML, ER) terms See example infoset-xmpl.xml,

infoset-uml.xsd

Page 26: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

26Specifying mapping outside Requires some duplication of

structural information Encourages cross-language working XSLT is the obvious candidate See example infoset-xmpl.xsl

Page 27: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

27

Compile the Mapping Perhaps we can get the benefits of

both approaches Annotate the schema Compile the annotations into XSLT

– With bindings for separate implementations

Semi-structured data has a role to play here Particularly when the data model

antecedes the document model

Page 28: Internet-based Application Architectures for the 21 Century: The Role of XML Henry S. Thompson HCRC Language Technology Group Division of Informatics University

Language Technology Group

Henry S. Thompson

Dagstuhl, 2000-03-20

28

Take-home message The point at which idiosyncratic

scripting takes over can be moved one layer up

Using public consensual declarative standards is a Good Thing

Interoperability makes things better for everyone