are we there yet? the state of the xml revolution henry s. thompson language technology group hcrc,...

40
Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

Upload: lee-daniel

Post on 21-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

Are we There Yet? The State of the XML Revolution

Henry S. ThompsonLanguage Technology Group

HCRC, University of Edinburgh

Page 2: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

2

Introduction XML is going to solve all our

problems: The information glut Web/print publication coordination Site management Electronic commerce :-) . . . The common cold

Page 3: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

3

Overview Although XML the language is

stable, everything else is still in flux CSS/XSL for Style RDF/Xlink for meta-data DTD/Schema for document structure

definition I’ll give a selective introduction to

the latter two, then summarise business strategy as I see it

Page 4: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

4An aside: what is the W3C? The World Wide Web Consortium. A voluntary association of companies and

non-profit organisations. Membership costs serious money, confers voting rights. Complex procedures, with the Chairman (Tim Berners-Lee) holding all the high cards, but the big vendors (e.g. Microsoft, Adobe, Netscape) have a lot of power.

How do standards get drafted and approved? W3C Draft Recommendations come from

Working Groups with little (XML) or a lot of input from W3C staff (CSS1,2). They are approved by the Chairman.

Page 5: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

5Part 1, The Information Glut We're all drowning in information: our

desktop machines are bursting, our intranets are vast, the World Wide Web is effectively infinite, and significantly different from one week to the next.

Traditional approaches to storage management (hierarchical file systems) and information retrieval (indexing on all words in every document) are clearly already barely coping at best, and are unlikely to meet the challenge.

Page 6: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

6

The problem Exponential increase in average amount

of local disk space per individual. Hyper-exponential growth in connectivity:

Intranets; The Internet

Hierarchical file systems, with perhaps formerly meaningful names, are just not adequate to the organisational task which arises.

The idea, if not yet the reality, of metadata has been widely touted as the solution to these problems

Page 7: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

7

Where did I put that? How can I find the information I need?

In a document I wrote once; In a message I received once; In a message I sent once; In a document a colleague wrote; In a document somewhere on the Internet?

Traditional information-retrieval techniques are not working: Too many of the repositories (e.g.

compressed mail archives) are resistant to indexing;

Plain text indexing is too blunt an instrument.

Page 8: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

8

The solution(s)? There's been a lot of talk about

metadata. What is metadata?

It's just data. But it's data about other data.

What could metadata do for us? Give search engines something to work

with that is designed for their needs. Give us all a place to record what a

document is for or about.

Page 9: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

9Requirements for metadata What would we need to make this

work? A standard syntax, so metadata can be

recognised as such; One or more standard vocabularies, so

search engines, authors and users all speak the same language;

Lots of documents with metadata attached.

Page 10: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

10

What is RDF? RDF is actually two standardisation

efforts, under the aegis of the W3C. It stands for Resource Description

Framework (in other words, data about data).

The two efforts are: Standardising a syntax and abstract

semantics for metadata; Providing a standard way of defining

standard metadata vocabularies (but not actually defining any).

Page 11: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

11

RDF Model and Syntax The model is a labelled directed

graph, with nodes being loci in hyperspace, and labels being atoms.

There is a notion of reification, which allows property types (= edge labels) and individual edges to themselves be described (i.e. effectively be the sources of edges)

The syntax is XML, almost but not quite expressible in a DTD

Page 12: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

12

An RDF example This is expressed using one of

several shorthands The about attribute of a Description

identifies the thing described Each sub-element name is a property Each sub-element either has

– content (the atomic value of its property)– a resource attribute which points to its

value[ora's home

page]

[ora himself]

s:Creator Ora Lassilav:Name

[email protected]:Email

Page 13: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

13

What RDF is not RDF obviously cannot provide the

third requirement: lots of meta-described documents

More subtly, RDF is not a knowledge representation system. There is no notion of what a conformant

RDF application might do other than identify and normalise metadata.

We can easily read much more into RDF annotations than is computationally there.

Page 14: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

14

An alternative answer If RDF is just about connections

between points in hyperspace, why isn't XML Link all that's needed?

Before showing how this can be done, a brief diversion into XML Link

Page 15: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

15

What is XML-Link Just as XML itself simplified SGML while

extending HTML XML-link simplifies HyTime while

extending HTML XML-link provides mechanisms for

Describing links with link elements Identifying links and link ends by type and

role Locating link ends with a powerful locator

syntax Incorporating link elements in-line or out-of-

line Specifying default behaviours

Page 16: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

16

Simple XML-link example This a simple reconstruction of

HTML's A element, specifying a two-ended link in-line with one implicit and one explicit locator<refr xml:link="simple"

href="http://www.w3.org/">The W3C</refr>

On the next slide is a richer example, specifying a two-ended link out-of-line with two explicit locators

Page 17: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

17More complex link example

<connect xml:link='extended'> <dutch xml:link='locator' href='http://www.klm.nl/About/Nederlands/default.htm'/>

<english xml:link='locator' href='http://www.klm.nl/About/default.htm'/> This is a good example of hand-crafted home-page translation pairing.</connect>

Page 18: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

18Using XML Link for metadata XML Link is all about pointing from

place to place in hyperspace So we can use it to reconstruct the

same kind of metadata that RDF is about

The attached example is not of course complete

I've restructured the syntax to make things simpler

Note that by declaring things in the DTD, the instance is much simpler

Page 19: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

19

New aspects of XML Link We use the role attribute to indicate

what part is played by each endpoint of a link

We use the xml:attributes attribute to map attribute names from what the application wants to what XML Link expects

We cheat a bit to get inline string values

So on this account, RDF is just an application of XML Link

Page 20: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

20

Namespaces for XML Where did those colons come from?

xsl:this, fo:that, xml:the_other Two communities pushed for

namespaces Vendors, to manage the composition of

document fragments– E.g. the inclusion of mathematical formulae

in a document Working groups, to reserve names

without compromising users' freedom to name things– E.g. it wouldn't do for XML-Link to reserve

LINK for simple links, or XSL to reserve TEXT

Page 21: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

21

Namespaces, cont'd A second and near-final working

draft W3C recommendation was published in August 1998 There was a lot of vendor pressure to

get something in place, which caused political tension and at least one resignation from the WG

The example illustrates how namespaces are declared, scoped and used

Page 22: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

22

Namespaces defined You can use qualified names,

consisting of two simple names separated by a colon (:)

The namespace prefix is an abbreviation for a URI which uniquely identifies the owner/meaning/identity of the source of the name

Using a namespace essentially cedes responsibility for the meaning of the qualified names to the owner of the URI

Page 23: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

23

Declaring a namespace The association between namespace

prefixes and URIs is declared using reserved attributes<doc xmlns:mml=

'http://www.w3.org/TR/REC-MathML/'>...</doc>

Anywhere inside the above doc element mml is a legal namespace prefix, standing for the URI given

There is also a mechanism for defining the default (unprefixed) namespace

Declarations are scoped Qualified names can be used for

Element type names Attribute names

Page 24: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

24

Namespace limitations An add-on for, not a rewrite of, the

XML spec Validation is unchanged

Declarations must match instances character by character

Indeed there's no place for associating prefixes with URIs in DTDs

There is no provision for merging DTDs XML Schema have responsibility for

addressing these issues in the near future

Page 25: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

25

Conclusion We don't need/aren't ready for high-

level standards yet RDF Database serialisation

The basic mechanisms we have are very powerful XML Namespaces XML-Link

Let's focus on exploiting them creatively Light-weight components Designed for composition and pipelining

Page 26: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

26Part 2: Document Structure Moving from the old (DTD) to the

new (Schemas)

Page 27: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

27

Overview What are schemata, anyway? Where did this all start? The nature of document structure Taking control of structure definition The role of inheritance in structure

definition Comparison with existing techniques Datatypes

Page 28: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

28

Terminology Documents have structure

Document types Document instances

Structure can be defined Informally (D. S. D.) SGML DTD XML DTD Schema using XML

Page 29: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

29

Background SGML DTDs for D. S. D

Sperberg-McQueen Others

Considered for XML itself MCF, then RDF, now DCD, by Bray

et al. XML-Data, two versions, by Layman

et al.

Page 30: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

30

Document Structure Two relations are constitutive

Part-of Kind-of

Existing DSD mechanisms use Content Models to specify part-of relations

But they only specify kind-of relations implicitly or informally

Making kind-of relations explicit would make both understanding and maintenance easier

Page 31: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

31

Taking Control of D. S. D. Eric Naggum used to talk about SGML

allowing users to take control of their data

XML allows the same move one level up, for developers The starting point is much simpler The architecture is congenial The demand is there

We need to do this, to make the transition to validation easier

We need to do it now, to stimulate experimentation

Page 32: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

32

Why validate? A D. S. D. is a contract between

producers and consumers It provides a guaranteed interface Producers validate to ensure they are

providing what they promised Consumers validate to check up on

producers and to protect their applications

Application authors validate to simplify their task Leave error detection and analysis to the

validating parser

Page 33: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

33

Reconstructing DTDs The Schema DTD is expressed in vanilla

XML Top level element types for declaring

Element types :-) Entities Notations . . .

Subordinate element types for declaring Attributes Content models . . .

Page 34: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

34

Schema example<!ELEMENT text (PCDATA|emph|name)*><!ATTLIST text

timestamp NMTOKEN #REQUIRED>

<ElementType name="text" content="mixed">

<attribute type="timestamp"/> <element type="emph"/> <element type="name"/></ElementType><AttributeType name="timestamp"

datatype="ISOTime" default="required"/>

Page 35: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

35The Schema Architecture: Static A document consists of

A schema A document proper

Each is well-formed XML The schema is valid w.r.t the

Schema DTD The document proper is meta-valid

w.r.t the schema

Page 36: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

36The Schema Architecture: Dynamic An XML application (XSP) which

meta-validates In the first instance, semantics in

terms of translation into vanilla XML

‘Takes control’ because changing how schemata work means changing the Schema DTD upgrading XSP accordingly not changing XML itself

Page 37: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

37What to Do with our New Power? Get rid of parameter entities! Two main uses of P. E. s

element type disjunctions for content models

attribute declaration list fragments Both of these are indicative of

implicit kind-of hierarchies

Page 38: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

38Conclusion: XML is moving fast The language itself won't change

much Things around it will change a lot

XML Link Namespaces XML Schema XSL

Page 39: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

39What do I use for a new project today? XML with DTD, no namespaces, unless

you really need SGML Author with one of the free tools Validate with SP (best error messages) Render by:

Using DSSSL and JADE (robust, powerful, flexible)– Upgrade path via full XSL in due course

Using XSL transformation via XT to HTML+CSS– Show with IE4 or Netscape4– Upgrade via XML+CSS in due course

Build tools with SAX/DOM using Javascript, Java or Python

Page 40: Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC, University of Edinburgh

XML: Are we There Yet? Henry S. Thompson

ECN, Curtin, 1999.1.25

40

What will I use in a year? XML+Schema and Namespaces Validate with schema-based parser Render with XSL to screen or print Build tools with the DOM