introduction to xml valérie bellynck efpg-inpg france mailto:[email protected]

64
Introduction to XML Valérie Bellynck EFPG-INPG France mailto:Valerie.Bellynck@efp g.inpg.fr

Upload: theodore-bradley

Post on 11-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Introduction to XML

Valérie Bellynck

EFPG-INPG

France

mailto:[email protected]

Page 2: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

What is XML ?

• means : eXtensible Markup Language(in French « langage à balises extensible », or « langage à balises

extensibles » ; in spanish ?)

• 1996 : clarification by the XML Working Group, under World Wide Web Consortium (W3C) supervision

• XML ~ generalisation of HTML wherefixed semantic predefined tags author « invented » own tags

• 1998 : official evolution to standardXML 1.0 specifications recommandations

From "XML in Micro-Application", e-Poche collection

http://www.w3c.org/XML/

Page 3: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

HTML ? XML ? SGML

XML comes from SGML, not from HTML

From XML in Micro-Application e-Poche collection

GML (1969)Goldfarb (IBM), Mosher and Lorie

SGML (1980)ANSI

HTML (1990)CERN-W3C

XML (1998)W3C

XHTML (1998)W3C

XML standards

Page 4: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

SGML

Standard Generalized Markup Language• defined in 1986 by ISO 8879 standard• dissociates completely in a document :

content / presentation / structure description• used in

- industry for technical documents- electronic document management (GED)

• problems : - does not aimed at Internet use- complex and heavy description to follow

http://www.sgmlsource.com/Goldfarb/history/index/htm

Page 5: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

HTML

HyperText Markup Language • is an extension of SGML• is a language of document description

section titles, bookmarks, anchors, linguistic elements to format text, to describe tables...

• is interpreted by a browser (a client application for Internet requests)

• the display is browser-independent• problems :

- content and presentation are mixed

http://www.w3c.org/HTML/

Page 6: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Targets to XML : it must be...

• used without difficulty in Internet• defined quickly• described in a formal and concise way• auto-describing• able to extent its-self• deal with an arborecent data description• treatable with any application equiped with a text parser• able to support UNICODE and any other police codage for

linguistic universality • support a large panel of applications • compatible with SGML • make easier writing software aimed to document processing• a way of representing data as human-readable documents • easy to use for creating documents

Page 7: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Markup Languages ?

• Markups are pairs of expressions (tags) which surround a block of text, to indicate some characteristics

ex : in HTML, the tag <B> commands beginning of bold display and </B> commands its end

<B> Text in Bold </B>

Text in Bold

• Tags can be parametrised by attributesex : in HTML, - the tag <a> allows to define a hypertext link

- the URL of the link is defined by the attribute href- the clickable text is surrounded by the tags <a> and </a>

<a href="http://www.3ie.org/xml"> click here </a>

click here

Page 8: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

HTML code<HTML><HEAD>

<TITLE>Lime Jello Marshmallow Cottage Cheese Surprise</TITLE></HEAD><BODY>

<H3>Lime Jello Marshmallow Cottage Cheese Surprise</H3>My grandma's favorite (may she rest in peace).<H4>Ingredients</H4><TABLE BORDER="1">

<TR BGCOLOR="#308030"><TH>Qty</TH><TH>Units</TH><TH>Item</TH>

</TR><TR><TD>1</TD><TD>box</TD><TD>lime gelatin</TD>

</TR><TR><TD>500</TD><TD>g</TD><TD>multicolored tiny marshmallows</TD>

</TR><TR><TD>500</TD><TD>ml</TD><TD>cottage cheese</TD>

</TR><TR><TD></TD><TD>dash</TD><TD>Tabasco sauce (optional)</TD>

</TR></TABLE><P><H4>Instructions</H4><OL>

<LI>Prepare lime gelatin according to package instructions...</LI><!-- and so on -->

</BODY></HTML>

Page 9: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

HTML example code in browser

Page 10: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

XML example code<?xml version="1.0"?><Recipe>

<Name>Lime Jello Marshmallow Cottage Cheese Surprise</Name><Description>My grandma's favorite (may she rest in peace).</Description><Ingredients>

<Ingredient>         <Qty unit="box">1</Qty>         <Item>lime gelatin</Item>

</Ingredient><Ingredient>

         <Qty unit="g">500</Qty>         <Item>multicolored tiny marshmallows</Item>

</Ingredient><Ingredient>

         <Qty unit="ml">500</Qty>         <Item>Cottage cheese</Item>

</Ingredient><Ingredient>

         <Qty unit="dash"/>         <Item optional="1">Tabasco sauce</Item>

</Ingredient></Ingredients><Instructions>      <Step>Prepare lime gelatin according to package instructions</Step>      <!-- And so on... --></Instructions></Recipe>

Page 11: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

XML heading informations

Every XML file should begin with a header defining which version of XML is used in the document <?xml version="1.0"?>This is done through the version attribute.Other attributes can define global properties, such as : - encoding attribute, which defines the character encoding<?xml version="1.0" encoding="ISO-8859-1"?>The encoding specific to French characters is ISO-8859-1 The international universal encoding for all characters is UTF-8

Page 12: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Well-formed XML means « parsable »• A well-formed XML document is a document that follows all

the notational and structural rules for XML, otherwise it is meaningless

By analogy, the expression 2 ( + + 5 (=) 9 > 7 is meaningless even if it looks (sort of) like math

• The most important rules are :– No unclosed tags : a block can’t be "opened" with a tag

<TAG> without being "closed" afterwards with </TAG>

– Use of closed empty elements : they must have either a closing tag <EMPTY type="example"> </EMPTY> or a single tag with slash " /" before the closing " >" : <EMPTY type="example" />

– No overlapping tags : a tag that opens inside another tag must close before the containing tag closes : <INCLUDING-TAG> <CONTAINING-TAG> </CONTAINING-TAG> </INCLUDING-TAG>

– Enclosing quotes for attribute values : <TAG type="example">

Page 13: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Valid XMLA document is valid because it matches its

Document Type Definition (DTD) • A DTD is a grammar for some class of documents using a

markup language, that is, a set of rules to describe the authorized sequences and embeddings of tags

• The language to write DTDs is a special language, not XMLbut there is a more complex syntax to define DTs in XML (schemas)

• A DTD specifies – what elements may exist, – which attributes the elements may have, – what structural organisation of elements is attempted :

what element may or must be found inside other elements, and in what order.

due to DTD, XML is eXtensible

Page 14: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Power of DTD

• Wrinting a DTD is how you actually define a new markup language -- often called a dialect of XML.

• At present, DTDs are being written for an enormous number of different problem domains, and each DTD defines a new markup language.

• New markup languages now exist, or are being designed, – to mark up specific domains such as the plays of Shakespeare or

business data in the footwear industry (FDX) ...– to define general data resources (RDF); – to model information in the health care industry (HL7 SGML/XML); – to typeset, display, and actively use mathematical equations

(MathML);– and to perform electronic data interchange (XML/EDI).

Page 15: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Modelling information structure in XML

Recipe document structure

Description<comment>

Quantity<how many/much in which unit>

Item<name>

Ingredient<one ingredient>

Ingredients<all ingredients>

Step<one thing>

Instructions<what to do>

Recipe<document>

Page 16: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

DTD for the example

<!-- This is the example DTD for the example XML --><!ELEMENT Recipe (Name, Description?, Ingredients?, Instructions?)><!ELEMENT Name (#PCDATA)><!ELEMENT Description (#PCDATA)><!ELEMENT Ingredients (Ingredient)*><!ELEMENT Ingredient (Qty, Item)><!ELEMENT Qty (#PCDATA)><!ATTLIST Qty unit CDATA #REQUIRED><!ELEMENT Item (#PCDATA)><!ATTLIST Item optional CDATA "0"  isVegetarian CDATA "true"><!ELEMENT Instructions (Step)+>

Page 17: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

DTD : defining tags<!ELEMENT Recipe (Name, Description?, Ingredients?, Instructions?)> The <!ELEMENT...> statement defines a tag in the document. This tag defines a <Recipe> tag, stating that it can contain

- a <Name> , - an optional <Description> (the question mark [?] denotes

optionality), - an optional <Ingredients> tag,

- and an optional <Instructions> tag.

<!ELEMENT Name (#PCDATA)> This simply states that a <Name> tag can contain character data and nothing else. <!ATTLIST Item optional CDATA "0" isVegetarian CDATA "true"> This section states that the <Item> tag has two possible attributes:

- optional , whose default value is 0; and - isVegetarian , whose default value is true .

<!--- This is a comment --> the text « This is a comment » won’t be interpreted.

Page 18: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

DTD : other definitions<!ENTITY Utterance "example of sentence or value"> This defines an internal entity.It associates a value to a name which will be more explicit than a tag in the document.. The browser will replace

the entity &Utterance; by the text : example of sentence or value

There are external entities too which can either be some XML content or not, and are all defined in XML

language. <!ENTITY TextPresentation SYSTEM "http://foo.com/presentation/text.xml"> It allows the document to reference the content of the file saved in the URL.The browser will replace the entity &TextPresentation;

by the content of the file placed at http://foo.com/presentation/text.xml <!NOTATION gif SYSTEM "usr/local/bin/display"> <!ENTITY ImagePresentation SYSTEM "http://foo.com/img/lion.gif" NDATA gif> For not XML content, as gif files, for example, the notation definition allows to specify the authorized application<imagePres src= "ImagePresentation"> which will include the image in the document through the browser

Page 19: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

DTD file call in XML file

in the XML file, a document type declaration tells the parser

-to start looking for a <Recipe> tag as the top-level tag (root) of the document.-that the DTD is in the system file personne.dtd

<!DOCTYPE Recipe SYSTEM "example.dtd">

<?xml version="1.0" encoding="ISO-8859-1" ?>

<!DOCTYPE personne SYSTEM "personne.dtd"><personne>

<prenom>Alain</prenom><nom>Connu</nom>

</personne>

Page 20: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

DTD directly included in file<!DOCTYPE personne [directDTDcontent]>

<?xml version="1.0" encoding="ISO-8859-1" ?>

<!--DTD declaration and definition -->

<!DOCTYPE personne [<!ELEMENT personne (prenom, nom)><!ELEMENT prenom (#PCDATA)><!ELEMENT nom (#PCDATA)>]><!--end of DTD declaration and definition -->

<personne><prenom>Alain</prenom><nom>Connu</nom>

</personne>

Page 21: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

What is a « NameSpace » ?

• It allows to share tags between XML-authors of documents • It allows to choose between

own-defined tags and someone-else-defined tags• It concerns DTD : used for elements and for attributes • Some NamesSpace can become a W3C norm :

- XMLSchema (eXtensible Markup Language Schema)- Xlink (eXtensible link)- XSL (eXtensible Stylesheet Language)- XHTML- versions of HTML (3.0, 4.0...)

Page 22: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Example of HTML Namespace

<?XML version="1.0"?><!--Every elements are in HTML Namespace--><html:html xmlns:html= "http://www.w3.org/TR/REC-html40">

<html:head> <html:title>Namespace Example use</html:title>

</html:head><html:body>

<html:p>Text and Links <html:a href= "http://foo.com">here</html:a>

</html:p> </html:body>

</html:html>

This example uses the XML name space of HTML defined in the W3C recommendations REC-html40 for HTML version 4.0

Page 23: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Example of using 2 Namespaces

This example commands the browser to load 2 namespaces using respectively lv and isbn as prefixes

<?XML version="1.0"?><ls:livre xmlns:lv= "unr:loc.gov:livres" xmlns:isbn= "unr:ISBN:0-395-36341-6">

<lv:titre>Harry Potter et la coupe de feu</lv:titre> <isbn:number>0747554420</isnb:number>

</ls:livre>

Page 24: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Case of schema structure representation in XML

XML Schema • is an XML based alternative to DTD • has support for Data types (more than only PCDATA)

• use XML syntax (=> editable with an XML editor, parseable by any XML parser, manipulate with the XML DOM, transformable with XSLT)

• is extendible just like XML (=> reusability, derivability for own data types from standard types , multiple schema referenciation from the same document)

• secure data communication (sender and receiver can both have same « expectation » about the content by sharing its structural representation : link to interoperability)

http://www.w3schools.com/default.asp/

Page 25: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Exemple de schéma<?XML version="1.0" encoding="iso-8859-1" ?><xsd:schema xmlns:xsd= "http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" > <xsd:element name="film" type="typeFilm" /> <xsd:complexType name="typeFilm" >

<xsd:sequence ><xsd:element name="titre" type="xsd:string" /><xsd:element name="acteurs" type="typeActeur" /><xsd:element name="realisateur" type="xsd:string" /><xsd:element name="annee" type="xsd:decimal" /><xsd:element name="texte" type="xsd:string" /><xsd:element name="note" type="xsd:string" minOccurs="0" maxOccurs="1" /></xsd:sequence >

</xsd:complexType ><xsd:complexType name="typeActeur" >

<xsd:sequence ><xsd:element name="personne" type="xsd:string" minOccurs="0"

maxOccurs="unbounded" /></xsd:sequence >

</xsd:complexType ></xsd:schema>

Page 26: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Presentation : CSS and XSL

for general control over formatting, use• Cascading Style Sheet• eXtensible Stylesheet LanguageBoth are declarative languagesXSL is more recent than CSSXSL is described in XML, using namespace

power

Page 27: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

CSS for HTML and XML

• exists as a current recommendation from the W3C, usable with HTML or XML

• Is simpler to use and less powerful than XSL

• is supported by most current-generation browsers (to varying degrees)

http://www.W3.org/TR/html401/present/styles

Page 28: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Cascading Style Sheets

<HTML><HEAD></HEAD><BODY>

<H1>A Theory About the Brontosaurus</H1>My theory about the brontosaurus is...

</BODY></HTML>

In the small example next, <HTML> contains <BODY> contains <H1> contains text :

The whole idea of a style sheet is to use these structural relationships to indicate where changes in text style, spacing, and so on should occur.

<STYLE TYPE="text/css"><!--H1 { color: red; font-size: 16pt; text-decoration: underline }--></STYLE>

Page 29: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Example of CSS file

html\:body { background-color: rgb(255, 230, 230) }article { display: block; font-family:helvetica,sans-serif; background-color: rgb(230, 230, 255) }titre { display: block; font-size: 200%; text-align: center; border-width: medium; border-style: groove }auteur { display: block; font-size: 80%; font-weight: bold }date { display: inline; font-size: 80%; font-style: italic }lieu { display: inline; font-size: 80%; font-weight: bold }texte { display: block }grand { display: inline; font-variant: small-caps; font-size: 120%; font-weight: bold }image { display: block; border-width: thin; text-align: center; border-style: solid; content: url(attr(site)); }legende { display: block; text-align: center; padding-right: 2mm; padding-top: 2mm; padding-bottom: 2mm; padding-left: 2mm }

Page 30: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

External CSSThe CSS to use can be defined- using <LINK> element (in the <HEAD> for default use)

<HTML><HEAD>

<LINK href="special.css" rel="stylesheet" type="text/css"></HEAD><BODY>

<H1>A Theory About the Brontosaurus</H1>My theory about the brontosaurus is...

</BODY></HTML>

- in the <META> declaration (only for default use)

... <HEAD><META http-equiv="Content-Style-Type" content="text/css">

</HEAD> ...

Page 31: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

How do browsers apply CSS ?

The browser will determine which style to use as follows1. select the last CSS <META> declaration2. otherwise, select the last other CSS declaration

(for example, by <LINK> )3. otherwise, the default stylesheet language is "text/css"

Page 32: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Why CSS is named CSS ?

• These style sheets are called cascading style sheets, because styles (like fonts, colors, and so on) for one markup element "cascade" down, and apply to all of the element's contents.

• For example, if a paragraph tag (<P>) is set to show its text in red, all text and any other element inside that paragraph will be displayed in red, unless one sub-element of the paragraph specifies a color for its contents.

Page 33: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

XSL for XML and SGML

• used exclusively to format XML or SGML

• more complex and powerful than CSS

http://nwalsh.com/docs/tutorials/webtek2000/xsl/ie/frames.html

Page 34: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

XSL : Why Stylesheets for XML ?

because :• XML is not a fixed tag set (like HTML)

and has no (application) semantics • XML markup does not (usually) include formatting information • Reuse: the same content can look different in different

contexts • Multiple output formats: different media (paper, online),

different sizes (manuals, reports), different classes of output devices (workstations, hand-held devices)

• Styles tailored to the reader's preference (e.g., accessibility): print size, color, simplified layout for audio readers

From Norman Walsh http://nwalsh.com/docs/tutorials/webtek2000/xsl/ie/frames.html

Page 35: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Options for displaying XML

Page 36: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

What does a StyleSheet do ?It specifies the presentation of XML information

using two basic categories of techniques: • An optional transformation of the input

document into another structure – generation of constant text – suppression of content – moving text (e.g., exchanging the order of the first and last

name) – duplicating text (e.g., copying titles to make a table of contents) – executing more complex transformations that "compute" new

information in terms of the existing information

• A description of how to present the transformed information – i.e., a specification of what properties to associate to each of

the various parts of the transformed information

Page 37: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Needs to present information

Description of how to present the (possibly transformed) data includes three levels of formatting information:

1. Specification of the general screen or page (or even audio) layout

2. Assignment of the transformed content into basic "content container types" (e.g., lists, paragraphs, inline text)

3. Specification of formatting properties (spacing, margins, alignment, fonts, etc.) for each resulting "container"

Page 38: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Components of XSL

The full XSL language logically consists of three component languages which are described in three W3C (World Wide Web Consortium) recommendations:

• XPath: XML Path Language a language for referencing specific parts of an XML document

• XSLT: XSL Transformations a language for describing how to transform one XML document (represented as a tree) into another

• XSL: Extensible Stylesheet Language XSLT plus a description of a set of Formatting Objects and Formatting Properties

Page 39: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

XML to Result TreeAn XSLT "stylesheet" transforms the input (source) document tree

into a structure called a result tree consisting of result objects

Transform to Another Vocabulary

Page 40: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

What is an XSL Stylesheet ?

• XSLT Stylesheets are XML documents; namespaces are used to identify semantically significant elements.

• Most stylesheets are stand-alone documents rooted at <xsl:stylesheet> (or <xsl:transform>).

It is possible to have "single template" stylesheet/documents.

Note that it is the mapping from namespace abbreviation to URI that is important, not the literal namespace abbreviation "xsl: " that is used most commonly

Page 41: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Understanding a template

Most templates have the following form: <xsl:template match=" para ">

<p> <xsl:apply-templates/> </p> </xsl:template>

• The whole <xsl:template> element is a template

• The match pattern determines where this template applies

• Literal result elements come from non-XSL namespace(s)

• XSLT elements come from the XSL namespace

Page 42: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Style sheet example

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method="html"/> <xsl:template match="doc">

<html> <head><title><xsl:value-of select="title"/></head> <body><xsl:apply-templates/></body>

</html> </xsl:template> <xsl:template match="title">

<h1><xsl:apply-templates/></h1> </xsl:template> <xsl:template match="para">

<p><xsl:apply-templates/></p> </xsl:template>

</xsl:stylesheet>

A small, complete style sheet:

Page 43: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Transformation is application of templatesTemplates transform

portions of the source tree into portions of the result tree. The ordered accumulation of all the transformed portions

forms the complete result tree. Individual templates are free

to process elements from anywhere in the source tree.

Page 44: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Match Patterns (locating elements)

critical capability of a stylesheet language : locate source elements to be styled

For example,

- CSS, does this with "selectors".

- FOSIs do it with "e-i-c's", elements in context.

- XSLT does it with "match patterns" defined in XPath.

Page 45: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

XPath

XPath has an extensible string-based syntax inspired, in part, by the common "path/file" file system syntax:para

matches all <para> children in the current context para/emphasis

matches all <emphasis> elements that have a parent of <para> ancestor-or-self::*/@sepchar

matches the sepchar attribute on the current element or any ancestor of the current element numberedlist/listitem[position() mod 2 = 0]

matches odd list items in a numbered list.

Page 46: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Applying style recursivelyThe process is allowed to run recursively, driven primarily by the document.A series of templates is created, such that if there is a template to match each context, then these templates are recursively applied starting at the root of the document.

<xsl:template match="section/title"> <h2><xsl:apply-templates/></h2>

</xsl:template>

• <xsl:apply-templates>

• <xsl:template match="...">

2 obstacles appear when using the recursive model, – how to arbitrate between multiple patterns that match and – how to process the same nodes in different contexts.

These are solved by conflict resolution and modes, respectively.

<xsl:apply-templates select="th|td"/>

Page 47: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Applying style proceduralyThis process for applying style, is to select each action procedurally. A series of templates is created, such that each template explicitly selects and processes the necessary elements.

<xsl:for-each select="row"> <tr> <xsl:for-each select="entry"> <td><xsl:value-of select="."/></td> </xsl:for-each> </tr></xsl:for-each>

<xsl:template name="...">

<xsl:for-each>

<xsl:template name="admonition"> <xsl:param name="type">warning</xsl:param> ...</xsl:template>

<xsl:call-template> <xsl:call-template name="admonition"> <xsl:with-param name="type">caution</xsl:with-param></xsl:call-template>

Page 48: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Conditional processing

<xsl:if test="{$somecondition}"> <xsl:text>this text only gets used if $somecondition is true()</xsl:text></xsl:if>

<xsl:choose>

<xsl:if>

<xsl:choose> <xsl:when test="$count > 2"> <xsl:text>, and </xsl:text> </xsl:when> <xsl:when test="$count > 1"> <xsl:text> and </xsl:text> </xsl:when> <xsl:otherwise> <xsl:text> </xsl:text> </xsl:otherwise></xsl:choose>

Simple conditional (no "else")

Select among alternatives with <xsl:when> and <xsl:otherwise>

Page 49: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Variables

• Variables are created with <xsl:variable> .• Variables are "single assignment" (no side effects) • Variables are lexically scoped

Variables can be used to save computed values.

Once created, variables can be used to generate content:

And control conditional processing:

<a href="{$file}">...</a>

<xsl:if test="$count = 3">...</xsl:if> >

Page 50: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Creating the resulting tree

Literal Result Elements Any element in a template rule that is not in the XSL (or other extension) namespace is copied literally to the result tree

<p>...</p>

<xsl:text >

XSL Elements Elements in the XSL namespace:

<xsl:value-of >

<xsl:element >

<xsl:attribut >

...

Page 51: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Numbering and sorting

• Count source tree elements (chapters, list-items, stock quotes, etc.)

• Convert between number formats (1, B, iii, ...) • Sort elements for presentation

You can

Page 52: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Overall XSL formatting capabilities

• HTML + CSS capabilities • most high quality print output capabilities including

internationalization features

XSL FO formatting capabilities in XSL 1.0 are approximately the union of:

Not included are complex page layouts (e.g., magazine and newspaper layout), complex layout-driven formatting (e.g., copy fitting and complex floats), and loose leaf pagination (change page production)

Page 53: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Formatting objects and properties

• XSL = XSLT + vocabulary of FOs and properties • XSL defines a powerful set of formatting objects • XSL uses (and extends) a set of Common Formatting

Properties developed jointly with the CSS&FP (Cascading Style Sheet and Formatting Property) Working Group

• When a result tree uses this standardized set of formatting objects and properties, then an XSL-compliant formatter can process that result tree to produce the specified output

Page 54: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Formatting object basics

Inline versus block objects Common formatting properties, harmonized with CSS

Page 55: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Common formatting objects

• page-sequence--a major part (such as front or body) in which the basic page layout may differ from other parts

• flow--a chapter- or section-like division within a page-sequence

• block--a paragraph (or title or block quote, etc.) • inline--e.g., a font change within a paragraph • wrapper--a "transparent" object usable as either a block or

an inline object that has no effect other than to provide a place to hang inheritable properties

• list FOs--list-block, list-item, list-item-label, list-item-body • graphic--references an external graphic object • table FOs--mostly analogous to the standard (CALS, OASIS,

HTML) table models

Page 56: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Basic properties

• font properties • margin and spacing properties • border and padding properties • keeps/breaks • horizontal alignment/justification • indentation • more formatting object specific properties

Page 57: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Some application domains (1)• HR-XML (Human Resources XML)

is a standard suite of XML specifications to enable e-business and the automation of human resources-related data exchanges

• XHTML (eXtensible HTML)is a standard designed to help the transition from HTML to XML. It makes it possible to use XML processing tools, in particular to modify presentation depending on the target device (PDA, cellular...)

• SVG (Scalable Vector Graphics)allows to describe 2-dimensional graphics in XML. Its standardization is supported by Adobe, Microsoft, & others

• SMIL (Synchronized Multimedia Integration Language)is a standard suite of XML specifications to enable e-business and the automation of human resources-related data exchanges

Page 58: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Some application domains (2)

• MathML (Mathematical Markup Language)is a language for normalized scientific content. It allows to represent complex mathematical expressions for displaying them on Internet

• DHTML (Dynamic HTML)is a kind of self-contained thing-unto-itself to create HTML that can change even after a page has been loaded into a browser

• PPML (Printnamic Dynamic Markup Language)is an XML-based language for variable-data printing. It was developed by the Digital Printing Initiative (PODi)

• 3DML, HumanML, Artificial Intelligence ML ...

Page 59: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

In short, XML is...

…a powerful tool for • data representation, • storage, • modelling, • and interoperation

Page 60: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr
Page 61: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Small XML example code

<?xml version="1.0" encoding="ISO-8859-1"?> <article>  <titre> Un journaliste accuse, un policier dément </titre>  <auteur> Alain Connu </auteur>  <date> 14 juin 1972 </date>  <lieu> banquise </lieu>  <texte> Un journaliste de la place accuse les autorités ...  </texte> </article>

Page 62: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Petite introduction à XML

• Un document XML est bien formé s’il respecte certaines contraintes : – toutes les balises ayant un contenu non vide doivent être

fermées – les balises n'ayant pas de contenu doivent se terminer par />  – les valeurs d'attributs doivent être entre guillemets

• Un document XML est valide par rapport à une DTD s'il respecte les règles exprimées par la DTD – DTD : ensemble de règles indiquant quelles sont les séquences

et imbrications de balises autorisées

<!ELEMENT UL (LI)+><!ELEMENT LI (PCDATA | u | it | b)*>

Page 63: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Modelling information structure in XML

structure d'un livre

Préface<une page>

Introduction<chapitre>

Première partie<sous-chapitre>

Deuxième partie<sous-chapitre>

Troisième partie<sous-chapitre>

Contenu<chapitre>

Conclusion<chapitre>

Sommaire<sommaire>

Bibliographie<biblio>

livre<document>

Page 64: Introduction to XML Valérie Bellynck EFPG-INPG France  mailto:Valerie.Bellynck@efpg.inpg.fr

Small introduction to Markup Languages (XML, HTML)

• XML : allows to structure the information • XML : allows to automatize the processing of

structured documents and formatted data • XML ~ a generalization of HTML where, instead

of using a set of predefined tags with predefined meanings, authors can "invent" their own tags

From the course of Bertrand Ibrahim, Geneva University