the semantic blessings of xslt diederik gerth van wijk [email protected] xml holland 2008 planetarium...

19
The Semantic Blessings of XSLT Diederik Gerth van Wijk [email protected] XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Upload: janessa-lawley

Post on 31-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

The Semantic Blessings of XSLT

Diederik Gerth van [email protected]

XML Holland 2008Planetarium Gaasperplas, Amsterdam, 20 november

DOXATRIX

Page 2: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 2

Intended audience

Understands English Knows what XML is about Cares about meaning, processing and validation Does not need to know about XSLT Does not need to be a programmer But might be aware that computers need to be programmed

Page 3: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 3

Semantic? Blessings? XSLT?

XML is about the structure of a document Semantics are about “meaning” A schema can say that a document should have a title (structure) The documentation might add that a title is used for identification (unique

within a set of documents), and give a clue about what the document is about (semantics)

The words used in the title are really semantics Blessings are good, helpful, you want them What is XSLT? How can XSLT help you in adding, verifying and using semantic markup?

Page 4: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 4

Why bother marking up explicitly?

Page 5: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 5

NLP is good, Explicit Markup is better

“Plein 26 Den Haag”=<street>Plein</street><nr>26</nr><city>Den Haag</city>

“Plein 1813 Den Haag”=<street>Plein 1813</street><city>Den Haag</city>

XML is about tagging structure A schema adds semantics <name>Quattro Staggioni</name>: Pizza by Mario or piece by Vivaldi? I don’t care (in this presentation)

Page 6: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 6

eXtensible Stylesheet Language - Transformations

XSL: the eXtensible Stylesheet Language Family of three W3C recommendations for transformation and

presentation

XML Path Language (XPath)

XSL Transformations (XSLT)

XSL Formatting Objects (XSL-FO)XSLT

stylesheet 1

XSLTstylesheet 2

XSLT processor

PDF

HTMLpages

XMLsource

document(s)

XSL-FOdocument XSL-FO processor

Page 7: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 7

XSLT characteristics

An XSLT style sheet is an XML document Input is one or more XML documents Output is one or more XML (XSLT!), HTML, XSL-FO or plain text (CSS!)

documents Style sheet can look like template of the result document (data pull) Or be event driven (data push) Elements and attributes are “events” Functional programming language Rule based Declarative No side effects Statements can be executed in any order Embeds XPath XSLT 2.0 and XPath 2.0 know XML Schema types XSLT 2.0 can compute from implicit structure

Page 8: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 8

XSLT engines

stand alone:

Saxon (open source, Michael Kay)

Altova (free, XML Spy)

MSXML

on server:

Saxon + .NET

Altova + .NET

MSXML + ASP

built in browser:

IE6 and higher

FF1 and higher

Opera9 and higher

Page 9: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 9

What’s the competition?

CSS (Cascading Style Sheets)

Easier, simpler

Don’t transform

Perl, Python, Java, JavaScript, C(++), (V)Basic

Generic programming or scripting languages

No built in knowledge of XML, but lots of libraries for DOM or SAX

JSP, ASP, PHP

Server side processing

Not really XML aware

Little or no transformation

IS-10179 DSSSL: Document Style Semantics and Specification Language

SGML based

Rarely used

Page 10: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 10

XSLT and semantics...

XML elements describe what the content is (semantics) XSLT stylesheets what to do (processing) with them How can a processing stylesheet be a semantic blessing?

Page 11: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 11

Blessing 3: XSLT 2.0 may be schema aware

A schema defines the semantics of a document type XSLT 2.0 is based on XPath 2.0 XSLT 2.0 may use schemas Then, XPath 2.0 can use the type of element types or attributes So it can know whether to treat an attribute as string or as integer

(”12” < ”3” if type is string, ”12” > ”3” if type is integer) But will it sort correctly:

<song title=”50 ways to leave your lover” performer=”Paul Simon” /><song title=”1919 rag” performer=”Kid Ory” />or<king name=”Henry VIII” born=”1491-06-28” died=”1547-01-28” /><king name=”Henry IX” born=”1725-03-11” died=”1807-07-13” />(yes, if the roman numbers were coded as &#x2167; and &#x2168;)

With the “instance of” operator you can use information that is not in the document, but is in the schema

Therefore, XSLT 2.0 disencourages stand alone processing From a semantic point of view, that’s a blessing

Page 12: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 12

Blessing 4: Schema independent processing (1)

In a sequence group, the order contains no information:(title, abbreviated-title?) (1)is equivalent to(abbreviated-title?, title) (2)

Suppose, you want to print the abbreviated title if one is coded, and otherwise the full title

In streamprocessing, the q&d solution might be as simple as:temp=getNextElement; if existsNextElement then write(getNextElement)

else write(temp); (1)orwrite(getNextElement); (2)

But what if you decide to change from order (1) to (2)? Or add an optional element toc-title?

(title, abbreviated-title?, toc-title?) (1)(toc-title?, abbreviated-title?, title) (2)

The simple program breaks

Page 13: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 13

Blessing 4: Schema independent processing (2)

In XSLT, you have access to the elements by name, in arbitrary order The style sheet fragment looks like

<xsl:choose><xsl:when test="./abbreviated-title">

<xsl:value-of select="abbreviated-title"/></xsl:when><xsl:otherwise>

<xsl:value-of select="title"/></xsl:otherwise>

</xsl:choose>

If the schema (and documents) change order, the stylesheet remains the same

If an optional toc-title is added, the stylesheet remains the same Verbosity turns out to be simpler, in the long run By the way, if sequence matters in the document, it shouldn’t in the

schema Reasons to prescribe sequence:

to ease input

to enforce cardinality

Page 14: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 14

Blessing 5: functional programming

No variables Suppose you want to sort items alphabetically and do act on each new

letter First idea:

<xsl:variable name="PrevLetter" select="' '" /><xsl:for-each select="book">

<xsl:sort select="title" data-type="text" order="ascending"/>

<xsl:variable name="ThisLetter" select="substring(title/.[1],1,1)" />

<xsl:if test="$PrevLetter!=$ThisLetter">

<H2><xsl:value-of select="$ThisLetter"/></H2>

</xsl:if>

<xsl:variable name="PrevLetter" select="$ThisLetter" />

<H3><xsl:value-of select="title"/></H3>

</xsl:for-each>

No good: the value of the variable PrevLetter is reset in every iteration of the for-each loop

Page 15: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 15

Would this work?

<xsl:for-each select="book">

<xsl:sort select="title" data-type="text" order="ascending"/>

<xsl:variable name="PrevLetter" select="substring(preceding-sibling::book[1]/title/.[1],1,1)" />

<xsl:variable name="ThisLetter" select="substring(title/.[1],1,1)" />

<xsl:if test="$PrevLetter!=$ThisLetter">

<H2><xsl:value-of select="$ThisLetter"/></H2>

</xsl:if>

<H3><xsl:value-of select="title"/></H3>

</xsl:for-each>

Better, but the function preceding-sibling operates on the original order, not on the sorted...

Is that a bug or a feature? It’s a blessing!

Page 16: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 16

The solution

<xsl:for-each-group select="book" group-by="substring(title/.[1],1,1)">

<H2><xsl:value-of select="current-grouping-key()"/></H2>

<xsl:for-each select="current-group()">

<xsl:sort select="title" data-type="text" order="ascending"/>

<H3><xsl:value-of select="title"/></H3>

</xsl:for-each>

</xsl:for-each-group>

Think XML Think in creating hierarchies: groups of titles starting with the same letter

Page 17: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 17

The ultimate semantic normalisation

“PCDATA considered harmful” (Han Nonnekes, Shell Oil) Text is the outer structure in a specific language of a deeper meaning You should encode a text as that deeper tree With references to abstract words (concepts) For each language (“English, upper class, around 1850”) give dictionary

and transformation rules Then generate the text

Page 18: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 18

Questions?

Ask me now Ask me during lunch or tea break Ask me during buffet Mail [email protected] Presentation can be downloaded from

www.xmlholland2008.nl

www.doxatrix.nl/dg

Page 19: The Semantic Blessings of XSLT Diederik Gerth van Wijk dg@doxatrix.nl XML Holland 2008 Planetarium Gaasperplas, Amsterdam, 20 november DOXATRIX

Diederik Gerth van Wijk Semantic Blessings of XSLT 19