inside an xslt processor michael kay, icl 19 may 2000
TRANSCRIPT
Inside an XSLT Processor
Michael Kay, ICL19 May 2000
About me:
¶ ICL Fellow, systems
architect
¶ Database background
¶ Developer of SAXON ¶ Author of
XSLT Programmer’s Reference
published by Wrox Press¶ Recently joined XSL WG as
invited expert
About this talk:
¶ The XSLT Processing Model¶ Structure of an XSLT Processor¶ Performance
» current limitations» possible ways forward
¶ Ideas on future development of the language
The XSLT Processing Modelfirst approximation
SourceDocument
ResultDocument
Stylesheet
TransformationProcess
The XSLT Processing Modelin more detail
SourceDocument
ResultDocument
TransformationProcess
SourceTree
ResultTree
StylesheetTree
Stylesheet
ParsingSerialization
An XSLT Template Rule
<xsl:template match="appendix/para[1]"> <h4> <xsl:number level="single"/> <xsl:value-of select="@title"/> </h4> <p> <xsl:apply-templates/> </p></xsl:template>
Pattern
XPathExpression
Instruction
ResultElement
Architecture of an XSLT processor
XMLParser
TreeBuilder
XML Parser
Tree Builder
XPathcompiler
XSLTcompiler
XPath interpreter
XSLT interpreter
OutputManager
XML serializer
HTML serializer
Text serializerSourceTree
Source
Stylesheet
Result
Compiled Stylesheet
At compile time:
¶ Parse and validate the stylesheet¶ Parse and validate all XPath expressions
» and attribute value templates
¶ Build rule base for matching patterns¶ Resolve references to named variables,
functions, and templates¶ Flatten the import tree¶ Optimize XPath expressions
Where does the time go?
Build Source Tree
Compile Stylesheet
Process Templates
Serialize Output
Is Performance a Problem?
¶ Client side: usually not» XSLT processing is generally faster
than download speed
¶ Server side: sometimes» CPU usage when handling very high
throughput» Memory problems when handling very
large documents
Some performance tips
¶ Keep documents small: split them first¶ Process once, at publishing time
» or use caching
¶ Do several simple transforms in series¶ Avoid complex patterns in template rules¶ Use keys¶ Use external functions¶ Avoid "//item"
Performance progress
Today
20 sec/Mb
5 sec/Mb
1 sec/Mb
Simpleoptimization
Advancedoptimization
Stylesheet compilationJava code optimizationLazy evaluationSimple XPath optimizationTail recursion
Incremental parsingPipeliningUse of schemaPattern matchingFull XPath optimizationCompile to bytecodes
Interesting research areas
¶ Database integration: transforming a document without loading into memory
¶ Applying regular expression theory¶ Execution as a sequence of serial
passes¶ Using schema knowledge at compile
time¶ Eager node numbering
Potential language features
¶ Serial transformation language?¶ Multi-pass stylesheets¶ Higher-level "relational" constructs:
grouping, joins, logical quantifiers¶ Richer data types¶ Assignment statement ????
Summary
¶ XSLT language is now stable¶ XSLT processor technology is
starting to be well understood¶ First crop of products are capable of
significant performance¶ Now the research needs to start on
the next phase of optimization techniques