xpipe - an xml processing methodology
DESCRIPTION
XPipe - An XML Processing Methodology. XML 2001 Florida, USA Sean McGrath CTO Propylon. What is XPipe?. It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems. based on proven mechanical manufacturing techniques. Specifically: - PowerPoint PPT PresentationTRANSCRIPT
XML 2001, Sean McGrath http://www.propylon.com
XPipe - An XML Processing Methodology
XML 2001 Florida, USA
Sean McGrath
CTO
Propylon
XML 2001, Sean McGrath http://www.propylon.com
What is XPipe?
• It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems.
• based on proven mechanical manufacturing techniques. Specifically:– The Assembly Line Principle– Component assembly and component re-use
XML 2001, Sean McGrath http://www.propylon.com
What is XPipe
• An open source project hosted on Sourceforge– http://xpipe.sourceforge.net
• A contribution to the blossoming meme of using pipeline based processing to tame the burgeoning complexity of XML transformations– (If you do not find XML transformation complicated, you
are not sufficiently well informed.)– (And no, XSLT does not solve all your problems)
• A way of thinking about systems that focuses on information flows rather than APIs
XML 2001, Sean McGrath http://www.propylon.com
Contents of this talk
• The XPipe philosophy• Major functional elements• Some examples• Relationship to other technologies• The XGrid• Some anticipated objections (and answers)• Current status • Current problems• Future plans
XML 2001, Sean McGrath http://www.propylon.com
XPipe Philosophy
Henry Ford’s Model T Ford Assembly Line – 1914
Cars Are complex, hierarchical structures
XML 2001, Sean McGrath http://www.propylon.com
XPipe Philosophy
Lunch Assembly Line – 2001
Lunch is a complex, hierarchical structure
XML 2001, Sean McGrath http://www.propylon.com
XPipe Philosophy
We are complex, hierarchical structures
XML 2001, Sean McGrath http://www.propylon.com
XPipe philosophy
• What have these scenes got it common?– Complex construction of cars, tuna melts and
tendons made possible and efficient through• assembly line manufacturing
• re-usable component processes and component materials
• Why not apply this approach to XML “manufacturing”?
XML 2001, Sean McGrath http://www.propylon.com
XPipe philosophy• Why does the assembly line approach work?
– Transformation task decomposition– Re-usable transformation components
• Transformation decomposition is the key to complexity management. Just ask:– Henry Ford– Herbert Simon (The Two Watchmakers – “The Architecture of
Complexity”)– George Miller (7+/-2)– Adam Smith (An Inquiry into the Nature And Causes of the
Wealth of Nations,1776)– Any electrical or chemical engineer.
XML 2001, Sean McGrath http://www.propylon.com
XPipe philosophy
• Component re-use is the key to productivity– Ask any form of engineer (electrical, chemical
etc.) apart from software engineers…– Component re-use remains a holy grail in
software engineering– XPipe is yet another attempt…
XML 2001, Sean McGrath http://www.propylon.com
XPipe philosophy• A lot of data processing will consist of XML to XML transformation• A lot of non-XML data processing can consist of XML to XML transformations with the addition of top and tail transformations• Mantra
– Get data into XML as quickly as possible– Keep it in XML until the last possible minute– Bring all your XML tools to bear on solving the data processing problem
XML 2001, Sean McGrath http://www.propylon.com
XPipe philosophy
Input
XMLOutput
XML
Non-XMLInput
Top Transformation
Non-XMLOutput
Tail Transformation
XML 2001, Sean McGrath http://www.propylon.com
XPipe philosophy• The philosophy hinges on the fact that every complex
XML transformation can be broken down into a series of smaller ones than can be chained together
Input XML
Task1
Task2
...Taskn-1 ... Task
n
OutputXML
XPipe
XML 2001, Sean McGrath http://www.propylon.com
XPipe philosophy
• Only so many ways to re-arrange an XML tree structure
• A finite number of fundamental transformations, from which all higher order transformations can be derived
XML 2001, Sean McGrath http://www.propylon.com
XPipe philosophy
– Transformation Decomposition leads to• a series of small, manageable, “stand alone”
problems with an XML input “spec” and an XML output “spec”.
• Can build, test, use and then re-use these transformation components
• Very team development friendly
• High cohesion, loose coupling – just like the professor advised
XML 2001, Sean McGrath http://www.propylon.com
XPipe philosophy
• Pipeline approach means you can mix ‘n’match black-box components that internally use whatever paradigm best suited the problem
• Lexical• SAX• DOM• XSLT• XDuce, Pyxie, Haskell…
XML 2001, Sean McGrath http://www.propylon.com
Sample XPipeDB
/CMS
CharacterSet Mods Add
Doctype+ validate
+ strip doctype Re-arrangeElements
Stats + FTP
XHTMLGenerate
Validation
SQLReplace
Lexical
Schematron/
RelaxNG/ RhinoJython
Java
XSLT
Lexical
DOM
XML 2001, Sean McGrath http://www.propylon.com
XPipe philosophy
• Assertion : developers would use a component based approach to XML processing if they did not have to write the plumbing (orchestration, exception handling) themselves– “Gee, this problem is complex. Maybe I’ll do it in
multiple stages! Gee, now I have to orchestrate the stages somehow. Batch files/shell scripts/driver program – all ugly and error prone. Maybe I’ll just write a single program after all…”
XML 2001, Sean McGrath http://www.propylon.com
XPipe philosophy
• “Professional developers spend 50 percent of their time writing plumbing” – Adam Bosworth
• XPipe aims to look after the plumbing letting developers concentrate on the interesting stuff
XML 2001, Sean McGrath http://www.propylon.com
Major Functional Elements – XComponents
• Developed in any language that runs on the Java Virtual Machine (Jython, Java, XSLT, Rhino (JavaScript) etc.)
• All XComponents are standalone programs of the form– [Name] [InputXML] [OutputXML]
[ErrorXML]
XML 2001, Sean McGrath http://www.propylon.com
Major Functional Elements - XComponents
• XComponents described in XML form. An Xcomponent consists of:– Documentation– Unit Tests (input,output XML stream pairs)– Metadata for retrieval– Input and Output predicates – declarative
(DTD/RelaxNG/Schema) or procedural (code)
XML 2001, Sean McGrath http://www.propylon.com
Major Functional Elements – XComponent Unit Tester
• Standalone program analogous to JUnit or PyUnit but for XML transformation component testing
• Very outsource-friendly and “inbetweenable” approach (specify everything but the code == spec+doc+test harness all in one)
XML 2001, Sean McGrath http://www.propylon.com
Major Functional Elements – XPipes
• Described in XML
• Consist of– Documentation– Input/Output Predicates (Schemas/Code)– Test Suite– References to XComponents which are
resolved when the XPipe is installed
XML 2001, Sean McGrath http://www.propylon.com
Major Functional Elements – XPipe Executive
• Uniprocessor– XPipe executed on 1 machine, possibly with
separate threads for each XComponent task
• Multiprocessor– XML based protocol to implement “Job Shop”
work distribution over a P2P network
XML 2001, Sean McGrath http://www.propylon.com
Major Functional Elements – XPipe Monitor
XML 2001, Sean McGrath http://www.propylon.com
Some related open technologies
• | - Unix Pipes• SAX Filters• TRAX• XBeans• Cocoon• axKit• JXTA• Translets• TupleSpaces
XML 2001, Sean McGrath http://www.propylon.com
Simple XComponent examples
• Fundamental Operation – Rename Element– Rename
• Input : <foo>baz</foo>
• Output: <bar>baz</bar>
foo
baz
bar
baz
XML 2001, Sean McGrath http://www.propylon.com
Simple XComponent examples
• Fundamental Operation - Peel • Input : <foo><bar>baz</bar></foo>
• Output: <foo>baz</foo>
foo
baz
bar
foo
baz
XML 2001, Sean McGrath http://www.propylon.com
Simple XComponent examples
• Compound Operation - Matryoshka• Input:
– <foo><bar>baz</bar></foo>
• Output:– <foo></foo><bar></bar>baz
foo
baz
barfoo bar baz
XML 2001, Sean McGrath http://www.propylon.com
Simple Xcomponent examples
• KlingonCloak– Input:
• <foo><bar>baz</bar></foo>
– Output:– <tag name=“foo”><tag name=“bar”>baz</tag></tag>
foo
baz
bar
tagtype=“foo”
baz
tag type=“bar”
XML 2001, Sean McGrath http://www.propylon.com
Sample Xcomponents
• Once you start thinking in terms of Pipes – components appear everywhere:– Regular fragmentations– Doctype changer– namespace normalizer– Character set transcoder– Hash generator– RelaxNG/Schematron etc
• A validator can be thought of as a component in an Xpipe that mirrors its input on its output
XML 2001, Sean McGrath http://www.propylon.com
Validation as an XComponent
XMLA
XMLA’RelaxNG
SchematronJython/Java/JACL
XComponent
ValidationLog
Input Output
Error
XML 2001, Sean McGrath http://www.propylon.com
The XGrid
• Grid Technologies – computational power “on tap” (http://www.gridforum.org)
• The XGrid – computational power “on tap” to execute XPipes
XML 2001, Sean McGrath http://www.propylon.com
The XGrid
XGrid Interface(XJCL)
XMLDataXPipe
XGridComputational
PowerSources
XML 2001, Sean McGrath http://www.propylon.com
Some objections (with some answers)
• It will be slow– No it won’t -
Premature optimization is the root of all evil!
– Speed is a three headed monster. I’m old enough to have left the X axis and currently heading for Y through Z
Speed
of
Devel
opm
ent
Speed ofExecution
Spe
ed
ofm
odifi
catio
n
Me at age 26
Me at age 36
Me at age 46(Projected)
The 3 Axes to Speed
XML 2001, Sean McGrath http://www.propylon.com
Some objections (with some answers)
• It will be slow (cont.)– Massive Parallelism will kill all von Neumann
throughput arguments• Documents per second, not seconds per document
– A myriad of “compile time” optimizations on XPipes possible
– Keep the architecture simple – and speed will sort itself out
XML 2001, Sean McGrath http://www.propylon.com
Some objections (with some answers)
• Pipes are not rich enough, real data flows require graphs– Inside every graph is a collection of straight
segments– Do the smallest thing than can possible work– XComponents can conditionally flow data in
different directions – graph
XML 2001, Sean McGrath http://www.propylon.com
Some objections (with some answers)
• Component based software? Harumph! We have heard that one before…– XPipe is data flow based not API based (COM,
VBX, CORBA). They payload is what is important – not the plumbing
– Information integration (needed on the server side)– not application integration (needed on the client side)
XML 2001, Sean McGrath http://www.propylon.com
Current Status
• Schemas for XPipes and XComponents on xpipe.sourceforge.net. – feedback required
• Sample components (Java/XSLT/Jython) and some documentation
• Simple, illustrative XPipe uniprocessor executives• Draft of XJCL – XGrid Job Control Language
XML 2001, Sean McGrath http://www.propylon.com
Current Status
• Uniprocessor XPipe used to develop– 80-C pipe from Hub notation for a complex
document type to a legacy mainframe display notation. 120 page spec.
– 20-C pipe for semantic validation of legislation documents
– Xpipe and XComponent validators
XML 2001, Sean McGrath http://www.propylon.com
Current Problems
• Everybody agrees that an XML document is a tree but:– The content and structure of the tree depends
on the parser– The content and structure of re-generated XML
(The round-tripping problem)
XML 2001, Sean McGrath http://www.propylon.com
Current Problems
• Naming things– Taxonomy of XTLs (XML Transformation
Languages)– Taxonomy of re-usable XComponents and
XPipes
XML 2001, Sean McGrath http://www.propylon.com
Current Problems
• Flexible transformation scheduling is hard
• Optimal transformation scheduling is very hard
• Packaging
XML 2001, Sean McGrath http://www.propylon.com
Future Plans
• Evangelize the idea that DTD validated XML 1.0 is just Well Formed XML that has been through a pipe consisting of:– A transclusion component (entity expansion)
– A macro pre-processor (conditional marked sections)
– An attribute decorator (implied/fixed attributes)
– A grammar checker
– …
XML 2001, Sean McGrath http://www.propylon.com
Valid XMLWell Formed XML
Paremeter Entity Expansion
Conditional Sections
General Entity Expansion
Attribute Decoration
Grammer Validation ValidXML
XML 2001, Sean McGrath http://www.propylon.com
Future Plans
• XPipes and XComponents as web services (SOAP/XML-RPC, UDDI etc.)
• Getting the P2P and Grid Technology communities input into XGrid.
• Getting help to develop the XPipe reference implementation on Sourceforge
XML 2001, Sean McGrath http://www.propylon.com
Future Plans
• Development of commercial implementations of XPipe integrated with leading EAI systems (Ongoing)
• Use of SCADA tools to develop XPipe process control and monitoring systems
XML 2001, Sean McGrath http://www.propylon.com
Future Plans
• Use of Animation Engineering techniques for CAXTE tools (Computer Aided XML Transformation Engineering)
• Digging around hierarchy theory, self-assembly, bio-informatics and nanofabrication for concepts and tools applicable to XML transformations
XML 2001, Sean McGrath http://www.propylon.com
In conclusion
• XPipe is simple
• Simplicity works!
• Plenty of evidence outside of XML engineering that this approach will work
• Plenty of lore and tools from other fields of science can be brought to bear to build systems using the XPipe approach
XML 2001, Sean McGrath http://www.propylon.com
Thank you
– http://xpipe.sourceforge.net