diaries of a desperate (xml|xproc) hacker. james fuller lead engineer | marklogic
TRANSCRIPT
Diaries of a Desperate (XML|XProc) Hacker
Diaries of a Desperate (XML|XProc) Hacker
James FullerLead Engineer | MarkLogic
Background
• Engineer on MarkLogic API team (History meters, Management API, etc…)
• W3C XML Processing WG (XProc v2.0)• 2001 started with XML tech (EXSLT),XML
Prague, etc… • Open source contrib.
• Thank you to the organisers of XProc XML London 2015
Agenda
1. XML Hacker Desperation2. XMLCalabash & depify3. Show & Tell4. XProc Hacker Desperation5. Summary6. Goto pub
* Raise your hand to ask question
* Yes, I am going to ‘powerpoint’ you
xkcd.com - http://xkcd.com/208/ [xkcd-ref]
TheD.P.H.
Email !!!
D.P.H. – a twinkling in SGML eye
• Desperate Perl Hacker – Paul Grosso 1997 xml-dev link – Google images ‘desperate perl hacker’ link– Etymological cousin of ‘Just Another Perl Hacker’
(JAPH) – Randal Schwartz aka Merlin• What’s it all about ?– GSD– Opaque One liners (Perl Golf encouraged)– Even better if (regex|pipes|sed|awk) involved– Challenge: Be able to munge XML with Perl
Desperate XML Hacker
• GAD (Get it All Done) with XML Stack• ‘clever’ (and|or) ‘clear’ • Highly productive, albeit marooned and
anxious on ‘XML island’• Working with xml means working with
documents and that means working with document workflows
All programmers are desperate
emacs
xpath
xslt
xquery
marklogic
xml
emacs
json
java
gradle
ant
bash…..
• Day 1 - transform an xml doc with XSLT• Day 2 - run transform on set of docs• Day 3 - generate multiple output formats• Day 4 - read docs from database• Day 5 - put results into database• Day 6 - notify when its done• Day 7 - run assertions and validate results• Day 8 - generate png from svg for each document• Day 8 - zip up files and upload them (w/ oauth)• Day 9 - create EPub• And so forth …
Technology Selection – XSLT– XQuery– Bash scripts– Makefiles– Ant– Java
– All of the above ?
Adhoc pipelines
TRANSFORM
GENERATE
PACKAGE
zip
notify
upload
Pipelines manage complexity
• Transformation decomposition is the key to complexity management, just ask:– Henry Ford– Herbert Simon (The Two Watchmakers – “The Architecture of
Complexity”)– George Miller (7+/-2)– Adam Smith (An Inquiry into the Nature And Causes of the
Wealth of Nations,1776)– Any electrical/chemical engineer– Michael A. Jackson
[McGrath2004] Sean McGrath. Performing impossible feats of XML processing with pipelining, Proc XML Open 2004,
• Easy to build, test and reuse• Segregation of business rules from grammar rules• Enable group collaboration
Michael Kay Balisage 2009 – ‘You Pull, I’ll Push: on the Polarity of Pipelines’
• ‘the code of each step in the pipeline is kept very simple’
• ‘very easy to assemble an application from a set of components, thus maximizing the potential for component reuse’
• ‘there is no requirement that each step in a pipeline should use the same technology; it's easy to mix XSLT, XQuery, Java and so on in different stages.’
http://www.balisage.net/Proceedings/vol3/html/Kay01/BalisageVol3-Kay01.html
Use all the XML technologies …
Modern XMLTier 1
Modern XMLTier 2
Core XML 1.0NamespacesXPATH 1.0/2.0/3.0
XML Canonicalization
Transform/Query
XSLT 1.0/2.0/3.0XQuery 1.0/3.0
XSLT 1.0/2.0 (in browser)
Processing SAX, DOM XProc?, XOM
Other XML Catalog XForms
Schema SchematronXML Schema 1.0
RELAX-NGXML Schema 1.1
Semantics RDFOWL
SPARQLSPARQL Update
Vocabularies* SVG‘Office’ Doc ML….
MathMLDocbookDITAXHTML
- Amended from XML Amsterdam 2012 Keynote
XML – The Good Parts
Dependency Adoption (technology selection)
Helter skelter
Dependency Adoption
Helter skelterhttp://upload.wikimedia.org/wikipedia/commons/thumb/b/ba/Helter_skelter.jpg/440px-Helter_skelter.jpg
Its more like this
The right Tool
Obligatory Jedi slide
But it works!
Java and XML
xml:Father- "XML gives Java something to do.”
• XML, Java, and the future of the Web 1997, Jon Bosak - http://www.ibiblio.org/pub/sun-info/standards/xml/why/xmlapps.htm
• SAX,DOM• Unicode support• Distributed
• Caring and feeding of java vm• Invoke abstraction (classpath, jar fun)
Do Java and XML work better together?
Not enough time
Not enough time
Desire to be Productive
10x programmers is not a myth• Augustine, N. R. 1979. "Augustine’s Laws and Major System Development Programs." Defense Systems Management
Review: 50-76.• Boehm, Barry W., and Philip N. Papaccio. 1988. "Understanding and Controlling Software Costs." IEEE Transactions on
Software Engineering SE-14, no. 10 (October): 1462-77.• Boehm, Barry, et al, 2000. Software Cost Estimation with Cocomo II, Boston, Mass.: Addison Wesley, 2000.• Boehm, Barry W., T. E. Gray, and T. Seewaldt. 1984. "Prototyping Versus Specifying: A Multiproject Experiment." IEEE
Transactions on Software Engineering SE-10, no. 3 (May): 290-303. Also in Jones 1986b.• Card, David N. 1987. "A Software Technology Evaluation Program." Information and Software Technology 29, no. 6
(July/August): 291-300.• Curtis, Bill. 1981. "Substantiating Programmer Variability." Proceedings of the IEEE 69, no. 7: 846.• Curtis, Bill, et al. 1986. "Software Psychology: The Need for an Interdisciplinary Program." Proceedings of the IEEE 74, no. 8:
1092-1106.• DeMarco, Tom, and Timothy Lister. 1985. "Programmer Performance and the Effects of the Workplace." Proceedings of the
8th International Conference on Software Engineering. Washington, D.C.: IEEE Computer Society Press, 268-72.• DeMarco, Tom and Timothy Lister, 1999. Peopleware: Productive Projects and Teams, 2d Ed. New York: Dorset House, 1999.• Mills, Harlan D. 1983. Software Productivity. Boston, Mass.: Little, Brown.• Sackman, H., W.J. Erikson, and E. E. Grant. 1968. "Exploratory Experimental Studies Comparing Online and Offline
Programming Performance." Communications of the ACM 11, no. 1 (January): 3-11.• Valett, J., and F. E. McGarry. 1989. "A Summary of Software Measurement Experiences in the Software Engineering
Laboratory." Journal of Systems and Software 9, no. 2 (February): 137-48.• Weinberg, Gerald M., and Edward L. Schulman. 1974. "Goals and Performance in Computer Programming." Human Factors
16, no. 1 (February): 70-77.
Except when it is a myth
• technical debt – Maintainable/Upgrade– Add new features– Enterprise requirements
• more bugs• brittle code
Upfront designTechnology selectionBalancing trade-offs to achieve sum gain
reflection
• Desperate people do desperate things– Use all the XML technologies– Dependency adoption– Not the right tool– Not enough time– Being productive
avoid being a D.X.H.
• Careful technology selection• Manage your dependencies• Avoid distributing logic up/down/across tech
stack (hint: don’t use bash, makefiles, ant, etc)• Simplify interaction with Java (VM)• Model pipelines (hint: XProc)
avoid being a D.X.H.
• Use XProc (XMLCalabash)– XProc is designed for XML processing pipelines– Extensible– Simplify and aggregate logic
• Use XProc extension steps (depify) – XProc w/o extension steps is half of XProc– Provide façade over other technologies
We use pipelines• John Lumley – worked with DITA OT • Sandro Cirulli - workflow (pull scm, push db, process)• Nic Gibson – conversion workflows• Philip Fearon - types of workflows (seq and concurrent)
with XMLFlow• Andrew Sales – schematron on word docs (used Ant)• ….
• most talks mentioned workflow/pipeline– ~100 mentions in proceedings– guestimate ~6 mentions per hour during the talks
Desperate XProc Hacker• XProc learning curve
– v1.0 verbose in places– XProc generic by design– Some ‘Batteries not included’
• XProc v2.0 addresses this– Simplify connecting steps– Simplify parameters (maps)– Flow control– Metadata– Anything ‘flows’– avt/tvt– Syntactic optimisations
• depify provides a way to distribute and reuse extension steps
beats the problems that arise using ‘hairball’ approach
XMLCalabash & depify
• XMLCalabash – XProc processor– Norm Walsh – http://xmlcalabash.com/
• depify – XProc dependency management – http://depify.com/
XMLCalabash extension steps
package com.example.library;
import com.xmlcalabash.library.DefaultStep;… elided …import com.xmlcalabash.runtime.XAtomicStep;
@XMLCalabash( name = "ex:hello-world", type = "{http://example.org/xmlcalabash/steps}hello-world")
public class HelloWorld extends DefaultStep { private WritablePipe result = null;
public HelloWorld(XProcRuntime runtime, XAtomicStep step) { super(runtime,step); }
public void setOutput(String port, WritablePipe pipe) { result = pipe; }
public void reset() { result.resetWriter(); }
public void run() throws SaxonApiException { super.run();
… elided … tree.addText("Hello World");… elided …result.write(tree.getResult()); }}
<p:library version="1.0" xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:ex="http://example.org/xmlcalabash/steps">
<p:declare-step type="ex:hello-world"> <p:output port="result"/> </p:declare-step> </p:library>
Library for the step
M Filemode Length Date Time File- ---------- -------- ----------- -------- ----------------------------------------------------- drwxr-xr-x 0 8-Mar-2015 10:43:38 META-INF/ -rw-r--r-- 843 8-Mar-2015 10:43:38 META-INF/MANIFEST.MF drwxr-xr-x 0 8-Mar-2015 10:43:38 com/ drwxr-xr-x 0 8-Mar-2015 10:43:38 com/example/ drwxr-xr-x 0 8-Mar-2015 10:43:38 com/example/library/ -rw-r--r-- 2062 8-Mar-2015 10:43:38 com/example/library/HelloWorld.class drwxr-xr-x 0 8-Mar-2015 10:43:38 META-INF/annotations/ -rw-r--r-- 31 8-Mar-2015 10:43:38 META-INF/annotations/com.xmlcalabash.core.XMLCalabash -rw-r--r-- 294 19-Feb-2015 15:41:00 example-library.xpl- ---------- -------- ----------- -------- ----------------------------------------------------- 3230 9 files
library xpl included in jar
• depify.com
• depify client
• depify github
depify
• Usage of XMLCalabash• Usage of depify• Develop your own step• Distribute with depify
depify future
• Gradle plugin• Depify into other repos to enable day zero
bootstrap (w/ yum, etc)• Integration (expath package management)• More steps• More steps• More steps
Summary
• XProc extension steps provide reuse• XProc v2.0 lets you work in broader context• Pipelines manage complexity• depify specifically built for XProc
(XMLcalabash)• Reuse with existing mechanisms (ex. Maven)
How to Become a Delighted XProc Hacker
• model pipelines with XProc (XMLCalabash)• try out ext steps (depify)• GSD• reuse and distribute new steps (depify)• goto pub
• Stop using bash, makefiles, ant or bending XML tech to control main loop
• Stop making adhoc pipelines
<pub/>
Thank you for your attention and time, questions ?