diaries of a desperate (xml|xproc) hacker. james fuller lead engineer | marklogic

Post on 30-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Diaries of a Desperate (XML|XProc) Hacker

Diaries of a Desperate (XML|XProc) Hacker

James FullerLead Engineer | MarkLogic

Background

• Engineer on MarkLogic API team (History meters, Management API, etc…)

• W3C XML Processing WG (XProc v2.0)• 2001 started with XML tech (EXSLT),XML

Prague, etc… • Open source contrib.

• Thank you to the organisers of XProc XML London 2015

Agenda

1. XML Hacker Desperation2. XMLCalabash & depify3. Show & Tell4. XProc Hacker Desperation5. Summary6. Goto pub

* Raise your hand to ask question

* Yes, I am going to ‘powerpoint’ you

xkcd.com - http://xkcd.com/208/ [xkcd-ref]

TheD.P.H.

Email !!!

D.P.H. – a twinkling in SGML eye

• Desperate Perl Hacker – Paul Grosso 1997 xml-dev link – Google images ‘desperate perl hacker’ link– Etymological cousin of ‘Just Another Perl Hacker’

(JAPH) – Randal Schwartz aka Merlin• What’s it all about ?– GSD– Opaque One liners (Perl Golf encouraged)– Even better if (regex|pipes|sed|awk) involved– Challenge: Be able to munge XML with Perl

Desperate XML Hacker

• GAD (Get it All Done) with XML Stack• ‘clever’ (and|or) ‘clear’ • Highly productive, albeit marooned and

anxious on ‘XML island’• Working with xml means working with

documents and that means working with document workflows

All programmers are desperate

emacs

xpath

xslt

xquery

marklogic

xml

emacs

json

java

gradle

ant

bash…..

• Day 1 - transform an xml doc with XSLT• Day 2 - run transform on set of docs• Day 3 - generate multiple output formats• Day 4 - read docs from database• Day 5 - put results into database• Day 6 - notify when its done• Day 7 - run assertions and validate results• Day 8 - generate png from svg for each document• Day 8 - zip up files and upload them (w/ oauth)• Day 9 - create EPub• And so forth …

Technology Selection – XSLT– XQuery– Bash scripts– Makefiles– Ant– Java

– All of the above ?

Adhoc pipelines

TRANSFORM

GENERATE

PACKAGE

zip

notify

upload

Pipelines manage complexity

• Transformation decomposition is the key to complexity management, just ask:– Henry Ford– Herbert Simon (The Two Watchmakers – “The Architecture of

Complexity”)– George Miller (7+/-2)– Adam Smith (An Inquiry into the Nature And Causes of the

Wealth of Nations,1776)– Any electrical/chemical engineer– Michael A. Jackson

[McGrath2004] Sean McGrath. Performing impossible feats of XML processing with pipelining, Proc XML Open 2004,

• Easy to build, test and reuse• Segregation of business rules from grammar rules• Enable group collaboration

Michael Kay Balisage 2009 – ‘You Pull, I’ll Push: on the Polarity of Pipelines’

• ‘the code of each step in the pipeline is kept very simple’

• ‘very easy to assemble an application from a set of components, thus maximizing the potential for component reuse’

• ‘there is no requirement that each step in a pipeline should use the same technology; it's easy to mix XSLT, XQuery, Java and so on in different stages.’

http://www.balisage.net/Proceedings/vol3/html/Kay01/BalisageVol3-Kay01.html

Use all the XML technologies …

Modern XMLTier 1

Modern XMLTier 2

Core XML 1.0NamespacesXPATH 1.0/2.0/3.0

XML Canonicalization

Transform/Query

XSLT 1.0/2.0/3.0XQuery 1.0/3.0

XSLT 1.0/2.0 (in browser)

Processing SAX, DOM XProc?, XOM

Other XML Catalog XForms

Schema SchematronXML Schema 1.0

RELAX-NGXML Schema 1.1

Semantics RDFOWL

SPARQLSPARQL Update

Vocabularies* SVG‘Office’ Doc ML….

MathMLDocbookDITAXHTML

- Amended from XML Amsterdam 2012 Keynote

XML – The Good Parts

Dependency Adoption (technology selection)

Helter skelter

Dependency Adoption

Helter skelterhttp://upload.wikimedia.org/wikipedia/commons/thumb/b/ba/Helter_skelter.jpg/440px-Helter_skelter.jpg

Its more like this

The right Tool

Obligatory Jedi slide

But it works!

Java and XML

xml:Father- "XML gives Java something to do.”

• XML, Java, and the future of the Web 1997, Jon Bosak - http://www.ibiblio.org/pub/sun-info/standards/xml/why/xmlapps.htm

• SAX,DOM• Unicode support• Distributed

• Caring and feeding of java vm• Invoke abstraction (classpath, jar fun)

Do Java and XML work better together?

Not enough time

Not enough time

Desire to be Productive

10x programmers is not a myth• Augustine, N. R. 1979. "Augustine’s Laws and Major System Development Programs." Defense Systems Management

Review: 50-76.• Boehm, Barry W., and Philip N. Papaccio. 1988. "Understanding and Controlling Software Costs." IEEE Transactions on

Software Engineering SE-14, no. 10 (October): 1462-77.• Boehm, Barry, et al, 2000. Software Cost Estimation with Cocomo II, Boston, Mass.: Addison Wesley, 2000.• Boehm, Barry W., T. E. Gray, and T. Seewaldt. 1984. "Prototyping Versus Specifying: A Multiproject Experiment." IEEE

Transactions on Software Engineering SE-10, no. 3 (May): 290-303. Also in Jones 1986b.• Card, David N. 1987. "A Software Technology Evaluation Program." Information and Software Technology 29, no. 6

(July/August): 291-300.• Curtis, Bill. 1981. "Substantiating Programmer Variability." Proceedings of the IEEE 69, no. 7: 846.• Curtis, Bill, et al. 1986. "Software Psychology: The Need for an Interdisciplinary Program." Proceedings of the IEEE 74, no. 8:

1092-1106.• DeMarco, Tom, and Timothy Lister. 1985. "Programmer Performance and the Effects of the Workplace." Proceedings of the

8th International Conference on Software Engineering. Washington, D.C.: IEEE Computer Society Press, 268-72.• DeMarco, Tom and Timothy Lister, 1999. Peopleware: Productive Projects and Teams, 2d Ed. New York: Dorset House, 1999.• Mills, Harlan D. 1983. Software Productivity. Boston, Mass.: Little, Brown.• Sackman, H., W.J. Erikson, and E. E. Grant. 1968. "Exploratory Experimental Studies Comparing Online and Offline

Programming Performance." Communications of the ACM 11, no. 1 (January): 3-11.• Valett, J., and F. E. McGarry. 1989. "A Summary of Software Measurement Experiences in the Software Engineering

Laboratory." Journal of Systems and Software 9, no. 2 (February): 137-48.• Weinberg, Gerald M., and Edward L. Schulman. 1974. "Goals and Performance in Computer Programming." Human Factors

16, no. 1 (February): 70-77.

Except when it is a myth

• technical debt – Maintainable/Upgrade– Add new features– Enterprise requirements

• more bugs• brittle code

Upfront designTechnology selectionBalancing trade-offs to achieve sum gain

reflection

• Desperate people do desperate things– Use all the XML technologies– Dependency adoption– Not the right tool– Not enough time– Being productive

avoid being a D.X.H.

• Careful technology selection• Manage your dependencies• Avoid distributing logic up/down/across tech

stack (hint: don’t use bash, makefiles, ant, etc)• Simplify interaction with Java (VM)• Model pipelines (hint: XProc)

avoid being a D.X.H.

• Use XProc (XMLCalabash)– XProc is designed for XML processing pipelines– Extensible– Simplify and aggregate logic

• Use XProc extension steps (depify) – XProc w/o extension steps is half of XProc– Provide façade over other technologies

We use pipelines• John Lumley – worked with DITA OT • Sandro Cirulli - workflow (pull scm, push db, process)• Nic Gibson – conversion workflows• Philip Fearon - types of workflows (seq and concurrent)

with XMLFlow• Andrew Sales – schematron on word docs (used Ant)• ….

• most talks mentioned workflow/pipeline– ~100 mentions in proceedings– guestimate ~6 mentions per hour during the talks

Desperate XProc Hacker• XProc learning curve

– v1.0 verbose in places– XProc generic by design– Some ‘Batteries not included’

• XProc v2.0 addresses this– Simplify connecting steps– Simplify parameters (maps)– Flow control– Metadata– Anything ‘flows’– avt/tvt– Syntactic optimisations

• depify provides a way to distribute and reuse extension steps

beats the problems that arise using ‘hairball’ approach

XMLCalabash & depify

• XMLCalabash – XProc processor– Norm Walsh – http://xmlcalabash.com/

• depify – XProc dependency management – http://depify.com/

XMLCalabash extension steps

package com.example.library;

import com.xmlcalabash.library.DefaultStep;… elided …import com.xmlcalabash.runtime.XAtomicStep;

@XMLCalabash( name = "ex:hello-world", type = "{http://example.org/xmlcalabash/steps}hello-world")

public class HelloWorld extends DefaultStep { private WritablePipe result = null;

public HelloWorld(XProcRuntime runtime, XAtomicStep step) { super(runtime,step); }

public void setOutput(String port, WritablePipe pipe) { result = pipe; }

public void reset() { result.resetWriter(); }

public void run() throws SaxonApiException { super.run();

… elided … tree.addText("Hello World");… elided …result.write(tree.getResult()); }}

<p:library version="1.0" xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:ex="http://example.org/xmlcalabash/steps">

<p:declare-step type="ex:hello-world"> <p:output port="result"/> </p:declare-step> </p:library>

Library for the step

M Filemode Length Date Time File- ---------- -------- ----------- -------- ----------------------------------------------------- drwxr-xr-x 0 8-Mar-2015 10:43:38 META-INF/ -rw-r--r-- 843 8-Mar-2015 10:43:38 META-INF/MANIFEST.MF drwxr-xr-x 0 8-Mar-2015 10:43:38 com/ drwxr-xr-x 0 8-Mar-2015 10:43:38 com/example/ drwxr-xr-x 0 8-Mar-2015 10:43:38 com/example/library/ -rw-r--r-- 2062 8-Mar-2015 10:43:38 com/example/library/HelloWorld.class drwxr-xr-x 0 8-Mar-2015 10:43:38 META-INF/annotations/ -rw-r--r-- 31 8-Mar-2015 10:43:38 META-INF/annotations/com.xmlcalabash.core.XMLCalabash -rw-r--r-- 294 19-Feb-2015 15:41:00 example-library.xpl- ---------- -------- ----------- -------- ----------------------------------------------------- 3230 9 files

library xpl included in jar

• depify.com

• depify client

• depify github

depify

• Usage of XMLCalabash• Usage of depify• Develop your own step• Distribute with depify

depify future

• Gradle plugin• Depify into other repos to enable day zero

bootstrap (w/ yum, etc)• Integration (expath package management)• More steps• More steps• More steps

Summary

• XProc extension steps provide reuse• XProc v2.0 lets you work in broader context• Pipelines manage complexity• depify specifically built for XProc

(XMLcalabash)• Reuse with existing mechanisms (ex. Maven)

How to Become a Delighted XProc Hacker

• model pipelines with XProc (XMLCalabash)• try out ext steps (depify)• GSD• reuse and distribute new steps (depify)• goto pub

• Stop using bash, makefiles, ant or bending XML tech to control main loop

• Stop making adhoc pipelines

<pub/>

Thank you for your attention and time, questions ?

top related