diaries of a desperate (xml|xproc) hacker. james fuller lead engineer | marklogic

46
Diaries of a Desperate (XML|XProc) Hacker

Upload: vernon-west

Post on 30-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Diaries of a Desperate (XML|XProc) Hacker

Page 2: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Diaries of a Desperate (XML|XProc) Hacker

James FullerLead Engineer | MarkLogic

Page 3: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Background

• Engineer on MarkLogic API team (History meters, Management API, etc…)

• W3C XML Processing WG (XProc v2.0)• 2001 started with XML tech (EXSLT),XML

Prague, etc… • Open source contrib.

• Thank you to the organisers of XProc XML London 2015

Page 4: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Agenda

1. XML Hacker Desperation2. XMLCalabash & depify3. Show & Tell4. XProc Hacker Desperation5. Summary6. Goto pub

* Raise your hand to ask question

* Yes, I am going to ‘powerpoint’ you

Page 5: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

xkcd.com - http://xkcd.com/208/ [xkcd-ref]

TheD.P.H.

Email !!!

Page 6: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

D.P.H. – a twinkling in SGML eye

• Desperate Perl Hacker – Paul Grosso 1997 xml-dev link – Google images ‘desperate perl hacker’ link– Etymological cousin of ‘Just Another Perl Hacker’

(JAPH) – Randal Schwartz aka Merlin• What’s it all about ?– GSD– Opaque One liners (Perl Golf encouraged)– Even better if (regex|pipes|sed|awk) involved– Challenge: Be able to munge XML with Perl

Page 7: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Desperate XML Hacker

• GAD (Get it All Done) with XML Stack• ‘clever’ (and|or) ‘clear’ • Highly productive, albeit marooned and

anxious on ‘XML island’• Working with xml means working with

documents and that means working with document workflows

Page 8: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

All programmers are desperate

emacs

xpath

xslt

xquery

marklogic

xml

emacs

json

java

gradle

ant

bash…..

Page 9: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

• Day 1 - transform an xml doc with XSLT• Day 2 - run transform on set of docs• Day 3 - generate multiple output formats• Day 4 - read docs from database• Day 5 - put results into database• Day 6 - notify when its done• Day 7 - run assertions and validate results• Day 8 - generate png from svg for each document• Day 8 - zip up files and upload them (w/ oauth)• Day 9 - create EPub• And so forth …

Page 10: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Technology Selection – XSLT– XQuery– Bash scripts– Makefiles– Ant– Java

– All of the above ?

Page 11: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Adhoc pipelines

TRANSFORM

GENERATE

PACKAGE

zip

notify

upload

Page 12: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Pipelines manage complexity

• Transformation decomposition is the key to complexity management, just ask:– Henry Ford– Herbert Simon (The Two Watchmakers – “The Architecture of

Complexity”)– George Miller (7+/-2)– Adam Smith (An Inquiry into the Nature And Causes of the

Wealth of Nations,1776)– Any electrical/chemical engineer– Michael A. Jackson

[McGrath2004] Sean McGrath. Performing impossible feats of XML processing with pipelining, Proc XML Open 2004,

• Easy to build, test and reuse• Segregation of business rules from grammar rules• Enable group collaboration

Page 13: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Michael Kay Balisage 2009 – ‘You Pull, I’ll Push: on the Polarity of Pipelines’

• ‘the code of each step in the pipeline is kept very simple’

• ‘very easy to assemble an application from a set of components, thus maximizing the potential for component reuse’

• ‘there is no requirement that each step in a pipeline should use the same technology; it's easy to mix XSLT, XQuery, Java and so on in different stages.’

http://www.balisage.net/Proceedings/vol3/html/Kay01/BalisageVol3-Kay01.html

Page 14: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic
Page 15: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Use all the XML technologies …

Page 16: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Modern XMLTier 1

Modern XMLTier 2

Core XML 1.0NamespacesXPATH 1.0/2.0/3.0

XML Canonicalization

Transform/Query

XSLT 1.0/2.0/3.0XQuery 1.0/3.0

XSLT 1.0/2.0 (in browser)

Processing SAX, DOM XProc?, XOM

Other XML Catalog XForms

Schema SchematronXML Schema 1.0

RELAX-NGXML Schema 1.1

Semantics RDFOWL

SPARQLSPARQL Update

Vocabularies* SVG‘Office’ Doc ML….

MathMLDocbookDITAXHTML

- Amended from XML Amsterdam 2012 Keynote

XML – The Good Parts

Page 17: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Dependency Adoption (technology selection)

Page 18: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Helter skelter

Dependency Adoption

Page 19: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Helter skelterhttp://upload.wikimedia.org/wikipedia/commons/thumb/b/ba/Helter_skelter.jpg/440px-Helter_skelter.jpg

Its more like this

Page 20: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

The right Tool

Page 21: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Obligatory Jedi slide

Page 22: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

But it works!

Page 23: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Java and XML

Page 24: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

xml:Father- "XML gives Java something to do.”

• XML, Java, and the future of the Web 1997, Jon Bosak - http://www.ibiblio.org/pub/sun-info/standards/xml/why/xmlapps.htm

• SAX,DOM• Unicode support• Distributed

• Caring and feeding of java vm• Invoke abstraction (classpath, jar fun)

Page 25: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Do Java and XML work better together?

Page 26: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Not enough time

Page 27: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Not enough time

Page 28: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Desire to be Productive

Page 29: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

10x programmers is not a myth• Augustine, N. R. 1979. "Augustine’s Laws and Major System Development Programs." Defense Systems Management

Review: 50-76.• Boehm, Barry W., and Philip N. Papaccio. 1988. "Understanding and Controlling Software Costs." IEEE Transactions on

Software Engineering SE-14, no. 10 (October): 1462-77.• Boehm, Barry, et al, 2000. Software Cost Estimation with Cocomo II, Boston, Mass.: Addison Wesley, 2000.• Boehm, Barry W., T. E. Gray, and T. Seewaldt. 1984. "Prototyping Versus Specifying: A Multiproject Experiment." IEEE

Transactions on Software Engineering SE-10, no. 3 (May): 290-303. Also in Jones 1986b.• Card, David N. 1987. "A Software Technology Evaluation Program." Information and Software Technology 29, no. 6

(July/August): 291-300.• Curtis, Bill. 1981. "Substantiating Programmer Variability." Proceedings of the IEEE 69, no. 7: 846.• Curtis, Bill, et al. 1986. "Software Psychology: The Need for an Interdisciplinary Program." Proceedings of the IEEE 74, no. 8:

1092-1106.• DeMarco, Tom, and Timothy Lister. 1985. "Programmer Performance and the Effects of the Workplace." Proceedings of the

8th International Conference on Software Engineering. Washington, D.C.: IEEE Computer Society Press, 268-72.• DeMarco, Tom and Timothy Lister, 1999. Peopleware: Productive Projects and Teams, 2d Ed. New York: Dorset House, 1999.• Mills, Harlan D. 1983. Software Productivity. Boston, Mass.: Little, Brown.• Sackman, H., W.J. Erikson, and E. E. Grant. 1968. "Exploratory Experimental Studies Comparing Online and Offline

Programming Performance." Communications of the ACM 11, no. 1 (January): 3-11.• Valett, J., and F. E. McGarry. 1989. "A Summary of Software Measurement Experiences in the Software Engineering

Laboratory." Journal of Systems and Software 9, no. 2 (February): 137-48.• Weinberg, Gerald M., and Edward L. Schulman. 1974. "Goals and Performance in Computer Programming." Human Factors

16, no. 1 (February): 70-77.

Page 30: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Except when it is a myth

• technical debt – Maintainable/Upgrade– Add new features– Enterprise requirements

• more bugs• brittle code

Upfront designTechnology selectionBalancing trade-offs to achieve sum gain

Page 31: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

reflection

• Desperate people do desperate things– Use all the XML technologies– Dependency adoption– Not the right tool– Not enough time– Being productive

Page 32: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

avoid being a D.X.H.

• Careful technology selection• Manage your dependencies• Avoid distributing logic up/down/across tech

stack (hint: don’t use bash, makefiles, ant, etc)• Simplify interaction with Java (VM)• Model pipelines (hint: XProc)

Page 33: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

avoid being a D.X.H.

• Use XProc (XMLCalabash)– XProc is designed for XML processing pipelines– Extensible– Simplify and aggregate logic

• Use XProc extension steps (depify) – XProc w/o extension steps is half of XProc– Provide façade over other technologies

Page 34: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

We use pipelines• John Lumley – worked with DITA OT • Sandro Cirulli - workflow (pull scm, push db, process)• Nic Gibson – conversion workflows• Philip Fearon - types of workflows (seq and concurrent)

with XMLFlow• Andrew Sales – schematron on word docs (used Ant)• ….

• most talks mentioned workflow/pipeline– ~100 mentions in proceedings– guestimate ~6 mentions per hour during the talks

Page 35: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Desperate XProc Hacker• XProc learning curve

– v1.0 verbose in places– XProc generic by design– Some ‘Batteries not included’

• XProc v2.0 addresses this– Simplify connecting steps– Simplify parameters (maps)– Flow control– Metadata– Anything ‘flows’– avt/tvt– Syntactic optimisations

• depify provides a way to distribute and reuse extension steps

beats the problems that arise using ‘hairball’ approach

Page 36: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

XMLCalabash & depify

• XMLCalabash – XProc processor– Norm Walsh – http://xmlcalabash.com/

• depify – XProc dependency management – http://depify.com/

Page 37: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

XMLCalabash extension steps

Page 38: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

package com.example.library;

import com.xmlcalabash.library.DefaultStep;… elided …import com.xmlcalabash.runtime.XAtomicStep;

@XMLCalabash( name = "ex:hello-world", type = "{http://example.org/xmlcalabash/steps}hello-world")

public class HelloWorld extends DefaultStep { private WritablePipe result = null;

public HelloWorld(XProcRuntime runtime, XAtomicStep step) { super(runtime,step); }

public void setOutput(String port, WritablePipe pipe) { result = pipe; }

public void reset() { result.resetWriter(); }

public void run() throws SaxonApiException { super.run();

… elided … tree.addText("Hello World");… elided …result.write(tree.getResult()); }}

Page 39: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

<p:library version="1.0" xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:ex="http://example.org/xmlcalabash/steps">

<p:declare-step type="ex:hello-world"> <p:output port="result"/> </p:declare-step> </p:library>

Library for the step

Page 40: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

M Filemode Length Date Time File- ---------- -------- ----------- -------- ----------------------------------------------------- drwxr-xr-x 0 8-Mar-2015 10:43:38 META-INF/ -rw-r--r-- 843 8-Mar-2015 10:43:38 META-INF/MANIFEST.MF drwxr-xr-x 0 8-Mar-2015 10:43:38 com/ drwxr-xr-x 0 8-Mar-2015 10:43:38 com/example/ drwxr-xr-x 0 8-Mar-2015 10:43:38 com/example/library/ -rw-r--r-- 2062 8-Mar-2015 10:43:38 com/example/library/HelloWorld.class drwxr-xr-x 0 8-Mar-2015 10:43:38 META-INF/annotations/ -rw-r--r-- 31 8-Mar-2015 10:43:38 META-INF/annotations/com.xmlcalabash.core.XMLCalabash -rw-r--r-- 294 19-Feb-2015 15:41:00 example-library.xpl- ---------- -------- ----------- -------- ----------------------------------------------------- 3230 9 files

library xpl included in jar

Page 41: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

• depify.com

• depify client

• depify github

depify

Page 42: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

• Usage of XMLCalabash• Usage of depify• Develop your own step• Distribute with depify

Page 43: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

depify future

• Gradle plugin• Depify into other repos to enable day zero

bootstrap (w/ yum, etc)• Integration (expath package management)• More steps• More steps• More steps

Page 44: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

Summary

• XProc extension steps provide reuse• XProc v2.0 lets you work in broader context• Pipelines manage complexity• depify specifically built for XProc

(XMLcalabash)• Reuse with existing mechanisms (ex. Maven)

Page 45: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

How to Become a Delighted XProc Hacker

• model pipelines with XProc (XMLCalabash)• try out ext steps (depify)• GSD• reuse and distribute new steps (depify)• goto pub

• Stop using bash, makefiles, ant or bending XML tech to control main loop

• Stop making adhoc pipelines

Page 46: Diaries of a Desperate (XML|XProc) Hacker. James Fuller Lead Engineer | MarkLogic

<pub/>

Thank you for your attention and time, questions ?