living with-spec

22
Living with spec @sbelak [email protected]

Upload: simon-belak

Post on 14-Apr-2017

166 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Living with-spec

Living with spec@sbelak

[email protected]

Page 2: Living with-spec

Parsing with DerivativesA Functional Pearl

Matthew Might David DaraisUniversity of Utah

[email protected], [email protected]

Daniel SpiewakUniversity of Wisconsin, Milwaukee

[email protected]

AbstractWe present a functional approach to parsing unrestricted context-free grammars based on Brzozowski’s derivative of regular expres-sions. If we consider context-free grammars as recursive regular ex-pressions, Brzozowski’s equational theory extends without modifi-cation to context-free grammars (and it generalizes to parser combi-nators). The supporting actors in this story are three concepts famil-iar to functional programmers—laziness, memoization and fixedpoints; these allow Brzozowski’s original equations to be translit-erated into purely functional code in about 30 lines spread overthree functions.

Yet, this almost impossibly brief implementation has a draw-back: its performance is sour—in both theory and practice. Theculprit? Each derivative can double the size of a grammar, and withit, the cost of the next derivative.

Fortunately, much of the new structure inflicted by the derivativeis either dead on arrival, or it dies after the very next derivative.To eliminate it, we once again exploit laziness and memoizationto transliterate an equational theory that prunes such debris intoworking code. Thanks to this compaction, parsing times becomereasonable in practice.

We equip the functional programmer with two equational theo-ries that, when combined, make for an abbreviated understandingand implementation of a system for parsing context-free languages.

Categories and Subject Descriptors F.4.3 [Formal Languages]:Operations on languages

General Terms Algorithms, Languages, Theory

Keywords formal languages, parsing, derivative, regular expres-sions, context-free grammar, parser combinator

1. IntroductionIt is easy to lose sight of the essence of parsing in the minutiaeof forbidden grammars, shift-reduce conflicts and opaque actiontables. To the extent that understanding in computer science comesfrom implementation, a deeper appreciation of parsing often seemsout of reach. Brzozowski’s derivative upsets this calculus of effortand understanding to make the construction of parsing systemsaccessible to the common functional programmer.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.ICFP’11, September 19–21, 2011, Tokyo, Japan.Copyright c� 2011 ACM 978-1-4503-0865-6/11/09. . . $10.00

The derivative of regular expressions [1], if gently temperedwith laziness, memoization and fixed points, acts immediatelyas a pure, functional technique for generating parse forests fromarbitrary context-free grammars. Despite—even because of—itssimplicity, the derivative transparently handles ambiguity, left-recursion, right-recursion, ill-founded recursion or any combina-tion thereof.

1.1 Outline• After a review of formal languages, we introduce Brzozowski’s

derivative for regular languages. A brief implementation high-lights its rugged elegance.

• As our implementation of the derivative engages context-freelanguages, non-termination emerges as a problem.

• Three small, surgical modifications to the implementation (butnot the theory)—laziness, memoization and fixed points—guarantee termination. Termination means the derivative canrecognize arbitrary context-free languages.

• We generalize the derivative to parsers and parser combinatorsthrough an equational theory for generating parse forests.

• We find poor performance in both theory and practice. Theroot cause is vestigial structure left in the grammar by earlierderivatives; this structure is malignant: though it no longerserves a purpose, it still grows in size with each derivative.

• We develop an optimization—compaction—that collapses gram-mars by excising this mass. Compaction, like the derivative,comes from a clean, equational theory that exploits lazinessand memoization in its transliteration to working code.

In this article, we provide code in Racket, but it should adapt readilyto any Lisp. All code and test cases within or referenced from thisarticle (plus additional implementations in Haskell and Scala) areavailable from:

http://www.ucombinator.org/projects/parsing/

2. Preliminary: Formal languagesA language L is a set of strings. A string w is a sequence ofcharacters from an alphabet A. (From the parser’s perspective, a“character” might be a token/terminal.)

Two atomic languages arise often in formal languages: theempty language and the null (or empty-string) language:• The empty language ; contains no strings at all:

; = {} .

• The null language ✏ contains only the length-zero “null” string:

✏ = {w} where length(w) = 0.

Page 3: Living with-spec

blog.klipse.tech/clojure/2016/10/02/parsing-with-derivatives-regular.html

Page 4: Living with-spec

Writing a spec should enable automatic:

• Validation

• Error reporting

• Destructuring

• Instrumentation

• Test-data generation

• Generative test generation

*http://clojure.org/about/spec

Page 5: Living with-spec

“Composition is about decomposing.”

— E. Normand

Page 6: Living with-spec

The code base• ~15k loc

• ETL

• Risk-hedging

• Demand-responsive pricing

• Packing & routing

• Internal BI tools

• …

Page 7: Living with-spec

Validation• API boundaries

• Structured errors

• Fail fast = more context

• nil punning

• Multiple airities

Page 8: Living with-spec

Friendlier error messages• Pluggable explanations via s/*explain-out*

• Capture common mistakes

• Provide hints

• Limitations

• One explainer function for all

• Don’t know which spec (only what went wrong)

Page 9: Living with-spec

Destructuring• Pull apart and name

• Patterns

• case (dispatch on tag)

• core.match

• Separate data description and transformation

Page 10: Living with-spec

Data macros

• Recursive transformations into canonical form

• s/conformer

• Do more without code macros

* juxt.pro/blog/posts/data-macros.html

Page 11: Living with-spec
Page 12: Living with-spec
Page 13: Living with-spec
Page 14: Living with-spec

Generative testing • Limitations

• sequences with internal structure (time series etc.)

• generic higher-order functions (e.g. map)

• Uncovers numerical instabilities

• s/exercise for mocking

Page 15: Living with-spec

Queryable data descriptions:

• s/registry, s/form

• Build a graph

• No inline docs :(

Interact with your type system!

Page 16: Living with-spec

Case study: autogenerating materialised views

KafkaMaterialised views

Events External data

Automatic view generation• Event & attribute ontology

• Manual (via spec) • Inferred

• Statistical analysis (seasonality detection, outlier removal, …)

Onyx Onyx

Onyx

Page 17: Living with-spec

Simple first, then Easy

Page 18: Living with-spec
Page 19: Living with-spec

Going forward

Page 20: Living with-spec

My favourites

• Data macros

• Destructuring

• Queryable data descriptions

• Structured errors

Page 22: Living with-spec

clojure.org/about/spec

matt.might.net/papers/might2011derivatives.pdf

infoq.com/presentations/Mixin-based-Inheritance

realworldclojure.com/the-system-paradigm

juxt.pro/blog/posts/data-macros.html

blog.klipse.tech/clojure/2016/10/02/parsing-with-derivatives-regular.html

indiegogo.com/projects/typed-clojure-clojure-spec-auto-annotations#/

github.com/arohner/spectrum