spec: a lisp-flavoured type system

38
clojure.spec: a lisp-flavoured type system @sbelak [email protected]

Upload: simon-belak

Post on 29-Jan-2018

410 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Spec: a lisp-flavoured type system

clojure.spec: a lisp-flavoured

type system@sbelak

[email protected]

Page 2: Spec: a lisp-flavoured type system

Clojure at a glance

• (lisp (running-on :JVM))

• Functional, dynamic, immutable

• Excellent concurrency and state management support

• Unparalleled data manipulation

Page 3: Spec: a lisp-flavoured type system
Page 4: Spec: a lisp-flavoured type system

Motivation• Communication (docs are not enough)

• A lot of computation in Clojure is encoded with (naked) data

• Generative (property based) testing

• Manual parsing and error reporting is tedious and error prone

Page 5: Spec: a lisp-flavoured type system

Enter clojure.spec

Page 6: Spec: a lisp-flavoured type system

Writing a spec should enable automatic:

• Validation

• Error reporting

• Destructuring

• Instrumentation

• Test-data generation

• Generative test generation

*http://clojure.org/about/spec

Page 7: Spec: a lisp-flavoured type system

Parsing with DerivativesA Functional Pearl

Matthew Might David DaraisUniversity of Utah

[email protected], [email protected]

Daniel SpiewakUniversity of Wisconsin, Milwaukee

[email protected]

AbstractWe present a functional approach to parsing unrestricted context-free grammars based on Brzozowski’s derivative of regular expres-sions. If we consider context-free grammars as recursive regular ex-pressions, Brzozowski’s equational theory extends without modifi-cation to context-free grammars (and it generalizes to parser combi-nators). The supporting actors in this story are three concepts famil-iar to functional programmers—laziness, memoization and fixedpoints; these allow Brzozowski’s original equations to be translit-erated into purely functional code in about 30 lines spread overthree functions.

Yet, this almost impossibly brief implementation has a draw-back: its performance is sour—in both theory and practice. Theculprit? Each derivative can double the size of a grammar, and withit, the cost of the next derivative.

Fortunately, much of the new structure inflicted by the derivativeis either dead on arrival, or it dies after the very next derivative.To eliminate it, we once again exploit laziness and memoizationto transliterate an equational theory that prunes such debris intoworking code. Thanks to this compaction, parsing times becomereasonable in practice.

We equip the functional programmer with two equational theo-ries that, when combined, make for an abbreviated understandingand implementation of a system for parsing context-free languages.

Categories and Subject Descriptors F.4.3 [Formal Languages]:Operations on languages

General Terms Algorithms, Languages, Theory

Keywords formal languages, parsing, derivative, regular expres-sions, context-free grammar, parser combinator

1. IntroductionIt is easy to lose sight of the essence of parsing in the minutiaeof forbidden grammars, shift-reduce conflicts and opaque actiontables. To the extent that understanding in computer science comesfrom implementation, a deeper appreciation of parsing often seemsout of reach. Brzozowski’s derivative upsets this calculus of effortand understanding to make the construction of parsing systemsaccessible to the common functional programmer.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.ICFP’11, September 19–21, 2011, Tokyo, Japan.Copyright c� 2011 ACM 978-1-4503-0865-6/11/09. . . $10.00

The derivative of regular expressions [1], if gently temperedwith laziness, memoization and fixed points, acts immediatelyas a pure, functional technique for generating parse forests fromarbitrary context-free grammars. Despite—even because of—itssimplicity, the derivative transparently handles ambiguity, left-recursion, right-recursion, ill-founded recursion or any combina-tion thereof.

1.1 Outline• After a review of formal languages, we introduce Brzozowski’s

derivative for regular languages. A brief implementation high-lights its rugged elegance.

• As our implementation of the derivative engages context-freelanguages, non-termination emerges as a problem.

• Three small, surgical modifications to the implementation (butnot the theory)—laziness, memoization and fixed points—guarantee termination. Termination means the derivative canrecognize arbitrary context-free languages.

• We generalize the derivative to parsers and parser combinatorsthrough an equational theory for generating parse forests.

• We find poor performance in both theory and practice. Theroot cause is vestigial structure left in the grammar by earlierderivatives; this structure is malignant: though it no longerserves a purpose, it still grows in size with each derivative.

• We develop an optimization—compaction—that collapses gram-mars by excising this mass. Compaction, like the derivative,comes from a clean, equational theory that exploits lazinessand memoization in its transliteration to working code.

In this article, we provide code in Racket, but it should adapt readilyto any Lisp. All code and test cases within or referenced from thisarticle (plus additional implementations in Haskell and Scala) areavailable from:

http://www.ucombinator.org/projects/parsing/

2. Preliminary: Formal languagesA language L is a set of strings. A string w is a sequence ofcharacters from an alphabet A. (From the parser’s perspective, a“character” might be a token/terminal.)

Two atomic languages arise often in formal languages: theempty language and the null (or empty-string) language:• The empty language ; contains no strings at all:

; = {} .

• The null language ✏ contains only the length-zero “null” string:

✏ = {w} where length(w) = 0.

Page 8: Spec: a lisp-flavoured type system

Parsing with derivates

• Brzozowski derivative u−1S = { v ∈ Σ*: uv ∈ S }the set of all rest-strings obtainable from a string in S by cutting off its prefix u

• code along: blog.klipse.tech/clojure/2016/10/02/parsing-with-derivatives-regular.html

Page 9: Spec: a lisp-flavoured type system

Regular expressions (for data) +

arbitrary predicates +

data transformations (conformers)

clojure.spec

Page 10: Spec: a lisp-flavoured type system
Page 11: Spec: a lisp-flavoured type system

Two schools of thinking

Page 12: Spec: a lisp-flavoured type system

System paradigmLanguage paradigm

infoq.com/presentations/Mixin-based-Inheritance realworldclojure.com/the-system-paradigm

[Language:] a formal system of signs governed by grammatical rules of combination to communicate meaning

[System:] a set of interacting or interdependent components forming an integrated whole

Page 13: Spec: a lisp-flavoured type system

The system paradigm

1. Nibble at the problem from different directions

2. Compose partial solutions into the final solution

In lisp you build systems by altering the living, running one right in front of you — R. Gabriel

Pascal is for building pyramids—imposing, breathtaking, static structures built by armies pushing heavy blocks into place. Lisp is for building organisms.

— A. Perlis

Page 14: Spec: a lisp-flavoured type system

The Artificial Intelligence people were endeavouring to write programs that no one knew how to write. The idea that you could sit down and say: 'Well, here is my problem, here are the requirements, let me come up with a specification and now code that up' (was) completely crazy as far as the AI people were concerned' The only way to write AI programs, then (as it still is now) was by taking an exploratory approach to development. The only way to do it was to experiment. 'Let me try this, let me add that. Let me try to add this fuzzy concept, let me try to add a scheduler, let me add agendas, let me add resources, let me have resource-limitations' ... you're not constructing it like making a ton of source code and compiling it periodically, you're constructing it the way you construct a city: build some of it, it's running all the time, so it's kind of like a live programming language.

— R. Gabriel

Page 15: Spec: a lisp-flavoured type system

Live programming

Page 16: Spec: a lisp-flavoured type system

No clear delineation between environment

and work code

Page 17: Spec: a lisp-flavoured type system

spec: a fully interactive à la carte type system?

Page 18: Spec: a lisp-flavoured type system

“Composition is about decomposing.”

— E. Normand

“Good design is not about making grand plans, but about taking

things apart.” — R. Hickey

Page 19: Spec: a lisp-flavoured type system

Destructuring• Pull apart and name

• Factor out data shape

Page 20: Spec: a lisp-flavoured type system

Data macros

• Recursive transformations into canonical form

• Do more without code macros

* juxt.pro/blog/posts/data-macros.html

Page 21: Spec: a lisp-flavoured type system
Page 22: Spec: a lisp-flavoured type system
Page 23: Spec: a lisp-flavoured type system

Decomplect validation and encoding

Page 24: Spec: a lisp-flavoured type system

Generative testing • Limitations

• sequences with internal structure (time series etc.)

• generic higher-order functions (e.g. map)

• Uncovers numerical instabilities

• Better mocking in the REPL

Page 25: Spec: a lisp-flavoured type system

TDD vs.

REPL-driven development vs.

spec-driven development?

Page 26: Spec: a lisp-flavoured type system

Everything is data(aka. the lisp way)

• generation

• introspection

• parsable errors

Page 27: Spec: a lisp-flavoured type system

Building on top of spec

Page 28: Spec: a lisp-flavoured type system

Queryable data descriptions

Turn spec into a graph

order

promo code

useraccount age

countryalways always

alwaysmaybe

Page 29: Spec: a lisp-flavoured type system

Case study: autogenerating materialised views

KafkaMaterialised views

Events External data

Automatic view generation• Event & attribute ontology

• Manual (via spec) • Inferred

• Statistical analysis (seasonality detection, outlier removal, …)

Onyx Onyx

Onyx

Page 30: Spec: a lisp-flavoured type system

Automatic view generation

1. Walk spec registry

2. Apply rules

1. Define new view (spec)

2. Trigger Onyx job that creates the view

Page 31: Spec: a lisp-flavoured type system

Missing pieces

github.com/arohner/spectrum — static analysis of (spec annotated) Clojure code

github.com/stathissideris/spec-provider — infer spec from examples

github.com/typedclojure/auto-annotation — infer spec from tests

Page 32: Spec: a lisp-flavoured type system

Takeouts

Page 33: Spec: a lisp-flavoured type system

Decomplect everything

Page 34: Spec: a lisp-flavoured type system

Everything should be live and interactive

Page 35: Spec: a lisp-flavoured type system

Blurring the line between environment and work is

a powerful idea

Page 36: Spec: a lisp-flavoured type system

Queryable data descriptions supercharge interactive development and are a great building block for automation

Page 38: Spec: a lisp-flavoured type system

clojure.org/about/spec

matt.might.net/papers/might2011derivatives.pdf

infoq.com/presentations/Mixin-based-Inheritance

realworldclojure.com/the-system-paradigm

juxt.pro/blog/posts/data-macros.html

blog.klipse.tech/clojure/2016/10/02/parsing-with-derivatives-regular.html

indiegogo.com/projects/typed-clojure-clojure-spec-auto-annotations#/

github.com/arohner/spectrum

github.com/stathissideris/spec-provider

purelyfunctional.tv/issues/clojure-gazette-192-composition-is-about-decomposing