lecture 7 generate-test-aggregate in coq - nii.ac.jp · lecture 7 generate-test-aggregate in coq...
TRANSCRIPT
Lecture 7Generate-Test-Aggregate in Coq
NII Lectures Series
Frederic Loulergue
Universite d’Orleans – LIFO – PaMDA Team
October-November 2013
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 1 / 20
Outline
1 Introduction
2 GTA in Coq
3 Summary
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 2 / 20
Generate-Test-Aggregate
Generate-Test-Aggregate K. Emoto, S. Fischer, Z. Hu
User’s perspective:
G Generator to produce all candidates
T Tester for validity check of candidates
A Aggregator to build the final result
Optimization:
I Filter embedding: G T A ⇒ G A
I Semiring Fusion: G A ⇒ D&C
Papers: [1, 2]
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 3 / 20
GTA: Example
The Knapsack problem
Given:
I a knapsack that could contain items for at most a volume v
I a set of items each caracterised by a value and a volume
Find the most valuable selection of items
Items and knapsack
I pens .: U200, 1dl
I scissors ": U350, 1dl
I phone %: U5000, 2dl
Knapsack: 3dl
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 4 / 20
GTA Programming: Generator
Choose an adequate generator
A generator produces:
I a multi-set of candidates
I it should generate all necessary candidates
I it could generate invalid candidates
The Knapsack problem
I G generates all the subsets of the set of items
I G {.,",%} =*{}, {.}, {"}, {%}{.,"}, {",%}, {.,%}{.,",%}+
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 5 / 20
GTA Programming: Tester
Choose an adequate tester
I it removes invalid items according to a predicate
I it should be a homomorphism
The Knapsack problem
I T is based on the predicate:
p ≡ s 7→∑
i∈svolume ≤ 3
I T*{}, {.}, {"}, {%}, {.,"}, {",%}, {.,%}, {.,",%}+= *{}, {.}, {"}, {%}, {.,"}, {",%}, {.,%}, +
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 6 / 20
GTA Programming: Tester
For better performances
p ≡ s 7→⊕
i∈svolume ′(i) ≤ 3
where
I volume ′(i) = max volume(i) (3 + 1)
I x ⊕ y = max (x + y) (3 + 1)
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 7 / 20
GTA Programming: Aggregator
Choose an adequate aggregator
It produces a summary of the valid candidates
The Knapsack problem
Examples:
I the best candidate
I the two first best candidates
I the number of candidates
I . . .
A *{}, {.}, {"}, {%}, {.,"}, {",%}, {.,%}, += ({",%}, 5350U)
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 8 / 20
GTA Programming
I The naive program G T A has complexity O(2nn)
I with GTA framework: O(n)
Two transformations:
I Filter embedding: G T A ⇒ G A
I Semiring Fusion: G A ⇒ D&C
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 9 / 20
A verified version of GTA
The Goal
I to allow the user to write GTA programs
I type class mechanism to automatically derive a efficientdivide-and-conquer program
I prove the fusion theorems in CoqI extraction towards:
I OCaml + BSMLI Scala + Spark
Joint work with
I Kento Emoto (Kyushu University of Technology)
I Julien Tesson (LACL, Universite Paris-Est Creteil)
I Frederic Dabrowski (LIFO, Universite d’Orleans)
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 10 / 20
Status
It is an ongoing work !
3 to allow the user to write GTA programs
3 type class mechanism to automatically perform semiringfusion and filter embedding fusion
7 prove the fusion theorems in Coq
7 type class mechanism to automatically derive parallel BSMLprogram from a GTA specification
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 11 / 20
Writing GTA programs and automatic fusion in Coq
DEMO
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 12 / 20
GTA in Coq
We had to define data-structures: for example Bags.
I multiset in Coq standard library is not abstract, it is a specificimplementation using functions to represent multisets
I the multiset axiomatisation in the CoLoR library is nice butdoes not use type classes
How to axiomatise:
I try to find a minimal axiomatisation so that its realisation willnot be too complicated
I then additional results are derived from this axiomatisation
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 13 / 20
Automatic parallelisation
Basically we need:
I an automatic way to derive parallel versions ofhomomorphisms
I but in a generalised setting with user-defined equivalencerelations rather than Coq = equality
⇒ ongoing work
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 14 / 20
Proof of the fusion theorem
Generate, Test, and Aggregate 267
Proof. The following calculation combines previous observations and definitionsto show the claimed identity.
aggregate (filter (ok ◦ hom) ls)= { Partition, individual aggregation }!
m∈Mok m=True
(aggregate (filter ((m ≡) ◦ hom) ls))
= { Definition of f ls , and Definition 7 }postprocessM ok f ls
= { Definition 6, homomorphic equations for f ls }postprocessM ok (aggregateM ls)
Our main result combines the theorems from Sections 4 and 5. It allows, undercertain conditions, to transform specifications in GTA form into efficient parallelalgorithms.
Main Theorem 3 (Filter-embedding Semiring Fusion). Given a set A, afinite monoid (M ,⊙), a monoid homomorphism hom from ([A ], ++) to (M ,⊙),a semiring (S ,⊕,⊗), a semiring homomorphism aggregate from (![A ]",&,×++)to (S ,⊕,⊗), a function ok : M → Bool, and a polymorphic semiring generatorgenerate, the following equation holds:
aggregate ◦ filter (ok ◦ hom) ◦ generate",×++(λx → ![x ]")
= postprocessM ok ◦ generate⊕M ,⊗M(λx → aggregateM ![x ]")
Proof. Combining Theorems 1 and 2.
Filter-embedding Semiring Fusion is not restricted to parallel algorithms. It canbe used to calculate efficient programs from specifications that use arbitrarypolymorphic semiring generators.
It is worth noting that it is possible to remove the finiteness requirementfor monoids and define a lifted semiring of finite mappings of unbounded andunknown size. We require the finiteness only in order to be able to describe thecomplexity of the resulting parallel algorithms more accurately.
If the generator happens to be a list homomorphism, like sublists, then as-sociativity of list concatenation allows the resulting program to be executed inparallel by distributing the input list evenly among available processors. Thecomplexity of a derived program using sublists as generator is linear in the sizeof the input list and quadratic in the size of the range M of the homomorphicpredicate because the semiring multiplication of the lifted semiring SM , which isused to combine all list elements, can be implemented by ranging over M ×M .
6 A More Complex Application
In this section we describe how to use our framework to derive an efficient parallelimplementation for a practical problem in statistics. We further demonstrate howto extend the derived basic program incrementally.
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 15 / 20
Proof of the fusion theorem
Currently: finiteness condition removed, so
I need to deal with a Map data-structure instead of tuples
I prove of associativity and distributivity for the semi-ring aredifficult and not yet done
I the user is may be not aware that to obtain an efficientversion, finitness optimisation is necessary
We will consider another version: M is finite
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 16 / 20
Finite and tuples with dependent types?
DEMO
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 17 / 20
Summary
GTA framework:
I usable for writing and deriving sequential functions
I soon able to derive parallel BSML versions
I not yet completely dependable (admits) ... but hopefully soon
In the future:
I scala extraction for Cloud Computing
I additional examples and library of generators/testers
I . . .
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 18 / 20
Thank you !
Coming soon
I https://traclifo.univ-orleans.fr/SYDPACC
I packages for the OPAM (OCaml) package manager
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 19 / 20
Bibliography
[1] K. Emoto, S. Fischer, and Z. Hu. Generate, Test, andAggregate – A Calculation-based Framework for SystematicParallel Programming with MapReduce. In ESOP, volume7211 of LNCS, pages 254–273. Springer, 2012.doi:10.1007/978-3-642-28869-2 13.
[2] K. Emoto, S. Fischer, and Z. Hu. Filter-embedding semiringfusion for programming with MapReduce. Formal Asp.Comput., 24(4-6):623–645, 2012.doi:10.1007/s00165-012-0241-8.
F. Loulergue SyDPaCC – Lecture 7 October-November 2013 20 / 20