lecture 7 generate-test-aggregate in coq - nii.ac.jp · lecture 7 generate-test-aggregate in coq...

Lecture 7Generate-Test-Aggregate in Coq

NII Lectures Series

Frederic Loulergue

Universite d’Orleans – LIFO – PaMDA Team

October-November 2013

F. Loulergue SyDPaCC – Lecture 7 October-November 2013 1 / 20

Outline

1 Introduction

2 GTA in Coq

3 Summary


Generate-Test-Aggregate

Generate-Test-Aggregate K. Emoto, S. Fischer, Z. Hu

User’s perspective:

G Generator to produce all candidates

T Tester for validity check of candidates

A Aggregator to build the final result

Optimization:

I Filter embedding: G T A ⇒ G A

I Semiring Fusion: G A ⇒ D&C

Papers: [1, 2]


GTA: Example

The Knapsack problem

Given:

I a knapsack that could contain items for at most a volume v

I a set of items each caracterised by a value and a volume

Find the most valuable selection of items

Items and knapsack

I pens .: U200, 1dl

I scissors ": U350, 1dl

I phone %: U5000, 2dl

Knapsack: 3dl


GTA Programming: Generator

Choose an adequate generator

A generator produces:

I a multi-set of candidates

I it should generate all necessary candidates

I it could generate invalid candidates


I G generates all the subsets of the set of items

I G {.,",%} =*{}, {.}, {"}, {%}{.,"}, {",%}, {.,%}{.,",%}+


GTA Programming: Tester

Choose an adequate tester

I it removes invalid items according to a predicate

I it should be a homomorphism


I T is based on the predicate:

p ≡ s 7→∑

i∈svolume ≤ 3

I T*{}, {.}, {"}, {%}, {.,"}, {",%}, {.,%}, {.,",%}+= *{}, {.}, {"}, {%}, {.,"}, {",%}, {.,%}, +


GTA Programming: Tester

For better performances

p ≡ s 7→⊕

i∈svolume ′(i) ≤ 3

where

I volume ′(i) = max volume(i) (3 + 1)

I x ⊕ y = max (x + y) (3 + 1)


GTA Programming: Aggregator

Choose an adequate aggregator

It produces a summary of the valid candidates


Examples:

I the best candidate

I the two first best candidates

I the number of candidates

I . . .

A *{}, {.}, {"}, {%}, {.,"}, {",%}, {.,%}, += ({",%}, 5350U)


GTA Programming

I The naive program G T A has complexity O(2nn)

I with GTA framework: O(n)

Two transformations:

I Filter embedding: G T A ⇒ G A

I Semiring Fusion: G A ⇒ D&C


A verified version of GTA

The Goal

I to allow the user to write GTA programs

I type class mechanism to automatically derive a efficientdivide-and-conquer program

I prove the fusion theorems in CoqI extraction towards:

I OCaml + BSMLI Scala + Spark

Joint work with

I Kento Emoto (Kyushu University of Technology)

I Julien Tesson (LACL, Universite Paris-Est Creteil)

I Frederic Dabrowski (LIFO, Universite d’Orleans)


Status

It is an ongoing work !

3 to allow the user to write GTA programs

3 type class mechanism to automatically perform semiringfusion and filter embedding fusion

7 prove the fusion theorems in Coq

7 type class mechanism to automatically derive parallel BSMLprogram from a GTA specification


Writing GTA programs and automatic fusion in Coq

DEMO


GTA in Coq

We had to define data-structures: for example Bags.

I multiset in Coq standard library is not abstract, it is a specificimplementation using functions to represent multisets

I the multiset axiomatisation in the CoLoR library is nice butdoes not use type classes

How to axiomatise:

I try to find a minimal axiomatisation so that its realisation willnot be too complicated

I then additional results are derived from this axiomatisation


Automatic parallelisation

Basically we need:

I an automatic way to derive parallel versions ofhomomorphisms

I but in a generalised setting with user-defined equivalencerelations rather than Coq = equality

⇒ ongoing work


Proof of the fusion theorem

Generate, Test, and Aggregate 267

Proof. The following calculation combines previous observations and definitionsto show the claimed identity.

aggregate (filter (ok ◦ hom) ls)= { Partition, individual aggregation }!

m∈Mok m=True

(aggregate (filter ((m ≡) ◦ hom) ls))

= { Definition of f ls , and Definition 7 }postprocessM ok f ls

= { Definition 6, homomorphic equations for f ls }postprocessM ok (aggregateM ls)

Our main result combines the theorems from Sections 4 and 5. It allows, undercertain conditions, to transform specifications in GTA form into efficient parallelalgorithms.

Main Theorem 3 (Filter-embedding Semiring Fusion). Given a set A, afinite monoid (M ,⊙), a monoid homomorphism hom from ([A ], ++) to (M ,⊙),a semiring (S ,⊕,⊗), a semiring homomorphism aggregate from (![A ]",&,×++)to (S ,⊕,⊗), a function ok : M → Bool, and a polymorphic semiring generatorgenerate, the following equation holds:

aggregate ◦ filter (ok ◦ hom) ◦ generate",×++(λx → ![x ]")

= postprocessM ok ◦ generate⊕M ,⊗M(λx → aggregateM ![x ]")

Proof. Combining Theorems 1 and 2.

Filter-embedding Semiring Fusion is not restricted to parallel algorithms. It canbe used to calculate efficient programs from specifications that use arbitrarypolymorphic semiring generators.

It is worth noting that it is possible to remove the finiteness requirementfor monoids and define a lifted semiring of finite mappings of unbounded andunknown size. We require the finiteness only in order to be able to describe thecomplexity of the resulting parallel algorithms more accurately.

If the generator happens to be a list homomorphism, like sublists, then as-sociativity of list concatenation allows the resulting program to be executed inparallel by distributing the input list evenly among available processors. Thecomplexity of a derived program using sublists as generator is linear in the sizeof the input list and quadratic in the size of the range M of the homomorphicpredicate because the semiring multiplication of the lifted semiring SM , which isused to combine all list elements, can be implemented by ranging over M ×M .

6 A More Complex Application

In this section we describe how to use our framework to derive an efficient parallelimplementation for a practical problem in statistics. We further demonstrate howto extend the derived basic program incrementally.



Currently: finiteness condition removed, so

I need to deal with a Map data-structure instead of tuples

I prove of associativity and distributivity for the semi-ring aredifficult and not yet done

I the user is may be not aware that to obtain an efficientversion, finitness optimisation is necessary

We will consider another version: M is finite


Finite and tuples with dependent types?

DEMO


Summary

GTA framework:

I usable for writing and deriving sequential functions

I soon able to derive parallel BSML versions

I not yet completely dependable (admits) ... but hopefully soon

In the future:

I scala extraction for Cloud Computing

I additional examples and library of generators/testers

I . . .


Thank you !

Coming soon

I https://traclifo.univ-orleans.fr/SYDPACC

I packages for the OPAM (OCaml) package manager


https://traclifo.univ-orleans.fr/SYDPACC

Bibliography

[1] K. Emoto, S. Fischer, and Z. Hu. Generate, Test, andAggregate – A Calculation-based Framework for SystematicParallel Programming with MapReduce. In ESOP, volume7211 of LNCS, pages 254–273. Springer, 2012.doi:10.1007/978-3-642-28869-2 13.

[2] K. Emoto, S. Fischer, and Z. Hu. Filter-embedding semiringfusion for programming with MapReduce. Formal Asp.Comput., 24(4-6):623–645, 2012.doi:10.1007/s00165-012-0241-8.


http://dx.doi.org/10.1007/978-3-642-28869-2_13

http://dx.doi.org/10.1007/s00165-012-0241-8

lecture 7 generate-test-aggregate in coq - nii.ac.jp · lecture 7 generate-test-aggregate in coq...

Documents