gheorghe m. Ștefan . “the semiconductor industry threw the equivalent of a hail mary pass when it...

33
ETTI Colloquia, Nov. 6, 2014 * Can Parallel Computing Be Liberated From Ad Hoc Solutions? A Recursive MapReduce Approach and Its Implementation Gheorghe M. Ștefan http://arh.pub.ro/gstefan/

Upload: edgar-laurence-spencer

Post on 26-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Gheorghe M. tefan http://arh.pub.ro/gstefan/
  • Slide 2
  • The semiconductor industry threw the equivalent of a Hail Mary pass when it switched from making microprocessors run faster to putting more of them on a chip doing so without any clear notion of how such devices would in general be programmed David Patterson IEEE Spectrum, July 2010 November 6, 2014ETTI Colloquia2
  • Slide 3
  • Outline: Little history How parallel computing could be restarted Kleenes mathematical model Recursive MapReduce abstract model Backus architectural description Programming the MapReduce hierarchy Generic one-chip parallel structure Concluding remarks November 6, 2014ETTI Colloquia3
  • Slide 4
  • History: mono-core computation 1936 mathematical computational models : Turing, Post, Church, Kleene 1944-45 abstract machine models : Harvard abstract model von Neumann abstract model 1953 manufacturing in quantity : IBM 701 1964 computer architecture : the concept allows independent evolution for software and hardware Consequently, now we have few stable and successful sequential architectures : x86, ARM, PowerPC, November 6, 2014ETTI Colloquia4
  • Slide 5
  • History: parallel computation 1962 manufacturing in quantity : the first MIMD engine is introduced on the computer market by Burroughs 1965 architectural issues : Dijkstra formulates the first concerns about parallel programming issues 1974-76 abstract machine models : the first abstract models (PRAM models) start to come in after almost two decades of non-systematic experiments ? computation model : it is there waiting for us Consequently the semiconductor industry threw the equivalent of a Hail Mary pass when it switched from making microprocessors run faster to putting more of them on a chip November 6, 2014ETTI Colloquia5
  • Slide 6
  • About PRAM-like models Parallel Random Access Machine PRAM - (bit vector models in [Pratt et al. 1974] and PRAM models in [Fortune and Wyllie 1978]) is considered a natural generalization of the Random Access Machine model. Parallel Memory Hierarchy [Alpern et al. 1993] is also a generalization, but this time of the Memory Hierarchy model applied to the RAM model. Bulk Synchronous Parallel model divides the program in super-steps [Valiant 1990]. Latency-overhead-gap-Processors LogP - is designed to model the communication cost [Culler et al. 1991]. November 6, 2014ETTI Colloquia6
  • Slide 7
  • How parallel computing could be consistently restarted 1. Use Kleenes partial recursive functions model as the foundational mathematical framework 2. Define an abstract machine model using meaningful forms derived from Kleenes model 3. Interface the abstract machine with an architectural (low level) description based on Backus FP Systems 4. Provide the simplest generic parallel structure able to run the functions requested by the architecture 5. Evaluate, using the computational motifs highlighted by Berkeleys View, the options made in the previous three steps and improve them when needed November 6, 2014ETTI Colloquia7
  • Slide 8
  • Kleenes mathematical model for parallel computation From the three rules: composition primitive recursion minimalization only the first one, the composition, is independent. f(x) = g(h 1 (x), h m (x)) November 6, 2014ETTI Colloquia8
  • Slide 9
  • Integral parallel abstract model: data-parallel November 6, 2014ETTI Colloquia9
  • Slide 10
  • Integral parallel abstract model: reduction-parallel November 6, 2014ETTI Colloquia10
  • Slide 11
  • Integral parallel abstract model: speculative-parallel November 6, 2014ETTI Colloquia11
  • Slide 12
  • Integral parallel abstract model: time-parallel November 6, 2014ETTI Colloquia12
  • Slide 13
  • Integral parallel abstract model: thread-parallel November 6, 2014ETTI Colloquia13
  • Slide 14
  • Putting all forms together: integral parallel abstract model The MapReduce abstract model: Map means data, speculative and thread parallelism Reduce means reduce parallelism November 6, 2014ETTI Colloquia14
  • Slide 15
  • From one-chip to cloud: MapReduce recursive abstract model November 6, 2014ETTI Colloquia15
  • Slide 16
  • Backus architectural description John Backus: Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs, Communications of the ACM, August, 1978. Functional Programming Systems primitive functions functional forms definitions November 6, 2014ETTI Colloquia16
  • Slide 17
  • Functional forms Apply to all: f : x (x = ) Construction: [f 1, , f p ] : x Threaded construction: [f 1, , f p ] : x (x = ) Insert: /f : x ((x = ) & (p 2)) f : > Composition: (f q f q-1 f 1 ) : x f q : (f q-1 : (f q-2 : ( :(f 1 : x)))) November 6, 2014ETTI Colloquia17
  • Slide 18
  • Kleene Backus synergy November 6, 2014ETTI Colloquia18
  • Slide 19
  • MapReduce hierarchy programming Any level in the hierarchy uses the same programming forms: Map & Reduce (define (Map funcs args) (cond ((and (atom? funcs) (atom? args)) ; one funcs one args (funcs args) ) ((and (atom? funcs) (list? args)) ; one funcs many args (if (null? args)() (cons(funcs(car args)) (Map funcs (cdr args))) )) ((and (list? funcs) (atom? args)) ; many funcs one args (if (null? funcs) () (cons((car funcs) args) (Map (cdr funcs) args))) )) ((and (list? funcs) (list? args)) ; many funcs many args (if (or (null? funcs)(null? args))() (cons((car funcs) (car args))(Map (cdr funcs) (cdr args))) )) November 6, 2014ETTI Colloquia19
  • Slide 20
  • MapReduce hierarchy programming (define(Reduce binaryOp argList) (cond((atom? argList)argList) (#t(binaryOp(car argList) (Reduce binaryOp (cdr argList)))) )) The 0-level functions in the hierarchy are: Add, Sub, Mult, And, Or, Xor, Inc, Dec, Not, Max, Min, November 6, 2014ETTI Colloquia20
  • Slide 21
  • Generic one-chip parallel structure November 6, 2014ETTI Colloquia21
  • Slide 22
  • The ConnexArray TM : BA1024 Last version, March 2008 65 nm 99 mm 2 (entire chip) 1024 16-bit cells 1 KB/cell 400 MHz 400 GOPS > 120 GOPS/W > 6.25 GOPS/mm 2 The first version, 1111 mm 2, in 90 nm November 6, 2014ETTI Colloquia22
  • Slide 23
  • Updated version in 28 nm 2048 32-bit cells with 8KB/cell 1MHz < 15Watt, at T