cs4460 ad d al ithcs4460 advanced algorithmsnisansa/classes/01_university_of_moratu… · cs4460 ad...
TRANSCRIPT
CS4460 Ad d Al ithCS4460 Advanced Algorithms Batch 08, L4S2Batch 08, L4S2
Lecture 11Lecture 11Multithreaded Algorithms – Part 1
N H N D de SilvaN. H. N. D. de SilvaDept. of Computer Science & Eng
University of Moratuwa
AnnouncementsAnnouncements
• Last topic discussed is MultithreadingLast topic discussed is Multithreading– Will have two sessions
Nov 2012 N. H. N. D. de Silva 2
Outline: Multithreaded AlgorithmsOutline: Multithreaded Algorithms
• Part 1: todayPart 1: today– Introduction, Dynamic Multithreading
Model for Multithreaded Execution– Model for Multithreaded Execution– Performance Measures
Analyzing Multithreaded Algorithms– Analyzing Multithreaded Algorithms• Part 2: next session
• Parallel Loops• Race Conditions• Examples• Examples
3Nov 2012 N. H. N. D. de Silva
IntroductionIntroduction
• Aim: extend models to parallel algorithmsAim: extend models to parallel algorithms– Can run on multiprocessors that permit
multiple instructions to execute concurrentlymultiple instructions to execute concurrently• E.g.: chip multiprocessors (multicore), clusters,
supercomputers
• Multiprocessor models– Shared memory vs. distributed memoryS a ed e o y s d st buted e o y– Shared memory seem to be the trend– We use shared memory model for this studyWe use shared memory model for this study
4Nov 2012 N. H. N. D. de Silva
IntroductionIntroduction
• Threading modelsThreading models– Static threading
• Software abstraction of “virtual processors” with• Software abstraction of virtual processors with shared common memory
• Programmer can manage (create/destroy) threads with OS support but difficult, error prone, etc
• Work-load balancing, scheduling by programmerH l d t l tf th t id• Has lead to concurrency platforms that provide software layers to coordinate, schedule, manage
5Nov 2012 N. H. N. D. de Silva
IntroductionIntroduction
• Threading modelsThreading models– Dynamic multithreading
• Allows programmers to specify parallelism without• Allows programmers to specify parallelism without worrying about load-balancing, managing etc
• Concurrency platform has a scheduler• Simplifies programming• Supports nested parallelism and parallel loops• We use this model
6Nov 2012 N. H. N. D. de Silva
IntroductionIntroduction
• Nested ParallelismNested Parallelism– Allows a subroutine (procedure) to be
spawnedspawned• This allows the caller to proceed while the called
subroutine is computing
• Parallel Loop– Like a normal (sequential) loop, but the e a o a (seque t a ) oop, but t e
iterations of the loop can execute concurrently
7Nov 2012 N. H. N. D. de Silva
Dynamic MultithreadingDynamic Multithreading
• Advantages of the modelAdvantages of the model– Programmer specifies only the logical
parallelism within a computationparallelism within a computation• Threads within the concurrency platform schedule
and load balance within themselves– Simply extends the serial programming model
• Only need 3 keywords: parallel, spawn, sync• When these are deleted, results in serial code for
the same problem ( serialization of algorithm)
Nov 2012 N. H. N. D. de Silva 8
Dynamic MultithreadingDynamic Multithreading
• Advantages of the modelAdvantages of the model– Clean way to quantify parallelism
• Based on concepts of “work” and “span”• Based on concepts of work and span– Many multithreaded algorithms involving
nested parallelism follow naturally from thenested parallelism follow naturally from the divide-and-conquer paradigm
• Can analyze by solving recurrences also– Close to how parallel computing is evolving
Nov 2012 N. H. N. D. de Silva 9
Dynamic Multithreading: BasicsDynamic Multithreading: Basics
• Example: Fibonacci sequenceExample: Fibonacci sequence• Recall: F0 = 0 and F1 = 1
F F F f i 2Fi = Fi-1 + Fi-2 for i ≥ 2• Simple recursive serial algorithm
FIB(n)if n ≤ 1 return nelse x ← FIB (n-1)
y ← FIB (n-2)y ( )return x + y
Nov 2012 N. H. N. D. de Silva 10
Tree of Recursive ProceduresTree of Recursive Procedures
Nov 2012 N. H. N. D. de Silva 11
Dynamic Multithreading: BasicsDynamic Multithreading: Basics
• Example: Fibonacci sequenceExample: Fibonacci sequence– What about the running time of FIB?
Running time T(n) = T(n 1) + T(n 2) + Θ(1)– Running time T(n) = T(n-1) + T(n-2) + Θ(1)– Solution to the recurrence is T(n) = Θ(Fn)
This can be bound as T(n) Θ(ϕn)– This can be bound as T(n) = Θ(ϕn)• Where ϕ = (1 + √5)/2 is the golden ratio• T(n) grows exponentially in n• T(n) grows exponentially in n
Let us consider a multithreaded algorithm– Let us consider a multithreaded algorithm
Nov 2012 N. H. N. D. de Silva 12
Dynamic Multithreading: BasicsDynamic Multithreading: Basics
• Example: Fibonacci sequenceExample: Fibonacci sequence
FIB i d i ltith di• FIB using dynamic multithreadingP-FIB(n)
if n ≤ 1 return nelse x ← spawn P-FIB (n-1)y ← P-FIB (n-2)sync
Nested parallelism occurs when keyword spawn
return x + yNov 2012 N. H. N. D. de Silva 13
yprecedes a procedure call
Dynamic Multithreading: BasicsDynamic Multithreading: Basics
• Nested parallelismNested parallelism– The parent procedure instance may continue
instead of waiting for the child to continueinstead of waiting for the child to continue• E.g., while the child is computing P-FIB(n-1), the
parent may go on to compute P-FIB(n-2)– The two calls themselves and their children
create nested parallelism (they are recursive)– But, spawn does not say parent must
execute concurrently with spawned children• Shows logical parallelism; scheduler will decide
Nov 2012 N. H. N. D. de Silva 14
Dynamic Multithreading: BasicsDynamic Multithreading: Basics
• Nested parallelismNested parallelism– A procedure cannot safely use the values
returned by a child until it executes a syncreturned by a child until it executes a sync• sync says parent must wait there for all spawned
children to complete before proceeding
– Every procedure executes a sync implicitly before it returns
• Ensures all children terminate before it does
Nov 2012 N. H. N. D. de Silva 15
Model for Multithreaded ExecutionModel for Multithreaded Execution
• We can think of a multithreadedWe can think of a multithreaded computation as a computation dag
A vertex: A chain of instructions with no– A vertex: A chain of instructions with no parallel control (a strand)
– An edge (u v): strand u must execute before vAn edge (u,v): strand u must execute before v– If a strand has 2 successors?
• One of them must have been spawnedOne of them must have been spawned– A strand with multiple predecessors?
• Predecessors joined because of a syncPredecessors joined because of a sync
Nov 2012 N. H. N. D. de Silva 16
Model for Multithreaded ExecutionIf a computation dag has a directed path from strand u tofrom strand u to strand v, then u and v are (logically) in series; else they areseries; else they are (logically) in parallel.
Nov 2012 N. H. N. D. de Silva 17
Model for Multithreaded ExecutionModel for Multithreaded Execution
• Multithreaded algorithms are executed onMultithreaded algorithms are executed on an ideal parallel computer with
a set of processors of equal power– a set of processors of equal power– sequentially consistent shared memory
• Due to scheduling instruction ordering may differ• Due to scheduling, instruction ordering may differ from one program run to next, but can assume executed in some order consistent with the
i dcomputation dag
• We ignore cost of scheduling
Nov 2012 N. H. N. D. de Silva 18
Performance MeasuresPerformance Measures
• Two measures: work and spanTwo measures: work and span• Work
T t l ti t t– Total time to execute on one processor– i.e., sum of times taken by each strand = the
number of vertices in the dagnumber of vertices in the dag• Span
– The longest time to execute strands along any path in the dag
If h t d t k it ti # ti• If each strand takes unit time, span = # vertices along the critical path
Nov 2012 N. H. N. D. de Silva 19
Performance MeasuresPerformance Measures
• E g in Fig 27 2E.g., in Fig 27.2– Total 17 vertices, 8 vertices in the critical path
work = 17 time units span = 8 time units– work = 17 time units, span = 8 time units
• Actual running time also depends on – number of processors available – how the scheduler allocates strands to proc’s
Nov 2012 N. H. N. D. de Silva 20
Performance MeasuresPerformance Measures
• NotationsNotations– Tp = running time on P processors
T = the work = running time on one proc– T1 = the work = running time on one proc.– T∞ = the span = running time if each strand
can run on its processor (have infinite proc’s)can run on its processor (have infinite proc s)
Work span: lower bounds on running time T– Work, span: lower bounds on running time Tpof a multithreaded computation on P proc’s
Nov 2012 N. H. N. D. de Silva 21
Performance MeasuresPerformance Measures
• Work lawWork law– In a step, ideal computer with P proc’s can do
at most P units of work In TP time can do atat most P units of work. In TP time, can do at most P×Tp work. So, Tp ≥ T1/P
• Span lawSpan law– A P-processor ideal parallel computer cannot
run any faster than one with unlimited proc’srun any faster than one with unlimited proc s. So Tp ≥ T∞
Nov 2012 N. H. N. D. de Silva 22
Performance MeasuresPerformance Measures
• Speedup = T1 / TPSpeedup = T1 / TP– Of a computation on P processors
How many times faster the computation is on– How many times faster the computation is on P processors than on 1 processor
– At most P i e T / T ≤ P– At most P, i.e., T1 / TP ≤ P– When T1 / TP = Θ(P) linear speedup
When T / T = P perfect linear speedup– When T1 / TP = P perfect linear speedup
Nov 2012 N. H. N. D. de Silva 23
Performance MeasuresPerformance Measures
• Parallelism = T1 / TParallelism = T1 / T∞– As a ratio: denotes the average amount of
work that can be performed in parallel forwork that can be performed in parallel for each step along the critical path
– As an upper bound: gives the maximumAs an upper bound: gives the maximum possible speedup on any # of processors
– Also: Provides a limit on the possibility of p yattaining perfect linear speedup
• If # of proc’s exceeds parallelism, cannot possibly achieve perfect linear speedup
Nov 2012 N. H. N. D. de Silva 24
Performance MeasuresPerformance Measures
• (Parallel) Slackness = (T1 / T ) / P(Parallel) Slackness = (T1 / T∞) / P– = T1 / (PT∞)
The factor by which the parallelism of the– The factor by which the parallelism of the computation exceeds the number of processors in the machineprocessors in the machine
• E.g., if slackness < 1, cannot hope to achieve perfect linear speedup
– As the slackness decreases from 1 toward 0, speedup of the computation diverges further
d f th f f t li dand further from perfect linear speedup
Nov 2012 N. H. N. D. de Silva 25
SchedulingScheduling
• Rely on concurrency platform’s schedulerRely on concurrency platform s scheduler• A greedy scheduling is good enough
Nov 2012 N. H. N. D. de Silva 26
SchedulingScheduling
Nov 2012 N. H. N. D. de Silva 27
SchedulingScheduling
Nov 2012 N. H. N. D. de Silva 28
Analyzing Multithreaded AlgorithmsAnalyzing Multithreaded Algorithms
Nov 2012 N. H. N. D. de Silva 29
Analysis of P-FIB(n)Analysis of P FIB(n)
• The span of P-FIB(n)The span of P FIB(n)T∞ = max (T∞(n-1), T∞(n-2)) + Θ(1)
T ( 1) Θ(1) Θ( )= T∞(n-1) + Θ(1) = Θ(n)
• The parallelism of P-FIB(n)T1(n)/ T (n) = Θ(ϕn /n)T1(n)/ T∞(n) Θ(ϕ /n)
– Grows dramatically as n gets largeeven on largest parallel computers modest– even on largest parallel computers, modest
values of n give near perfect linear speedupNov 2012 N. H. N. D. de Silva 30
Next Session – Part 2Next Session Part 2
• TopicsTopics– Parallel loops
Race conditions– Race conditions– Examples: Multithreaded matrix-vector and
matrix-matrix multiplicationmatrix-matrix multiplication• Before next session, read if possible
• pp 785-796 (12 pages) from CLSR 3/e• pp. 785-796 (12 pages) from CLSR 3/e • Document “A Minicourse on Dynamic Multithreaded
Algorithms” (focus on relevant topics)
Nov 2012 N. H. N. D. de Silva 31
ReferencesReferences• The lecture slides are based on the slides prepared by p p y
Prof. Sanath Jayasena for this class in previous years.
• Mainly: CLRS book, 3e– Part VII: Selected Topics– Chapter 27: Multithreaded AlgorithmsChapter 27: Multithreaded Algorithms
• Other resources (on LMS)– Document “A Minicourse on Dynamic Multithreaded Algorithms”– Slide sets: Cilk and Design and Analysis of Dynamic
Multithreaded Algorithms and Strassen’s Matrix Multiplication Algorithm
32Nov 2012 N. H. N. D. de Silva
ConclusionConclusion
• We discussed multithreaded algorithmsWe discussed multithreaded algorithms– Dynamic Multithreading– Model for Multithreaded ExecutionModel for Multithreaded Execution– Performance Measures– Analyzing Multithreaded Algorithms
• Next session (last session) – Part 2( )– Parallel Loops, Race Conditions, Examples
33Nov 2012 N. H. N. D. de Silva