a.broumandnia, [email protected] 1 3 parallel algorithm complexity review algorithm complexity...

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

1

3 Parallel Algorithm ComplexityReview algorithm complexity and various complexity classes:

• Introduce the notions of time and time/cost optimality• Derive tools for analysis, comparison, and fine-tuning

Topics in This Chapter

3.1 Asymptotic Complexity

3.2 Algorithms Optimality and Efficiency

3.3 Complexity Classes

3.4 Parallelizable Tasks and the NC Class

3.5 Parallel Programming Paradigms

3.6 Solving Recurrences

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.1 Asymptotic Complexity• Algorithms can be analyzed in two ways: precise and

approximate.• A useful form of approximate analysis, which we will use

extensively throughout this book, is asymptotic analysis• Suppose that a parallel sorting algorithm requires (log2 n) ²

compare–exchange steps, another one (log2 n)²/ 2 + 2 log 2 n steps, and a third one 500 log2 n steps (assume these are the results of exact analyses).

• Thus, for such large values of n, an algorithm with running time c log n is asymptotically better than an algorithm with running time c' log² n for any values of the constants c and c'.

2

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.1 Asymptotic Complexity• To make our discussions of asymptotic analysis more precise,

we introduce some notations that are commonly used in the study of computational complexity.

• Given two functions ƒ (n) and g(n) of an independent variable n (usually, the problem size), we define the relationships “O” (big-oh), “Ω” (big-omega), and “Θ” (theta) between them as follows:

• Thus, 3

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.1 Asymptotic Complexity• For example, ƒ(n) = O(g(n)) means that ƒ(n) grows no faster

than g(n), so that for n sufficiently large (i.e., n >) and a suitably chosen constant c, ƒ(n) always remains below c g(n). This relationship is represented graphically in the left panel of Fig. 3.1.

• Similarly, ƒ(n) = Ω(g(n)) means that ƒ(n) grows at least as fast as g(n), so that eventually ƒ(n) will exceed c g(n) for all n beyond (middle panel of Fig. 3.1).

• Finally, ƒ(n) = Θ(g(n)) means that ƒ(n) and g(n) grow at about the same rate so that the value of f (n) is always bounded by c g(n) and c’ g(n) for n > (right panel of Fig. 3.1).

4

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.1 Asymptotic Complexity

5Fig. 3.1 Graphical representation of the notions of asymptotic complexity.

n

c g(n)

g(n)

f(n)

n n

c g(n)

c' g(n)

f(n)

n n

g(n)

c g(n)

f(n)

n 0 0 0

f(n) = O(g(n)) f(n) = (g(n)) f(n) = (g(n)) f(n) = O(g(n)) f(n) = W(g(n)) f(n) = Q(g(n))

3n log n = O(n2) ½ n log2 n = W(n) 3n2 + 200n = Q(n2)

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.1 Asymptotic Complexity• We define ordering relationships between the growth rates of functions. In

other words, in the statement“The rate of growth of ƒ(n) is ___ that of g(n).”

• we can fill in the blank with the relational symbol (<, ≤ , =, ≥ , >) to the left of the defined relations shown below:

• Of the above, the big-oh notation will be used most extensively, because it can express an upper bound on an algorithm’s time or computational complexity and thus helps us establish whether or not a given algorithm is feasible for a given architecture.

6

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.1 Asymptotic Complexity• At a very coarse level, we sometimes talk about algorithms

with sublinear, linear, and superlinear running times or complexities. These coarse categories can be further subdivided or refined as illustrated by the following examples:

7

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.1 Asymptotic Complexity• Table 3.1 helps you get an idea of the growth rates for two

sublinear and two superlinear functions, as the problem size n increases.

8

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.1 Asymptotic Complexity• Table 3.2 shows the growth rates of a few functions, including

constant multiplicative factors, to give you a feel for the contribution of such constants.

9

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.1 Asymptotic Complexity• Table 3.3 presents the same information using larger time

units and rounded figures which make the differences easier to grasp (assuming that the original numbers of Table 3.2 showed the running time of an algorithm in seconds).

10

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.2 Algorithm Optimality and Efficiency

• What is the running timeƒ(n) of the fastest algorithm for solving this problem?

• If we are interested in asymptotic comparison, then because an algorithm with running time g(n) is already known, ƒ(n) = O(g(n)); i.e., for large n, the running time of the best algorithm is upper bounded by cg(n) for some constant c. If, subsequently, someone develops an asymptotically faster algorithm for solving the same problem, say in time h(n), we conclude that ƒ(n) = O(h(n)). The process of constructing and improving algorithms thus contributes to the establishment of tighter upper bounds for the complexity of the best algorithm (Fig. 3.2).

• If and when the known upper bound and lower bound for a given problem converge, we say that we have an optimal algorithm 11

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.2 Algorithm Optimality and Efficiency

12

Fig. 3.2 Upper and lower bounds may tighten over time.

Upper bounds: Deriving/analyzing algorithms and proving them correct

Lower bounds: Theoretical arguments based on bisection width, and the like

Typical complexity classes

Improving upper bounds Shifting lower bounds

log n log n 2 n / log n n n log log n n log n n 2

1988 Zak’s thm. (log n)

1994 Ying’s thm. (log n) 2

1996 Dana’s alg.

O(n)

1991 Chin’s alg.

O(n log log n)

1988 Bert’s alg. O(n log n)

1982 Anne’s alg. O(n ) 2

Optimal algorithm?

Sublinear Linear

Superlinear

http://www.toongalaxy.com/preview/_truck_clip.jpg

http://www.psych.purdue.edu/~xiaoli/xiaoli/cartoon/turtle.gif

http://www.psych.purdue.edu/~xiaoli/xiaoli/cartoon/turtle.gif

http://www.ars.usda.gov/is/kids/farm/story4/ToonMan2.gif

http://www.ars.usda.gov/is/kids/farm/story4/ToonMan2.gif

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.2 Algorithm Optimality and Efficiency• Now, because of the additional cost factor introduced,

different notions of optimality can be entertained. Let T(n, p) be our algorithm’s running time when solving a problem of size n on a machine with p processors. The algorithm is said to be

13

Time optimality (optimal algorithm, for short)

T(n, p) = g(n, p), where g(n, p) is an established lower bound

Cost-time optimality (cost-optimal algorithm, for short)

pT(n, p) = T(n, 1); i.e., redundancy = utilization = 1

Cost-time efficiency (efficient algorithm, for short)

pT(n, p) = Q(T(n, 1)); i.e., redundancy = utilization = Q(1)

Problem size Number of processors

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.2 Algorithm Optimality and Efficiency• A speed-up of 5 in terms of step counts may correspond to a

speed-up of 2 or 3, say, when real time is considered (Fig. 3.3).

14

Fig. 3.2 Five times fewer steps does not necessarily mean five times faster.

Machine or algorithm A

Machine or algorithm B

4 steps

Solution

20 steps

For example, one algorithm may need 20 GFLOP, another 4 GFLOP (but float division is a factor of 10 slower than float multiplication

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om


• Several methods are used extensively in devising efficient parallel algorithms for solving problems of interest.

• Divide and conquer: Decompose problem of size n into smaller problems; solve subproblems independently; combine subproblem results into final answer

T(n) = Td(n) + Ts + Tc(n) Decompose Solve in parallel Combine

For example, in the case of sorting a list of n keys, we can decompose the list into two halves, sort the two sublists independently in parallel, and merge the two sorted sublists into a single sorted list. If we can perform each of the decomposition and merging operations in log 2 n steps on some parallel computer, and if the solution of the two sorting problems of size n/2 can be completely overlapped in time, then the running time of the parallel algorithm is characterized by the recurrence T(n ) = T (n/2) + 2 log 2 n.

15

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om


• Randomization: When it is impossible or difficult to decompose a large problem into subproblems with equal solution times, one might use random decisions that lead to good results with very high probability.

Again, sorting provides a good example. Suppose that each of p processors begins with a sublist of size n/p. First each processor selects a random sample of size k from its local sublist. The kp samples from all processors form a smaller list that can be readily sorted, perhaps on a single processor or using a parallel algorithm that is known to be efficient for small lists. If this sorted list of samples is now divided into p equal segments and the beginning values in the p segments used as thresholds to divide the original list of n keys into p sublists, the lengths of these latter sublists will be approximately balanced with high probability. The n-input sorting problem has thus been transformed into an initial random sampling, a small sorting problem for the kp samples, broadcasting of the p threshold values to all processors, permutation of the elements among the processors according to the p threshold values, and p independent sorting problems of approximate size n/p. The average case running time of such an algorithm can be quite good. However, there is no useful worst-case guarantee on its running time.

16

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om


• Approximation: Iterative numerical methods often use approximation to arrive at the solution(s). For example, to solve a system of n linear equations, one can begin with some rough estimates for the answers and then successively refine these estimates using parallel numerical calculations. Jacobi relaxation, to be covered in Section 11.4, is an example of such approximation methods. Under proper conditions, the iterations converge to the correct solutions; the larger the number of iterations, the more accurate the solutions.

17

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.6 Solving Recurrences• The simplest method for solving recurrences is through

unrolling. The method is best illustrated through a sequence of examples. In all examples below, ƒ(1) = 0 is assumed.

18

f(n) = f(n – 1) + n {rewrite f(n – 1) as f((n – 1) – 1) + n – 1}= f(n – 2) + n – 1 + n= f(n – 3) + n – 2 + n – 1 + n . . .= f(1) + 2 + 3 + . . . + n – 1 + n = n(n + 1)/2 – 1 = Q(n2)

f(n) = f(n/2) + 1 {rewrite f(n/2) as f((n/2)/2 + 1} = f(n/4) + 1 + 1= f(n/8) + 1 + 1 + 1 . . .= f(n/n) + 1 + 1 + 1 + . . . + 1

-------- log2 n times -------- = log2 n = Q(log n)

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om


19

f(n) = f(n/2) + n = f(n/4) + n/2 + n= f(n/8) + n/4 + n/2 + n . . .

= f(n/n) + 2 + 4 + . . . + n/4 + n/2 + n = 2n – 2 = Q(n)

f(n) = 2f(n/2) + 1 = 4f(n/4) + 2 + 1= 8f(n/8) + 4 + 2 + 1 . . .

= n f(n/n) + n/2 + . . . + 4 + 2 + 1 = n – 1 = Q(n)

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om


20

f(n) = f(n/2) + log2 n = f(n/4) + log2(n/2) + log2 n= f(n/8) + log2(n/4) + log2(n/2) + log2 n . . .

= f(n/n) + log2 2 + log2 4 + . . . + log2(n/2) + log2 n = 1 + 2 + 3 + . . . + log2 n= log2 n (log2 n + 1)/2 = Q(log2

n)

f(n) = 2f(n/2) + n = 4f(n/4) + n + n= 8f(n/8) + n + n + n . . .

= n f(n/n) + n + n + n + . . . + n --------- log2 n times ---------

= n log2n = Q(n log n)

Alternate solution method:

f(n)/n = f(n/2)/(n/2) + 1Let f(n)/n = g(n)g(n) = g(n/2) + 1 = log2 n

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

3.6 Solving Recurrences• Another method that we will find useful, particularly for recurrences that

cannot be easily unrolled, is guessing the answer to the recurrence and then verifying the guess by substitution. In fact, the method of substitution can be used to determine the constant multiplicative factors and lower-order terms once the asymptotic complexity has been established by other methods.

• As an example, let us say that we know that the solution to Example 1 above is ƒ (n ) =Θ( n ²). We write ƒ (n) = an² + g (n), where g(n) = o( n ²) represents the lower-order terms. Substituting in the recurrence equation, we get

• This equation simplifies to

• Choose a = 1/2 in order to make g(n) = o(n ²) possible. Then, the solution to the recurrence g(n) = g(n – 1) + 1/2 is g(n) = n/2 – 1, assuming g (1) = 0. The solution to the original recurrence then becomes ƒ (n ) = n ²/2 + n /2 – 1, which matches our earlier result based on unrolling.

21

A.Br

oum

andn

ia, B

roum

andn

ia@

gm

ail.c

om

Master Theorem for Recurrences

22

Theorem 3.1:

Given f(n) = a f(n/b) + h(n); a, b constant, h arbitrary function

the asymptotic solution to the recurrence is (c = logb a)

f(n) = Q(n c) if h(n) = O(n

c – e) for some e > 0

f(n) = Q(n c log n) if h(n) = Q(n

c)

f(n) = Q(h(n)) if h(n) = W(n c + e) for some e > 0

Example: f(n) = 2 f(n/2) + 1a = b = 2; c = logb a = 1h(n) = 1 = O( n

1 – e)f(n) = Q(n

c) = Q(n)

a.broumandnia, [email protected] 1 3 parallel algorithm complexity review algorithm complexity...

Documents