analysis of algorithms cs 477/677 final exam review instructor: george bebis

Analysis of AlgorithmsCS 477/677

Final Exam Review

Instructor: George Bebis

2

The Heap Data Structure

• Def: A heap is a nearly complete binary tree with the following two properties:– Structural property: all levels are full, except

possibly the last one, which is filled from left to right– Order (heap) property: for any node x

Parent(x) ≥ x

Heap

5

7

8

4

2

3

Array Representation of Heaps

• A heap can be stored as an

array A.– Root of tree is A[1]

– Parent of A[i] = A[ i/2 ]

– Left child of A[i] = A[2i]

– Right child of A[i] = A[2i + 1]

– Heapsize[A] ≤ length[A]

• The elements in the subarray

A[(n/2+1) .. n] are leaves

• The root is the max/min

element of the heapA heap is a binary tree that is filled in order

4

Operations on Heaps(useful for sorting and priority queues)

– MAX-HEAPIFY O(lgn)

– BUILD-MAX-HEAP O(n)

– HEAP-SORT O(nlgn)

– MAX-HEAP-INSERT O(lgn)

– HEAP-EXTRACT-MAX O(lgn)

– HEAP-INCREASE-KEY O(lgn)

– HEAP-MAXIMUM O(1)

– You should be able to show how these algorithms

perform on a given heap, and tell their running time

5

Lower Bound for Comparison Sorts

Theorem: Any comparison sort algorithm requires (nlgn) comparisons in the worst case.

Proof: How many leaves does the tree have?

– At least n! (each of the n! permutations if the input appears as

some leaf) n!

– At most 2h leaves

n! ≤ 2h

h ≥ lg(n!) = (nlgn)

h

leaves

6

Linear Time Sorting

• Any comparison sort will take at least nlgn to sort an

array of n numbers

• We can achieve a better running time for sorting if we

can make certain assumptions on the input data:

– Counting sort: each of the n input elements is an integer in the

range [0, r] and r=O(n)

– Radix sort: the elements in the input are integers represented

as d-digit numbers in some base-k where d=Θ(1) and k =O(n)

– Bucket sort: the numbers in the input are uniformly distributed

over the interval [0, 1)

7

Analysis of Counting Sort

Alg.: COUNTING-SORT(A, B, n, k)1. for i ← 0 to r2. do C[ i ] ← 03. for j ← 1 to n4. do C[A[ j ]] ← C[A[ j ]] + 15. C[i] contains the number of elements equal to i

6. for i ← 1 to r7. do C[ i ] ← C[ i ] + C[i -1]8. C[i] contains the number of elements ≤ i

9. for j ← n downto 110. do B[C[A[ j ]]] ← A[ j ]11. C[A[ j ]] ← C[A[ j ]] - 1

(r)

(n)

(r)

(n)

Overall time: (n + r)

8

RADIX-SORT

Alg.: RADIX-SORT(A, d)for i ← 1 to d

do use a stable sort to sort array A on digit i

• 1 is the lowest order digit, d is the highest-order digit

(d(n+k))

9

Analysis of Bucket Sort

Alg.: BUCKET-SORT(A, n)

for i ← 1 to n

do insert A[i] into list B[nA[i]]

for i ← 0 to n - 1

do sort list B[i] with quicksort sort

concatenate lists B[0], B[1], . . . , B[n -1]

together in order

return the concatenated lists

O(n)

(n)

O(n)

(n)

10

Hash Tables

Direct addressing (advantages/disadvantages)

Hashing

– Use a function h to compute the slot for each key

– Store the element (or a pointer to it) in slot h(k)

Advantages of hashing

– Can reduce storage requirements to (|K|)

– Can still get O(1) search time in the average case

11

Hashing with Chaining

• How is the main idea? • Practical issues?• Analysis of INSERT, DELETE• Analysis of SEARCH

– Worst case – Average case

(both successful and unsuccessful)

(1 )

12

Designing Hash Functions

• The division method

h(k) = k mod m

• The multiplication method

h(k) = m (k A mod 1)

• Universal hashing

– Select a hash function at random,

from a carefully designed class of

functions

Advantage: fast, requires only one operationDisadvantage: certain values of m give are bad (powers of 2)

Disadvantage: Slower than division methodAdvantage: Value of m is not critical: typically 2p

Advantage: provides good results on average, independently of the keys to be stored

13

Open Addressing

• Main idea• Different implementations

– Linear probing– Quadratic probing– Double hashing

• Know how each one of them works and their main advantages/disadvantages– How do you insert/delete?– How do you search?– Analysis of searching

14

Binary Search Tree

• Tree representation:– A linked data structure in which

each node is an object

• Binary search tree property:

– If y is in left subtree of x, then key [y] ≤ key [x]

– If y is in right subtree of x, then key [y] ≥ key [x]

2

3

5

5

7

9

15

Operations on Binary Search Trees

– SEARCH O(h)

– PREDECESSOR O(h)

– SUCCESOR O(h)

– MINIMUM O(h)

– MAXIMUM O(h)

– INSERT O(h)

– DELETE O(h)

– You should be able to show how these algorithms

perform on a given binary search tree, and tell their

running time

16

Red-Black-Trees Properties

• Binary search trees with additional properties:

1. Every node is either red or black

2. The root is black

3. Every leaf (NIL) is black

4. If a node is red, then both its children are black

5. For each node, all paths from the node to

descendant leaves contain the same number of

black nodes

17

Properties of Red-Black-Trees

• Any node with height h has black-height ≥ h/2

• The subtree rooted at any node x contains

at least 2bh(x) - 1 internal nodes

• No path is more than twice as long as any

other path the tree is balanced

– Longest path: h <= 2bh(root)

– Shortest path: bh(root)

18

Upper bound on the height of Red-Black-Trees

Lemma: A red-black tree with n internal nodes has height at most 2lg(n + 1).

Proof:

n

• Add 1 to both sides and then take logs:

n + 1 ≥ 2b ≥ 2h/2

lg(n + 1) ≥ h/2 h ≤ 2 lg(n + 1)

root

l r

height(root) = hbh(root) = b

number n of internal nodes

≥ 2b - 1 ≥ 2h/2 - 1

since b h/2

19

Operations on Red-Black Trees– SEARCH O(h)

– PREDECESSOR O(h)

– SUCCESOR O(h)

– MINIMUM O(h)

– MAXIMUM O(h)

– INSERT O(h)

– DELETE O(h)

• Red-black-trees guarantee that the height of the tree will be O(lgn)

• You should be able to show how these algorithms perform on a given

red-black tree (except for delete), and tell their running time

20

Adj. List - Adj. Matrix Comparison

Comparison Better

Faster to test if (x, y) exists?

Faster to find vertex degree?

Less memory on sparse graphs?

Faster to traverse the graph?

matrices

lists

lists (m+n) vs. n2

lists (m+n) vs. n2

Adjacency list representation is better for most applications

Graph representation: adjacency list, adjacency matrix

21

Minimum Spanning Trees

Given:

• A connected, undirected, weighted graph G = (V, E)

A minimum spanning tree:

1. T connects all vertices

2. w(T) = Σ(u,v)T w(u, v) is minimized

a

b c d

e

g g f

i

4

8 7

8

11

1 2

7

2

4 14

9

106

22

Correctness of MST Algorithms(Prim’s and Kruskal’s)

• Let A be a subset of some MST (i.e., T), (S, V - S) be a cut that respects A, and (u, v) be a light edge crossing (S, V-S). Then (u, v) is safe for A .

Proof:• Let T be an MST that includes A

– edges in A are shaded

• Case1: If T includes (u,v), then

it would be safe for A• Case2: Suppose T does not include

the edge (u, v)• Idea: construct another MST T’

that includes A {(u, v)}

u

v

S

V - S

23

PRIM(V, E, w, r)1. Q ←

2. for each u V

3. do key[u] ← ∞

4. π[u] ← NIL

5. INSERT(Q, u)

6. DECREASE-KEY(Q, r, 0) ► key[r] ← 0

7. while Q

8. do u ← EXTRACT-MIN(Q)

9. for each v Adj[u]

10. do if v Q and w(u, v) < key[v]

11. then π[v] ← u

12. DECREASE-KEY(Q, v, w(u, v))

O(V) if Q is implemented as a min-heap

Executed |V| times

Takes O(lgV)

Min-heap operations:O(VlgV)

Executed O(E) times

Constant

Takes O(lgV)

O(ElgV)

Total time: O(VlgV + ElgV) = O(ElgV)

O(lgV)

24

1. A ← 2. for each vertex v V

3. do MAKE-SET(v)

4. sort E into non-decreasing order by w5. for each (u, v) taken from the sorted list

6. do if FIND-SET(u) FIND-SET(v)

7. then A ← A {(u, v)} 8. UNION(u, v)

9. return ARunning time: O(V+ElgE+ElgV)=O(ElgE) – dependent on

the implementation of the disjoint-set data structure

KRUSKAL(V, E, w)

O(V)

O(ElgE)

O(E)

O(lgV)

25

Shortest Paths Problem

• Variants of shortest paths problem

• Effect of negative weights/cycles

• Notation– d[v]: estimate

– δ(s, v): shortest-path weight

• Properties– Optimal substructure theorem

– Triangle inequality

– Upper-bound property

– Convergence property

– Path relaxation property

26

Relaxation

• Relaxing an edge (u, v) = testing whether we can improve the shortest path to v found so far by going through u

If d[v] > d[u] + w(u, v) we can improve the shortest path to v

update d[v] and [v]

5 92

u v

5 72

u v

RELAX(u, v, w)

5 62

u v

5 62

u v

RELAX(u, v, w)

After relaxation:d[v] d[u] + w(u, v)

27

Single Source Shortest Paths

• Bellman-Ford Algorithm– Allows negative edge weights– TRUE if no negative-weight cycles are reachable from

the source s and FALSE otherwise – Traverse all the edges |V – 1| times, every time

performing a relaxation step of each edge

• Dijkstra’s Algorithm– No negative-weight edges– Repeatedly select a vertex with the minimum

shortest-path estimate d[v] – uses a queue, in which keys are d[v]

28

BELLMAN-FORD(V, E, w, s)

1. INITIALIZE-SINGLE-SOURCE(V, s)

2. for i ← 1 to |V| - 1

3. do for each edge (u, v) E

4. do RELAX(u, v, w)

5. for each edge (u, v) E

6. do if d[v] > d[u] + w(u, v)

7. then return FALSE

8. return TRUE

Running time: O(V+VE+E)=O(VE)

(V)

O(V)

O(E)

O(E)

O(VE)

29

Dijkstra (G, w, s)

1. INITIALIZE-SINGLE-SOURCE(V, s)

2. S ←

3. Q ← V[G]

4. while Q

5. do u ← EXTRACT-MIN(Q)

6. S ← S {u}

7. for each vertex v Adj[u]

8. do RELAX(u, v, w)

9. Update Q (DECREASE_KEY)

Running time: O(VlgV + ElgV) = O(ElgV)

(V)

O(V) build min-heap

Executed O(V) times

O(lgV)

O(E) times (total)

O(lgV)

O(VlgV)

O(ElgV)

30

Correctness

• Bellman-Ford’s Algorithm: Show that d[v]= δ (s, v), for every v, after |V-1| passes.

• Dijkstra’s Algorithm: For each vertex u V, we have

d[u] = δ(s, u) at the time when u is added to S.

31

NP-completeness

• Algorithmic vs Problem Complexity• Class of “P” problems• Tractable/Intractable/Unsolvable problems• NP algorithms and NP problems• P=NP ?• Reductions and their implication• NP-completeness and examples of problems• How do we prove a problem NP-complete?• Satisfiability problem and its variations

analysis of algorithms cs 477/677 final exam review instructor: george bebis

Documents

n input elements

n rcs

j ca j ca j

leaf n

array of n numberswe

onradix sort

onbucket sort

stable sort