15-211 fundamental data structures and algorithms margaret reid-miller 1 march 2005 more lzw /...

66
15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

Upload: lewis-harris

Post on 19-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

15-211Fundamental Data Structures and Algorithms

Margaret Reid-Miller

1 March 2005

More LZW / Midterm Review

Page 2: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

2

Midterm

Thursday, 12:00 noon, 3 March 2005

WeH 7500

Worth a total of 125 points

Closed book, but you may have one page of notes.

If you have a question, raise your hand and stay in your seat

Page 3: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

Last Time…

Page 4: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

4

Last Time:Lempel & Ziv

Page 5: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

5

Reminder: Compressing

where each prefix is in the dictionary.

We stop when we fall out of the dictionary:

A b

We scan a sequence of symbols

A = a1 a2 a3 …. ak

Page 6: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

6

Reminder: Compressing

Then send the code for

A = a1 a2 a3 …. ak

This is the classical algorithm.

Page 7: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

7

…s…sssb…

LZW: Compress bad case

Input:^

Dictionary:

Output: ….

s

- word(possibly empty)

Page 8: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

8

…s…sssb…

LZW: Compress bad case (time t)

Input:^

Dictionary:

Output: ….

s

Page 9: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

9

.…

LZW: Uncompress bad case (time t)

Input:^

Dictionary:

Output: ……

s

Page 10: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

10

…s…sssb…

LZW: Compress bad case (step t+1)

Input:^

Dictionary:

Output: …….

s

s

Page 11: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

11

.…

LZW: Uncompress bad case (time t+1)

Input:^

Dictionary:

Output: ……s

s

Page 12: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

12

…s…sssb…

LZW: Compress bad case (time t+2)

Input:^

Dictionary:

Output: …….

s

s

+1b

Page 13: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

13

….

LZW: Uncompress bad case (time t+2)

Input:

Dictionary:

Output: ……s

s

What is ??^

Page 14: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

14

.…

LZW: Uncompress bad case (time t+2)

Input:

Dictionary:

Output: ……sss

s

What is ??

It codes for ss!

s

^

Page 15: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

15

Example

0 0 1 5 3 6 7 9 5 aabbbaabbaaabaababb s s s s

Input Output add to D

0 a

0 + a 3:aa

1 + b 4:ab

5 - bb 5:bb

3 + aa 6:bba

6 + bba 7:aab

7 + aab 8:bbaa

9 - aaba 9:aaba

5 + bb 10:aabab

s = a = ab

Page 16: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

16

LZW Correctness

So we know that when this case occurs, decompression works.

Is this the only bad case? How do we know that decompression always works? (Note that compression is not an issue here).

Formally have two maps

comp : texts int seq.

decomp : int seq. texts

We need for all texts T:

decomp(comp(T)) = T

Page 17: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

17

Getting Personal

Think about

Ann: compresses T, sends int sequence

Bob: decompresses int sequence,

tries to reconstruct T

Question: Can Bob always succeed?

Assuming of course the int sequence is valid

(the map decomp() is not total).

Page 18: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

18

How?

How do we prove that Bob can always succeed?

Think of Ann and Bob working in parallel.

Time 0: both initialize their dictionaries.

Time t: Ann determines next code number c,

sends it to Bob.

Bob must be able to convert c back into the corresponding word.

Page 19: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

19

Induction

We can use induction on t.

The problem is:

What property should we establish by induction?

It has to be a claim about Bob’s dictionary.

How do the two dictionaries compare over time?

Page 20: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

20

The Claim

At time t = 0 both Ann and Bob have the same dictionary.

But at any time t > 0 we have

Claim: Bob’s dictionary misses exactly the last entry in Ann’s dictionary after processing the last code Ann sends.

(Ann can add Wx to the dictionary, but Bob won’t know x until the next message he receives.)

Page 21: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

21

The Easy Case

Suppose at time t Ann enters A b with code number C and sends c = code(A).

Easy case: c < C-1

By Inductive Hypothesis Bob has codes upto and including C-2 in his dictionary. That is, c is already in Bob’s dictionary. So Bob can decode and now knows A.

But then Bob can update his dictionary: all he needs is the first letter of A.

Page 22: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

22

The Easy Case

Suppose at time t Ann enters A b with code number C and sends c = code(A).

Easy case: c < C-1

… A b …

c

CC-1Entered:

Sent:

Page 23: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

23

The Hard Case

Now suppose c = C-1.

Recall, at time t Ann had entered A b with code number C and sent c = code(A).

… A b …

c

CC-1Entered:

Sent:

Page 24: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

24

The Hard Case

Now suppose c = C-1.

Recall, at time t Ann had entered A b with code number C and sent c = code(A).

… A’ s’ … b …

c

C

cEntered:

Sent:

A = A’ s’

a1 = s’

Page 25: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

25

The Hard Case

Now suppose c = C-1.

Recall, at time t Ann had entered A b with code number C and sent c = code(A).

… s’ W s’ … b…

c

C

cEntered:

Sent:

A’ = s’ W

Page 26: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

26

The Hard Case

Now suppose c = C-1.

Recall, at time t Ann had entered A b with code number C and sent c = code(A).

… s’ W s’ W s’ b …

c

C

cEntered:

Sent:

Page 27: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

27

The Hard Case

Now suppose c = C-1.

Recall, at time t Ann had entered A b with code number C and sent c = code(A).

So we have

Time t-1: entered c = code(A),

sent code(A’), where A = A’ s’

Time t: entered C = code(A b),

sent c = code(A), where a1 = s’

But then A’ = s’ W.

Page 28: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

28

The Hard Case

In other words, the text must looked like so

…. s’ W s’ W s’ b ….

But Bob already knows A’ and thus can reconstruct A.

QED

Page 29: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

Midterm Review

Page 30: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

30

Basic Data Structures

ListPersistance

TreeHeight of tree, Depth of node, LevelPerfect, Complete, Full Min & Max number of nodes

Page 31: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

31

Recurrence Relations

E.g., T(n) = T(n-1) + n/2 Solve by repeated substitution Solve resulting series Prove by guessing and substitution Master Theorem

T(N) = aT(N/b) + f(N)

Page 32: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

32

Solving recurrence equations

Repeated substitution:t(n) = n + t(n-1) = n + (n-1) + t(n-2) = n + (n-1) + (n-2) + t(n-3)and so on… = n + (n-1) + (n-2) + (n-3) + … + 1

Page 33: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

33

Incrementing series

This is an arithmetic series that comes up over and over again, because characterizes many nested loops:

for (i=1; i<n; i++) { for (j=1; j<i; j++) { f(); }}

Page 34: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

34

“Big-Oh” notation

N

cf(N)

T(N)

n0

runn

ing t

ime

T(N) = O(f(N))“T(N) is order f(N)”

Page 35: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

35

Upper And Lower Bounds

f(n) = O( g(n) ) Big-Ohf(n) ≤ c g(n) for some constant c and n > n0

f(n) = ( g(n) ) Big-Omegaf(n) ≥ c g(n) for some constant c and n > n0

f(n) = ( g(n) ) Thetaf(n) = O( g(n) ) and f(n) = ( g(n) )

Page 36: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

36

Upper And Lower Bounds

f(n) = O( g(n) ) Big-OhCan only be used for upper bounds.

f(n) = ( g(n) ) Big-OmegaCan only be used for lower bounds

f(n) = ( g(n) ) ThetaPins down the running time exactly (up to a multiplicative constant).

Page 37: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

37

Big-O characteristic

Low-order terms “don’t matter”:Suppose T(N) = 20n3 + 10nlog n + 5Then T(N) = O(n3)

Question:What constants c and n0 can be used to show

that the above is true?

Answer: c=35, n0=1

Page 38: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

38

Big-O characteristic

The bigger task always dominates eventually. If T1(N) = O(f(N)) and T2(N) = O(g(N)).Then T1(N) + T2(N) = max( O(f(N)), O(g(N) ).

Also:T1(N) T2(N) = O( f(N) g(N) ).

Page 39: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

39

Dictionary

Operations: Insert Delete Find

Implementations: Binary Search Tree AVL Tree Splay Trie Hash

Page 40: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

40

Binary search trees

Simple binary search trees can have bad behavior for some insertion sequences. Average case O(log N), worst case O(N).

AVL trees maintain a balance invariant to prevent this bad behavior. Accomplished via rotations during insert.

Splay trees achieve amortized running time of O(log N). Accomplished via rotations during find.

Page 41: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

41

AVL trees

Definition Min number of nodes of height H

FH+3 -1, where Fn is nth Fibonacci

number Insert - single & double rotations.

How many? Delete - lazy. How bad?

Page 42: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

42

Single rotation

For the case of insertion into left subtree of left child:

Z

YX

ZYX

Deepest node of X has depth 2 greater than deepest node of Z.

Depth reduced by 1

Depth increased by 1

Page 43: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

43

Double rotation

For the case of insertion into the right subtree of the left child.

Z

X

Y1 Y2

ZX Y1 Y2

Page 44: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

44

Splay trees

Splay trees provide a guarantee that any sequence of M operations (starting from an empty tree) will require O(Mlog N) time.

Hence, each operation has amortized cost of O(log N).

It is possible that a single operation requires O(N) time.

But there are no bad sequences of operations on a splay tree.

Page 45: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

45

Splaying, case 3

Case 3: Zig-zag (left).Perform an AVL double rotation.

a

Zb

X

Y1 Y2

a

Z

b

X Y1 Y2

Page 46: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

46

Splaying, case 4

Case 4: Zig-zig (left).Special rotation.

a

Zb

Y

W X

a

Z

b

Y

W

X

Page 47: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

47

Tries

Good for unequal length keys or sequences

Find O(m), m sequence length

But: Few to many children

4 5 9

4 6 6

5 8 8

3 3

I

like loveyou

5

9lovely

Page 48: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

48

Hash Tables

Hash function h: h(key) = index Desirable properties:

Approximate random distributionEasy to calculateE.g., Division: h(k) = k mod m

Perfect hashingWhen know all input keys in advance

Page 49: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

49

Collisions

Separate chainingLinked list: ordered vs unordered

Open addressingLinear probing - clustering very bad with

high load factor*Quadratic probing - secondary

clustering, table size must be primeDouble hashing - table size must be

prime, too complex

Page 50: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

50

Hash Tables

Delete? Rehash when load factor high -

double (amortize cost constant) Find & insert are near constant

time! But: no min, max, next,… operation Trade space for time--load factors

<75%

Page 51: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

Priority Queues

Page 52: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

52

Priority Queues

Operations: Insert FindMin DeleteMin

Implementations:Linked listSearch treeHeap

Page 53: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

53

Linked list deleteMin O(1) O(N) insert O(N) O(1)

Search treesAll operations O(log N)

HeapsdeleteMin O(log N)

insert O(log N)

buildheap O(N) N inserts

or

Possible priority queue implementations

Page 54: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

54

Heaps

Properties: 1. Complete binary tree in an array2. Heap order property

Insert: push up DeleteMin: push down Heapify: starting at bottom, push

down Heapsort: BuildHeap + DeleteMin

Page 55: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

55

Insert - Push up

Insert leaf to establish complete tree property. Bubble inserted leaf up the tree until the heap order

property is satisfied.

13

2665

24

32

31 6819

16

14

21

13

2665

24

32

21 6819

16

31

14

Page 56: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

56

DeleteMin - Push down

Move last leaf to root to restore complete tree property. Bubble the transplanted leaf value down the tree until the

heap order property is satisfied.

14

31

2665

24

32

21 6819

16

14

--

2665

24

32

21 6819

16

31

1 2

Page 57: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

57

Heapify - Push down

Start at bottom subtrees. Bubble subtree root down until the heap order

property is satisfied.

24

2365

26

21

31 1916

68

14

32

Page 58: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

Sorting

Page 59: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

59

Simple sorting algorithms

Several simple, quadratic algorithms (worst case and average).

- Bubble Sort- Insertion Sort- Selection Sort

Only Insertion Sort of practical interest: running time linear in number of inversion of input sequence.

Constants small. Stable?

Page 60: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

60

Sorting Review

Asymptotically optimal O(n log n) algorithms (worst case and average).

- Merge Sort- Heap Sort

Merge Sort purely sequential and stable.

But requires extra memory: 2n + O(log n).

Page 61: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

61

Quick Sort

Overall fastest. In place.

BUT:

Worst case quadratic. Average case O(n log n).

Not stable.

Implementation details tricky.

Page 62: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

62

Radix Sort

Used by old computer-card-sorting machines.

Linear time:• b passes on b-bit elements• b/m passes m bits per pass

Each pass must be stable

BUT:

Uses 2n+2m space.

May only beat Quick Sort for very large arrays.

Page 63: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

Data Compression

Page 64: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

64

Data Compression

Huffman Optimal prefix-free codesPriority queue on “tree” frequency

LZWDictionary of codes for previously seen

patternsWhen find pattern increase length by

one trie

Page 65: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

65

Huffman Full: every node

Is a leaf, orHas exactly 2 children.

Build tree bottom up:Use priority queue of trees

weight - sum of frequencies.

New tree of two lowest weight trees.

c

a

b

d0

0

0

1

1

1

a=1, b=001, c=000, d=01

Page 66: 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

66

Summary of LZW

LZW is an adaptive, dictionary based compression method.

Incrementally builds the dictionary (trie) as it encodes the data.

Building the dictionary while decoding is slightly more complicated, but requires no special data structures.