greedy algorithms huffman coding credits: thanks to dr. suzan koknar-tezel for the slides on huffman...
TRANSCRIPT
![Page 1: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/1.jpg)
Greedy AlgorithmsHuffman Coding
Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding.
![Page 2: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/2.jpg)
2
Huffman Codes Widely used technique for compressing
data Achieves a savings of 20% - 90%
Assigns binary codes to characters
![Page 3: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/3.jpg)
Fixed-length code?
Consider a 6-character alphabet {a,b,c,d,e,f}
Fixed-length: 3 bits per character Encoding a 100K character file requires
300K bits
![Page 4: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/4.jpg)
Variable-length code
Suppose you know the frequencies of characters in advance
Main idea: Fewer bits for frequently occurring characters More bits for less frequent characters
![Page 5: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/5.jpg)
5
Savings compared to:ASCII – 72%Unicode – 86%Fixed-Len – 25%
Variable-length codes
An example: Consider a 100,000 character file with only 6 different characters:
a b c d e f
Total number of bits
Frequency 45k 13k 12k 16k 9k 5k
ASCII 01000001 01000010 01000011 01000100 01000101 01000110 800,000
Unicode 16-bit 16-bit 16-bit 16-bit 16-bit 16-bit 1,600,000
Fixed-Length 000 001 010 011 100 101 300,000
Variable Length 0 101 100 111 1101 1100 224,000
![Page 6: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/6.jpg)
Another way to look at this:
Relative probability of character ‘a’: 45K/100K = 0.45
Expected encoded character length:
0.45 *1 + 0.12 *3 + 0.13 * 3 + 0.16 * 3+0.09*4 + 0.05 *4 = 2.24
If we have string of n characters Expected encoded string length = 2.24 * n
![Page 7: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/7.jpg)
7
How to decode?
Example: a = 0, b = 01, c = 10
Decode 0010• Does it translate to “aac” or “aba”• Ambiguous
![Page 8: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/8.jpg)
8
How to decode?
Example: a = 0, b = 101, c = 100
Decode 00100• Translates to “aac”
![Page 9: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/9.jpg)
What is the difference between the previous two codes?
![Page 10: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/10.jpg)
What is the difference between the previous two codes?
The second one is a prefix-code!
![Page 11: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/11.jpg)
11
Prefix Codes
In a prefix code, no code is a prefix of another code
Why would we want this? It simplifies decoding
• Once a string of bits matches a character code, output that character with no ambiguity
– No need to look ahead
![Page 12: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/12.jpg)
12
Prefix Codes (cont)
We can use binary trees for decoding If 0, follow left path If 1, follow right path Leaves are the characters
![Page 13: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/13.jpg)
13
Prefix Codes (cont)
a 45
f 5 e 9
14 d 16c 12 b 13
0
0
0 0
0
1
1
1
1
1
0
100 101 111
1100 1101
![Page 14: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/14.jpg)
14
Prefix Codes (cont)
Given tree T (corresponding to a prefix code), compute the number of bits to encode the file C = set of unique characters in file f(c) = frequency of character c in file dT(c) = depth of c’s leaf node in T
= length of code for character c
![Page 15: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/15.jpg)
15
Prefix Codes (cont)
Then the number of bits required to encode a file B(T) is
Cc
T cdcfTTB )()( tree of cost)(
![Page 16: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/16.jpg)
16
Huffman Codes (cont)
Huffman's algorithm determines an optimal variable-length code (Huffman Codes) Minimizes B(T)
![Page 17: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/17.jpg)
17
Greedy Algorithm for Huffman Codes
Merge the two lowest frequency nodes x and y (leaf or internal) into a new node z until every leaf has been considered Set f(z) = f(x)+f(y) You can also view this as replacing x & y with a single
character z in the alphabet, and after the process is completed, the code for z is determined , say 11, then the code for x is 110 and for y is 111.
Use a priority queue Q to keep nodes ordered by frequency
![Page 18: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/18.jpg)
18
Example of Creating a Huffman Code
75,40,15,25,50)(,,,,
cf
edcbaC
c 15 b 25 d 40 a 50 e 75
c 15 b 25
d 40 a 50 e 7540
i = 1
i = 2
0 1
![Page 19: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/19.jpg)
Example of Creating a Huffman Code (cont)
i = 3 a 50 e 75
d 40
c 15 b 25
40
800
0 1
1
![Page 20: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/20.jpg)
20
Example of Creating a Huffman Code (cont)
i = 4 a 50 e 75
1250 1
d 40
c 15 b 25
40
80
0
0 1
1
![Page 21: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/21.jpg)
21
Example of Creating a Huffman Code (cont)
i = 5
d 40
c 15 b 25
40
80
a 50 e 75
125
205
0
0
0
0
1
11
1
![Page 22: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/22.jpg)
22
Huffman(C)1. n = |C|2. Q = C // Q is a binary Min-Heap; (n) Build-Heap3. for i = 1 to n-14. z = Allocate-Node()5. x = Extract-Min(Q) // (lgn), (n) times6. y = Extract-Min(Q) // (lgn), (n) times7. left(z) = x8. right(z) = y9. f(z) = f(x) + f(y)10. Insert(Q, z) // (lgn), (n) times11. return Extract-Min(Q) // return the root of the tree
Huffman Algorithm Total run time: (nlgn)
![Page 23: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/23.jpg)
Correctness Claim: Consider the two characters x and y with the lowest
frequencies. Then there is an optimal tree in which x and y are siblings at the deepest level of the tree.
![Page 24: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/24.jpg)
24
Proof Let T be an arbitrary optimal prefix code tree Let a and b be two siblings at the deepest level of
T. We will show that we can convert T to another
prefix tree where x and y are siblings at the deepest level without increasing the cost.
• Switch a and x • Switch b and y
![Page 25: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/25.jpg)
25
x
y
a b
a
y
x b
a
b
x y
)()( TBTB )()( TBTB
:T :T :T
![Page 26: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/26.jpg)
26
Assume f(x) f(y) and f(a) f(b) We know that f(x) f(a) and f(y) f(b)
0
)()()()()()()()()()()()()()()()()()()()(
)()()()()()(
xdadxfafxdafadxfadafxdxfadafxdxfadafxdxf
cdcfcdcfTBTB
TT
TTTT
TTTT
CcT
CcT
Non-negative because x has (at least) the lowest
frequency
Non-negative because a is at the
max depth
![Page 27: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/27.jpg)
27
Since is at least as good as T
But T is optimal, so T’must be optimal too
Thus, moving x to the bottom (similarly, y to the bottom) yields a optimal solution
TTBTB ,0)()(
![Page 28: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/28.jpg)
The previous claim asserts that the greedy-choice of Huffman’s algorithm is the proper one to make.
![Page 29: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/29.jpg)
Claim: Huffman’s algorithm produces an optimal prefix code tree.
Proof (by induction on n=|C|) Basis: n=1
• the tree consists of a single leaf—optimal
Inductive case: • Assume for strictly less than n characters, Huffman’s
algorithm produces the optimal tree• Show for exactly n characters.
![Page 30: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/30.jpg)
(According to the previous claim) in the optimal tree, the lowest frequency characters x and y are siblings at the deepest level.
Remove x and y replacing them with z, where f(z)= f(x)+ f(y), Thus, n-1 characters remain in the alphabet.
![Page 31: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/31.jpg)
Let T’be any tree representing any prefix code for this (n-1) character alphabet. Then, we can obtain a prefix-code treeT for the original set of n characters by replacing the leaf node for z with an internal node having x and y as children. The cost of T is B(T) = B(T’) – f(z)d(z)+f(x)(d(z)+1)+f(y)(d(z)+1)
= B(T’) – (f(x)+f(y))d(z) + (f(x)+f(y))(d(z)+1)
= B(T’) + f(x) + f(y)
To minimize B(T) we need to build T’ optimally—which we assumed Huffman’s algorithm does.
![Page 32: Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding](https://reader035.vdocuments.site/reader035/viewer/2022062300/56649d6e5503460f94a4f571/html5/thumbnails/32.jpg)
32
T T
x y
z