overview what is huffman codes? how can they be helpful? fixed-length codes v.s. variable-length...

29
Huffman Codes Bahareh Sarrafzadeh 6111 Fall 2009

Upload: edward-wilkerson

Post on 19-Jan-2016

226 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Huffman Codes

Bahareh Sarrafzadeh6111

Fall 2009

Page 2: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Overview

• What is Huffman Codes?• How can they be helpful?• Fixed-Length Codes v.s. Variable-Length Codes• Encoding v.s. Decoding• Prefix Codes• How to construct the Huffman’s Code?• Greedy works!• Problem Definition• Proof of Correctness• Huffman’s Code and Entropy

Page 3: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Huffman Codes - Intro

• A very efficient technique for data compression

• Savings of 20% to 90%• Proposed by David Huffman, 1952• A greedy algorithm which yields an optimal

encoding for characters based on their frequency

Page 4: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Fixed-length v.s. Variable-length

a b c d e f

Frequency (in thousands) 45 13 12 16 9 5

Fixed-length codeword 000 001 010 011 100 101

Variable-length codeword 0 101 100 111 1101 110

Page 5: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

An Example

100

86 14

a : 45 b : 13 c : 12 d : 16 e : 9 f : 5

58 28 14

100

55

c : 12 b : 13 d : 16

25 30

a : 45

e : 9 f : 5

14

0

0

0

0

00

0

0

0 0

0

1

1

1 1

1

1

1 1

1

1

Page 6: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Encoding V.S. Decoding

E 0

T 11

N 100

I 1010

S 1011

E 0

T 10

N 100

I 0111

S 1010

101010110100 10010101011

TE N N I ST

100100

N

Page 7: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Prefix Codes

• Identify end of a codeword as soon as it arrives• No codeword can be a prefix of another

codeword• A symbol code is called a prefix code if no

code word is a prefix of any other codeword• Prefix Codes simplify decoding.

Page 8: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

A Convenient Data Structure

• The decoding process needs a convenient representation for the prefix code.

• A binary tree– Leaves: characters– Paths: codewords

• It is not a BST ! A

0

B

0

0

0

01

1

1

1

1

C D

E F

Page 9: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Optimal Code

• full binary tree

• if C is the alphabet and all character frequencies are positive,

• then the tree for an optimal prefix code has exactly |C| leaves, and |C| - 1 internal nodes

Page 10: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Cost of a tree

• Given a tree T, compute the number of bits required to encode a file:

Cc

T cdcfTCost )()()(

character

alphabetfrequency

of character c

depth of c`s leaf

Page 11: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Greedy Algorithm - Overview

1. Take the two least probable symbols in the alphabet.

2. Combine these two symbols into a single symbol, and repeat.

Page 12: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

An example

a : 45 b : 13 c : 12d : 16 e : 9 f : 5

14

0 1

Page 13: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

a : 45 b : 13d : 16 c : 12

e : 9 f : 5

14

0 1

25

0 1

Page 14: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

a : 45

b : 13

d : 16

c : 12 e : 9 f : 5

14

0 1

25

30

0

0

1

1

Page 15: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

a : 45

b : 13d : 16

c : 12

e : 9 f : 5

14

0 1

2530

55

0 0

0

1 1

1

Page 16: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

a : 45

b : 13d : 16

c : 12

e : 9 f : 5

14

0 1

2530

55

0 0

0

1 1

1

100

0 1

Page 17: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Specifications

• Preconditions: We have a set of characters C (i.e. an alphabet) and we can derive the frequency table for them.

• Postconditions: We have a full binary tree which corresponds to the list of prefix codes assigned to each character in C, such that the total cost is minimum.

• Greedy Choice: Take two nodes in the tree with the least frequencies and merge them.– An Adaptive Decision

Page 18: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Specifications – Cont.

• Loop Invariant:– We have built a binary tree which is consistent with the

optimal solution.

• Establishing LI: Pre LI– Initially we haven’t made any choice, so our current

solution is consistent with the optimal solution.

• Maintaining LI: LI + Code LI’

Page 19: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Maintaining the LI

Algorithm

OptSolLI

Page 20: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Instructions for Fairy Godmother!

x

y

a b

T

a

y

x b

T’

a

b

x y

T”

Page 21: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Proof of Correctness:

1. Validity

– We have a Tree!

Page 22: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

2. Consistency

x

y

a b

T

a

y

x b

T’

a

b

x y

T”

][][

][][

yfxf

bfaf

][][

][][

bfyf

afxf

Page 23: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

• We need to prove Cost (T’) is not more than Cost (T)

Cc

T

Cc

T cdcfcdcfTCostTCost )()()()()'()( '

0

))()(])([][(

)(][)(][)(][)(][

)(][)(][)(][)(][ ''

xdadxfaf

xdafadxfadafxdxf

adafxdxfadafxdxf

TT

TTTT

TTTT

x

y

a b

T

3. Optimality

Page 24: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Running Time

1 Huffman (C)2 n |C|3 Q C 4 for i = 1 to n – 15 do allocate a new node z6 left [z] x Extract-Min (Q)7 right [z] y Extract-Min (Q)8 f [z] f [x] + f [y]9 Insert (Q, z)10 Return Extract-Min (Q)

Q : a binary min heap O (n)

O (lg n)

Running Time: O (n lg n)

Page 25: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Conclusion

• Huffman Coding– Introduction and Application– Greedy Algorithm– Proof of Correctness

• Entropy

Page 26: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct
Page 27: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Huffman’s Code and Entropy

• As defined by Shannon, the information content h (in bits) of each symbol ci with non-null probability is:

• The entropy H (in bits) is the weighted sum, across all symbols ci with non-zero probability pi of the information content of each symbol:

ii

pch

1log)( 2

i

ii chpCH )(.)(

Page 28: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

Input (C, f) Symbol (ci) a b c d e Sum

Probabilities (pi) 0.10 0.15 0.30 0.16 0.29 = 1

Huffman SCode

Codewords(cwi) 000 001 10 01 11

Codeword length (in bits)

(li)3 3 2 2 2

Cost = li pi 0.30 0.45 0.60 0.32 0.58 L(C) = 2.25

Optimality

Probability budget (2-li

)1/8 1/8 1/4 1/4 1/4 = 1.00

Information content (in bits)

(−log2 pi)3.32 2.74 1.74 2.64 1.79

Entropy(−pi log2 pi)

0.332 0.411 0.521 0.423 0.518 H(A) =

2.205

i

Page 29: Overview What is Huffman Codes? How can they be helpful? Fixed-Length Codes v.s. Variable-Length Codes Encoding v.s. Decoding Prefix Codes How to construct

i

ii ccodelengthcpL )()( i

ii cpcpH )(log)( 2

• Huffman reaches entropy limit when all probabilities are negative powers of 2• i.e., 1/2; 1/4; 1/8; 1/16; etc.

• H <= Code Length < H + 1