huffman presantation
TRANSCRIPT
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 1/24
DATA STRUCTURE
Huffman Tree : Project
S ubmitted to : S ir Abdul Wahab
S ubmitted by:
Muzmmal Hussain
Muhammad Zia S hahid
Riasat Ali
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 2/24
´In computer science and inf ormation theory, Huff man
coding is an entropy encoding alg orithm used f or
lossless data compressionµ
What Is Huf fman Tree . .?
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 4/24
Huff man tree is a eleg ant f orms of a data compressions. It is based
On minimum redundancy coding . We need represent the data in a way
that makes the data required less space.
Need O f Hu f f man Co d i n g
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 5/24
Huff man coding is most eff icient f orm of a binary
tree.
Adaptive Huff man coding
I S T H I S D AT A S T R U C T U R E D E R I V E F R O M
A N Y D A T A S T R U C T U R E
A R E T H E R E A N Y D S T H A T A R E D E R I V E D
F R O M I T ?
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 6/24
Huff man is used to compressed the f iles.
Huff man is used to minimized the binary code
Huff man is used in compression tools and also in f ax machine
Reduce storag e needed
Reduce size of data e.g . imag es audio video and text
Reduce transmission cost and band width
A d va n t a g e o f Hu f f ma n
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 7/24
Chang ing ensemble
If the ensemble chang es the f requencies and probabilities chang e the optimal coding chang es
e.g . in text compression symbol f requencies vary with context
Re-computing the Huff man code by running throug h the entire f ile in ad vance?!
Saving/ transmitting the code too?!
Does not consider ¶blocks of symbols·
¶ string s_of _ch· the next nine symbols are predictable ¶aracters_· ,but bits are used without con veying any new inf ormation
Disad vantag e of Huff man
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 8/24
The run time complexity of Huff man is 0( n),where n is number is a
symbol in the orig inal data. Each of these runs is 0 ( n) times.
The time to build the Huff man tree does not eff ect the complexity
of Huff man compress because a running time of this process
depends only on the number of diff erent symbols in the data which in
this implantation is a constant
Cost Of Huff man
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 9/24
On a computer: chang ing the representation of a f ile so that it takes less
space to store or/and less time to transmit. ² orig inal f ile can be reconstructed exactly f rom the
compressed representation.
diff erent than data compression in g eneral
² text compression has to be lossless.
² compare with sound and imag es: small chang es and noise is
tolerated.
Tex t C omp r e s s i o n
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 10/24
We can construct lossless compression by f ollowing alg orithm
Let the word ABRACADABRA
What is the most economical way to write this string in a binary
representation?
Generally speaking , if a text consists of N diff erent characters, we
need bits log[ N] bits to represent each one using a f ixed-
leng th encoding .
Co n s t r u c t i o n O f Hu f f ma n
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 11/24
Thus, it would require 3 bits f or each of 5
diff erent letters, or 33 bits f or 11 letters.
Can we do it better?
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 12/24
We can do better, provided:
² Some characters are more f requent than others.
² Characters may be diff erent bit leng ths, so that f or
example, in the Eng lish alphabet letter a may use
only one or two bits, while letter y may use several.
² We have a unique way of decoding the bit stream.
YES!!!!
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 13/24
U s i n g Va r i a b l e - l e n g t h En c o d i n g ( 1 )
Mag ic word: ABRACADABRA
LET A = 0
B = 100
C = 1010
D = 1011
R = 11
Thus, ABRACADABRA = 01001101010010110100110
So 11 letters demand 23 bits < 33 bits, an improvement of about
30%.
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 14/24
U s i n g Va r i a b l e - l e n g t h En c o d i n g ( 2 )
However, there is a serious dang er: How to ensure unique
reconstruction?
Let A -> 01 and B -> 0101
How to decode 010101?
AB?
BA?
AAA?
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 15/24
N o P r o b l em«««
if we use pref ix codes: no code word is a pref ix of another code
word.
Any pref ix code can be represented by a f ull binary tree.
Each leaf stores a symbol.
Each node has two children ² lef t branch means 0, rig ht means 1.
code word = path f rom the root to the leaf interpreting suitably
the lef t and rig ht branches.
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 16/24
P r e f i x Co d e s ( 2 )
ABRACADABRA
A = 0
B = 100
C = 1010
D = 1011
R = 11
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 17/24
Decoding is unique and simple!
Read the bit stream f rom lef t to rig ht and starting f rom the root,
whenever a leaf is reached,
write dow n its symbol and return to the root.
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 19/24
CO NS TR U CTI NG A H UFF MA N CO DE(1 )
Assume that frequencies of symbols are:
² A: 40 B: 20 C: 10 D: 10 R: 20
S mallest numbers are 10 and 10 (C and D), so connect them
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 20/24
CO NS TR U CTI NG A H UFF MA N CO DE(2 )
C and D have already been used, and the new node above them (call it C+D)
has value 20
The smallest values are B, C+D, and R, all of which have value 20
² Connect any two of these
It is clear that the alg orithm does not construct a unique tree, but
even if we have chosen the other possible connection, the code would
be optimal too!
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 22/24
CO NS TR U CTI NG A H UFF MA N CO DE(3 )
The smallest value is R, while A and B+C+D have value 40.
Connect R to either of the others.
8/8/2019 Huffman Presantation
http://slidepdf.com/reader/full/huffman-presantation 23/24
CO NS TR U CTI NG A H UFF MA N CODE(4 )
Connect the final two nodes, adding 0 and 1 to each left and right branch
respectively.