data compression project-huffman algorithm
Post on 28-Apr-2015
119 Views
Preview:
DESCRIPTION
TRANSCRIPT
Data Compression Project
Mini Project Report
Submitted to
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
By
Samir Sheriff
Satvik N
In partial fulfilment of the requirements
for the award of the degree
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
R V College of Engineering
(Autonomous Institute, Affiliated to VTU)
BANGALORE - 560059
May 2012
DECLARATION
We, Samir Sheriff and Satvik N bearing USN number 1RV09CS093 and 1RV09CS095
respectively, hereby declare that the dissertation entitled “Data Compression Project”
completed and written by us, has not been previously formed the basis for the award of
any degree or diploma or certificate of any other University.
Bangalore Samir Sheriff
USN:1RV09CS093
Satvik N
USN:1RV09CS095
R V COLLEGE OF ENGINEERING
(Autonomous Institute Affiliated to VTU)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
This is to certify that the dissertation entitled, “Data Compression Project”, which
is being submitted herewith for the award of B.E is the result of the work completed by
Samir Sheriff and Satvik N under my supervision and guidance.
Signature of Guide
(Name of the Guide)
Signature of Head of Department Signature of Principal
(Dr. N K Srinath) (Dr. B.S Sathyanarayana)
Name of Examiner Signature of Examiner
1:
2:
ACKNOWLEDGEMENT
The euphoria and satisfaction of the completion of the project will be incomplete without
thanking the persons responsible for this venture.
We acknowledge RVCE (Autonomous under VTU) for providing an opportunity to
create a mini-project in the 5th semester. We express our gratitude towards Prof. B.S.
Satyanarayana, principal, R.V.C.E for constant encouragement and facilitates extended
in completion of this project. We would like to thank Prof. N.K.Srinath, HOD, CSE
Dept. for providing excellent lab facilites for the completion of the project. We would
personally like to thank our project guides Chaitra B.H. and Suma B. and also the
lab in charge, for providing timely assistance & guidance at the time.
We are indebted to the co-operation given by the lab administrators and lab assistants,
who have played a major role in bringing out the mini-project in the present form.
Bangalore
Samir Sheriff
6th semester, CSE
USN:1RV09CS093
Satvik N
6th semester, CSE
USN:1RV09CS095
i
ABSTRACT
The Project “Data Compression Techniques is aimed at developing programs that
transform a string of characters in some representation (such as ASCII) into a new string
(of bits, for example) which contains the same information but whose length is as small as
possible. Compression is useful because it helps reduce the consumption of resources such
as data space or transmission capacity. The design of data compression schemes involve
trade-off s among various factors, including the degree of compression, the amount of
distortion introduced (e.g., when using lossy data compression), and the computational
resources required to compress and uncompress the data.
Many data processing applications require storage of large volumes of data, and the
number of such applications is constantly increasing as the use of computers extends to
new disciplines. Compressing data to be stored or transmitted reduces storage and/or
communication costs. When the amount of data to be transmitted is reduced, the effect
is that of increasing the capacity of the communication channel. Similarly, compressing a
file to half of its original size is equivalent to doubling the capacity of the storage medium.
It may then become feasible to store the data at a higher, thus faster, level of the storage
hierarchy and reduce the load on the input/output channels of the computer system.
ii
Contents
ACKNOWLEDGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
1 INTRODUCTION 1
1.1 SCOPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 REQUIREMENT SPECIFICATION 3
3 Compression 4
3.1 A Naive Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 The Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Building the Huffman Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4.1 An Example: ”go go gophers” . . . . . . . . . . . . . . . . . . . . 6
3.4.2 Example Encoding Table . . . . . . . . . . . . . . . . . . . . . . . 8
3.4.3 Encoded String . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Decompression 9
4.1 Storing the Huffman Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Creating the Huffman Table . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Storing Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 CONCLUSION AND FUTURE WORKS 12
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
iii
APPENDICES15
iv
Chapter 1
INTRODUCTION
The Project “Data Compression Techniques is aimed at developing programs that trans-
form a string of characters in some representation (such as ASCII) into a new string (of
bits, for example) which contains the same information but whose length is as small as
possible. Compression is useful because it helps reduce the consumption of resources such
as data space or transmission capacity. The design of data compression schemes involve
trade-offs among various factors, including the degree of compression, the amount of
distortion introduced (e.g., when using lossy data compression), and the computational
resources required to compress and uncompress the data.
1.1 SCOPE
The data compression techniques find applications in almost all fields. To list a few,
• Audio data compression reduces the transmission bandwidth and storage re-
quirements of audio data. Audio compression algorithms are implemented in soft-
ware as audio codecs. Lossy audio compression algorithms provide higher compres-
sion at the cost of fidelity, are used in numerous audio applications. These algo-
rithms almost all rely on psychoacoustics to eliminate less audible or meaningful
sounds, thereby reducing the space required to store or transmit them. Video
1
Software Requirements Specification Data Compression Techniques
• Video compression uses modern coding techniques to reduce redundancy in video
data. Most video compression algorithms and codecs combine spatial image com-
pression and temporal motion compensation. Video compression is a practical im-
plementation of source coding in information theory. In practice most video codecs
also use audio compression techniques in parallel to compress the separate, but
combined data streams.
• Grammar-Based Codes
They can extremely compress highly-repetitive text, for instance, biological data
collection of same or related species, huge versioned document collection, internet
archives, etc. The basic task of grammar-based codes is constructing a context-
free grammar deriving a single string. Sequitur and Re-Pair are practical grammar
compression algorithms which public codes are available.
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 2
Chapter 2
REQUIREMENT SPECIFICATION
Software Requirement Specification (SRS) is an important part of software development
process. We describe about the overall description of the Data Compression Project, the
specific requirements of the Data Compression Project, the software requirements and
hardware requirements and the functionality of the system.
Software Requirements
• Front End: Qt GUI Application.
• Back End: C++
• Operating System: Linux.
Hardware Requirements
• Processor: Intel Pentium 4 or higher version
• RAM: 512MB or more
• Hard disk: 5 GB or less
3
Chapter 3
Compression
We’ll look at how the string ”go go gophers” is encoded in ASCII, how we might save
bits using a simpler coding scheme, and how Huffman coding is used to compress the
data resulting in still more savings.
3.1 A Naive Approach
With an ASCII encoding (8 bits per character) the 13 character string ”go go gophers”
requires 104 bits. The table below on the left shows how the coding works.
4
Compression Data Compression Techniques
The string ”go go gophers” would be written (coded numerically) as 103 111 32 103
111 32 103 111 112 104 101 114 115. Although not easily readable by humans, this would
be written as the following stream of bits (the spaces would not be written, just the 0’s
and 1’s) 1100111 1101111 1100000 1100111 1101111 1000000 1100111 1101111 1110000
1101000 1100101 1110010 1110011 Since there are only eight different characters in ”go go
gophers”, it’s possible to use only 3 bits to encode the different characters. We might, for
example, use the encoding in the table on the right above, though other 3-bit encodings
are possible. Now the string ”go go gophers” would be encoded as 0 1 7 0 1 7 0 1 2 3 4
5 6 or, as bits: 000 001 111 000 001 111 000 001 010 011 100 101 110 111 By using three
bits per character, the string ”go go gophers” uses a total of 39 bits instead of 104 bits.
More bits can be saved if we use fewer than three bits to encode characters like g, o, and
space that occur frequently and more than three bits to encode characters like e, p, h, r,
and s that occur less frequently in ”go go gophers”.
3.2 The Basic Idea
This is the basic idea behind Huffman coding: to use fewer bits for more frequently
occurring characters. We’ll see how this is done using a tree that stores characters at
the leaves, and whose root-to-leaf paths provide the bit sequence used to encode the
characters.
We’ll use Huffman’s algorithm to construct a tree that is used for data compression.
We’ll assume that each character has an associated weight equal to the number of times
the character occurs in a file, for example. In the ”go go gophers” example, the characters
’g’ and ’o’ have weight 3, the space has weight 2, and the other characters have weight 1.
When compressing a file we’ll need to calculate these weights, we’ll ignore this step for
now and assume that all character weights have been calculated.
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 5
Compression Data Compression Techniques
3.3 Building the Huffman Tree
Huffman’s algorithm assumes that we’re building a single tree from a group (or forest)
of trees. Initially, all the trees have a single node with a character and the character’s
weight. Trees are combined by picking two trees, and making a new tree from the two
trees. This decreases the number of trees by one at each step since two trees are combined
into one tree. The algorithm is as follows:
• Begin with a forest of trees. All trees are one node, with the weight of the tree equal
to the weight of the character in the node. Characters that occur most frequently
have the highest weights. Characters that occur least frequently have the smallest
weights.
• Repeat this step until there is only one tree:
• Choose two trees with the smallest weights, call these trees T1 and T2. Create a
new tree whose root has a weight equal to the sum of the weights T1 + T2 and
whose left subtree is T1 and whose right subtree is T2.
• The single tree left after the previous step is an optimal encoding tree..
3.4 An Example
3.4.1 An Example: ”go go gophers”
We’ll use the string ”go go gophers” as an example. Initially we have the forest shown
below. The nodes are shown with a weight/count that represents the number of times
the node’s character occurs.
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 6
Compression Data Compression Techniques
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 7
Decompression Data Compression Techniques
3.4.2 Example Encoding Table
The character encoding induced by the last tree is shown below where again, 0 is used
for left edges and 1 for right edges.
3.4.3 Encoded String
The string ”go go gophers” would be encoded as shown (with spaces used for easier
reading, the spaces wouldn’t appear in the real encoding). 00 01 100 00 01 100 00 01
1110 1101 101 1111 1100
Once again, 37 bits are used to encode ”go go gophers”. There are several trees that
yield an optimal 37-bit encoding of ”go go gophers”. The tree that actually results from
a programmed implementation of Huffman’s algorithm will be the same each time the
program is run for the same weights (assuming no randomness is used in creating the
tree).
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 8
Chapter 4
Decompression
Generally speaking, the process of decompression is simply a matter of translating the
stream of prefix codes to individual byte values, usually by traversing the Huffman tree
node by node as each bit is read from the input stream (reaching a leaf node necessarily
terminates the search for that particular byte value). Before this can take place, however,
the Huffman tree must be somehow reconstructed.
4.1 Storing the Huffman Tree
• In the simplest case, where character frequencies are fairly predictable, the tree can
be preconstructed (and even statistically adjusted on each compression cycle) and
thus reused every time, at the expense of at least some measure of compression
efficiency.
• Otherwise, the information to reconstruct the tree must be sent a priori.
• A naive approach might be to prepend the frequency count of each character to
the compression stream. Unfortunately, the overhead in such a case could amount
to several kilobytes, so this method has little practical use.
• Another method is to simply prepend the Huffman tree, bit by bit, to the output
stream. For example, assuming that the value of 0 represents a parent node and 1 a
leaf node, whenever the latter is encountered the tree building routine simply reads
9
Decompression Data Compression Techniques
the next 8 bits to determine the character value of that particular leaf. The process
continues recursively until the last leaf node is reached; at that point, the Huffman
tree will thus be faithfully reconstructed. The overhead using such a method ranges
from roughly 2 to 320 bytes (assuming an 8-bit alphabet).
Many other techniques are possible as well. In any case, since the compressed data
can include unused ”trailing bits” the decompressor must be able to determine when to
stop producing output. This can be accomplished by either transmitting the length of
the decompressed data along with the compression model or by defining a special code
symbol to signify the end of input (the latter method can adversely affect code length
optimality, however).
4.2 Creating the Huffman Table
To create a table or map of coded bit values for each character you’ll need to traverse
the Huffman tree (e.g., inorder, preorder, etc.) making an entry in the table each time
you reach a leaf. For example, if you reach a leaf that stores the character ’C’, following
a path left-left-right-right-left, then an entry in the ’C’-th location of the map should be
set to 00110. You’ll need to make a decision about how to store the bit patterns in the
map. At least two methods are possible for implementing what could be a class/struct
BitPattern:
• Use a string. This makes it easy to add a character (using +) to a string during
tree traversal and makes it possible to use string as BitPattern. Your program may
be slow because appending characters to a string (in creating the bit pattern) and
accessing characters in a string (in writing 0’s or 1’s when compressing) is slower
than the next approach.
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 10
Conclusion Data Compression Techniques
• Alternatively you can store an integer for the bitwise coding of a character. You
need to store the length of the code too to differentiate between 01001 and 00101.
However, using an int restricts root-to-leaf paths to be at most 32 edges long since
an int holds 32 bits. In a pathological file, a Huffman tree could have a root-to-leaf
path of over 100. Because of this problem, you should use strings to store paths
rather than ints. A slow correct program is better than a fast incorrect program.
4.3 Storing Sizes
The operating system will buffer output, i.e., output to disk actually occurs when some
internal buffer is full. In particular, it is not possible to write just one single bit to a file,
all output is actually done in ”chunks”, e.g., it might be done in eight-bit chunks. In any
case, when you write 3 bits, then 2 bits, then 10 bits, all the bits are eventually written,
but you can not be sure precisely when they’re written during the execution of your
program. Also, because of buffering, if all output is done in eight-bit chunks and your
program writes exactly 61 bits explicitly, then 3 extra bits will be written so that the
number of bits written is a multiple of eight. Because of the potential for the existence
of these ”extra” bits when reading one bit at a time, you cannot simply read bits until
there are no more left since your program might then read the extra bits written due to
buffering. This means that when reading a compressed file, you CANNOT use code like
this.
int bits;
while (input.readbits(1, bits))
{
// process bits
}
To avoid this problem, you can write the size of a data structure before writing the
data structure to the file.
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 11
Chapter 5
CONCLUSION AND FUTURE
WORKS
Summary
Limitations
1. Huffman code is optimal only if exact probability distribution of the source symbols
is known.
2. Each symbol is encoded with integer number of bits.
3. Huffman coding is not efficient to adapt with the changing source statistics.
4. The length of the codes of the least probable symbol could be very large to store
into a single word or basic storage unit in a computing system.
Further enhancements The huffman coding the we have considered is simple binary
Huffman codingbut many variations of Huffman coding exist,
1. n-ary Huffman coding: The n-ary Huffman algorithm uses the {0, 1, ... , n 1}
alphabet to encode message and build an n-ary tree. This approach was considered
by Huffman in his original paper. The same algorithm applies as for binary (n equals
2)codes, except that the n least probable symbols are taken together, instead of just
the 2 least probable. Note that for n greater than 2, not all sets of source words
12
Conclusion Data Compression Techniques
can properly form an n-ary tree for Huffman coding. In this case, additional 0-
probability place holders must be added. If the number of source words is congruent
to 1 modulo n-1, then the set of source words will form a proper Huffman tree.
2. Adaptive Huffman coding: A variation called adaptive Huffman coding calcu-
lates the probabilities dynamically based on recent actual frequencies in the source
string. This is some what related to the LZ family of algorithms.
3. Huffman template algorithm: Most often, the weights used in implementa-
tions of Huffman coding represent numeric probabilities, but the algorithm given
above does not require this; it requires only a way to order weights and to add
them. The Huffman template algorithm enables one to use any kind of weights
(costs,frequencies etc)
4. Length-limited Huffman coding: Length-limited Huffman coding is a variant
where the goal is still to achieve a minimum weighted path length, but there is an
additional restriction that the length of each codeword must be less than a given
constant. The package-merge algorithm solves this problem with a simple greedy
approach very similar to that used by Huffman’s algorithm. Its time complexity
is O(nL),where L is the maximum length of a codeword. No algorithm is known
to solve this problem in linear or linear logarithmic time, unlike the presorted and
unsorted conventional Huffman problems, respectively.
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 13
Bibliography
14
Appendix A: Source Code Data Compression Techniques
Appendices
Appendix A : Source Code
Listing 5.1: The definiton of the class Charnode each node of the huffman tree is an
object of this class.
#ifndef Charnode h
#define Charnode h
#define DEBUG 1
#i f DEBUG
#define LOG( s ) cout<<s<<endl ;
#else
#define LOG( s ) //
#endif
using namespace std ;
template <class TYPE>
class Charnode
{
TYPE ch ;
int count ;
Charnode ∗ l e f t ;
Charnode ∗ r i g h t ;
public :
Charnode (TYPE ch , int count = 0 ) ;
Charnode ( const Charnode ∗ New ) ;
int GetCount ( ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 15
Appendix A: Source Code Data Compression Techniques
int Value ( ) ;
void SetLe f t ( Charnode ∗ l e f t ) ;
void SetRight ( Charnode ∗ r i g h t ) ;
Charnode ∗ GetLeft ( void ) ;
Charnode ∗ GetRight ( void ) ;
TYPE GetChar ( void ) ;
void show ( ) ;
bool operator <(Charnode &obj2 ) ;
void setChar (TYPE ch ) ;
} ;
template <class TYPE>
Charnode<TYPE> : : Charnode (TYPE ch , int count )
{
LOG( ”new Charnode”<<count<<” reques ted ” ) ;
this−>ch = ch ;
this−>count = count ;
this−> l e f t = this−>r i g h t = NULL;
}
template <class TYPE>
Charnode<TYPE> : : Charnode ( const Charnode ∗ New)
{
LOG( ”new Charnode”<<New−>count<<” reques ted ” ) ;
this−>ch = New−>ch ;
this−>count = New−>count ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 16
Appendix A: Source Code Data Compression Techniques
this−> l e f t = New−> l e f t ;
this−>r i g h t = New−>r i g h t ;
}
template <class TYPE>
int Charnode<TYPE> : : GetCount ( )
{
return count ;
}
template <class TYPE>
int Charnode<TYPE> : : Value ( )
{
return count ;
}
template <class TYPE>
void Charnode<TYPE> : : S e tLe f t ( Charnode ∗ l e f t )
{
this−> l e f t = l e f t ;
}
template <class TYPE>
void Charnode<TYPE> : : SetRight ( Charnode ∗ r i g h t )
{
this−>r i g h t = r i g h t ;
}
template <class TYPE>
Charnode<TYPE> ∗ Charnode<TYPE> : : GetLeft ( void )
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 17
Appendix A: Source Code Data Compression Techniques
{
return l e f t ;
}
template <class TYPE>
Charnode<TYPE> ∗ Charnode<TYPE> : : GetRight ( void )
{
return r i g h t ;
}
template <class TYPE>
TYPE Charnode<TYPE> : : GetChar ( void )
{
return ch ;
}
template <class TYPE>
void Charnode<TYPE> : : show ( )
{
cout<<ch<< ’\ t ’<<count<<endl ;
}
template <class TYPE>
bool Charnode<TYPE> : : operator <(Charnode &obj2 )
{
return ( count < obj2 . GetCount ( ) ) ;
}
template <class TYPE>
void Charnode<TYPE> : : setChar (TYPE ch )
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 18
Appendix A: Source Code Data Compression Techniques
{
this−>ch = ch ;
}
#endif
Listing 5.2: The definition of the class Huffman this class helps in building the huffman
tree for an input file.
#include <iostream>
#include ”Charnode . h”
#include ” g l o b a l s . h”
#include ” b i tops . h”
#include <vector>
#include <map>
#include <f stream>
#ifndef HuffmanCode h
#define HuffmanCode h
using namespace std ;
template <class TYPE>
class Huffman
{
private :
vector<Charnode<TYPE> ∗> charactermap ;
Charnode<TYPE> ∗huffmanTreeRoot ;
map<TYPE, s t r i ng> t ab l e ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 19
Appendix A: Source Code Data Compression Techniques
map<TYPE, int> f r eq tab ;
private :
void p r o c e s s f i l e ( const char ∗ f i l ename , map<TYPE, int> & charmap ) ;
vector<Charnode<TYPE> ∗> convertToVector (map<TYPE, int> &chamap ) ;
bool compare ( Charnode<TYPE> ∗ i , Charnode<TYPE> ∗ j ) ;
void MinHeapify ( vector<Charnode<TYPE> ∗> & charactermap , int i , const int n ) ;
void BuildMinHeap ( vector<Charnode<TYPE> ∗> &charactermap ) ;
void buildHuffmanTree ( ) ;
void delNode ( Charnode<TYPE> ∗ ) ;
public :
Huffman ( ) ;
Huffman ( const char ∗ f i l ename ) ;
˜Huffman ( ) ;
void createHuffmanTable ( Charnode<TYPE> ∗ t ree , int code , int he ight ) ;
void displayCharactermap ( ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 20
Appendix A: Source Code Data Compression Techniques
void displayHuffmanTable ( ) ;
Charnode<TYPE> ∗ getRoot ( ) ;
map<TYPE, s t r i ng> getHuffmanTable ( ) ;
map<TYPE, int> getFrequencyMap ( ) ;
int getCharVecSize ( ) ;
} ;
template<class TYPE>
int Huffman<TYPE> : : getCharVecSize ( )
{
return charactermap . s i z e ( ) ;
}
template <class TYPE>
void Huffman<TYPE> : : p r o c e s s f i l e ( const char ∗ f i l ename , map<TYPE, int> & charmap )
{
ibstream i n f i l e ( f i l ename ) ;
int i n b i t s ;
while ( i n f i l e . r e a d b i t s (BITS PER WORD, i n b i t s ) != fa l se )
{
// cout << (TYPE) i n b i t s ;
charmap [ (TYPE) i n b i t s ]++;
}
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 21
Appendix A: Source Code Data Compression Techniques
LOG( ”\n\n\nEND\n” )
}
template <class TYPE>
vector<Charnode<TYPE> ∗> Huffman<TYPE> : : convertToVector (map<TYPE, int> &chamap)
{
vector<Charnode<TYPE> ∗> charactermap ;
for (typename map<TYPE, int > : : i t e r a t o r i i=chamap . begin ( ) ; i i !=chamap . end ( ) ; ++i i )
{
// cout << (∗ i i ) . f i r s t << ” : ” << (∗ i i ) . second << end l ;
Charnode<TYPE> ∗ ch = new Charnode<TYPE>((∗ i i ) . f i r s t , ( ∗ i i ) . second ) ;
charactermap . push back ( ch ) ;
#i f DEBUG
//ch−>show ( ) ;
i f ( ch−>GetLeft()==NULL && ch−>GetRight()==NULL)
LOG( ” Leaf Node i n i t i a l i z e d proper ly ” ) ;
#endif
}
return charactermap ;
}
template <class TYPE>
bool Huffman<TYPE> : : compare ( Charnode<TYPE> ∗ i , Charnode<TYPE> ∗ j )
{
return (∗ i<∗ j ) ;
}
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 22
Appendix A: Source Code Data Compression Techniques
template <class TYPE>
void Huffman<TYPE> : : MinHeapify ( vector<Charnode<TYPE> ∗> & charactermap , int i , const int n)
{
int l e f t = 2∗ i + 1 ;
int r i g h t = l e f t + 1 ;
int s m a l l e s t = −1;
i f ( l e f t <n && charactermap [ l e f t ]−>Value()< charactermap [ i ]−>Value ( ) )
s m a l l e s t = l e f t ;
else
s m a l l e s t = i ;
i f ( r i ght<n && charactermap [ r i g h t ]−>Value()< charactermap [ s m a l l e s t ]−>Value ( ) )
s m a l l e s t = r i g h t ;
i f ( s m a l l e s t != i )
{
Charnode<TYPE> ∗ temp = charactermap [ i ] ;
charactermap [ i ] =charactermap [ s m a l l e s t ] ;
charactermap [ s m a l l e s t ] = temp ;
MinHeapify ( charactermap , sma l l e s t , n ) ;
}
}
template <class TYPE>
void Huffman<TYPE> : : BuildMinHeap ( vector<Charnode<TYPE> ∗> &charactermap )
{
int n = charactermap . s i z e ( ) ;
for ( int i = n /2 ; i>=0 ; i−−)
MinHeapify ( charactermap , i , n ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 23
Appendix A: Source Code Data Compression Techniques
}
template <class TYPE>
void Huffman<TYPE> : : buildHuffmanTree ( )
{
LOG( f u n c ) ;
vector<Charnode<TYPE> ∗> charactermap = this−>charactermap ; // Dup l i ca te and change
/∗
HUFFMAN (C)
Refer CLRS ( non−unicode c h a r a c t e r s . )
∗/
int n = charactermap . s i z e ( ) ;
LOG( ” S i z e o f the char map = ”<<n ) ;
for ( int i =1; i<n ; i++)
{
LOG( i<<” th i t e r a t i o n ” )
BuildMinHeap ( charactermap ) ;
Charnode<TYPE> ∗ l e f t = new Charnode<TYPE>(charactermap [ 0 ] ) ;
LOG( l e f t−>GetCount ( ) ) ;
charactermap . e r a s e ( charactermap . begin ( )+0) ;
BuildMinHeap ( charactermap ) ;
Charnode<TYPE> ∗ r i g h t = new Charnode<TYPE>(charactermap [ 0 ] ) ;
charactermap . e r a s e ( charactermap . begin ( )+0) ;
LOG( r ight−>GetCount ( ) ) ;
Charnode<TYPE> ∗ z = new Charnode<TYPE>( ’ \0 ’ , l e f t−>Value ()+ r ight−>Value ( ) ) ;
z−>SetLe f t ( l e f t ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 24
Appendix A: Source Code Data Compression Techniques
z−>SetRight ( r i g h t ) ;
LOG( z−>GetCount ( ) )
LOG( z−>GetLeft()−>GetCount ( ) ) ;
LOG( z−>GetRight()−>GetCount ( ) ) ;
charactermap . push back ( z ) ;
}
huffmanTreeRoot = charactermap [ 0 ] ; // I n i t i a l i z e the roo t ;
}
template <class TYPE>
Huffman<TYPE> : : Huffman ( )
{}
template <class TYPE>
Huffman<TYPE> : : Huffman ( const char ∗ f i l ename )
{
map<TYPE, int> charmap ;
p r o c e s s f i l e ( f i l ename , charmap ) ;
charactermap = convertToVector ( charmap ) ;
f r eq tab = charmap ;
buildHuffmanTree ( ) ;
createHuffmanTable ( huffmanTreeRoot , 0 , 0 ) ;
}
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 25
Appendix A: Source Code Data Compression Techniques
template <class TYPE>
void Huffman<TYPE> : : delNode ( Charnode<TYPE> ∗node )
{
i f ( node == NULL)
return ;
delNode ( node−>GetLeft ( ) ) ;
delNode ( node−>GetRight ( ) ) ;
delete node ;
}
template <class TYPE>
Huffman<TYPE> : :˜ Huffman ( )
{
delNode ( huffmanTreeRoot ) ;
huffmanTreeRoot = NULL;
}
template <class TYPE>
void Huffman<TYPE> : : createHuffmanTable ( Charnode<TYPE> ∗ t ree , int code , int he ight )
{
LOG( f u n c ) ;
i f ( t r e e==NULL) // This c o d i t i o n never occurs !
return ;
i f ( t ree−>GetLeft()==NULL && tree−>GetRight()==NULL) // Leaf Node : Print the char , count and the code . ( code l e n g t h = h e i g h t o f the node :−))
{
// cout<<”Character ”<<t ree−>GetChar()<<’\ t ’<<”Count = ”<<t ree−>GetCount()<<’\ t ’ ;
// cout<<”Code : ” ;
s t r i n g codeStr ing = ”” ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 26
Appendix A: Source Code Data Compression Techniques
for ( int j = height −1; j >=0; j−−)
{
i f ( code & (1<< j ) )
{
// cout << ’1 ’;
codeSt r ing += ”1” ;
}
else
{
// cout << ’0 ’;
codeSt r ing += ”0” ;
}
}
// cout<<end l ;
t ab l e [ t ree−>GetChar ( ) ] = codeSt r ing ;
return ;
}
code = code<<1;
createHuffmanTable ( t ree−>GetLeft ( ) , code , he ight +1);
createHuffmanTable ( t ree−>GetRight ( ) , code |1 , he ight +1);
}
template <class TYPE>
void Huffman<TYPE> : : displayCharactermap ( )
{
LOG( f u n c ) ;
int n = charactermap . s i z e ( ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 27
Appendix A: Source Code Data Compression Techniques
LOG( ” S i z e = ”<<n)
for ( int i = 0 ; i<n ; i++)
charactermap [ i ]−>show ( ) ;
cout<<endl ;
}
template <class TYPE>
Charnode<TYPE> ∗ Huffman<TYPE> : : getRoot ( )
{
return huffmanTreeRoot ;
}
template <class TYPE>
void Huffman<TYPE> : : displayHuffmanTable ( )
{
LOG( ”HUFFMAN TABLE” ) ;
for (typename map<TYPE, s t r i ng > : : i t e r a t o r i i=tab l e . begin ( ) ; i i != tab l e . end ( ) ; ++i i )
{
cout << endl << (∗ i i ) . f i r s t << ”\ t ” << (∗ i i ) . second ;
}
cout << endl ;
}
template <class TYPE>
map<TYPE, s t r i ng> Huffman<TYPE> : : getHuffmanTable ( )
{
return t ab l e ;
}
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 28
Appendix A: Source Code Data Compression Techniques
template <class TYPE>
map<TYPE, int> Huffman<TYPE> : : getFrequencyMap ( )
{
return f r eq tab ;
}
#endif
Listing 5.3: The definition of the class CompressionWriting this class helps in writing
the bits to the compressed file.
#ifndef COMP H
#define COMP H
#include <iostream>
#include <vector>
#include <map>
#include <s t r i ng>
#include <f stream>
#include ” g l o b a l s . h”
#include ” b i tops . h”
#include ”Charnode . h”
using namespace std ;
template<class TYPE>
class CompressionWriting
{
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 29
Appendix A: Source Code Data Compression Techniques
map<TYPE, s t r i ng> huffmanTable ;
Charnode<TYPE> ∗huffmanTreeRoot ;
s t r i n g outputFilename ;
s t r i n g inputFi lename ;
map<TYPE, int> freqMap ;
private :
int convertStr ingToBitPattern ( s t r i n g s t r ) ;
int totalNumOfBits ( void ) ;
public :
CompressionWriting (){}
CompressionWriting ( Charnode<TYPE> ∗ root , map<TYPE, s t r i ng> tab le , map<TYPE, int> freqMap , s t r i n g oname , s t r i n g iname ) ;
void writeCompressedDataToFile ( ) ;
void di sp layOutputFi l e ( ) ;
void writeHuffmanTreeBitPattern ( Charnode<TYPE> ∗ t ree , obstream &o u t f i l e ) ;
} ;
template<class TYPE>
CompressionWriting<TYPE> : : CompressionWriting ( Charnode<TYPE> ∗ root , map<TYPE, s t r i ng> tab le , map<TYPE, int> freMap , s t r i n g oname , s t r i n g iname )
{
huffmanTreeRoot = root ;
huffmanTable = tab l e ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 30
Appendix A: Source Code Data Compression Techniques
outputFilename = oname ;
inputFi lename = iname ;
freqMap = freMap ;
}
template<class TYPE>
void CompressionWriting<TYPE> : : writeCompressedDataToFile ( )
{
LOG( ”\nWriting Pattern : \n” ) ;
ibstream i n f i l e ( inputFi lename . c s t r ( ) ) ;
obstream o u t f i l e ( outputFilename . c s t r ( ) ) ;
o u t f i l e . w r i t e b i t s (BITS PER INT , freqMap . s i z e ( ) ) ; // Writing Number o f unique Characters
writeHuffmanTreeBitPattern ( huffmanTreeRoot , o u t f i l e ) ; // Writing Huffman Tree
o u t f i l e . w r i t e b i t s (BITS PER INT , totalNumOfBits ( ) ) ; // Writing Tota l Number o f B i t s to be compressed
// Writing Compressed Data
int i n b i t s ;
i n f i l e . rewind ( ) ;
while ( i n f i l e . r e a d b i t s (BITS PER WORD, i n b i t s ) )
{
// cout << (TYPE) i n b i t s << ” = ” << huffmanTable [ (TYPE) i n b i t s ] ;
int b i tPat te rn = convertStr ingToBitPattern ( huffmanTable [ (TYPE) i n b i t s ] ) ;
// cout << ” = ” << b i t P a t t e r n << end l ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 31
Appendix A: Source Code Data Compression Techniques
o u t f i l e . w r i t e b i t s ( huffmanTable [ (TYPE) i n b i t s ] . l ength ( ) , b i tPat t e rn ) ;
}
o u t f i l e . f l u s h b i t s ( ) ;
i n f i l e . c l o s e ( ) ;
o u t f i l e . c l o s e ( ) ;
}
template <class TYPE>
int CompressionWriting<TYPE> : : totalNumOfBits ( )
{
int count = 0 ;
int n = freqMap . s i z e ( ) ;
for (map<char , int > : : i t e r a t o r i i = freqMap . begin ( ) ; i i != freqMap . end ( ) ; ++i i )
{
// Length o f each c h a r a c t e r code ∗ num of t imes the char appears o f a l l t he chars = num of b i t s in the c h a r a c t e r
count += huffmanTable [ ( ∗ i i ) . f i r s t ] . l ength ( ) ∗ (∗ i i ) . second ;
}
LOG( ”Count = ” << count << endl ) ;
return count ;
}
template<class TYPE>
int CompressionWriting<TYPE> : : convertStr ingToBitPattern ( s t r i n g s t r )
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 32
Appendix A: Source Code Data Compression Techniques
{
int b i tPat te rn = 0 ;
int n = s t r . l ength ( ) ;
for ( int i =0; i<n ; i++)
b i tPat te rn += (1 << (n−i −1)) ∗ ( s t r [ i ] − ’ 0 ’ ) ;
return b i tPat te rn ;
}
template<class TYPE>
void CompressionWriting<TYPE> : : d i sp layOutputFi l e ( )
{
ibstream i n f i l e ( outputFilename . c s t r ( ) ) ;
o f s tream o u t f i l e ( ”xxx” ) ;
cout << ”\nDisp lay ing Output F i l e : ” << endl ;
int i n b i t s ;
while ( i n f i l e . r e a d b i t s (1 , i n b i t s ) != fa l se )
{
cout << i n b i t s ;
o u t f i l e << i n b i t s ;
}
o u t f i l e . c l o s e ( ) ;
}
template<class TYPE>
void CompressionWriting<TYPE> : : writeHuffmanTreeBitPattern ( Charnode<TYPE> ∗node , obstream &o u t f i l e )
{
i f ( node == NULL)
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 33
Appendix A: Source Code Data Compression Techniques
return ;
i f ( node−>GetLeft ( ) == NULL && node−>GetRight ( ) == NULL)
{
o u t f i l e . w r i t e b i t s (1 , 1 ) ;
o u t f i l e . w r i t e b i t s (BITS PER WORD, node−>GetChar ( ) ) ;
}
else
{
o u t f i l e . w r i t e b i t s (1 , 0 ) ;
writeHuffmanTreeBitPattern ( node−>GetLeft ( ) , o u t f i l e ) ;
writeHuffmanTreeBitPattern ( node−>GetRight ( ) , o u t f i l e ) ;
}
}
#endif
Listing 5.4: The main program of the huffman compression algorithm.
#include<f stream>
#include<c s td io>
#include<algor ithm>
#include<iostream>
#include<c s t r i ng>
#include<map>
#include<vector>
#include<c s t d l i b >
#include ”Charnode . h”
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 34
Appendix A: Source Code Data Compression Techniques
#include ”HuffmanCode . h”
#include ” g l o b a l s . h”
#include ” b i tops . h”
#include ” CompressionWriting . h”
using namespace std ;
int main ( int argc , char ∗ argv [ ] )
{
LOG( f u n c ) ;
i f ( argc != 3)
{
cout<<”Usage ”<<argv [0]<< ” I n p u t f i l e ”<<” O u t p u t f i l e \n” ;
e x i t ( 0 ) ;
}
Huffman<char> hu f f ( argv [ 1 ] ) ;
// h u f f . d isp layCharactermap ( ) ;
cout << endl << endl ;
// h u f f . d isp layHuffmanTable ( ) ;
map<char , s t r i ng> huffmanTable = hu f f . getHuffmanTable ( ) ;
CompressionWriting<char> writ ingObj ( hu f f . getRoot ( ) , huffmanTable , hu f f . getFrequencyMap ( ) , argv [ 2 ] , argv [ 1 ] ) ;
wr it ingObj . writeCompressedDataToFile ( ) ;
cout<<”Done ! ”<<endl ;
// w r i t i n g O b j . d i s p l a y O u t p u t F i l e ( ) ;
// t e s t ( ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 35
Appendix A: Source Code Data Compression Techniques
// cin . g e t ( ) ;
}
Listing 5.5: The definition of the class Decompressor this class helps in decompressing
the compressed file using huffman algorithm.
#ifndef DECOMP H
#define DECOMP H
#include <iostream>
#include <vector>
#include <map>
#include <s t r i ng>
#include ” g l o b a l s . h”
#include ” b i tops . h”
#include ”Charnode . h”
using namespace std ;
template <class TYPE>
class Decompressor
{
Charnode<TYPE> ∗huffmanTreeRoot ;
s t r i n g outputFilename ;
s t r i n g compressedFilename ;
int numChars ;
private :
inl ine int readCount ( ibstream &i b s ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 36
Appendix A: Source Code Data Compression Techniques
void const ructTree ( Charnode<TYPE> ∗ &, int n , ibstream &i b s ) ;
void preorder ( Charnode<TYPE> ∗node ) ;
public :
Decompressor ( ){}
Decompressor ( s t r i n g cname , s t r i n g oname ) ;
˜Decompressor ( ) ;
void decompress ( ) ;
void delNode ( Charnode<TYPE> ∗ ) ;
} ;
template <class TYPE>
Decompressor<TYPE> : : Decompressor ( s t r i n g cname , s t r i n g oname)
{
outputFilename = oname ;
compressedFilename = cname ;
}
template <class TYPE>
void Decompressor<TYPE> : : delNode ( Charnode<TYPE> ∗ node )
{
i f ( node == NULL)
return ;
i f ( node−>GetLeft ( ) != NULL)
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 37
Appendix A: Source Code Data Compression Techniques
delNode ( node−>GetLeft ( ) ) ;
i f ( node−>GetRight ( ) != NULL)
delNode ( node−>GetRight ( ) ) ;
delete node ;
}
template <class TYPE>
Decompressor<TYPE> : :˜ Decompressor ( )
{
LOG( f u n c )
// delNode ( huffmanTreeRoot ) ;
huffmanTreeRoot = NULL;
}
template < class TYPE >
int Decompressor<TYPE> : : readCount ( ibstream & i b s )
{
int count = 0 ;
i b s . r e a d b i t s (BITS PER INT , count ) ;
return count ;
}
template <class TYPE>
void Decompressor<TYPE> : : p reorder ( Charnode<TYPE> ∗node )
{
i f ( node == NULL)
{
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 38
Appendix A: Source Code Data Compression Techniques
return ;
}
cout << endl << node−>GetChar ( ) ;
preorder ( node−>GetLeft ( ) ) ;
p reorder ( node−>GetRight ( ) ) ;
}
template <class TYPE>
void Decompressor<TYPE> : : const ructTree ( Charnode<TYPE> ∗ &node , int n , ibstream &i b s )
{
i f (n == 0)
return ;
i f ( node != NULL && node−>GetLeft ( ) != NULL && node−>GetRight ( ) != NULL)
return ;
int b i t r ead ;
i b s . r e a d b i t s (1 , b i t r ead ) ;
i f ( b i t r ead == 1)
{
i b s . r e a d b i t s (BITS PER WORD, b i t r ead ) ;
node = new Charnode<TYPE>((char ) b i t r ead ) ;
n−−;
}
else
{
node = new Charnode<TYPE>( ’ \0 ’ ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 39
Appendix A: Source Code Data Compression Techniques
Charnode<TYPE> ∗ l e f t n o d e = node−>GetLeft ( ) ;
Charnode<TYPE> ∗ r ightnode = node−>GetRight ( ) ;
const ructTree ( l e f tnode , n , i b s ) ;
const ructTree ( r ightnode , n , i b s ) ;
node−>SetLe f t ( l e f t n o d e ) ;
node−>SetRight ( r ightnode ) ;
}
}
template <class TYPE>
void Decompressor<TYPE> : : decompress ( )
{
// Read and b u i l d the t r e e
/∗ 1) Read the f i r s t 8 b i t s t h a t r e p r e s e n t s the count o f b i t s in the t r e e
∗ 2) ( Reading the t r e e c o n t e n t s ) 0 i n d i c a t e s an i n t e r n a l node and when a 1
∗ i s encountered i t ’ s a l e a f and the next 8 b i t s r e p r e s e n t t h a t char .
∗ 3) Thus read a l l the chars i n t o a l i s t and c o n s t r u c t the huffman t r e e ( Bottom Up)
∗ 4) Use the t r e e and decompress the f i l e .
∗/
// Step 1
vector<Charnode<TYPE> ∗> a l l c h a r s ;
ibstream compressedFi l e ( compressedFilename . c s t r ( ) ) ;
obstream outputF i l e ( outputFilename . c s t r ( ) ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 40
Appendix A: Source Code Data Compression Techniques
int n = readCount ( compressedFi l e ) ;
LOG( ”Huffman Tree S i z e read = ”<< n)
// Step 2
huffmanTreeRoot = NULL; //new Charnode<TYPE>( ’\0 ’) ;
const ructTree ( huffmanTreeRoot , n , compressedFi l e ) ;
// preorder ( huffmanTreeRoot ) ;
// Step 4
int i = readCount ( compressedFi l e ) ;
Charnode<TYPE> ∗ t r a v e r s e r = huffmanTreeRoot ;
while ( i )
{
int b i t r ead ;
compressedFi l e . r e a d b i t s (1 , b i t r ead ) ;
// cout << ”Read b i t = ” << b i t r e a d ;
t r a v e r s e r = ( b i t r ead ) ? t r a ve r s e r−>GetRight ( ) : t r a ve r s e r−>GetLeft ( ) ;
// cout << ”−−> ” << t r a v e r s e r−>GetChar ( ) << end l ;
i f ( t r av e r s e r−>GetLeft()==NULL && tr av e r s e r−>GetRight()==NULL)
{
outputF i l e . w r i t e b i t s (BITS PER WORD, t r a ve r s e r−>GetChar ( ) ) ;
// cout << ” Leaf = ” << t r a v e r s e r−>GetChar ( ) << end l ;
t r a v e r s e r = huffmanTreeRoot ;
}
−−i ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 41
Appendix A: Source Code Data Compression Techniques
}
outputF i l e . c l o s e ( ) ;
compressedFi l e . c l o s e ( ) ;
}
#endif
Listing 5.6: The main program of the huffman decompression algorithm.
#include<f stream>
#include<c s td io>
#include<algor ithm>
#include<iostream>
#include<c s t r i ng>
#include<map>
#include<vector>
#include<c s t d l i b >
#include ”Charnode . h”
//#i n c l u d e ”HuffmanCode . h”
#include ” g l o b a l s . h”
#include ” b i tops . h”
#include ”Decompressor . h”
using namespace std ;
int main ( int argc , char ∗ argv [ ] )
{
LOG( f u n c ) ;
i f ( argc != 3)
{
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 42
Appendix A: Source Code Data Compression Techniques
cout<<”Usage ”<<argv [0]<< ” Input f i l e ”<<” Output f i l e \n” ;
e x i t ( 0 ) ;
}
Decompressor<char> c o m p r e s s e d f i l e ( argv [ 1 ] , argv [ 2 ] ) ;
c o m p r e s s e d f i l e . decompress ( ) ;
// cin . g e t ( ) ;
// c in . g e t ( ) ;
}
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 43
Appendix B: Screen shots Data Compression Techniques
Appendix B : Screen Shots
Figure 5.1: The Data Compression Server window.
Figure 5.2: Creation of a new file from the server window.
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 44
Appendix B: Screen shots Data Compression Techniques
Figure 5.3: Compressing a file (google) at the server.
Figure 5.4: Compressing a file (samir.txt) at the server.
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 45
Appendix B: Screen shots Data Compression Techniques
Figure 5.5: The Data Compression Client window.
Figure 5.6: The Client after receiving a file from the server .
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 46
Appendix B: Screen shots Data Compression Techniques
Figure 5.7: The Client after receiving a file from the server.
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 47
top related