erlz: compression coding for palettized colored images compression coding for palettized colored...
TRANSCRIPT
ERLZ: Compression Coding for Palettized Colored Images
Mohamed Hamdy. El-Sheikh, Shehab Gamalel-Din
Hassanein Al-Barhmtoshy, and Ahmed Kolkila
Al-Azhar University, Systems & Computers Engineering Dept.
Cairo, Egypt
Abstract. In this work, we deal with palettized colored images. These types of images are widely used with most of the applications on personal computers and small workstations and, more importantly, satellite images. In the era of Internet and e-learning, efficiency of transmission of such images is critical. Unfortunately, palettized images are usually large and hence require large transmission bandwidth. Compression is one way of managing the size of transmitted images. Therefore, a key criterion of successful compression for such types of applications is the compaction ratio. Accuracy is also as important; hence, we focus on lossless compression techniques. Two commonly used techniques in lossless image compression are RLE and LZW. The main advantage in RLE technique is the use of small memory size requiring the implementation of a high-speed compression process, that is in addition to its easy-to-implement algorithm; while the main advantage in LZ techniques (especially LZW) is the high compression ratio. In this research we propose a new compression algorithm for lossless color palettized images that we called it ERLZ. It merges the two known algorithms of RLE and LZW to gain the benefits of them both. The design and implementation of ERLZ is also presented in this paper. This algorithm is tested using two image sets. The first contained small changes in colors. The second contained many color changes. Comparisons of the results of ERLZ to those of both RLE and LZW are also presented. The analysis of such results and comparisons allow us to identify the characteristics of the images that are best compressed by ERLZ.
1. INTRODUCTION.
Image compression is important for many emerging applications in the fields of
multimedia, visual communication systems and telecommunication networks [1]. The need
for image compression is increasing with an aim to minimize the costs of both the storing
memory and the transmitting time and bandwidth. Image compression is concerned with
minimization of the number of information carrying units used to represent an image. The
efficiency of a compression algorithm is measured by its data compressing ability, the
resulting distortion, and storage and time required in compressing and decompressing
processes as well as its implementation complexity [2,3].
In image compression, there is a common assumption that “small numerical errors in
decoded values lead to small visual errors that are unrecognizable”. This is not true for
some image types and applications, such as palettized images that require critical
performance criteria [4]. Examples are atmospheric observatory images that usually
contain repeated data and that require high transmission and translation speeds. Other
examples are cartoon and animated multimedia and applications. For these reasons, we are
more interested in lossless compression techniques.
In this research we have comparatively studied several of the commonly used lossless
compression techniques to identify the strengths and weaknesses of each. Then we studied
the viability of these techniques in compressing the special type of palettized images. We
have been able to conclude that none of the studied techniques is powerful enough to meet
the unique characteristics of palettized images. Some are too complex, while others are
time expensive in both compression and decompression, and still others require large
storage to manipulate either of the two processes. However, palettized images require high
compression ratio and high-speed compression/decompression processors that makes
neither of these techniques ideal when used separately for the compression of such type of
images.
Therefore, in this article we propose a new image compression approach for lossless
compression of color palettized images that we called it ERLZ (Enhanced Run Length and
Lempel Ziv). This new technique is envisioned from two commonly known techniques,
namely Run Length Encoding (RLE) [8] and Lempel Ziv Welch (LZW) [5,6,7]. Hence,
ERLZ gains some of the benefits of them both and avoids some of their limitations and
weaknesses. For instance, it gains the high compression ratio of LZW and the high
compression speed of RLE.
In the following section we review some of the compression techniques upon which we
have built ERLZ, namely RLE, LZW and Huffman. A comparison between them all is
then given for analyzing their strengths and weaknesses trying to assess their capabilities in
supporting the criteria required by the palettized images.
2. COMMENLY USED LOSSLESS COMPRESSION TECHNIQUES – A REVIEW OF RELATED WORK.
The most commonly used techniques in lossless compression are Huffman, RLE and LZW.
The Huffman coding technique is a statistical-based technique, while RLE is simple and
effective for string data, and LZW is substitutional or a dictionary-based encoding
technique. Each of them has its advantages and disadvantages. A brief introduction to
each of them is given in the next sections followed by a comparative analysis.
2.1 The Run Length Encoding (RLE) Compression Technique.
Run Length is a simple method for lossless compression of sequential data. The
assumption upon which this algorithm is based is that “in many data streams, the
consecutive single tokens (symbols) are identical”; an assumption that is commonly valid
for palettized images. Therefore, RLE checks the stream for these identical tokens and
whenever more than four equal input tokens (symbols) are found, it replaces them with a
two-field structured special token carrying data about the number of occurrences of this
repeated token and the symbol itself [9]. For more details on how the RLE algorithm
works, please refer to Appendix A.
The decoding (decompression) process is simple. When one of the special tokens is found
in the received input compressed stream, it is replaced in the decompressed output data
stream by a string of concatenated repeated original symbol with the appropriate length.
All other tokens are placed as is in the output stream, since they represent the original
message symbol.
2.2 Lempel Ziv Welsh (LZW) Compression Technique.
The two computer scientists, Abraham Lempel and Jacob Ziv introduced the substitutional
compression technique in 1977 [10], and since then, it is known as LZ77. This technique
depends on capturing the high order relationship between words and phrases. Lempel and
Ziv came up with an improvement for LZ77 scheme in 1978 [11], known as LZ78, which
was refined by Terry Welsh in 1984 and hence, became known by LZW [12]. LZW works
by creating a “dictionary” of phrases that occur in the input data. When an encountered
phrase is already present in the dictionary, the index number of that phrase in the dictionary
is placed in the output stream. This means that the dictionary entries are of variable length.
The first 256 entries are used to contain the values for individual bytes, so the actual first-
string index is 256. As the string is compressed, the dictionary is built up to contain every
possible string combination that can be obtained from the message/image, starting with two
characters, then increasing to three characters, and so on. The details of LZW are shown in
Appendix A.
2.3 Huffman Encoding Technique.
Huffman encoding uses a strategy to analyze the file and assign codes on a character
frequency basis. Once the file has been analyzed, a binary tree is constructed to represent
the coded alphabet. The individual characters are leaves in the tree, and their codes are
indicated by the path of each leaf from the root of the tree. Tracing the path from the root
to a specific leaf yields the code for the character found at that leaf [13].
Huffman coding technique has optimum case when all input symbols probabilities are
integral power of 0.5; otherwise, the binary tree and the code length are large. Huffman
algorithm is slow because it requires scanning all input symbols to calculate its
probabilities [14]. Although it requires a medium-sized memory, its overall average
compression ratio is usually good.
2.4 Analytical Comparison.
Table 1 compares the three lossless compression techniques, namely Huffman, LZW and
RLE.
Studying this comparative data, we can conclude the following characteristics. RLE is a
simple lossless compression technique. It is an easy and fast way to compress data but the
compression ratio isn’t nearly as good as that obtained with Huffman or LZW.
Table (1): Comparison among Huffman, LZW and RLE.
Huffman
LZW
RLE
Compression
technique
Statistical Substitution (dictionary) Simple.
Basic idea Gives a reduction in
the average code length
used to represent the
symbols of input
alphabet.
Entering phrases into a
dictionary and then, when
a repeat occurrences of
that particular phrase is
found, outputting the
dictionary index instead of
the phrase.
Replaces redundant data
with token.
Optimal case Where all symbols
probabilities are
integral power of 0.5.
When there are repeated
sequence of string symbols
(identical or different)
When there is high
redundancy of identical
(repeated) sequence data.
Examples (for
optimal case).
Input alphabet a, b.
Abababab 10101010
(optimal)
Aaaabbbb 11110000
(optimal)
Aabbaabb 11001100
(optimal)
12352(a:1,b:2,ab:3,ba:4,ab
a:5)
13262
(a:1,b:2,aa:3,aaa:4,ab:5,bb:
6, bbb:7)
Not compressed.
!4a!4b(! special token).
Not compressed.
Applied on Sequential data. Sequential data. Sequential and identical
(repeated) data.
At work
(application)
Doesn’t work alone,
but with other
techniques
Work alone effectively. Good work alone.
Usually used in JPEG formats Graphics (GIF format). Graphics ( bmp & pcx
format). Used in facsimile
that apply CCITT(ITU-
T)T4&T6 recommendations
Speed of
compression
Medium
Statistical process -
building binary tree.
Medium.
Compare & add –
build a symbol table.
High
Count repeated symbols and
encode them.
Compressed output
data (C.R.)
Good.
Middle C.R.
Very good.
High C.R.
Good.
Low C.R.
Restriction on this
type.
Number of input
symbols alphabet
(binary tree).
Dictionary length. Redundant sequential data
only.
Example
( input =
abbababaac)
A=0.5 b=0.4 c=0.1
The o/p code
100001001001101
Dictionary- process-o/p
code
A=1 ab 1
B=2 bb 2
C=3 ba 2
//ab=4 ab
bb=5 (ab)a 4
ba=6 ab
aba=7 aba
abaa=8 (aba)a 7
ac=9 ac 1
c(special case) 3
Not compressed, i.e.,
abbababaac
LZW is dictionary-based having a high compression ratio but it consumes a long time in
both of the compression and decompression processes. It also requires a relatively large
memory size (depending on the size of the created symbol table).
Huffman technique doesn’t work separately, but rather it is used to to improve the results of
other techniques.
This analysis reveals that neither of these techniques suits the requirements of palettized
images that have a high repetition of identical data elements and that require fast
compression/decompression processes with high compression ratio and minimum memory
requirements. This led us to investigate the possibility of merging these three techniques
together for better and improved results. In the following section we introduce ERLZ, the
proposed compression technique inspired from both RLE and LZW integrated. In Section
4, an improvement of ERLZ by utilizing Huffman technique is introduced.
3. ERLZ: THE PROPOSED TECHNIQUE.
Theoretical possibility of image compression is primarily on the redundancy in the data
representing the image. Such redundancy occurs because of the correlation between
neighboring pixels with identical physical values. In palettized images, redundancy in
identical pixels is common not only for neighboring pixels but at multi different places and
areas of a single image. One can safely conclude that palettized images are composed of
spots of identical pixels in most of the image space with exceptions only occurring at object
edges. For typical examples of palettized images, see Appendix B.
Based on this assumption, both RLE and LZW techniques are suitable for image
compression with limitations and disadvantages as discussed in last section. RLE utilizes
this assumption very well and has relatively good output, though it only considers the
neighboring identical pixels in the data stream and ignoring other similarly identical pixels
when scanning the area of an image. On the other hand, LZW considers this unique feature
of palettized images by using string table, symbol table and dictionary. However, it
requires relatively large temporary storage, consumes more time and sends the table as part
of the message.
In this paper we propose a new algorithm that we called it ERLZ “Enhanced Run Length
and Lempel Ziv”. ERLZ gains the advantages of both RLE and LZW together. Let us
explain. RLE encodes identical consecutively repeated symbols with a single token having
the symbol and its occurrence count; so at decompression, the symbol is repeatedly inserted
in the output stream with the exact number as was in the uncompressed image. On the
other hand, LZW builds a dictionary of all possible strings of pixels found in the image
including strings of identical symbols with different repetition count. Strings of identical
symbols with different lengths are considered different entries with new encoding. ERLZ
utilizes these two ideas in order to use the same encoding symbol of repeated strings in the
dictionary table. Each entry is considered a single symbol and hence, ERLZ reduces the
sizes of both the symbol table and the compressed output. This improves the execution
speed of the compression process.
In the following sections, we present the compression and decompression algorithms of
ERLZ, then we review some of our experimental work to evaluate and compare ERLZ with
the other algorithms. Section 4 presents an improved version of ERLZ with the utilization
of Huffman algorithm.
3.1 The ERLZ Compression Algorithm.
The ERLZ Compression algorithm is shown in Table 2.a. The detailed algorithm and
source code written in C language can be found in [15].
This algorithm begins the compression process with an empty dictionary table that is
incrementally built along the compression process. Each time a new input symbol is read,
ERLZ compares it with previous symbols. All consecutive repeated symbols are
considered of a single entry in the Symbol Table with adjusted frequency. Other Symbol
table entries are created along the process of scanning the input stream. A new entry is
created when the look-ahead symbol concatenated to the scanned sub-stream is not already
an entry in the table. At which time, a new code is generated for that sub-stream and its
value is also sent to the output stream. In this process, all duplicate consecutive input
symbols are considered as a single symbol with a repetition frequency.
It should be noted that the symbol table entries are of variable length. Each symbol entry is
in itself a string like that generated by the RLE algorithm, i.e., consists of symbols and their
repetition frequencies. Those symbols are codes of other entries of the same Symbol Table,
which might be primitive symbols (of the alphabet used) or composite of consecutive
primitive or other composite symbols. So, ERLZ extends the concept of a symbol to mean
more complex streams and strings. This is a unique characteristic of ERLZ that
contributes to the efficiency of this algorithm in terms of both memory and time as well as
compression ratio.
Table2.a. The ERLZ Compression Algorithm.
Create and Initialize the Symbol Table. (Frequency, Symbol, Code)
Initialize.
While not EOF in input stream do
Repeat Until (Sin ≠ Sin+1)
Get next input symbol (Sin );
Increment counter
End Repeat
IF (number of repeated Sin and Sin value is not an entry in Symbol Table)
Then Create a new entry in Symbol Table
End IF
Continue Until S = String + Sin is not in Symbol Table S = S + Get next input symbol (Sin );
IF S is not in Symbol Table
Then Add a new entry in Symbol table
End IF
End Continue
Output its code to the output compressed data stream.
End While
Output symbol table to table output stream.
Concatenate the two streams (table stream and output stream) to a single
output compressed file.
3.2 The ERLZ Decompression Algorithm.
The ERLZ Decompression algorithm is illustrated in Table 2.b. The details of this
algorithm and the source code written in C language can be found in [15]. As shown, it has
a simple and straightforward implementation. This is due to the aid of the Symbol Table
whose each of it entries is considered as an output of the simple RLE algorithm.
The decompression algorithm begins its job by initializing the Symbol Table by that read
from the input compressed file. This table is the same dictionary table (Symbol Table)
generated by ERLZ at the compression time of that image file, but in an inverted way to
simplify the search for entry codes. For each scanned code in the input compressed file, the
table is searched for the appropriate entry. All input codes should be found in the Symbol
Table; otherwise an error might have taken place, e.g., transmission corruption. Each
scanned code in the input stream is replaced in the output stream by the appropriate string
of symbols as per the corresponding table entry. Each symbol is repeated according to the
indicated frequency count. It should be noted that this replacement process is recursive.
Each table entry code in an entry string is recursively replaced by the string of the entry
corresponding to sub-string code.
Table2.b. The ERLZ Decompression Algorithm.
Initialize the dictionary.
(Symbol Table consists of the following structure: frequency of symbol, the
symbol value and the code).
While not EOF in input stream do Get next code from compressed input stream.
IF the compressed code is in Symbol Table:
Then In the output stream: Place the symbol value repeated a number of the frequency value. (This is a recursive process in substituting code from
table entries.)
Else Error (this code is either not compressed by ERLZ or the compressed input is
distorted.
End While.
3.3 Experimental Analysis
For evaluating and comparing ERLZ to its ancestors’ algorithms, each of them is separately
applied to different images of different types, e.g., engineering drawings, photos, satellite
images, cartoon pictures, and logos, see Appendix B for the tested images. These images
are selected in such a way to cover different applications of palettized images. They can be
divided into two groups. The first group contains images with small number of colors and
a small change in the degree of colors’ depth from an area to another in the same image,
hence, they have few details with respect to 16 and 256 color systems. The second group
contains images with sharp differences in colors’ depth in the same area and hence, they
contain more details. The two image sets are in the BMP image file format.
The comparison criteria that we evaluated in this experimental work are: compression
ratio, temporary memory size and compression time. See Table 3 for the results of these
tested compressed images. The analysis of the resulted data reveals the following
conclusions.
In most cases, the compression ratio of ERLZ was relatively better than that of RLE and
close to that of LZW, though improved. This was expected because it merges the two
approaches in a single one. For the time cost, RLE technique was shortest, as it is the
simplest and doesn’t have to build time-consuming tables. However, ERLZ was relatively
better in compression time than LZW, especially for large sized images, because the String
Table of LZW contains larger strings and takes longer time for searching and comparing
the input string with all strings in the table. For the size of memory required in the
compression process, again RLE was the best of them all as it uses constant memory size to
check the repeated identical characters and it doesn’t requires any additional storage as it is
needed by the other two algorithms for storing and manipulating their Symbol Tables. On
the other hand, ERLZ consumed less memory size than LZW, as the sizes of most entries
of ERLZ table are shorter than those of LZW, especially when there are high frequencies of
repeated symbols or patterns; also, the overall number of entries in ERLZ table is less
because of the recursive nature of its entries.
Table 3. Comparison among LZW, RLE and ERLZ
(First Level of Compression).
Original
File
LZW RLE ERLZ
Compress
-ion Ratio
%
Memory
Size
Time
(sec.)
Ratio %
Time
(sec.)
Ratio %
Memory
Size
Time
(sec.)
Bub
77.68 15261 0.50 48.86 0.001 64.68 264 0.002
Bub5 87.85 127428 1.650 73.123 0.01 90.32 660 0.5
Mark 81.80 267595 13.20 75.98 0.60 88.05 2392 0.280
Map 90.314 280572 7.470 77.697 0.01 92.82 664 0.11
Setup16 83.89 360171 13.648 67.291 0.50 88.25 1304 1.648
Eclipse 91.63 429597 12.654 89.277 0.11 93.81 1588 0.22
Bal 92.253 551431 37.80 90.703 0.11 93.74 4628 0.380
Ger 97.81 541291 14.60 95.845 0.50 96.80 2248 1.648
Tree 95.645 1019911 77.654 94.270 0.11 96.03 3200 0.280
4. IMPROVED ERLZ.
Although the compression ratio in ERLZ was acceptable, improvement is still possible.
Therefore, we investigated whether integrating ERLZ with Huffman coding, as a statistical
compression method, will reveal better results. Therefore, we applied the Huffman coding
technique on the output of ERLZ as a second level of compression. This is depicted in
Figure 1.
Input
LZW
RLE
ERLZ
Huffman
First-Level Compression
Second-Level
Compression
Output
Figure 1: Improved ERLZ.
4.1 Experimental Analysis.
Table 4 shows a comparative study of applying the improved ERLZ technique on the
same set of test images used in our previous experimental analysis. It shows results with
and without applying Huffman coding. It indicates slight improvements when Huffman
coding is used as a second level compression that is applied on the results of ERLZ.
Table 4. Improved ERLZ (Second Level Compression).
Original
File
ERLZ Improved
ERLZ
Compression Ratio %
Compression Ratio %
Bub 64.68 62.88
Bub5 90.32 91.38
Mark 88.05 89.00
Map 92.82 93.50
Setup16 88.25 89.19
Eclipse 93.81 95.00
Bal 93.74 94.36
Ger 96.80 97.56
These data are statistically depicted as in Figures 2 and 3. Figure 2 shows compression
ratio differences among LZW, RLE and ERLZ for the first order level compression, while
Figure 3 shows comparisons for the second order level compression ratios.
4.2 Verification
In testing the correctness of our implementations of both compression and decompression,
we designed a program and called it Checkit to match the outputs resulted from the
decompressed image files with the original uncompressed input image files.
Figure 2. Chart of output sizes for the first order of compression.
Figure 3. Chart of output sizes for the second order of compression.
Compression Samples Test First Level
0
10000
20000
30000
40000
50000
60000
Input File Size
Output File Size
LZW
RLE
ERLZ
LZW 748 5150 13458 9088 18052 12226 15698 8286 19072
RLE 1083 14300 27046 28231 50279 21206 23343 10773 27567
ERLZ 607 6796 20498 12261 24753 16550 19452 6804 20951
Bub-2118Bub5-
53206
Mark-
112624
Map-
126582
Setup-
153718
Eclipse-
197770Bal-251078
Ger-
259278
Tree-
481078
Com pression Sam ple s Test Second Lev e l
0
10000
20000
30000
40000
50000
60000 LZWLZ+HUF
RLERL+HUFERLZERLZ+HUF
LZW 748 5150 13458 9088 18052 12226 15698 8286 19072
LZ+HUF 786 4584 12379 8218 16602 9872 14158 6323 15475
RLE 1083 14300 27046 28231 50279 21206 23343 10773 27567
RL+HUF 564 6151 14741 9688 21853 10691 13950 5516 14707
ERLZ 607 6796 20498 12261 24753 16550 19452 6804 20951
ERLZ+HUF 570 4257 13589 6250 15460 9741 14594 4998 14316
Bub-2118 Bub15-53206 Mark-112624 Map-126582 Setup-153718Ec lipse-
197770Bal-251078 Ger-259278 Tree-481078
5. CONCLUSION.
In this paper we have introduced and discussed a new proposed
compression/decompression technique – namely, ERLZ – that is suitable for the special
features of palettized images that assumes that “images are usually composed of spots of
identical pixels in most of the image space with exceptions only occurring at object edges”.
ERLZ is inspired by two commonly used lossless compression algorithms, namely, RLE
and LZW. It integrates them both in such a way to gain their benefits and avoid their
limitations.
ERLZ was tested on several palettized images of different types, e.g., engineering
drawings, photos, satellite images, cartoon pictures, and logos. These images are selected
in such a way to cover different applications of palettized images and are classified into two
groups: images with small number of colors and a small change in the degree of colors’
depth from one area to another, and images with sharp differences in colors’ depth in the
same area and containing more details. The implementations of both the compression and
decompression algorithms of ERLZ are verified by comparing the decompressed images to
those of the uncompressed ones.
Experimental and comparative tests revealed that ERLZ has improved results over its
inspirers in terms of compression ratio, time and memory requirements. Further
improvements for the results are obtained when the statistical coding technique (Huffman
encoding) is applied on the outputs of ERLZ for a second level of compression.
We believe that the proposed ERLZ technique is suitable in the compression of special
types of images such as atmospheric observatory images and engineering drawings.
REFERENCES.
[1] Jerry D. Gibson et al, Digital Compression for Multimedia, Morgan Kaufman Publishers, California,
1998.
[2] M. Rabbani, Image & Video Compression Fundamentals and the International Standards NTSC,
IEEE press, 1998.
[3] Belur V. Dasarathy, Image Data Compression – Bolck Truncation Coding, IEEE press, 1995.
[4] Noura A. Saleh, “A New Compression Algorithm for Color Palettized Images”, an M. Sc. Thesis,
Electronics and Communications Dep., Cairo University, 1998.
[5] Leszek Gasieniec and Wojciech Rytter, “Almost Optimal Fully LZW-Compressed Pattern Matching”,
in the Proceedings of the Data Compression Conference, IEEE, 1998.
[6] Sergei Hludov and Christoph Meinel, “DICOM - Image Compression”, in the Proceedings of the 12th
IEEE Symposium on Computer-Based Medical Systems, 1998.
[7] Kun-Jin Lin, Cheng-Wen Wu, “A Low-Power CAM Design for LZ Data Compression “, IEEE
Transactions on Computers, Vol. 49, No. 10, October 2000.
[8] Fikret Ercal, Mark Allen, Hao Feng, “A Systolic Image Difference Algorithm for RLE-Compressed
Images”, IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 5, May 2000.
[9] Sayood Khalid, Introduction to Data Compression, Morgan Kaufman publishers, San Francisco,
USA, 1996.
[10] J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression,” IEEE Transactions
on Information Theory, Vol. IT-23, No. 3, May 1977.
[11] J. Ziv and A. Lempel, “Compression of Individual Sequences via Variable Rate Coding, “ IEEE
Transactions on Information Theory, Vol. IT-24, No. 5, Sept.1978.
[12] T. A. Welch, “A Technique for High Performance Data Compression”, IEEE Computers, Vol. 17,
No. 6, June 1984.
[13] Knuth, D., Sorting and Searching, Addison Wesley Readings, MA, 1973.
[14] P. Surti; L. F. Chao, and A. Tyagi, “Low Power FSM Design Using Huffman-Style Encoding”, in the
Proceedings of the 1997 European Design and Test Conference (ED&TC '97), IEEE, 1997.
[15] A. A. Kolkila, “ Multimedia Compression”, M. Sc. Thesis, Systems & Computer Engineering Dept.,
Al-Azhar University, Cairo, Egypt, 2000.
Appendix A
Compression Algorithms
A.1. RLE Compression Algorithm:
The steps for RLE Compression algorithm are as follows:
Step1: Read current stream from input buffer.
Step2: For all input data stream
Check the input stream for sequence of identical token (t).
Step3: IF there are more than four identical tokens (t)?
Count the number of repeated token.
Replace these identical tokens with;
(A) Special token (or flag, such as!).
(B) The number of repeated tokens.
(C) Token which counted.
ELSE
Output the token(s) as it to compressed data (file).
Step 4: End
A.2. Lempel Ziv Welch (LZW) Encoding Algorithm:
The steps for LZW compression algorithm are as follows:
Step1: Initialize the dictionary (string table, which consists of all possible
roots for selecting language and the unique code for each of them).
Step2: Reserve space of buffer called prefix (P) and current (C).
(P: contain sequence of characters (tokens) that precede one
character and C: current character)
Step3: Put the prefix (P) equals empty (only at the start).
Step4: Read next character (token) from input data buffer and put it in C.
Step5: IF prefix and current (P+C) present in dictionary
Make the string (P+C) as prefix to next one (P = P+C).
ELSE
Output equivalent compressed code for prefix (P) to the compressed file.
Add the string P+C to the dictionary (as new string and code).
Make current element (token) as prefix to next one (P = C).
Step6: IF there are more symbols in input data
Go to step 4.
ELSE
Output the code that denotes prefix (P) to output compressed
file.
END.