a novel binary-image compression scheme
DESCRIPTION
This is an ICASSP 2010 presentation of a novel binary and discrete-color image compression scheme.TRANSCRIPT
A Fast Lossless A Fast Lossless Compression Scheme Compression Scheme
for Digital Map Images for Digital Map Images Using Color SeparationUsing Color Separation
by Saif Zahir and Arber Borici
Department of Computer Science,University of Northern British Columbia
ICASSP 2010
2
Huffman and Arithmetic Huffman and Arithmetic CodingCoding The Huffman method assigns shorter code
to symbols with higher probabilities Optimal encoder Low efficiency with skewed probabilities Higher efficiency on groups of symbols
The Arithmetic method performs better by encoding a sequence of symbols Higher complexity Less efficient with larger alphabets Affected by inaccurate probabilities more often
than Huffman coding
3
Motivation and ObjectivesMotivation and Objectives Proposing a new lossless compression
method for discrete-color images Constructing a universal Huffman-based
codebook by studying the entropy of a system of randomly chosen binary images
Introducing and additional helper module, the RCRC algorithm
Low-complexity and high-efficiency method Design a coding method notwithstanding the
nature of a binary image
4
The Proposed MethodThe Proposed Method Three components:
1. Preprocessing Perform color separation into binary layers Make layer dimensions divisible by 8
2. A universal Huffman codebook Designed by studying a relatively large sample of
randomly chosen binary images Huffman coding applied on 8x8 blocks of a binary image
3. The Row-Column Reduction Coding Attempts to compress 8x8 blocks which are not found in
the codebook Checks for identity between rows and columns
5
General Diagram of the Method
6
The Codebook The Codebook ConstructionConstruction Build a system of binary images
Images contain no noise The system is unbiased, i.e. images are randomly selected
Perform a frequency analysis on 8x8 blocks of the system images
Identify blocks that occur more than once Determine the system entropy Build correct Huffman codes for the most frequent
blocks The resulting codebook is a fixed-to-variable
dictionary containing 6952 entries
7
How is the codebook employed? Search the codebook for each 8x8 block of a
given source image The X-by-Y image is partitioned into XY/64 8x8
blocks
If the block exists in the codebook, compress using the corresponding Huffman code
An example of the codebook structure:
8
The first three codebook entries:
9
The Codebook EntropyThe Codebook Entropy
The entropy of our codebook is 4.08 bits per 8x8 block Thus, the compression limit is (64 – 4.08)/64 =
93.63% The average Huffman code length is 4.094
10
Row-Column Reduction Coding Row-Column Reduction Coding (RCRC)(RCRC) Operates on 8x8 blocks Uses two reference vectors
The Row Reference Vector (RRV) is a column vector The Column Reference Vector (CRV) is a row vector
Checks whether two consecutive row vectors are identical If rows are identical, one is eliminated and the block is
reduced by one row If not, the next two consecutive row vectors are compared The row reduction operation continues until the end of the
block is reached
11
RCRC (cont.) The column reduction operation is similar and
elimination operations are stored in the CRV
The output of RCRC is a bit stream containing the RRV, CRV, and the reduced block Concatenated as S = RRV+CRV+RB Minimum length of S = 17 bits
12
RCRC Example:
Row 1 eliminates row 2
Row 3 eliminates all other rows
Column elimination is performed on the row-reduced block:
The compressed bit stream for this example is: 10100000100000101011
13
The Coding ProcessThe Coding Process Summarized in the following table:
Block encoding bits Description
Case 1a ‘11’For the block with the shortest Huffman code in the codebook
Case 1b‘00’ + 5 bits + Huffman Code
For other blocks found in the codebook
Case 2 ‘01’ + RRV + CRV + RBFor blocks compressed by RCRC
Case 3 ‘10’ + 64 bits For uncompressed blocks
14
Time ComplexityTime Complexity Analytical Time Complexity
The codebook contains fixed entries RCRC is executed on fixed, 8x8 blocks The variable input is the source image size Time complexity is O(XY), where X and Y are the image
dimensions
Empirical Metric Average run time: less than 1s for various binary images Worst case: Block B not in the dictionary; not compressed
by RCRC. However, such blocks are rare (about 5% of, on average)
15
Preliminary ResultsPreliminary Results Tested on several topographic maps Average compression of 0.036 bpp Very large image dimensions
ImageOriginal Size (KB)
Compressed Size (KB)
Compression Ratio (bpp)
1 10,960 210.77 0.019
2 220,979 7,489.27 0.034
3 173,769 6,626. 27 0.038
4 2,562 206.69 0.081
Total 409,270 14,533.12 0.036
Results reported in the literature for a similar class of images vary from 0.22 to 0.18 bpp.
(Franti et al., 2002)
16
Selected Set of Test Images:
Source: UNBC GIS Lab: Maps of British Columbia
17
4 different colors
Layer 1: Contour Lines Layer 2: Lakes
Layer 3: Rivers Layer 4: Roads
The Four Layers of Map 1