project_report [pdf library]

Project Report on

SCALABILITY IN HETEOGENEOUS ENVIRONMENT

(VIDEO COMPRESSION)

By

DHYANESH DAMANIA

SUDARSHAN GOPINATH

RAJESH TURE

NILESH KARIA

Guided by

Dr. N.S.T. SAI

(External Guide)

Mr. VIKRANT AGASKAR

(Internal guide)

Department of Computer Engineering

Vidyavardhini’s College of Engineering and TechnologyK.T.Marg, Vasai Road

University of Mumbai2003 – 2004

TABLE OF CONTENTS

ACKNOWLEDGEMENTS i

ABSTRACT ii

1. INTRODUCTION

1.1 INTRODUCTION TO THE PROJECT 2

1.2 SOFTWARE REQUIREMENTS 4

1.3 HARDWARE REQUIREMENTS 5

1.4 AIM OF THE PROJECT 6

2. BRIEF DESCRIPTION OF THE PROJECT

2.1 ORIGINAL SCOPE OF THE PROJECT 8

2.2 EXPANDED SCOPE OF THE PROJECT 9

2.3 FUNDAMENTALS OF MOTION ESTIMATON AND 10

COMPENSATION

2.4 FULL SEARCH ALGORITHM 13

2.5 LOGARITHMIC SEARCH ALGORITHM 14

3. DESIGN DETAILS

3.1 STRUCURE OF THE ENCODER AND DECODER 16

3.2 COMPRESSED FILE HEADER FORMAT (.mcp FILE) 18

3.3 PROCESS FLOWCHARTS WITH DESCRIPTION 19

3.4 CLASS DIAGRAMS 27

3.5 INTERFACE DESIGN 32

4. TESTING RESULTS AND CONCLUSIONS

4.1 TEST PLATFORM 35

4.2 TEST CRITERIA AND VARIABLES 35

4.3 SELECTION OF TEST CASES 36

4.4 TEST RESULTS 39

4.5 SAMPLE FRAMES 49

4.6 CONCLUSIONS 52

5. APPLICATIONS AND FUTURE SCOPE

5.1 APPLICATIONS 55

5.2 FUTURE SCOPE 57

APPENDIX-A (THE AVI FILE HEADER FORMAT) 60

APPENDIX-B (THE BMP FILE HEADER FORMAT) 69

REFERENCES 78

i

ACKNOWLEDGEMENTS

Every successful endeavor is not the work of a single hand; it is the contribution of a number

of heads that make it complete. We take this opportunity to express our gratitude to our

professors who have not only given us support and guidance, but also pushed us to achieve

the best.

We express our gratitude to our external guide Dr. N.S.T. Sai, Senior Manager – Mahindra

British Telecom Ltd. for giving this opportunity to work under his expert guidance.

Our sincere thanks to Professor (Mrs.) Madhavi Pradhan, Head – Department of Computer

Engineering for consistently backing us in our work and guiding us through the course of the

project.

This project would have never set off from the launch pad it not been for Mr. Vikrant

Agaskar, our internal project guide who has spent a great deal of his invaluable time and

efforts structuring our project, time and again.

We would also like to thank the entire Computer Engineering Department for allowing us

spend endless hours in the lab and to bend the rules on a few occasions.

And last but not the least; we would like to thank our parents for supporting us and putting up

with our late night project sessions.

Dhyanesh Damania

Sudarshan Gopinath

Rajesh Ture

Nilesh Karia

ii

ABSTRACT

A video clip is essentially a series of still images arranged back to back. Therefore problems

of storage size faced by still images get multiplied when applied to video clips. This makes it

necessary to use some kind of compression technique to bring the storage requirements of a

video file to more acceptable levels.

One of the best methods for compressing video files is Motion Estimation and

Compensation. This method is based on the fact that very little motion occurs between two

consecutive frames of a video. Hence, in most cases it is sufficient to transmit only the

differences between the frames and a few reference frames. However, Motion Estimation

works best for scenes with a reasonable amount of motion. In general, video with heavy

motion such as sports video is hard to compress with Motion Estimation and Compensation.

This project “Scalability in Heterogeneous Environment (Video Compression)” involves the

use of two such Motion Estimation algorithms in the implementation of an AVI file

compression utility, namely the Full Search algorithm and the Logarithmic Search algorithm.

While the best algorithm with respect to accuracy is the Full Search method, its extremely

high complexity necessitates the design of faster Motion Estimation algorithms. The

Logarithmic Search method is a scaled down version of the Full Search algorithm and is less

computationally expensive. However, the increase in performance is obtained at the cost of

accuracy in the estimation process.

Performance evaluation carried out on both the algorithms show that the decreased accuracy

of the Logarithmic Search method if kept within limits does not have a significant impact on

the quality of the reconstructed video file. And hence for practical considerations the

Logarithmic Search is preferred over the Full Search algorithm.

The AVI file compression utility implemented in this project allows the use of either of the

two methods to achieve significant compression. The utility also has a built in feature that

allows the AVI file to be split into its component Bitmap format frames.

SCALABILITY IN HETEROGENEOUS ENVIRONMENT(VIDEO COMPRESSION)

1

1. INTRODUCTION


2

1.1 INTRODUCTION TO THE PROJECT

We live in a world of colour. The recent surge of high bandwidth communication has made

the possibility of supplementing audio transmission with video a reality. A plethora of

applications and services ranging from online video conferencing to streaming video on-

demand are possible.

The human eye has the property that when an image appears on the retina, the image is

retained for a few milliseconds before decaying. If a sequence of images is presented to the

human eye at 50 images per second, the eye does not notice that it is looking at distant

images. All video systems exploit this principle to produce moving pictures. Hence, a video

is essentially a representation of a two dimensional image as a function of time. The simplest

representation of a digital video is a sequence of frames, each consisting of a rectangular grid

of picture elements or Pixels.

Eight bits per pixel are commonly used to represent 256 gray levels. This scheme gives high

quality black and white video. For colour video, we use 8 bits for each of the RGB colours.

Using 24 bits per pixel we can obtain a colour range of about 16 million. To produce smooth

motion, digital video must display at least 25 frames per second. Smoothness of motion is

determined by the number of different images per second. The significance of these

parameters becomes significant for transmitting digital video over a network. Common

resolution configurations are 1024 X 768, 1280 X 960 and 1600 X 1200. Even the smallest of

these videos, with 24 bits per pixel and 25 frames per second, needs to be transmitted at 472

Mbps.

This obviously means that transmission of uncompressed video is completely out of question.

We therefore need compression algorithms to crunch the files to manageable limits, one for

compressing the data at the source and another for decompressing it at the destination;

namely encoding and decoding respectively. The encode/decode process need not be

invertible. That is, when compressing a file, transmitting it and then decompressing it, it is

usually acceptable to have the video signal after encoding and then decoding, slightly

different from the original.


3

The encoding process makes use of an inherent property of the adjacent frames themselves:

redundancy. There is very little difference between the positions of the object under

consideration between consecutive frames. Video compression can be achieved using motion

estimation processes, still image compression or a combination of both these techniques.

Fig: Methods of Video Compression

While still image compression (intraframe compression) applied to the individual frames of a

video clip produce drastic compression we need further compression when considering video

transmission. And Motion Estimation and Compensation algorithms (interframe

compression) enhance these compression ratios even further.

This project focuses on the implementation of the Motion Estimation and Compensation

algorithms for achieving Video file compression.


4

1.2 SOFTWARE REQUIREMENTS

GNU COMPILER COLLECTION (GCC):

GNU Compiler Collection contains compilers for C, C++, Objective-C, FORTRAN,

Java, and Ada, as well as libraries for these languages. It compiles and links code of

these languages to produce executable files.

Our project uses the C++ compiler and the libraries supported by GCC.

STANDARD TEMPLATE LIBRARY (STL):

The Standard Template Library, or STL, is a C++ library of container classes,

algorithms, and iterators; it provides many of the basic algorithms and data structures

of computer science. The STL is a generic library, meaning that its components are

heavily parameterized: almost every component in the STL is a template.

GNU IMAGE MANIPULATION PROGRAM (GIMP):

GIMP is used for viewing the intermediate results of the Motion Estimation and

Compensation process. Alternative picture viewing softwares capable of displaying

Device Independent Bitmaps and Run Length Encoded bitmaps can also be used.

THE GIMP TOOLKIT (GTK+):

GTK+ is a multi-platform toolkit for creating graphical user interfaces. GTK is

essentially an object oriented application programmer interface (API). GTK+ is free

software and part of the GNU Project. GTK+ is used to create a graphical user

interface for our project. We have used GTK+2.0 for our project.

GLADE:

Glade is a GUI designer for the Linux Platform based on the GTK+2.0 Library. Glade

provides drag and drop tools for design of the interface.


5

1.3 HARDWARE REQUIREMENTS –

PROCESSOR:

Minimum: Pentium 200MHz

Recommended: Pentium III 400 MHz or better

HARD DISK SPACE:

Minimum: 1.7 GB

Recommended: 2.1 GB

MEMORY:

Minimum: 64 MB RAM

Recommended: 128 MB RAM


6

1.4 AIM OF THE PROJECT –

The primary focus of this project is the design and implementation of an AVI video file

compression and decompression utility using C++ and based on the Linux platform. The

utility will make use of the principle of Motion Estimation and Compensation. The utility

will provide the user with a detailed analysis of each compression operation.

The compression will be performed by using either of the following Motion Estimation and

Compensation algorithms based on the user’s choice:

Full Search Method

3-Step Logarithmic Search Method

Performance evaluation of the two algorithms will also be performed with respect to

Compression Ratio, Computation Times and Perceptual Quality of the regenerated video file.

The performance evaluation process aims to bring about the strength and weaknesses of the

two algorithms when applied to video files exhibiting objects with different degrees of

motion.

AVI files with the following attributes will be considered as test cases:

Motion occurring only in the foreground with the background relatively unchanged.

(Example: Video of a television news reader)

Foreground relatively constant with a moving background.

(Example: Video of a bouncing ball)

A high motion video in which both the foreground and background are in motion.

(Example: any sports video like a motorbike race)


7

2. BRIEF EXPLANATION OF THE PROJECT


8

2.1 ORIGINAL SCOPE OF THE PROJECT

The project involves the design and implementation of an AVI video file compression and

decompression utility. The compression algorithms used in the utility are based on the

principle of Motion Estimation and Compensation only.

No still image compression algorithms are applied to the frames of the video sequence. Thus,

the utility exclusively uses temporal redundancy removal techniques to achieve compression.

No attempt is made to take into account the spatial redundancy present in the video sequence.

Also, the utility does not take into account audio interleaved AVI files but focuses on pure

video data AVI files.

Encoder –

The input to the encoder will be the sequence of frames that constitute a video file.

The output of the encoder will be a compressed format file (.mcp) file.

Decoder –

The input to the decoder will be the compressed .mcp format file.

The output of the decoder will be the sequence of regenerated frames of the video

file.

The user will be given a choice of using the following Motion Estimation Algorithms to

perform compression:

Full Search Algorithm

Logarithmic Search

At the end of the compression process statistical data regarding the compression ratio,

computation time etc. will be provided to the user.

The primary concern of the project is the comparison of the two algorithms based on various

performance criteria like compression ratio, computation time and perceptual quality.

Appropriate test files will be used to evaluate the algorithms.


9

2.2 EXPANDED SCOPE OF THE PROJECT

AVI FILE SUPPORT –

The project has been extended to support entire AVI files instead of individual video frames.

The following AVI file formats are accepted as input:

Raw AVI files (no inherent compression)

RLE encoded AVI files

Support for AVI files of the above types which use 8-bit and 24-bit colour representation has

also been designed into the utility.

Encoder –

The encoder now accepts as input AVI files, with the process of splitting the

file into frames taken up by the AVI splitter module.

Output of the encoder will be the compressed .mcp file.

Decoder –

The decoder accepts as input a .mcp file.

The output of the decoder will be a regenerated AVI file instead of individual

video frames.

AVI FILE SPLITTER –

An additional feature for splitting the source AVI file into its component frames has been

included into the utility. This module extracts the component frames from the AVI file and

stores them in the BMP format.

FRAME BROWSER –

By using the built-in Frame Browser, the user will have the option of being able to view the

frames at different stages of the compression / decompression process. The user will be able

to view the intermediary frames which include Motion Estimated and Motion Compensated

Images at the decoder end. This will eliminate the need for a third party image viewing

software.


10

2.3 FUNDAMENTALS OF MOTION ESTIMATION AND COMPENSATION

Motion Estimation and Compensation are the fundamental methods used to compress video

and are used in various codecs such as MPEG-1, 2, and 4, H.261, H.263, H.263+ etc. The

goal of Motion Estimation and Compensation is to exploit the temporal redundancy (i.e.

redundancy present between consecutive frames) within an image sequence for optimum

compression.

The process of computing changes among frames by establishing correspondence between

frames is referred to as temporal prediction with motion compensation. Motion compensation

is preceded by Motion Estimation, the process of finding the corresponding pixels between

the frames.

MOTION ESTIMATION

A way to exploit the motion of an object to achieve image compression is to partition the

image into blocks called Macroblocks. Given a reference picture and an N × M Macroblock

in a current picture, the objective of motion estimation is to determine the N × M block in the

reference picture that better matches the characteristics of the Macroblock in the current

picture. The process of matching is done according to a predefined criterion called Mean

Absolute Error.

In many cases this reference block will be the same block (no motion). In some cases this

reference block will be a different block (motion). To simplify the process, only the

translatory motion model is assumed for objects in the scene and thus a rectangular geometry

is sufficient.

In general, the coordinates (x, y) of the Macroblock are given by its left top corner.

Considering practical limitations, we restrict the search to find a match to [-p, p] search

region around the original location of our Macroblock in the current picture. This process of

finding a suitable match is illustrated in the following sequence of diagrams.


11

Fig 1 .Motion estimation process

(a) Current Picture (b) Reference Picture

Motion Vector (u,v)

(c) Reference picture with (d) Search region definition

Motion Vector calculation

Let (x + u, y + v) be the location of the best matching block in the reference picture. Our goal

is to determine the effective displacement of the matching block from its position in the

previous frame. This process is termed as calculation of a motion vector (MV).

Compression is achieved by sending or storing only the motion vector (and a possible small

error) instead of the pixel values for the entire block.

MOTION COMPENSATION

Motion Compensation follows Motion Estimation and is defined as the correction necessary

to compensate for errors introduced due to the Estimation Process. This is necessary as

Estimation is basically a process of approximation.

(x,y) (x,y)

-pp

p-p

N

M

N

M

Macroblock

p

-p

p

-p

+

+

_

_

BestMatch

(x,y) N

M

(x + u, y + v)

[-p, p] Search Region


12

If a temporal redundancy reduction processor employs motion compensation, then we can

express its output, the difference error as:

e (x, y, t) = I(x, y, t) – I (x – u, y – v, t - 1)

where –

I(x, y, t) are pixel values at spatial location (x, y) in frame (t) and I(x – u, y – v, t – 1) are

corresponding pixel values at spatial location (x – u, y – v) in frame (t-1). The output of the

motion estimator, the co-ordinates (u, v), defines the relative motion of a block from one

frame to another and is referred to as the “Motion Vector (MV)” for block at (x, y).

I(x – u, y – v, t – 1) is referred to as the motion compensated prediction of I(x, y, t) and e(x,

y, t) is the prediction of the residual for I(x, y, t).

The motion Vectors along with the compensated values are used for reconstructing the

frames at the receiver end.

THE MATCHING CRITERION –

Let the pixels of the Macroblock in the current frame be denoted as C (x + k, y + l) and the

pixels in the reference picture be denoted as R (x + i + k, y + j + l). We define a cost

function:

MAE (i , j) = 1 | C (x + k, y + l) - R (x + i + k, y + j + l) |

where i and j are defined in -p ≤ i ≤ p and -p ≤ j ≤ p

This is referred to as Mean Absolute Error (MAE) or Mean Absolute Difference (MAD)

criterion.

MN k = 0 l = 0

M -1 N -1


13

2.4 THE FULL SEARCH ALGORITHM

The Full Search algorithm is the most fundamental and most obvious method to perform the

process of Motion Estimation.

It is based on the principle of comparing each and every possible location within the

predefined search area for a possible match. In this algorithm to find the motion vector for

each Macroblock we have to compute MAE (i, j) at each location in the search space.

As we compute the MAE value at every location in the search area, the Estimation process

will be highly accurate. As a result the outcome of the Motion Compensation process will

consist of very few difference element values.

The main advantage of the full search method is that it guarantees finding the minimum

MAE value (it is an optimal Estimation algorithm). Hence, this algorithm is highly accurate

in finding the best possible match.

However, to compare each and every pixel for each Macroblock in the frame requires a

substantial amount of computing resources (i.e. the algorithm is computationally expensive).

Hence, in time critical applications such as sports broadcasting this method proves

insufficient in providing compression at high computation speeds.

Therefore, alternative methods have been developed to achieve sub-optimum performance at

significantly reduced complexity compared to full search methods.

COMPLEXITY OF THE ALGORITHM –

For a picture resolution of I × J and a picture rate of F pictures per second the overall

complexity is I J F (2p +1)² × MN × 3.

M N

where M × N is the frame size of the video clip and p indicates the dimensions of the search

area defined for finding a match.


14

2.5 THE LOGARITHMIC SEARCH ALGORITHM

The Logarithmic Search algorithm was proposed in order to bring the computation

requirements for Motion Estimation to more acceptable levels. The Logarithmic Search

algorithm is very similar to the binary search method. It focuses on the reduction in number

of pixel comparisons at the expense of some amount of error introduced in the reconstructed

clip.

The working of this algorithm is given below –

In the first step, [-p, p] search rectangle is divided into two areas: one inside a [-

p/2, p/2] rectangle and one outside it.

Furthermore, instead of searching the whole [-p/2, p/2] area, we only compute the

MAE function of nine locations: at (0, 0) and at the eight major points in the

perimeter of the [-p/2, p/2] area.

That is, if the distance between the points is d1, we compute the minimum MAE from

the MAE computed at (0, 0), (0, d1), (-d1, 0), (d1, 0), (d1, d1), (d1, -d1), (-d1, d1) and

(-d1, -d1). The distance d1 is given by:

d1 = 2

where k = log p

COMPLEXITY OF THE ALGORITHM –

Overall, this method examines 8k + 1 search locations and computes the MAE. Hence, for a

picture resolution of I × J and a picture rate of F pictures per second the overall complexity of

logarithmic search is - I J F × (8k + 1) × 3

k -1

2


15

3. DESIGN DETAILS


16

3.1 STRUCTURE OF ENCODER / DECODER

THE ENCODER:

AVI SPLITTER

AVI SOURCE FILE

MOTION ESTIMATION

MOTION COMPENSATION

MOTIONVECTORS (u,v)

DIFFERENCESIGNAL

COMPRESSED FILECREATION .mcp FILE

TO DECODER

KEY FRAME SUCCESIVE FRAMES

The encoder takes as input an AVI file which is then passed onto the AVI splitter block. The

AVI splitter block splits the file into its component frames which are in the Bitmap (.BMP)

format.

The Key Frames and Intermediate frames are fed to the Motion Compensation Block which

calculates the Motion Vectors. Both the Motion Vectors and the Key frames are written into

the compressed (.MCP) file format. The resulting .MCP file is the compressed form of the

source AVI file.


17

THE DECODER:

.mcp FILE

.mcp DATAEXTRACTOR

MOTION VECTORADDITION BLOCK

REGENERATEDFRAMES

AVI CONSTRUCTOR

REGENERATEDAVI FILE

KEY FRAMES+

MOTION VECTORS+

DIFFERENCE VALUES

The compressed .MCP file is obtained from the Encoder block. The file is then sent to the

.MCP Data Extractor block. This block is responsible for extracting the information about the

Key frames as well as the Motion Vectors encoded in the .mcp file.

Once the relevant data has been identified we can go about the process of regenerating the

original frames. The Motion Vectors and the respective Difference Values are added to the

Key frames to regenerate the frames.

Finally, the frames are sent to the AVI Constructor which combines them to produce the

regenerated AVI file.


18

3.2 COMPRESSED FILE HEADER FORMAT (.mcp FILE)

The Header part of the file consists of:

The first 4 bytes of the file are the letters “MEAC”. These letters are used for file

identification.

The next byte following indicate the size of Macroblock used for Motion Estimation.

The next byte indicates whether Motion Compensation is used or not.

After this the 56-byte AVI Header is stored for recreation of AVI at the decoder end.

Next the 56-byte AVI Stream Header is stored for recreation of AVI at the decoder

end.

After this the 40-byte Device Independent Bitmap (DIB) Header is stored.

The format of each frame stored is as follows –

Each frame can be stored completely as a Key Frame or it is stored in terms of Motion

Vectors and Motion Compensation data with respect to most recent Key Frame. Thus each

frame has format:

Key Frame header has 8 bytes which identify the presence of a key frame and also

indicate whether the Key Frame is Run Length Encoded or not. The first three bytes

are “KEY”. The next byte is set if the Key Frame is Run Length Encoded else it is

reset. The remaining 4 bytes of the header give the size of the Key Frame data.

Key Frame data is stored immediately after the Key Frame header.

Motion Vectors are stored as 2 bytes for each Macroblock. The number of Motion

Vectors can be obtained from Macroblock size and from the width & height of each

frame

Motion Compensation data is Run Length Encoded. The first 4 bytes indicate the size

of the data stored. The data is stored immediately after these 4 bytes.


19

SPLIT THE FILE ?

Yes

No

SELECTCOMPRESSION

METHOD

AVI SPLIT

FULLSEARCH LOG SEARCH

STOP

DISPLAYCOMPRESSION STATISTICS

PERFORM MOTION COMPENSATION

WRITE .mcp FILE

READ .avi HEADER

INPUT .avi SOURCE FILE

START

IS HEADERVAILD

No

Yes

DISPLAY"INVAILD FILE FORMAT"

3.3 PROCESS FLOWCHARTS

FLOWCHART FOR ENCODER:

1 2


20

AVI SPLIT

FRAME = 1

READ NEXT FRAMEFROM .avi FILE

WRITE FRAMETO .bmp FILE

FRAME = FRAME + 1

IS FRAME <=

TOTAL FRAMES

Yes

No

STOP

PROCEDURE AVISPLIT:


21

Y = 0

X = 0

B

C

Y = 0

IS X - P < 0

Yes No

Yes NoIS Y - P < 0

XMAX = HORMAX

XMIN = X - P

Yes No

XMAX = X + P

Yes No

YMIN = Y - PYMIN = 0

XMIN = 0

YMAX = VERMAX YMAX = Y + P

A

IS X + P >= HORMAX

IS Y + P >= VERMAX

FULLSEARCH

PROCEDURE FULLSEARCH:


22

CALCULATE MAE AT POINT (I,J)

IS MAE < minMAEYes

minMAE = MAEMOTION VECTOR = (I,J)No

minMAE = 999999

A

IS X < HORMAXBYes

Y = Y + 1

No

IS Y < VERMAX

STOP

No

CYes

X = X + 1

I = I + 1

Yes

No

J = J + 1

Yes

No

J = YMIN

I = XMIN

IS I < XMAX

IS J < YMAX


23

LOGSEARCH

Y = 0

X = 0

dist = p / 2

CALCULATE MAE ATPOINT (X,Y)

I = 0

CALCULATE MAE AT POSITIONS:(X- dist,y-dist)(X+dist,Y-dist)

(X+dist,Y)(X-dist,Y)(X+dist,Y)(X-dist,Y)

(X-dist,Y+dist)(X,Y+dist)

(X+dist,Y+dist)

FIND MINIMUM MAE AND STORE IN MINMAE

STORE POSITION OF MINMAEAND SET THIS POSITION AS (X,Y)

I = I + 1

IS I < STEPS

Yes MOTION VECTOR = (X,Y)

ANo

B

C

PROCEDURE LOGSEARCH:


24

A

X = X + 1

IS X < HORMAXBYes

Y = Y + 1

No

IS Y < VERMAX

STOP

No

CYes


25

START

IS HEADERVAILD

No

Yes

DISPLAY"INVAILD FILE FORMAT"

READ .mcp HEADER

INPUT .mcp SOURCE FILE

REGENERATEFRAMES

WRITE.avi FILE

STOP

FLOWCHART FOR DECODER:


26

PROCEDURE REGENERATE FRAMES

ADD MOTION VECTORTO KEY FRAME

ADD MOTION VECTORTO KEY FRAME

ADD COMPENSATIONERROR VALUE

STOP

READ KEY FRAMES,MOTION VECTORS

REGENERATE FRAMES


27

3.4 CLASS DIAGRAMS


28


29


30


31


32

3.5 INTERFACE DESIGN

SNAPSHOT OF ENCODER (COMPRESSION) WINDOW:


33

SNAPSHOT OF DECODER (DECOMPRESSION) WINDOW:


34

SNAPSHOT OF COMPRESSION STATISTICS WINDOW:

SNAPSHOT OF FRAME BROWSER WINDOW:


35

4. TESTING RESULTS AND CONCLUSIONS


36

The testing process in the case of this project aims to bring out the relative strengths and

weaknesses of the compression utility. In this case, the performance of the utility has been

tested in three different scenarios: a best case, a worst case and a general case.

4.1 TEST PLATFORM

The test platform was chosen with a view of using a system with an average configuration.

All tests were carried out on the following platform –

PROCESSOR Intel Pentium III 500 MHz

SYSTEM MEMORY 128 MB

OPERATING SYSTEM Redhat Linux 9.0

4.2 TEST CRITERIA AND VARIABLES

Conclusions have been drawn based upon the performance of the two algorithms on the basis

of the following parameters:

Complexity (Mega Operations per Second)

Compression Ratio – The ratio of reduction in size of file to the size of original file.

Computation Time – The amount of processor time utilized for a particular operation

Perceptual quality of the regenerated file.

Each of the three test cases is provided as input to the utility. The effect of varying the design

parameters like Macro Block size, Key Frame Interval, Search Region size (p) on the

performance of the two algorithms are also recorded.


37

4.3 SELECTION OF TEST CASES

TEST CASE 1:

Filename clockuc.avi

Size 1,236KB

RLE Compressed Frames No

Bits per pixel 8

Dimensions of Frame 321 x 321

Frames per second 1

Type of Video Object undergoing motion, background is constant

This test file represents the ideal case of input that could be presented to the utility. In this

case the object in the foreground is undergoing motion while the background remains

absolutely constant. The AVI file is in the raw format (non-RLE encoded) and operates at a

very low rate of 1 frame per second.

TEST CASE 2:

Filename gyrobotuc.avi

Size 10,694KB


Bits per pixel 24


Frames per second 25

Type of VideoObject is constant, background is under going motion

This test file represents a more practical case of AVI file the compression utility is likely to

encounter. The object in the foreground is relatively constant while the background is in

motion. The file has a frame rate of 25 with a 24-bit Bitmap representation.


38

TEST CASE 3:

Filename mirranew.avi

Size 63,730KB


Bits per pixel 24


Frames per second 15

Type of VideoBoth object and background are undergoing motion

This file represents the worst case scenario for the utility. The file depicts a series of high

speed stunts being carried out on a bicycle. As a result both the background and the

foreground are in motion simultaneously. Also the AVI runs at a rate of 15 frames per second

with a 24-bit colour representation.

In this case, the test file violates the basic assumption of Motion Estimation algorithms that

only translatory motion model is assumed for objects in the scene and thus a rectangular

geometry is sufficient.


39

4.4 TEST RESULTS

MOTION ESTIMATION ONLY (NO COMPENSATION)

A. COMPLEXITY:

TEST CASE 1: clockuc.avi

ParametersComplexity

(Mega Operations per second)

Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 69.5 7.7

4 3 7 69.5 7.7

8 5 7 69.5 7.7

8 3 15 297.1 10.2

TEST CASE 2: gyrobotuc.avi



Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 324 36

4 3 7 324 36

8 5 7 324 36

8 3 15 1383.8 47.5


40

TEST CASE 3: mirranew.avi



Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 777.6 86.4

4 3 7 777.6 86.4

8 5 7 777.6 86.4

8 3 15 3321.2 114.0


41

B. COMPRESSION RATIO:


Input File Size: 1,236KB

Parameters Full search Logarithmic Search

Macro

Block

size

Key

Frame

Interval

Search

Region

size (p)

Output File

Size

Compression

Ratio

Output File

Size

Compression

Ratio

8 3 7 42KB 96.59% 42KB 96.59%

4 3 7 117KB 90.52% 117KB 90.52%

8 5 7 42KB 96.64% 42KB 96.64%

8 3 15 42KB 96.59% 42KB 96.59%




Macro

Block

size

Key

Frame

Interval

Search

Region

size (p)

Output File

Size

Compression

Ratio

Output File

Size

Compression

Ratio

8 3 7 3,674KB 65.64% 3,674KB 65.64%

4 3 7 3,896KB 63.57% 3,896KB 63.57%

8 5 7 2,226KB 79.18% 2,226KB 79.18%

8 3 15 3,674KB 65.64% 3,674KB 65.64%


42




Macro

Block

size

Key

Frame

Interval

Search

Region

size (p)

Output File

Size

Compression

Ratio

Output File

Size

Compression

Ratio

8 3 7 14,630KB 77.05% 14,630KB 77.05%

4 3 7 15,920KB 75.02% 15,920KB 75.02%

8 5 7 9,062KB 85.78% 9,062KB 85.78%

8 3 15 14,630KB 77.05% 14,630KB 77.05%


43

C. COMPUTATION TIME


Parameters Computation Time

Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 1.25s 0.26s

4 3 7 0.89s 0.31s

8 5 7 1.41s 0.28s

8 3 15 4.33s 0.29s

TEST CASE 2:gyrobotuc.avi


Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 15.03s 3.06s

4 3 7 21.64s 3.76s

8 5 7 17.98s 3.48s

8 3 15 29.03s 3.47s



Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 104.43s 18.89s

4 3 7 121.96s 21.37s

8 5 7 123.89s 21.20s

8 3 15 316.21s 22.37s


44

MOTION ESTIMATION AND MOTION COMPENSATION

A. COMPLEXITY:




Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 69.5 7.7

4 3 7 69.5 7.7

8 5 7 69.5 7.7

8 3 15 297.1 10.2

TEST CASE 2:gyrobotuc.avi



Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 324 36

4 3 7 324 36

8 5 7 324 36

8 3 15 1383.8 47.5


45




Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 777.6 86.4

4 3 7 777.6 86.4

8 5 7 777.6 86.4

8 3 15 3321.2 114.0


46

B. COMPRESSION RATIO:




Macro

Block

size

Key

Frame

Interval

Search

Region

size (p)

Output File

Size

Compression

Ratio

Output File

Size

Compression

Ratio

8 3 7 76KB 93.85% 78KB 93.67%

4 3 7 145KB 88.27% 149KB 87.92%

8 5 7 82KB 93.39% 83KB 93.26%

8 3 15 75KB 93.95% 79KB 93.60%




Macro

Block

size

Key

Frame

Interval

Search

Region

size (p)

Output File

Size

Compression

Ratio

Output File

Size

Compression

Ratio

8 3 7 6,401KB 40.14% 6,289KB 41.19%

4 3 7 6,986KB 34.67% 3,892KB 35.83%

8 5 7 5,621KB 47.44% 5,488KB 48.68%

8 3 15 5,636KB 47.30% 5,500KB 48.57%


47




Macro

Block

size

Key

Frame

Interval

Search

Region

size (p)

Output File

Size

Compression

Ratio

Output File

Size

Compression

Ratio

8 3 7 35,260KB 44.67% 35,150KB 44.85%

4 3 7 35,960KB 43.57% 36,340KB 42.98%

8 5 7 33,990KB 46.67% 33,940KB 46.74%

8 3 15 35,220KB 44.73% 35,220KB 44.73%


48

C. COMPUTATION TIME



Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 1.81s 0.85s

4 3 7 1.50s 0.90s

8 5 7 2.05s 0.94s

8 3 15 4.97s 0.87s



Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 16.92s 4.78s

4 3 7 23.69s 5.62s

8 5 7 20.10s 5.60s

8 3 15 37.48s 6.08s



Macro Block

size

Key Frame

Interval

Search Region

size (p)Full search

Logarithmic

Search

8 3 7 116.37s 29.99s

4 3 7 135.27s 33.06s

8 5 7 139.31s 34.57s

8 3 15 332.58s 33.63s


49

Frame no. 243(KEYFRAME)

Frame no. 244

Frame no. 245

4.5 SAMPLE FRAMES

Following are some sample frames at different stages of the encoding process using the Full

Search method. The sample frames are part of the mirranew.avi test file with the variables at

their default values.

ORIGINAL FRAMES (AT THE ENCODER):


50

Frame no. 243 (KEYFRAME)

Frame no. 244

Frame no. 245

FRAMES AFTER ADDITION OF MOTION VECTORS (AT THE DECODER):

Frames 244 and 245 shown here have been generated from the KEYFRAME (frame no. 243)

by adding the corresponding Motion Vectors from the .mcp file given as input to the decoder.

Frame 243 + Motion Vectors = Frame 244, Frame 245


51

Frame no. 243 (KEYFRAME)

Frame no. 244

Frame no. 245

FRAMES AFTER MOTION COMPENSATION PROCESS:

Frames 244, and 245 shown here, have been regenerated from the KEYRAME (to which

Motion Vectors have already been added). The regeneration operation has been performed by

adding the difference values from the .mcp file.


52

4.6 CONCLUSIONS

Based on the test results recorded, we can derive the following conclusions about the

performance of the two algorithms:

The computational complexity of the Full Search algorithm is several orders of magnitude

greater than that of the Logarithmic Search algorithm. As the search region value increases,

the computational complexity of the Full Search algorithm increases drastically as compared

to the Logarithmic Search. For a search area of 7 to a search area of 15, the computational

complexity of the Full Search method as compared to the Logarithmic Search increases from

a factor of 9 to about 30.

Both the algorithms are evenly matched when considering their respective compression

ratios. From the test data, we can see that higher compression was achieved by the Full

Search method in the first test case whereas the ratio was better for the Logarithmic Search

method in case of test cases 2 and 3. This difference in compression ratios is largely

dependant on the nature of the source AVI file and in the given test cases it varies through a

range of +/- 1%. The compression ratio is also directly proportional to the Macroblock size

i.e. as the Macroblock size is increased the Compression Ratio increases but at the cost of

perceptual quality.

The computation time is a factor of Macroblock size and size of search area. The keyframe

value determines how many frames have to be encoded as motion vectors and hence as its

value increases, more Motion Vectors have to be calculated. As the Macroblock size

increases more number of comparisons needs to be made in the evaluation of the MAE value.

Similarly, the number of computations also increases when the search area increases due to

the same reason.

The major pitfall of the Full Search Method is highlighted on the basis of the computation

times. In all the test cases, the computation times for the Full Search method are extremely

high as compared to the Logarithmic Search Method. This makes the Full Search Algorithm

impracticable in time critical applications like broadcasting of live video.


53

Since the Full search method computes the MAE value for every Macroblock in the search

region, the perceptual quality of the Motion Estimated clip is far superior to that of the

Logarithmic Search Method. The errors introduced in the Motion Estimation stage are almost

completely removed in the Motion Compensation process producing the same perceptual

quality as the source file in both methods. Hence, if Motion Compensation is employed the

relative degradation in perceptual quality in the Logarithmic Search method becomes

insignificant.

Thus, although less accurate with respect to the Motion Estimation process, the Logarithmic

Search stands out as a superior method when evaluated of practical considerations like

Computational complexity and Computation time.


54

5. APPLICATIONS AND FUTURE SCOPE


55

5.1 APPLICATIONS

Video finds use in a number of applications. With video compression, an entire gamut of

services comes alive due to the reduced transmission rate. We enumerate a few of the many

applications possible:

REDUCED STORAGE NEEDS:

Our project reduces the file size of an AVI input by converting it into compressed video

(MCP) format. This file can be then stored on a device to save space. With reduced video

file sizes, it is possible to cram more amounts of video data on devices like CDs, digital

cameras and a host of other devices. When the file needs to be viewed, it can be supplied

as input to the utility which will then convert it into an AVI file suitable for playing on a

media player.

VIDEO ON DEMAND:

Video on demand can be compared to an online electronic rental store. The user selects

any one of a large number of videos available in the electronic video library. A central

server acts as the video repository. The files here are stored in compressed video format

(MCP). The server sets up a connection with the user computer using TCP/IP. The

contents of the video are sent over the network and simultaneously played on the user’s

computer. Since the video is compressed, the bandwidth requirements will be much less.

At the user’s end the data received from the server is given as input to our utility. The

decompression can be done on the fly and displayed to the user.

This technique will be largely beneficial not only to the consumer but also to the

electronic rental store. The consumer can be given an account which can be charged as

and when he decides to view a video. The video will not be stored on the user’s

computer, thus saving his storage needs. Also since the video is not on a device

accessible to the user, the extent of video piracy will decrease. This will monetarily

benefit the electronic rental store.


56

ONLINE VIDEO CONFERENCING:

One of the prime applications of our video compression utility will be online video

conferencing. This would make a simultaneous real-time video conversation between

numbers of individuals possible.

Web cams can be used to generate the video which will serve as the streaming input to

utility. The compressed videos will obviate the need for a higher requirement for

bandwidth and will make optimum use of the provided resources. We do not generally

associate conferences with lots of movement; hence it is safe to presume that there is very

little difference in motion between consecutive frames transmitted. This will render the

quality of pictures near perfect to the original images captured. Moreover, picture quality

is expendable in such cases, whereas jerks in video would certainly not be very desirable.

Bandwidth reduction will be the key criteria for such applications and hence video

compression will be utilized heavily.

TELEVISION BROADCASTING:

Our utility is perfectly suited to compress AVIs which have very little interframe motion.

This makes it particularly useful in the broadcast of reduced motion images such as news

bulletins.

News bulletins typically portray the news reader at the right of the picture with a small

screen behind flashing the headlines or news clips. If we consider this scenario, there is

very little interframe motion viz. the lip movement of the news reader and the motion of

the news clips in the background. This formulates an optimum case situation for the

video compression utility which offers the best utilization of the bandwidth available.

Live reporting of news events can be made possible at a fraction of the cost.

Moreover, due to the reduced bandwidth required for the compressed video, an optimal

usage of the available bandwidth is possible for incorporating other information. With

technologies like HDTV, digital video broadcasting will only increase in volume.

Compression of such video transmissions will be of utmost priority.


57

5.2 FUTURE SCOPE

The compression utility currently works only on audio-subdued AVI files. In addition to this

support has been provided only for the Raw-AVI and RLE Encoded AVI file formats.

Further efforts can be taken in the following directions to enhance the features of the utility

as well as achieve better results.

ADDITIONAL CODEC SUPPORT :

There are over 350 plus Codec formats available for the AVI file type. In addition to

supporting the current raw AVI and RLE AVI, the project can be extended to support

some of the other popular AVI Codec formats.

Support for other formats can be included by including additional functions for

extracting the header information included. Also, appropriate code can be written for

processing the Codec specific data stored in the file.

SUPPORT FOR SOUND:

The project can be extended to support compression and transmission of sound data

along with video for a more realistic approach to motion picture compression.

THE LIST “MOVI” CHUNK in the AVI header has the actual stream data that is the

picture and the sound which are differentiated by the four character codes indicating

the start of each type of chunk. The audio data contained in these data chunks does

not contain any information about its format (Refer Appendix-A for header format).

The audio data can be removed from the input file and compressed using certain

audio compression technique. This compressed data can then be appended to the

compressed file (the .mcp file). While decoding the compressed sound data should be

properly decompressed and appended to the recreated AVI file. Care must be taken

that the audio data is placed in its proper location specified in the AVI file format.


58

USE OF NON-RLE CODING METHODS:

The project currently employs RLE for compressing the difference signal obtained

after Motion Compensation. However, the use of other coding techniques like

Huffman Coding to perform the same operation will in most cases produce higher

compression levels.

The probabilistic nature of the difference signal data will allow the utility to achieve

higher compression ratios. Due to the accuracy of the Motion Estimation process the

difference signal will contain relatively little data (i.e. most of the difference values

are zero). As a result the use of the Huffman coding Method will substantially

increase the compression achieved.

However, the increased compression levels can be achieved only at the expense of

higher complexity and processing time than the RLE method. Thus, the adoption of

advanced coding techniques would be feasible if a compression ratio is the prime

necessity of the video coder.

DESIGN OF A HYBRID CODER :

This project involved the design and implementation of an Interframe Coder using the

principles of Motion Estimation and Compensation only. The project can be extended

to include an Intra-frame Coding method as a part of the Encoder.

For the process of intraframe coding, either the DCT or Wavelet transform can be

adopted. The DCT / Wavelet transform module can be integrated into the utility after

Motion Estimation / Compensation modules. Similarly, on the decoder side the

module for regenerating the DCT/Wavelet compressed frames must be included

before the addition of Motion Vectors to the Key frames is done.

The design of such a hybrid coder (interframe/intraframe) is the basis for current

video coding standards. It will lead to higher levels of compression than both

interframe and intraframe encoding performed individually.


59

HARDWARE BASED IMPLEMENTATION :

To improve the computation times drastically, a completely hardware based system

could be constructed for the Estimation and compensation process. This would

substantially reduce the computation times for the compression process.

Modern computer graphics hardware contains extremely powerful graphics

processing units (GPU). These GPUs are designed to perform a limited number of

operations on very large amounts of data. They typically have more than one

processing pipeline working in parallel with each other. They can in fact be thought

of as highly parallel Single Instruction Multiple Data (SIMD) type processors.

Current NVIDIA GeForce FX 5900 GPU performance peaks at 20 Gigaflops. This is

equivalent to a 10-GHz Pentium 4 processor. The latest generation of graphics

hardware also contains much more programmable GPU’s. The increasing

performance of the GPU’s can be harnessed to perform the Motion Estimation /

compensation process purely in terms of Hardware Embedded Code to achieve

extremely fast computations.

An appropriate algorithm flow for Full search algorithm using the GPU as a co-

processor for the CPU can be described as follows:

Firstly the two frames are downloaded as textures to the graphics hardware. These are

noted as Texture0 and Texture1 respectively. The current motion vector to be

checked is passed as a parameter to the GPU. The Motion vector for the two frames is

then generated using vertex and fragment programs. In this way if a frame needs to be

interpolated it will be interpolated on the GPU. This results in an image which is the

absolute value of the Motion vectors for the two frames. This image is then read back

to the CPU where a Mean absolute error measure for each block in the image is

generated. This is repeated for each motion vector in the candidate set. The motion

vector which yielded the smallest Mean absolute error for each block is chosen as the

motion vector for that block. (For further references check papers on motion

estimation and compensation using GPUs by Yang and Welch.)


60

APPENDIX – A

AVI FILE HEADER FORMAT


61

AVI FILES:

The Microsoft Audio/Video Interleaved (AVI) file format is a RIFF file specification used

with applications that capture, edit, and playback audio/video sequences. In general, AVI

files contain multiple streams of different types of data. Most AVI sequences will use both

audio and video streams. A simple variation for an AVI sequence uses video data and does

not require an audio stream. This section describes AVI files containing only audio and video

data.

This section covers the following topics:

The required chunks of an AVI file

The optional chunks of an AVI file

AVI RIFF FORM

AVI files use the AVI RIFF form. The AVI RIFF form is identified by the four-character

code “AVI”. All AVI files include two mandatory LIST chunks. These chunks define the

format of the streams and stream data. AVI files might also include an index chunk. This

optional chunk specifies the location of data chunks within the file. An AVI file with these

components has the following form:

RIFF ('AVI '

LIST ('hdrl'

.

.

)

LIST ('movi'

.

.

)

['idx1'<AVI Index>]

)

The LIST chunks and the index chunk are subchunks of the RIFF “AVI ” chunk. The “AVI ”

chunk identifies the file as an AVI RIFF file. The LIST “hdrl” chunk defines the format of


62

the data and is the first required list chunk. The LIST “movi” chunk contains the data for the

AVI sequence and is the second required list chunk. The “idx1” chunk is the optional index

chunk. AVI files must keep these three components in the proper sequence. The LIST “hdrl”

and LIST “movi” chunks use subchunks for their data. The following example shows the

AVI RIFF form expanded with the chunks needed to complete the LIST “hdrl” and LIST

“movi” chunks:

RIFF ('AVI '

LIST ('hdrl'

'avih'(<Main AVI Header>)

LIST ('strl'

'strh'(<Stream header>)

'strf'(<Stream format>)

'strd'(additional header data)

.

.

)

.

.

)LIST ('movi'

{SubChunk | LIST('rec '

SubChunk1

SubChunk2

.

.

)

.

.

}

.

.

) ['idx1'<AVIIndex>]

)


63

The following sections describe the chunks contained in the LIST “hdrl” and LIST “movi”

chunks as well as the “idx1” chunk.

THE MAIN AVI HEADER LIST

The AVI file header is identified with “avih” four-character code. The main header has the

following data structure defined for it:

typedef struct {

DWORD dwMicroSecPerFrame;

DWORD dwMaxBytesPerSec;

DWORD dwReserved1;

DWORD dwFlags;

DWORD dwTotalFrames;

DWORD dwInitialFrames;

DWORD dwStreams;

DWORD dwSuggestedBufferSize;

DWORD dwWidth;

DWORD dwHeight;

DWORD dwScale;

DWORD dwRate;

DWORD dwStart;

DWORD dwLength;

} MainAVIHeader;

The dwMicroSecPerFrame field specifies the period between video frames.

The dwMaxBytesPerSec field specifies the approximate maximum data rate of the file.

The dwFlags field contains any flags for the file. The following flags are defined:

AVIF_HASINDEX

Indicates the AVI file has an “idx1” chunk.

AVIF_MUSTUSEINDEX

Indicates the index should be used to determine the order of presentation of the data.


64

AVIF_ISINTERLEAVED

Indicates the AVI file is interleaved.

AVIF_WASCAPTUREFILE

Indicates the AVI file is a specially allocated file used for capturing real-time video.

AVIF_COPYRIGHTED

Indicates the AVI file contains copyrighted data.

The dwTotalFrames field of the main header specifies the total number of frames of data in

file.

The dwInitialFrames is used for interleaved files.

The dwStreams field specifies the number of streams in the file.

The dwSuggestedBufferSize field specifies the suggested buffer size for reading the file.

Generally, this size should be large enough to contain the largest chunk in the file.

The dwWidth and dwHeight fields specify the width and height of the AVI file in pixels.

The dwScale and dwRate fields are used to specify the general time scale that the file will

use.

The dwStart and dwLength fields specify the starting time of the AVI file and the length of

the file. The units are defined by dwRate and dwScale. The dwStart field is usually set to

zero.

THE STREAM HEADER (“STRL”) CHUNKS

The main header is followed by one or more “strl” chunks. Each “strl” chunk must contain a

stream header and stream format chunk. Stream header chunks are identified by the four-

character code “strh” and stream format chunks are identified with the four-character code

“strf”. In addition to the stream header and stream format chunks, the “strl” chunk might

also contain a stream data chunk. Stream data chunks are identified with the four-character

code “strd”.

The stream header has the following data structure defined for it:


65

typedef struct {

FOURCC fccType;

FOURCC fccHandler;

DWORD dwFlags;

DWORD dwReserved1;

DWORD dwInitialFrames;

DWORD dwScale;

DWORD dwRate;

DWORD dwStart;

DWORD dwLength;

DWORD dwSuggestedBufferSize;

DWORD dwQuality;

DWORD dwSampleSize;

} AVIStreamHeader;

The fccType field is set to “vids” if the stream it specifies contains video data. It is set to

“auds” if it contains audio data.

The fccHandler field contains a four-character code describing the installable compressor or

decompressor used with the data.

The dwFlags field contains any flags for the data stream. The

AVISF_DISABLED flag indicates that the stream data should be rendered only when

explicitly enabled by the user. The

AVISF_VIDEO_PALCHANGES flag indicates palette changes are embedded in the file.

The dwInitialFrames is used for interleaved files. If you are creating interleaved files,

specify the number of frames in the file prior to the initial frame of the AVI sequence in this

field.

The remaining fields describe the playback characteristics of the stream. These factors

include the playback rate (dwScale and dwRate), the starting time of the sequence

(dwStart), the length of the sequence (dwLength), the size of the playback buffer

(dwSuggestedBuffer), an indicator of the data quality (dwQuality), and sample size

(dwSampleSize).


66

THE LIST “MOVI” CHUNK

Following the header information is a LIST “movi” chunk that contains chunks of the actual

data in the streams; that is, the pictures and sounds themselves. The data chunks can reside

directly in the LIST “movi” chunk or they might be grouped into “rec ” chunks. Like any

RIFF chunk, the data chunks contain a four-character code to identify the chunk type. The

four-character code that identifies each chunk consists of the stream number and a two-

character code that defines the type of information encapsulated in the chunk. For example, a

waveform chunk is identified by a two-character code of “wb”. If a waveform chunk

corresponded to the second LIST “hdrl” stream description, it would have a four-character

code of “01wb”.

Since all the format information is in the header, the audio data contained in these data

chunks does not contain any information about its format. An audio data chunk has the

following format (the ## in the format represents the stream identifier):

WAVE Bytes '##wb'

BYTE abBytes[];

Video data can be compressed or uncompressed DIBs. An uncompressed DIB has BI_RGB

specified for the biCompression field in its associated BITMAPINFO structure. A

compressed DIB has a value other than BI_RGB specified in the biCompression field.

A data chunk for an uncompressed DIB contains RGB video data. These chunks are

identified with a two-character code of “db” (db is an abbreviation for DIB bits). Data chunks

for a compressed DIB are identified with a two-character code of “dc” (dc is an abbreviation

for DIB compressed). Neither data chunk will contain any header information about the

DIBs. The data chunk for an uncompressed DIB has the following form:

DIB Bits '##db'

BYTE abBits[];

The data chunk for a compressed DIB has the following form:

Compressed DIB '##dc'

BYTE abBits[];


67

Video data chunks can also define new palette entries used to update the palette during an

AVI sequence. These chunks are identified with a two-character code of “pc” (pc is an

abbreviation for palette change). The following data structure is defined palette information:

typedef struct {

BYTE bFirstEntry;

BYTE bNumEntries;

WORD wFlags;

PALETTEENTRY peNew;

} AVIPALCHANGE;

Fields

bFirstEntry

Specifies the first palette entry to change.

bNumEntries

Specifies the number of entries to change.

wFlags

Reserved. (This should be set to 0.)

peNew

Specifies an array of new palette entries.

The bFirstEntry field defines the first entry to change and the bNumEntries field specifies

the number of entries to change. The peNew field contains the new color entries.

If you include palette changes in a video stream, set the AVITF_VIDEO_PALCHANGES

flag in the dwFlags field of the stream header. This flag indicates that this video stream

contains palette changes and warns the playback software that it will need to animate the

palette.

THE “IDX1” CHUNK

AVI files can have an index chunk after the LIST “movi” chunk. The index chunk

essentially contains a list of the data chunks and their location in the file. Index chunks use

the four-character code “idx1”. The following data structure is defined for index entries:


68

typedef struct {

DWORD ckid;

DWORD dwFlags;

DWORD dwChunkOffset;

DWORD dwChunkLength;

} AVIINDEXENTRY;

The ckid, dwFlags, dwChunkOffset, and dwChunkLength entries are repeated in the AVI

file for each data chunk indexed. If the file is interleaved, the index will also have these

entries for each “rec” chunk. The “rec” entries should have the AVIIF_LIST flag set and the

list type in the ckid field. The ckid field identifies the data chunk. This field uses four-

character codes for identifying the chunk.

The dwFlags field specifies any flags for the data. The AVIIF_KEYFRAME flag indicates

key frames in the video sequence. Key frames do not need previous video information to be

decompressed. The AVIIF_NOTIME flag indicates a chunk does not affect the timing of a

video stream. The AVIF_LIST flag indicates the current chunk is a LIST chunk. Use the

ckid field to identify the type of LIST chunk.

The dwChunkOffset and dwChunkLength fields specify the position of the chunk and the

length of the chunk. The dwChunkOffset field specifies the position of the chunk in the file

relative to the 'movi' list. The dwChunkLength field specifies the length of the chunk

excluding the eight bytes for the RIFF header.

If you include an index in the RIFF file, set the AVIF_HASINDEX in the dwFlags field of

the AVI header. (This header is identified by “avih” chunk ID.) This flag indicates that the

file has an index.

OTHER DATA CHUNKS

If you need to align data in your AVI file you can add a “JUNK” chunk. The “JUNK” chunk

has the following form:

AVI Padding 'JUNK'

Byte data[]


69

APPENDIX – B

BITMAP FILE FORMAT


70

This topic describes the Bitmap-File Formats originally designed for the Windows Operating

System. Windows bitmap files are stored in a device-independent bitmap (DIB) format that

allows the system to display the bitmap on any type of display device. The term "device

independent" means that the bitmap specifies pixel color in a form independent of the

method used by a display to represent color. The default filename extension of a Windows

DIB file is .BMP.

BITMAP-FILE STRUCTURES

Each bitmap file contains a bitmap-file header, a bitmap-information header, a color table,

and an array of bytes that defines the bitmap bits. The file has the following form:

BITMAPFILEHEADER bmfh;

BITMAPINFOHEADER bmih;

RGBQUAD aColors[];

BYTE aBitmapBits[];

The bitmap-file header contains information about the type, size, and layout of a device-

independent bitmap file. The header is defined as a BITMAPFILEHEADER structure. The

bitmap-information header, defined as a BITMAPINFOHEADER structure, specifies the

dimensions, compression type, and color format for the bitmap.

The color table, defined as an array of RGBQUAD structures, contains as many elements as

there are colors in the bitmap. The color table is not present for bitmaps with 24 color bits

because each pixel is represented by 24-bit red-green-blue (RGB) values in the actual bitmap

data area. The colors in the table should appear in order of importance. This helps a display

driver render a bitmap on a device that cannot display as many colors as there are in the

bitmap. If the DIB is in Windows version 3.0 or later format, the driver can use the

biClrImportant member of the BITMAPINFOHEADER structure to determine which

colors are important.


71

The BITMAPINFO structure can be used to represent a combined bitmap-information

header and color table. The bitmap bits, immediately following the color table, consist of an

array of BYTE values representing consecutive rows, or "scan lines," of the bitmap. Each

scan line consists of consecutive bytes representing the pixels in the scan line, in left-to-right

order. The number of bytes representing a scan line depends on the color format and the

width, in pixels, of the bitmap. If necessary, a scan line must be zero-padded to end on a 32-

bit boundary. However, segment boundaries can appear anywhere in the bitmap. The scan

lines in the bitmap are stored from bottom up. This means that the first byte in the array

represents the pixels in the lower-left corner of the bitmap and the last byte represents the

pixels in the upper-right corner.

The biBitCount member of the BITMAPINFOHEADER structure determines the number

of bits that define each pixel and the maximum number of colors in the bitmap. These

members can have different values, the various values and numbers are as given below

1: Bitmap is monochrome and the color table contains two entries. Each

bit in the bitmap array represents a pixel. If the bit is clear, the pixel is

displayed with the color of the first entry in the color table. If the bit is

set, the pixel has the color of the second entry in the table.

4: Bitmap has a maximum of 16 colors. Each pixel in the bitmap is

represented by a 4-bit index into the color table. For example, if the first

byte in the bitmap is 0x1F, the byte represents two pixels. The first pixel

contains the color in the second table entry, and the second pixel contains

the color in the sixteenth table entry.

8: Bitmap has a maximum of 256 colors. Each pixel in the bitmap is

represented by a 1-byte index into the color table. For example, if the first

byte in the bitmap is 0x1F, the first pixel has the color of the

thirty-second table entry.


72

24: Bitmap has a maximum of 2^24 colors. The bmiColors (or bmciColors)

member is NULL, and each 3-byte sequence in the bitmap array represents the

relative intensities of red, green, and blue, respectively, for a pixel.

The biClrUsed member of the BITMAPINFOHEADER structure specifies the number of

color indexes in the color table actually used by the bitmap. If the biClrUsed member is set

to zero, the bitmap uses the maximum number of colors corresponding to the value of the

biBitCount member. An alternative form of bitmap file uses the BITMAPCOREINFO,

BITMAPCOREHEADER, and RGBTRIPLE structures.

BITMAP COMPRESSION

Windows versions 3.0 and later support run-length encoded (RLE) formats for compressing

bitmaps that use 4 bits per pixel and 8 bits per pixel. Compression reduces the disk and

memory storage required for a bitmap.

Compression of 8-Bits-per-Pixel Bitmaps:

When the biCompression member of the BITMAPINFOHEADER structure is set to

BI_RLE8, the DIB is compressed using a run-length encoded format for a

256-color bitmap. This format uses two modes: encoded mode and absolute mode.

Both modes can occur anywhere throughout a single bitmap.

Encoded Mode

A unit of information in encoded mode consists of two bytes. The first byte specifies the

number of consecutive pixels to be drawn using the color index contained in the second byte.

The first byte of the pair can be set to zero to indicate an escape that denotes the end of a line,

the end of the bitmap, or a delta. The interpretation of the escape depends on the value of the

second byte of the pair, which must be in the range 0x00 through 0x02. Following are the

meanings of the escape values that can be used in the second byte:


73

Second byte Meaning

0 End of line.

1 End of bitmap.

2 Delta. The two bytes following the escape contain unsigned values indicating

the horizontal and vertical offsets of the next pixel from the current position.

Absolute Mode

Absolute mode is signaled by the first byte in the pair being set to zero and

the second byte to a value between 0x03 and 0xFF. The second byte represents

the number of bytes that follow, each of which contains the color index of a

single pixel. Each run must be aligned on a word boundary. Following is an

example of an 8-bit RLE bitmap (the two-digit hexadecimal values in the

second column represent a color index for a single pixel):

Compressed data Expanded data

03 04 04 04 04

05 06 06 06 06 06 06

00 03 45 56 67 00 45 56 67

02 78 78 78

00 02 05 01 Move 5 right and 1 down

02 78 78 78

00 00 End of line

09 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E

00 01 End of RLE bitmap


74

BITMAPFILEHEADER (3.0)

typedef struct tagBITMAPFILEHEADER { /* bmfh */

UINT bfType;

DWORD bfSize;

UINT bfReserved1;

UINT bfReserved2;

DWORD bfOffBits;

} BITMAPFILEHEADER;

The BITMAPFILEHEADER structure contains information about the type, size, and layout

of a device-independent bitmap (DIB) file.

The bfType specifies the type of file. This member must be BM.

The bfSize specifies the size of the file, in bytes.

The bfReserved1 Reserved; must be set to zero.

The bfReserved2 Reserved; must be set to zero.

The bfOffBits specifies the byte offset from the BITMAPFILEHEADER structure to the

actual bitmap data in the file.

BITMAPINFO (3.0)

typedef struct tagBITMAPINFO { /* bmi */

BITMAPINFOHEADER bmiHeader;

RGBQUAD bmiColors[1];

} BITMAPINFO;

The BITMAPINFO structure fully defines the dimensions and color information for a

Windows 3.0 or later device-independent bitmap (DIB).

The bmiHeader specifies a BITMAPINFOHEADER structure that contains information

about the dimensions and color format of a DIB.


75

The bmiColors specifies an array of RGBQUAD structures that define the colors in the

bitmap.

BITMAPINFOHEADER (3.0)

typedef struct tagBITMAPINFOHEADER { /* bmih */

DWORD biSize;

LONG biWidth;

LONG biHeight;

WORD biPlanes;

WORD biBitCount;

DWORD biCompression;

DWORD biSizeImage;

LONG biXPelsPerMeter;

LONG biYPelsPerMeter;

DWORD biClrUsed;

DWORD biClrImportant;

} BITMAPINFOHEADER;

The BITMAPINFOHEADER structure contains information about the dimensions and

color format of a Windows 3.0 or later device-independent bitmap (DIB).

The biSize specifies the number of bytes required by the BITMAPINFOHEADER

structure.

The biWidth specifies the width of the bitmap, in pixels.

The biHeight specifies the height of the bitmap, in pixels.

The biplanes specifies the number of planes for the target device. This member must be set

to 1.

The biBitCount specifies the number of bits per pixel. This value must be 1, 4, 8, or 24.

The biCompression specifies the type of compression for a compressed bitmap.

The various values and their meaning are as follows:


76

BI_RGB: Specifies that the bitmap is not compressed.

BI_RLE8: Specifies a run-length encoded format for bitmaps with 8 bits per pixel. The

compression format is a 2-byte format consisting of a count byte followed by

a byte containing a color index. For more information, see the following

Comments section.

BI_RLE4 Specifies a run-length encoded format for bitmaps with 4 bits per pixel. The

compression format is a 2-byte format consisting of a count byte followed by

two word-length color indexes

The biSizeImage specifies the size, in bytes, of the image. It is valid to set this member to

zero if the bitmap is in the BI_RGB format.

The biXPelsPerMeter specifies the horizontal resolution, in pixels per meter, of the target

device for the bitmap. An application can use this value to select a bitmap from a resource

group that best matches the characteristics of the current device.

The biYPelsPerMeter specifies the vertical resolution, in pixels per meter, of the target

device for the bitmap.

The biClrUsed specifies the number of color indexes in the color table actually used by the

bitmap. If this value is zero, the bitmap uses the maximum number of colors corresponding

to the value of the biBitCount member. For more information on the maximum sizes of the

color table, see the description of the BITMAPINFO structure earlier in this topic.

If the biClrUsed member is nonzero, it specifies the actual number of colors that the

graphics engine or device driver will access if the biBitCount member is less than 24. If

biBitCount is set to 24, biClrUsed specifies the size of the reference color table used to

optimize performance of Windows color palettes. If the bitmap is a packed bitmap (that is, a

bitmap in which the bitmap array immediately follows the BITMAPINFO header and which

is referenced by a single pointer), the biClrUsed member must be set to zero or to the actual

size of the color table.


77

The biClrImportant specifies the number of color indexes that are considered important for

displaying the bitmap. If this value is zero, all colors are important.

RGBQUAD (3.0)

typedef struct tagRGBQUAD { /* rgbq */

BYTE rgbBlue;

BYTE rgbGreen;

BYTE rgbRed;

BYTE rgbReserved;

} RGBQUAD;

The RGBQUAD structure describes a color consisting of relative intensities of red, green,

and blue. The bmiColors member of the BITMAPINFO structure consists of an array of

RGBQUAD structures.

The rgbBlue specifies the intensity of blue in the color.

The rgbGreen specifies the intensity of green in the color.

The rgbRed specifies the intensity of red in the color.

The rgbReserved is not used; must be set to zero.

RGB (2.x)

COLORREF RGB(cRed, cGreen, cBlue)

BYTE cRed; /* red component of color */

BYTE cGreen; /* green component of color */

BYTE cBlue; /* blue component of color */

The RGB macro selects an RGB color based on the parameters supplied and the color

capabilities of the output device.

cRed Specifies the intensity of the red color field.

cGreen Specifies the intensity of the green color field.

cBlue Specifies the intensity of the blue color field.


78

REFERENCES

[1] Vasudev Bhaskaran and Konstantinos Konstantinides, Image and Video

Compression Standards, Algorithms and Architectures – 2nd Edition, Kluwer

Academic Publishers.

[2] Woods and Gonzalves, Digital Image Processing.

[3] Bjarne Stroustrup, The C++ Programming Language – 3rd Edition, Addison

Wesley.

[4] ANSI/ISO C++ Professional Programmer's Handbook, Macmillan Computer

Publishing.

[5] Binh Nguyen, Linux Filesystem Hierarchy, Version 0.63

[6] Machtelt Garrels, Introduction to Linux –A Hands on Guide, Version 1.0

WEBSITES:

http://www.fastgraph.com/help/avi_header_format.html

http://www.fastgraph.com/help/bmp_header_format.html

http://pvdtools.sourceforge.net/aviformat.txt

http://pvdtools.sourceforge.net/bmpformat.txt

http://msdn.microsoft.com/library/default.asp?url=/library/enus/directx9_c/directx/ht

m/avirifffilereference.asp

http://msdn.microsoft.com/library/default.asp?url=/library/enus/directx9_c/directx/ht

m/bmpfilereference.asp

http://www.cs.wayne.edu/~dil/research/mdc/

http://dmoz.org/Computers/Multimedia/MPEG/

http://www.apl.jhu.edu/Notes/Geckle/525759/

http://www.autosophy.com/videcomp.htm

http://www.mathtools.net/Applications/DSP/Image_and_Video_Processng

http://aanda.u-trasbg.fr:2002/articles/astro/full/1999/09/ds1667/node2.html

http://www.nd.edu/~rls/Research/Compression/

http://www.ee.ualberta.ca/~mandal/index-info/img+vid.html

http://www.wave-report.com/tutorials/VC.htm

http://www.apl.jhu.edu/Notes/Beser/525759/icpvf02lect5.pdf

project_report [pdf library]

Documents

project scalability

reconstructed video

video files

video ondemand

sports video

video clips

project report

logarithmic search algorithm