motion compensation based video coder

7/30/2019 Motion Compensation based Video Coder

http://slidepdf.com/reader/full/motion-compensation-based-video-coder 1/72



II

Information Engineering and Technology Faculty

German University in Cairo

Motion Compensation Based video

coder on a DSP board

Bachelor Thesis

Author: Mohamed Ismail Mohamed

Supervisor: Dr.Gamal Fahmy

Submission Date: 13 July, 2009



III

This is to certify that:

(i) The thesis comprises only my original work towards the Bachelor Degree

(ii) Due acknowledgement has been made in the text to all other material used

Mohamed Ismail Mohamed

13 July , 2009



IV

Abstract

The goal in video compression is to remove the redundancy in a video sequence while preserving its

fidelity. Video sequence experiences both temporal and spatial redundancies, „temporal‟ due to

correlation between consecutive frames in the sequence and spatial due to correlation between

neighboring elements inside each frame. Motion estimation/compensation is used to predict frame for the

issue of temporal redundancy, while transform coding as discrete cosine transform is used to remove

spatial redundancy in visual data. Consequently, encoder uses fewer bits allowing a more efficient

transmission and storage of the visual data.

This thesis has two major purposes: (1) to design a hybrid motion compensated discrete cosine transform

video coder based on the block matching algorithm. (2) is to investigate the effect of changing some of

video coding parameters and strategies on the reconstructed video‟s visual quality and also on the coder

complexity. Three video sequences were involved in the empirical part, videos contain different scenes

with various specifications to demonstrate the results and to highlight the difference between each of the

reconstructed sequence‟s quality to the change of coding parameters. Analysis showed that using a

smaller block of the frame to search for in the reference frame will always result in better quality,

however this will also require dividing the frame into more blocks and will be more complex. Results also

proved that searching in a larger region in the reference frame for a specific block, will give a better

chance finding the best matching block. Furthermore, results illustrated the exceptions for the non

improved quality for low motion videos when increasing the search region. In addition this thesis

explores three different search strategies and distinguishes between each one‟s performance. Finally,

testing the full coder took place, including the discrete cosine transform applied to frames that are

unpredicted to minimize their encoded bits and to notice the effect on the reconstructed sequences quality.



V

Dedication

To my Parents, Ismail Hafez and Safaa Moghazy

I am grateful to my supervisor Dr.Gamal Fahmy for all the support during the process.



VI

Contents

Chapter 1 Introduction ....................................................................................... 1

1.1 Importance of Video Compression .................................................................................................. 1

1.2 Objective .......................................................................................................................................... 2

1.3 Methodology .................................................................................................................................... 3

1.4 Thesis Organization ......................................................................................................................... 4

Chapter 2 Background ........................................................................................ 5

2.1 Digital Video .................................................................................................................................... 6

2.2 Objective Video Quality .................................................................................................................. 7

2.3 Color Spaces .................................................................................................................................... 7

2.3.1 RGB...................................................................................................................................... 8

2.3.2 YCbCr .................................................................................................................................. 8

2.4 Chroma Sub-Sampling ................................................................................................................... 10

2.5 Digital Video Formats and Applications......................................................................................... 11

Chapter 3 Video Compression Fundamentals ................................................ 13

3.1 Video Coding Standards ................................................................................................................ 14

3.2 MPEG-2 Coding Standard ............................................................................................................. 16

3.2.1 Group of Pictures............................................................................................................... 18

3.3 Motion Estimation and Compensation ........................................................................................... 19



VIII

List of Figure

Figure 1.1 A Typical Video Encoder ............................................................................................................... 4

Figure 2.1 Example for an image along with its RGB components .............................................................. 8

Figure 2.2 Example for an image along with its YCbCr components........................................................... 9

Figure 2.3 Chroma Subsampling different versions .................................................................................... 10

Figure 3.1 MPEG Group of pictures ............................................................................................................ 18

Figure 3.2 Video codec with prediction ...................................................................................................... 19

Figure 3.3 Video codec with motion estimation and compensation .......................................................... 20

Figure 4.1 Block matching process.............................................................................................................. 22

Figure 4.2 Full search ‘Raster’ and ‘Spiral’ algorithms ................................................................................ 24

Figure 4.3 Fast search ‘Logarithmic’ algorithm ........................................................................................... 26

Figure 5.1 2-D DCT performed on an 8x8 block of an image ..................................................................... 28

Figure 5.2 An image with the intensity map along with the compacted version ....................................... 29

Figure 5.3 Inverse DCT of Trees; (a) DCT(100%); (b) DCT(75%); (c) DCT(50%); (d) DCT(25%). ................... 30

Figure 6.1 Image for BF561 Hardware ........................................................................................................ 31

Figure 6.2 Connector Locations .................................................................................................................. 32

Figure 6.3 Visual DSP++ Release 5.0 ........................................................................................................... 34

Figure 6.4 Connection to Video In and Video Out devices ......................................................................... 36

Figure 7.1 PSNR for {p} and {B} predicted frames using „Logarithmic search‟ ....................................... 41

Figure 7.2 PSNR for {p} and {B} predicted frames using „Raster full search‟ .......................................... 42

Figure 7.3 PSNR for {p} and {B} predicted frames using „Spiral full search‟ ........................................... 42

http://c/Documents%20and%20Settings/Mohamed/Desktop/Thesis/Thesis.docx%23_Toc231198860































































IX

Figure 7.4 PSNR for predicted frames “Foreman video” using „Raster full search‟ different search

window size ............................................................................................................................. 43

Figure 7.5 PSNR for {p} and {B} predicted "Stephan video" frames using 'Logarithmic search‟ ............. 44

Figure 7.6 PSNR for {p} and {B} predicted "Stephan video" frames 'Full search algorithms‟.................. 44

Figure 7.7 PSNR for {p} and {B} predicted "Stephan video" frames for different search windows 'Raster

full search algorithm‟ ................................................................................................................ 45

Figure 7.8 PS NR for {p} and {B} predicted "Fish video" frames using 'Logarithmic fast search algorithm‟

.................................................................................................................................................. 46

Figure 7.9 PSNR for {P} and {B} predicted “Fish video” frames using „Raster full search algorithm‟ .... 46

Figure 7.10 PSNR for {p} and {B} predicted "Fish video" frames different search windows using 'Raster

full search algorithm‟ .............................................................................................................. 47

Figure 7.11 Foreman video predicted frames macroblock size 1 "Logarithmic search" ............................ 48

Figure 7.12 Foreman video predicted frames macroblock size 8 "Logarithmic search" ............................ 48

Figure 7.13 Foreman video predicted frames macroblock size 16 "Logarithmic search" .......................... 48

Figure 7.14 Stephan video predicted frames macroblock size 1 "Logarithmic search" ............................. 48

Figure 7.15 Stephan video predicted frames macroblock size 8 "Logarithmic search" ............................. 48

Figure 7.16 Stephan video predicted frames macroblock size 16 "Logarithmic search" ........................... 48

Figure 7.17 Fish video predicted frames macroblock size 1 "Logarithmic search" .................................... 48

Figure 7.18 Fish video predicted frames macroblock size 8 "Logarithmic search" .................................... 48

Figure 7.19 Fish video predicted frames macroblock size 16 "Logarithmic search" .................................. 48

Figure 7.20 Foreman video predicted frames macroblock size 1 "Raster search" ..................................... 48

Figure 7.21 Foreman video predicted frames macroblock size 8 "Raster search" ..................................... 48

Figure 7.22 Foreman video predicted frames macroblock size 16 "Raster search" ................................... 48

Figure 7.23 Stephan video predicted frames macroblock size 1 "Raster search" ...................................... 48

































































X

Figure 7.24 Stephan video predicted frames macroblock size 8 "Raster search" ...................................... 48

Figure 7.25 Stephan video predicted frames macroblock size 16 "Raster search" .................................... 48

Figure 7.26 Fish video predicted frames macroblock size 1 "Raster search" ............................................. 48

Figure 7.27 Fish video predicted frames macroblock size 8 "Raster search" ............................................. 48

Figure 7.28 Fish video predicted frames macroblock size 16 "Raster search" ........................................... 48

Figure 7.29 PSNR values for "Foreman video" with search window 7 using 'Raster full search' ............... 48

Figure 7.30 PSNR values for "Foreman video" with search window 15 using 'Raster full search' ............. 48

Figure 7.31 PSNR values for "Foreman video" with search window 25 using 'Raster full search' ............. 48

Figure 7.32 PSNR values for "Stephan video" with search window 7 using 'Raster full search'................. 48

Figure 7.33 PSNR values for "Stephan video" with search window 15 using 'Raster full search' .............. 48

Figure 7.34 PSNR values for "Stephan video" with search window 25 using 'Raster full search' .............. 48

Figure 7.35 PSNR values for "Fish video" with search window 7 using 'Raster full search' ...................... 48

Figure 7.36 PSNR values for "Fish video" with search window 15 using 'Raster full search' ..................... 48

Figure 7.37 PSNR values for "Fish video" with search window 25 using 'Raster full search' ..................... 48

Figure 7.38 PSNR for predicted frames using different 2D-DCT Compression Qualities ............................ 48

Figure 7.39 foreman video predicted frames "NO DCT" ............................................................................ 48

Figure 7.40 foreman video predicted frames "DCT 36:64"......................................................................... 48



Figure 7.43 foreman video predicted frames "DCT 1:64" ........................................................................... 48

Figure 8.1 Block diagram for Search window size decision after motion is detected ................................ 48


































































XI

List of Tables

Table 2.1 Video formats with each format specifications………...………………………………………13

Table 3.1 Digital video formats with no. of frames per second and bit rate ………...……………………15



XII

Abbreviations

ADSL Asymmetric Digital Subscriber Line

AVC Advanced Video Coding

B-frame Bi-directionally predicted frame

BDS Block Distortion Surface

BMA Block Matching Algorithm

CIF Common Intermediate Format

CMY Cyan, Magenta, Yellow

CMYK Cyan, Magenta, Yellow, Black

DCT Discrete Cosine Transform

DPCM Differential Pulse Code Modulation

DVD Digital Versatile Disk

GOP Group of Pictures

HDTV High Definition Television

I-frame Intra-coded frame

IDCT Inverse Discrete Cosine Transform

ISDN Integrated Services Digital Network

ISO International Organization for Standardization

ITU International Telecommunication Union

JPEG Joint Photographic Experts Group



XIII

MAE Mean Absolute Error

MC Motion Compensation

ME Motion Estimation

MSE Mean Squared Error

MPEG Moving Pictures Expert Group

NTSC National Television System Committee

P-frame Predictive frame

PAL Phase Alternating Line

PSNR Peak Signal to Noise Ratio

QCIF Quarter Common Intermediate Format

RGB Red, Green, Blue

SAE Sum of Absolute Errors

SIF Source Intermediate Format

SDTV Standard Definition Television

UMTS Universal Mobile Telecommunications System

VDSL Very High Speed Subscriber Line

YCbCr Luminance, Chrominance blue, Chrominance red



Chapter 1 Introduction

1

Chapter One

1. Introduction

1.1 Importance of video compression

Video communication is a rapidly evolving field for several applications which include video telephony,

videoconference, remote surveillance, remote working and learning, etc. It is also a key feature for the

upcoming information and communication technologies based on residential digital lines (VDSL, ADSL

and ISDN) and the 3rd generation of mobile telephony system (UMTS). In this scenario, video image

compression plays a fundamental role in reducing the enormous bit-rate for transmission and storage. For

example a high quality HDTV picture which has spatial resolution 1920 x 1080 square pixels and

digitized as 8-bit per pixel, its uncompressed bit rate is about 1.3905G bit/sec. Consider also the Common

Intermediate Format (CIF), the standard for video conferencing that has spatial resolution 352x288. At 30

picture per second video signal and 8 bits per pixel, the uncompressed bit rate is about 36.5M bit/sec.

Even for smaller format, the Quarter CIF (QCIF) the uncompressed bit rate is about 9.1M bit/sec. ISDN

channel for example has only 64k bit/sec, which means that without compression, it is impossible or non

realistic to transmit over network or store such high-volume video data[1] [2]. To this objective, the ISO

and the ITU-T committees have worked on several compression standards such as JPEG, MPEG



Chapter 1 Introduction

3

1.3 Methodology

A typical encoder shown in figure 1.1 has an input video signal as a sequence of pictures, first these

pictures are processed one by one, divided into equal sized non-overlapping rectangular blocks ‘Macro-

blocks’ of on average 16x16 pixel. Ideally the frame dimensions are multiples of the block size and

square blocks are most common. If the ‘frame’ is one that will be used as reference to other frames

„intraframe‟, then it will be coded without any reference to others and will pass through the transform

coding and quantization block and then transmitted to the receiver. Otherwise if it is an „interframe‟ then

it will pass through the motion estimation and compensation blocks, where block matching algorithms

take place to search the reference frame for the best match and specify its location to create motion

vectors to point to this location.

Block size affects the performance of compression techniques. The larger the block size, the fewer the

number of blocks for each frame, and hence fewer motion vectors need to be transmitted. However,

borders of moving objects do not normally coincide with the borders of blocks and so larger blocks

require more correction data to be transmitted. Small blocks result in a greater number of motion vectors,

but each matching block is more likely to closely match its target and so less correction data is required.

Thus block size represents a tradeoff between minimizing the number of motion vectors and maximizing

the quality of the matching blocks. The relationship between block size, image quality, and compression

ratio has been the subject of much research and is well understood. Also the searching region ‘Search

Window’ (i.e. Number of candidate blocks to search) in the reference frame is represents a tradeoff

between finding the best match, hence better quality and exhaustive computations and waste of time [5].



Chapter 2 Background

5

Chapter Two

2. Background

Video (In Latin: “I see”), is a sequence of images referred to as “frames” and the number of still pictures

per unit of time of video is called the frame rate , Obviously the increase in the frame rate comes with

increase in the observed video quality, many standards specify on average 25 to 30 frames/sec. The main

point is that the frame rate must exceed 15 frames per sec to achieve the illusion of moving image.

A visual scene is continuous both spatially and temporally. In order to represent and process a visual

scene digitally it is necessary to sample the real scene spatially (typically on a rectangular grid” frame” in

the video image plan) and also temporally (typically as a series of still frames sampled at regular intervals

of time). Each frame element is known as pixel is represented digitally as one or more numbers that

describe the brightness and color of the sample [6].




6

2.1 Digital Video

‘Digital video’ refers to the capturing, manipulation and storage of video in digital formats, obtaining

digital video is done using two way (1) Directly from Digital cameras. (2) Conversion of an analog video

signal using both “Sampling and Quantization”. Video in digital domain is characterized by more than

one property, or in other words is preferable compared to analog video; Digital video is less subjective to

noise, higher visual quality than analog, allows advanced editing and processing, allows repeated

reproduction without losses and finally the most important feature, it allows better compression and

encryption schemes. Before examining methods for compressing and transporting digital video, it is

necessary to establish the concepts for video in digital domain [7]. Digital video is visual information

represented in a discrete form, suitable for digital electronic storage or transmission. In this part concepts

of digital video will be described such as: Color spaces (RGB and YCrCb) and Measuring and qualifying

visual quality. Video frames are formed using tri-chromatic color mixing theory which states that any

color can be formed by mixing three primary colors (RED, GREEN, BLUE) with the right proportion,

Also that is the way color monitors works, by exciting primary color phosphors using separate electronic

guns. Reflecting sources “Secondary colors” are cyan, magenta, yellow (CMY) these colors are used to

operate the color printers, but sometimes black (K) is added to these colors the enhance quality of printing

which results in the (CMYK) model.




8

as color spaces. Two of the most common color spaces are: RGB (red/green/blue) and YCrCb

(luminance/red chrominance/blue chrominance).

2.3.1 RGB

In the red/green/blue color space, each pixel is represented by three numbers indicating the relative

proportion of red, green and blue. Because the three components have equal importance to the final color,

RGB systems usually represent each component with the same precision and therefore the same number

of bits. Using 8 bits per component is quite common: 3 × 8 = 24

are required to represent each

pixel. Figure (2.1), shows an RGB image, along with its separate R, G and B components; Note that the

white snow consists of strong red, green, and blue; the brown barn is composed of strong red and green

with little blue; the dark green grass consists of strong green with little red or blue; and the light blue sky

is composed of strong blue and moderately strong red and green [6] [8].

2.3.2 YCbCr

The human visual system is less sensitive to color than to luminance (brightness), however the RGB

system does not take advantage of this since the three colors are equally important and the luminance is

present in all the three color components. It is possible to represent the color image more efficiently by

Figure 2.1 Example for an image along with its RGB components




9

separating the luminance from the color information. A popular color space of this type is Y: Cb: Cr. Y is

the luminance component, a monochrome version of the color image. Y is a weighted average of the three

components R, G and B:

= + + (2.2)

Where are the weighting factors. The color information can be represented as color difference or

chrominance components, where each chrominance component is the difference between [R,G, B] and Y:

= − (2.3)

= − (2.4)

= − (2.5)

The complete description is given by Y and the three color difference Cr, Cb and Cg that represent the

variation in color intensity and the luminance of the image. And since the value of + + is a

constant, therefore only two of the three chrominance components should be transmitted. Figure (2.2),

shows a color image and the Y, Cb and Cr elements of it. Note that the Y image is essentially a greyscale

copy of the main image; that the white snow is represented as a middle value in both Cr and Cb, that the

brown barn is represented by weak Cb and strong Cr; that the green grass is represented by weak Cb and

weak Cr and that the blue sky is represented by strong Cb and weak Cr [6] [8].

Figure 2.2 Example for an image along with its YCbCr components




11

2.5 Digital Video Formats and Applications:

Many digital video formats are being used nowadays for example the CIF (Common intermediate format )

which has a size of 352x288 and is color sampled by the 4:2:0 technique, CIF uses 30 frames per second

and its raw data is 37 Mbps which can be compressed to about 128-384 Kbps, CIF is used for Video

conferencing over ISDN/internet. While QCIF is a quarter of CIF with size of 176x144 and also uses

4:2:0 color sampling and 30 frames per second, on the other hand its raw data is 9 Mbps and can be

compressed to about 64-128 Kbps and QCIF is used for Video telephony over wired/wireless modems.

The new H.263 video codec standard which is better than the H.261 and which can compress the QCIF to

about 20 Kbps with better quality than the H.261.

The SIF (Source Intermediate Format) size is 352x240 for the 30 frames per second technique and

352x288 for the 25 frames per second technique. And as well SIF uses 4:2:0, with a raw data of 30 Mbps.

This format is targeted for video applications which require medium quality such as video games and CD

movies. SIF is compressed using the MPEG-1 ( Motion Picture Expert Group) technique to 1.1 Mbps, SIF

is used for intermediate quality video distribution VCD.

Table 2.1 Video formats with each format specifications

Video format Size Color sampling Frame rate Raw data (Mbps)

SIF 352x240/288 4:2:0 30/25 fps 30

CIF 352x288 4:2:0 30 fps 36.5

QCIF 176x144 4:2:0 30 fps 9.1




12

The Last decade has seen a rapid increase in applications for digital video technology and new, innovative

applications continue to emerge, such as; Video Conferencing, video telephony, Remote learning, Remote

medicine, Games and entertainment [6] [8] [11].



13

Chapter 3 Video Compression Fundamentals

Chapter Three

3. Video Compression

Fundamentals

Video represented in a digital form requires large number of bits, volume of data for this representation is

too large for most of storage and transmission systems which exceeds the continual increase in storage

capacity and transmission bandwidth. Table (2) shows the uncompressed bit rates of several video

formats. From this table it can be seen that even the QCIF at 15 frames per second (Low quality video)

requires 4.6Mbps for transmission or storage.

Table 3.1 Digital video formats with no. of frames per second and bit rate

Format Frames per second Bit rate (uncompressed)

ITU-R 601 30 fps 216Mbps

CIF 30 fps 36.5Mbps

QCIF 15 fps 4.6Mbps



14


Now it is clear that there is a reason for presence of video compression, due to that large gap between

high bit rate for uncompressed video data and the available capacity of transmission and storage systems.

Video compression systems aim to reduce the amount of data required to store or transmit videos while

maintaining an acceptable level of video quality (described in part (2.2)) and also it is obvious that higher

compression will result in a greater loss of quality[6].

3.1 Video Coding Standards

Most of practical systems and standards for video compression are known to be „lossy‟, (The volume of

data is reduced at the expense of a loss of visual quality).There are several video coding standards as:

• H.261:

– First video coding standard, targeted for video conferencing over ISDN

– Uses block-based hybrid coding framework with integer-pel MC

• H.263:

– Improved quality at lower bit rate, to enable video conferencing/telephony below 54 kbps

– Half-pixel MC and other improvement

• MPEG-1 video

– Video on CD and video on the Internet (good quality at 1.5 mbps)

– Half-pixel MC and bidirectional MC

• MPEG-2 video

– SDTV/HDTV/DVD (4-15 mbps)

– Extended from MPEG-1, considering interlaced video



15


MPEG-4

– To enable object manipulation and scene composition at the decoder -> interactive TV/virtual reality

– Object-based video coding: new shape coding tools

– Coding of synthetic video and audio: animation tools

• MPEG-7

– To enable search and browsing of multimedia documents

– Defines the syntax for describing the structural and conceptual content

– To be covered later when discussing multimedia databases

These standards use several techniques such as:

DPCM (Differential Pulse Code Modulation)

Transform Coding

Predictive Coding

Model-based Coding

Predictive Coding or as known also “Motion-compensated Prediction”, the encoder forms a model of the

current frame based on the samples of a previously coded and transmitted frame. The encoder tries to

compensate the motion in a video sequence by moving and warping the samples of the previously

transmitted frame “reference” frame. The resulting predicted frame is subtracted from the current frame to

produce a residual “error” frame and always further coding follows motion-compensated prediction, e.g.

transform coding for the residual frame [12].



16


3.2 MPEG-2 Coding Standard

MPEG-2 is a video coding standard created by the Moving Picture Experts Group (MPEG). Now, it is

the standard format used for satellite TV, digital cable TV, DVD movies, and HDTV. In addition, MPEG-

2 is a commonly used format to distribute video files on the internet [12] [13].

MPEG-2 is an evolution of MPEG-1, an earlier MPEG coding standard. In fact, MPEG-2 decoder can decode an

MPEG-1 video. The additions to MPEG-2, therefore, are what make it a separate standard. The major additions are:

Support for higher resolution video

Support for interlaced video (as used on standard definition TV (SDTV))

Optimized for higher bit rates (typically 4 Mb/s and above, versus 1.5 Mb/s and below for MPEG-1)

Scalability via layered encoding to support a variety of quality levels/transmission bandwidths from one

coded source

MPEG-2 Compression:

Color Space: YCbCr

Chroma Sub-sampling: 4:2:0

http://www.chiariglione.org/mpeg/

http://www.chiariglione.org/mpeg/



17


Block based coding: MPEG-2 uses block based coding for motion estimation and compensation. This

means that a frame is not encoded as a whole; it is divided into many independently coded blocks. A

macroblock is 16x16 pixels and is a basic unit of MPEG-2 coding. However, each macroblock is further

divided into 8x8 pixelblocks

. This results in 6 blocks per macroblock.

Types of Frames:

1. I-frame: Intra-coded frame, coded independently of all the other frames in the

sequence, they are the most important frames in the sequence, used as reference to

other frames and can be compressed using only transform coding “DCT” giving

moderate compression performance.

2. P-frame: Predictively coded frame, coded based on previously coded frames that

precede that frame. The MPEG-2 standard dictates that the past frame must be an I or

P frame, but not a B frame. Coding is achieved using motion vectors. The basic idea is to

match each macroblock in the current frame with the corresponding area in the past reference

frame as closely as possible.

3. B-frame: Bi-directionally predicted frame, coded based on previously coded frames

that precede or succeed the current frame (I or P-frames) in temporal order of images

sequence. B-frame is simply a more general version of a P frame. Motion vectors can

refer not only to a past frame, but to a future frame, or both a past and future frame.



18


Using future frames is exactly like a P frame except for referencing the future. Using

past and future frames together works by averaging the predicted past macroblock

with the predicted future macroblock . The main advantage of the usage of B frames is

coding efficiency. In most cases, B frames will result in less bits being coded overall.

Backward prediction in this case allows the encoder to make more intelligent

decisions on how to encode the video within these areas. Also, since B frames are not

used to predict future frames, errors generated will not be propagated further within

the sequence. One disadvantage is that the frame reconstruction memory buffers

within the encoder and decoder must be doubled in size to accommodate the 2

reference frames. This is almost never an issue for the relatively expensive encoder;

another disadvantage is that there will necessarily be a delay throughout the system as

the frames are delivered out of order [6] [9] [12] [13].

3.2.1 Group of Pictures

An I-frame with all other frames before the next I-frame is referred to as group of pictures (GOP).There

are various possible GOP structures, such as the [IIIIII...] which uses no temporal prediction and need a

high bit rate. Second the [IBIBIB...] which uses less bit rate than the all I-frame structure, third the

[IBBPBBPB...] shown in figure (3.1) Which uses forward and bi-directional prediction and give the best

compression, but needs large decoder memory and finally the [IPPIPPIP...] with uses only forward

prediction and needs less decoder memory[12] [13].

Figure 3.1 MPEG Group of pictures



20


Frame difference gives better compression performance when successive frames are very similar, but

does not perform well if there is a significant change between the current and previous frames. Such

changes are usually due to movement in the video scene and a significantly better prediction can be

achieved estimating this movement and compensating for it. Figure (3.3) has shown a video codec

which has motion prediction [15]. Two new steps are required in the encoder:

1. Motion estimation: A region of current frame is compared with neighboring region of the

previous frame, motion estimator attempts to find the best match macroblock.

2.

Motion compensation: the best match macroblock from the reference frame is subtracted

from the current macroblock.

The decoder has the same motion compensation operation to reconstruct the current frame. This

means that the encoder has to transmit the coordinates (usually it is named motion vector) of the best

matching macroblock to the decoder [15].

Figure 3.3 Video codec with motion estimation and compensation



21

Chapter 4 Block Matching Algorithms

Chapter Four

4. Block Matching

Algorithm

In the popular video coding Standards (H.261, H.263, MPEG-1, MPEG-2 and MPEG-4), motion

estimation and compensation are carried out on non-overlapping small regions “Blocks” in the current

frame. Motion estimation on a complete block is known as block matching Algorithm (BMA).

For each block of a certain size in the current frame, the motion estimation algorithm searches a

neighboring area of the reference frame for a „matching‟ same block size area. The best one is the one

that minimizes the energy of the difference between the current and the matching block. The area in

which the search is carried out may be centered around the position of the current block, because (a) it

is likely to be a good match due to the high correlation between sub-sequent video frames and (b) it

would be computationally intensive to search the whole reference frame.



22


Figure (4.1) illustrates the block matching process,

the current „block‟ in this case is a (3x3) pixels,

which is compared to the same position in the

reference frame (5x5) and the immediate

neighboring positions ( +/−1 pixels in each

direction). The mean squared error (MSE) between

the current block and the same position in the

reference frame position (0,0) is given by the

equation in the figure to be 2.44, and also showing

the complete set of MSE values for each search position, Of the candidate positions available, (-1,1)

gives the smallest MSE and therefore the best match [13] [14].

A video encoder carries out this process for each block in the current frame using the following steps:

1. Calculate the energy of the difference between current block and a set of neighboring blocks

in the reference frame.

2. Select the block that gives the lowest error ( for example: “MSE”)

3. Subtract the matching bock from the current block producing the difference block.

4. Encode and transmit the difference block.

5. Encode and transmit a „motion vector‟ that indicates the position of the matching region,

relative to the current block position. (In the above example, the motion vector (-1, 1).

Steps 1 and 2 correspond to motion estimation and step 3 to motion compensation.

The Video decoder reconstructs the block as follows:

1. Decode the difference block and the motion vector.

2. Add the difference block to the matching region (pointed to by the motion vectors) in the

reference frame.

Figure 4.1 Block matching process



23


4.1 Block Matching Algorithm Comparison Criteria:

Mean squared error provides a measure of the energy remaining in the difference block and can

be calculated for (N x N) sample block as:

=1

2 − 2

−1

=0

−1

=0

(4.1)

Where C and R are the samples of the current and reference blocks and 00, 00 are the top-left

samples in the current and reference blocks.

Mean absolute error (MAE) provides a reasonable approximation of the remaining energy and is

much easier to be calculated than MSE, since it requires a magnitude calculation instead of a

squared calculation for each pair of samples as show in the equation:

=1

2 −

−1

=0

−1

=0

(4.2)

The comparison may be simplified further by removing the term 1/N2

and simply calculate the

sum of absolute errors (SAE) or sum of absolute differences (SAD):

= − −1

=0

−1

=0

(4.3)



24


Figure 4.2 Full search ‘Raster’ and ‘Spiral’ algorithms

4.2 Search Algorithm for Motion Estimation

In order to find the best matching region in the reference frame, theoretical caring out a

comparison of the current block with every possible candidate in the reference frame, which of

course is impractical because of the large number of comparisons required. In practice a good

match for the current block can usually be found in the immediate neighborhood of the block

position in the reference frame. Hence, in practice the search for a matching region is limited to a

“search widow”, which is centered on the current block position. Search window optimum size

depends on several factors (1) Resolution of each frame (Larger window for higher resolution),

(2) Type of scene (High motion scenes benefit from a larger search window) and finally (3) the

available processing resources as larger window would requires more comparisons and therefore

more processing.

4.2.1 Full Search Block Matching Algorithm

Figure 4.2 Full search ‘Raster’ and ‘Spiral’ algorithms



25


This type of search calculates the comparison criteria at each available position in the search

window, which is computationally intensive especially in large search windows. Raster Full

search motion estimation processes the locations starting from the top-left location as shown in

the figure (4.2) or in a spiral order starting from the position (0, 0) shown in figure (4.2) .The

spiral search order has an advantage over the raster when early termination algorithm are used

because the best match is most likely to be near the center of the search region. Due to the

intensive computations required by the full search, various fast algorithms have been developed ,

which trade off estimation accuracy for reduced computation [6] [12] [13] [14].

4.2.2 Fast Search Block Matching Algorithm

This type of algorithms aims to reduce the number of comparison operations compared to the full

search algorithm, for example; Logarithmic search, Three-step search, Cross search, On-at a time

search, Nearest Neighbors search and the Hierarchical search. Fast search will sample the only

some of the possible locations in the search region. The difference in results is that the difference

block contains more energy than that found by the full search and hence the number of coded bits

generated by the video encoder increase increasing the errors and therefore poorer compression

performance than the full search.

4.2.2.1 Logarithmic Search Strategy

The Logarithmic search is one of the popular techniques used which starts from the position

corresponding to zero displacement and each step tests five points in a diamond arrangement. In

the next step, the diamond search is repeated with its center shifted to the best matching point

resulting from the previous step, while not searching a candidate position if it is outside the search

window. The step size of the search (radius of the diamond) is reduced if the best matching point



26


is the center it‟s self or if it is on the maximum search border range. Otherwise the search step

stays the same. The Logarithmic search is typically accurate for large searching windows and it

returns fast and reasonable quality [6] [12] [13] [14].

Figure 4.3 Fast search ‘Logarithmic’ algorithm



27

Chapter 5 Transform Coding

Chapter Five

5. Transform Coding

Transform Coding is a main point for most of the video coding systems and standards. Spatial image data

(image samples or motion-compensated residual samples) are transformed into a different representation,

the reason is that spatial image data is difficult to compress, neighboring samples are highly correlated

and the energy is distributed across the image, which makes it difficult to discard data or even reduce the

precision of data without disturbing the image quality. This type of coding should compact the image

energy (concentrate the energy into a small number of significant values), decorrelate the data (so that

discarding insignificant data has minimal effect on the image quality) and it should be suitable for

practical implementation in software and hardware.



28


5.1 Two Dimensional Discrete Cosine Transform (2-D DCT)

The 2-D DCT version transforms a 2-D block

of samples into a block of coefficients. Figure

(5.1), shows a 720x572 pixel image then taken

an 8x8 block, the next step shows the block

samples values and finally the block is

transformed with 2-D DCT to produce the

coefficients shown in the last part.

The compaction and decorretation performance

of the DCT increases with the increase of block

size, but also computational complexity increases with the block size. A block size of 8x8 is commonly

used in image and video coding applications. This size gives a good compromise between compression

efficiency and computational efficiency. Equation (5.1), is used to calculate the forward DCT for an 8x8

block of image samples [16].

, =()

4 , cos2 + 1

16

7

=0

7

=0

cos2 + 116

(5.1)

The inverse DCT reconstructs a block of image samples from an array of DCT coefficients. The IDCT

takes as its input a block of 8x8 DCT coefficients , and reconstructs a block of 8x8 image samples , Equation (5.2).

, = ()4

, cos2 + 116

7

=0

7

=0

cos2 + 116

(5.2)

Figure 5.1Figure 5.1 2-D DCT performed on an 8x8 block of an image



29


Figure (5.2) shows the intensity map for a block

of image samples and next the 2-D DCT

coefficients, which shows that the energy in the

transformed coefficients is concentrated about

the top-left corner of the array of coefficients

“Compaction”. The top-left coefficients

correspond to low frequencies, where there is a

peak in energy in this area and the values

decrease to the bottom right of the array (higher

frequency coefficients)[17] .

5.2 Quantization

The function of the coder is to transmit the DCT block to the decoder, in a bit rate efficient manner, so

that it can perform the inverse transform to reconstruct the image. It has been observed that the numerical

precision of the DCT coefficients may be reduced while still maintaining good image quality at the

decoder. Quantization is used to reduce the number of possible values to be transmitted, reducing the

required number of bits. In practice, this results in the high-frequency coefficients being more quantized

than the low-frequency coefficients. Note that the quantization noise introduced by the coder is not

reversible in the decoder, making the coding and decoding process 'lossy'. At quality 50 (i.e. 84% zeros)

there is almost no visible loss in the image, but there is high compression. At lower quality levels, the

quality goes down by a lot but the compression does not increase that much [16] [17].

Intensity map

DCT coefficients

Figure 5.2 An image with the intensity map along with the compacted vers



30


This part shows that the DCT exploits interpixel

redundancies to render excellent decorrelation for most

natural images. Thus, all (uncorrelated) transform

coefficients can be encoded independently without

compromising coding efficiency. In addition, the DCT

packs energy in the low frequency regions. Therefore,

some of the high frequency content can be discarded

without significant quality degradation. Such a

quantization scheme causes further reduction in the

average number of bits per pixel. Lastly, it is concluded

that successive frames in a video transmission exhibit

high temporal correlation. This correlation can be

employed to improve coding efficiency [16] [17].

Figure 5.3 Inverse DCT of Trees; (a) DCT(100%); (b) DCT(75%

DCT(50%); (d) DCT(25%).



31

Chapter 6 Analog Devices Hardware & Software Experience

Chapter six

6. Analog Devices Hardware &

Software Experience

The EZ-KIT Lite includes an ADSP-BF561 Processor desktop

evaluation board along with an evaluation suite of the

VisualDSP++® development and debugging environment with

the C/C++ compiler, assembler, and linker. It also includes

sample processor application programs.

Figure 6.1 Image for BF561 Hardware



32


6.1 ADZS-BF561-EZLITE®

ADSP-BF561 Blackfin processor (600 MHz)

SDRAM: 64 MB

Flash memory: 8 MB

AD1836A – Analog Devices 96 kHz audio codec

4 input RCA phono jacks (2 channels)

6 output RCA phono jacks (3 channels)

ADV7183A video decoder w/ 3 input RCA phono jacks

ADV7179 video encoder w/ 3 output RCA phono jacks

Universal asynchronous receiver/transmitter (UART)

20 LEDs: 1 power (green), 1 board

reset (red), 1 USB (red), 16 general purpose (amber), and 1 USB monitor (amber)

5 push buttons with debounce logic: 1 reset, 4 programmable flags

Expansion interface

JTAG ICE 14-pin header

Figure 6.2 Connector Locations



33


6.2 VisualDSP++® Release 5.0

The ADSP-BF561 is supported with a complete set of CROSSCORE®† software and hardware

development tools, including Analog Devices emulators and the VisualDSP++®‡ development

environment. The same emulator hardware that supports other Analog Devices processors also fully

emulates the ADSP-BF561. The VisualDSP++ project management environment lets programmers

develop and debug an application. This environment includes an easy to use assembler that is based on an

algebraic syntax, an archiver (librarian/library builder), a linker, a loader, a cycle-accurate instruction-

level simulator, a C/C++ compiler, and a C/C++ runtime library that includes DSP and mathematical

functions. A key point for these tools is C/C++ code efficiency. The compiler has been developed for

efficient translation of C/C++ code to Blackfin assembly.

VisualDSP++ Features:

The Blackfin processor has architectural features that improve the efficiency of compiled C/C++ code.

The VisualDSP++ debugger has a number of important features. Data visualization is enhanced by a

plotting package that offers a significant level of flexibility. This graphical representation of user data

enables the programmer to quickly determine the performance of an algorithm. As algorithms grow in

complexity, this capability can have increasing significance on the designer‟s development schedule,

increasing productivity. Statistical profiling enables the programmer to nonintrusively poll the processor

as it is running the program. This feature, unique to VisualDSP++, enables the software developer to

passively gather important code execution metrics without interrupting the real-time characteristics of the

program. Essentially, the developer can identify bottlenecks in software quickly and efficiently. By using

the profiler, the programmer can focus on those areas in the program that impact performance and take

corrective action.



34


Debugging both C/C++ and assembly programs with the VisualDSP++ debugger, programmers can:

• View mixed C/C++ and assembly code (interleaved source and object information).

• Insert breakpoints.

• Set conditional breakpoints on registers, memory, and stacks.

• Trace instruction execution.

• Perform linear or statistical profiling of program execution.

• Fill, dump, and graphically plot the contents of memory.

• Perform source level debugging.

• Create custom debugger windows

Figure 6.3 Visual DSP++ Release 5.0



35


6.3 Implementation and Testing

Using the hardware ADSP-BF561 Processor desktop evaluation board along with VisualDSP++ software

to test their performance; Supplying the board with a PAL or NTSC video signal, then buffering the data

in SDRAM. The buffered video frame is then sent out to the video monitor. In this application, no

processing is done on buffered video frames. Connect the board to power supply, Pc with the USB cable

provided then follow these steps to test this application:

1. ADSP-BF561 EZ-KIT LITE SETTINGS

SW2: 1-OFF 2-OFF 3-OFF 4-OFF 5-OFF 6-ON

SW3: 1-OFF 2-ON 3-ON 4-OFF

SW4: 1-ON 2-ON 3-ON 4-ON 5-OFF 6-OFF

SW5: 1-OFF 2-ON 3-ON 4-ON

SW10: 1-OFF 2-OFF 3-OFF 4-OFF 5-OFF 6-OFF

SW11: 1-OFF 2-OFF 3-OFF 4-OFF

SW12: 1-ON 2-ON 3-ON 4-ON

SW13: 1-ON 2-ON



36


2. External connections

Connect a monitor to the EZ-Kit video-out connector and a video source to the EZ-Kit video-in.

The video connectors are the bank of 6 RCA-style jacks nearest the serial cable connector on the

EZ-Kit labeled as J6.

3. Operational Description

Open the "VideoInVideoOut.dpj" project in the VisualDSP++ Integrated Development

Environment.

Under the "Project" tab, select "Build Project" (program is then loaded automatically into

DSP).

Run the executables by pressing "multiprocessor run" (CTRL-F5) on the toolbar.

Halt the processor ("multiprocessor halt" button). If you open a memory window and go to

the addresses of sFrame0, 1, 2, 3, you see the video data of the four frames.

Figure 6.4 Connection to Video In and Video Out devices



37

Chapter 7 Experiment & Analysis

Chapter Seven

7. Experiments &

Analysis

In this part some of the MPEG-2 video compression standard properties and enhancements will be tested,

Using MATLAB® that is a high-level language and interactive environment that enables you to perform

computationally intensive tasks faster than with traditional programming languages. First steps is to load

a video into Matlab and to divide it into number of frames as mentioned before, and then comes the

important part which is to divide each one of these frames into same parts ”macroblocks” which is the

small element that will undergo each operation till the end.

7.1 Exact Procedure

[1] Use command „ fopen’ to load the video file, and then adjust the frame components as the given ratio

4:2:0 for luminance and chrominance ratios. Also calculate the new frame size with the luminance

and chrominance ratios and specify the new number of frames by dividing the file size by the new

frame size.



38


[2] The GOP used for this test is [ IBBPBBPBBI ]; therefore the next step is classifying each frame type

so that it‟s easy to call each frame through the process. Using „ fread’ command that reads the video

file data loaded to Matlab as binary format into matrices. Next using „ fseek’ to move between video

frames and classify them.

[3] In this step the P and B-frames should pass through the motion estimation and compensation part.

The main part in this step as mentioned before is to get the motion vectors for each frame along with

the difference frame, introducing the motion estimation function that takes as an input the current

frame that needed to be coded, the reference one, type of search , macroblock size and for sure the

search window size. Dividing the search into three branches; Raster, Spiral and logarithmic search:

Raster search function: This function will calculate first the number of macroblocks

within each frame and then move through these macroblocks within the fixed search

window in a raster way from the beginning of the search window, block by block to the

end to calculate the minimum difference macroblock and to get the motion vectors.

Spiral search function: Operates the same as the raster search function, the only

difference is that this function moves between the macroblocks in a spiral way starting

from the current frame macroblock current location which is the center of the search



39


window, that has a computational advantage because the best match is likely to occur

near the center of the search window.

Logarithmic search function: Searches for the best match block in a logarithmic way

that was mentioned in part (4.2.2.1), and also there is another difference as logarithmic

search is a fast search technique not like the raster and spiral full searches. It does take

account for the search window, it searches the whole frame for the best match, but it

takes another parameter as an input rather than the search window, which is the number

of steps in each move “N”.

[4] Second main part for the experiment is the motion compensation; taking as an input the calculated

motion vectors, macroblock size and the reference frame. It creates a new frame with the reference

frame size and first fills it with zeros, then divides the frame into non-overlapping macroblocks and

finally gets the matching macroblock the reference frame.

[5] Introducing the Peak-signal-to-noise-ratio function, which is the main point for all analysis and

measurements, this function takes two frames and calculates the PSNR, actually it takes the current

and the compensated frames to calculate the PSNR for the compensated frame, using Equation (2.1)

and gives the final value in „dB‟.

[6] Discrete cosine transform function is very useful for our analysis, which simply works on an

individual frame to calculate its 2-D DCT coefficients for each block 8x8 pixels and to construct the

DCT image with the preferred quantization weight of compression. First it reads the input image



40


using ‘imread’ command, then using ‘Double’ to get better precision for the loaded values of the

image. Next dividing the image into non-overlapping blocks of 8x8 to perform DCT on each block

using ‘dct2’ built-in function on each block. After that multiplying the outcome block by a block of

values which specify the quantization weight (i.e. “1:64”,”10:64”,”32:64”). Inverse DCT is done

easily, again divide the frame into block and perform ‘idct2’ function on each to get the reconstructed

image.

[7] Finally plotting figures and graphs that shows the real and the predicted frames, PSNR for different

frames and different searching types, Complexity for each operation and also DCT and IDCT with

different compression ratios figures.

7.2 Results for Different Schemes

In this part of the thesis, Variation of some parameters will take place and observing there effect on

quality. It is clear and mentioned before that minimizing the macroblock size will give better performance

for this technique of compression that could be easily observed from the outputs quality. Trying to

calculated the PSNR for the first GOP in this video for different macroblock sizes [ 1 , 8 , 16 ] and also

with different search algorithms [Logarithmic, Raster and Spiral search]. Before going through the test

steps, it is important to mention the “uncompressed” video file used and its specifications: The video file

is Foreman.yuv which is very popular in video processing and testing issues. The video was converted

from avi format to yuv format using the Windows Command Processor , this video is a QCIF with 30

frames per second and of resolution 176x144 and of 4:2:0 sub-sampling. Now using different

macroblocks size and keeping the search window constant of 25x25 macroblocks for full search

algorithms and observe the variation of quality for different type of searches.



41


Figure 7.1 PSNR for {p} and {B} predicted frames using „Logarithmic search‟

In the above graphs the predicted frames from one to eight between two I-frames are [ B{1} B{2} P{1}

B{3} B{4} P{2} B{5} B{6} ]. Also it is clear that the very small macroblock of 1x1 would result in a

high quality, especially with the full search that tries to find the best match from all the macroblocks. Also

putting into account that the search window is constant of ±7. Moving between the PSNR values, B{1}

and B{2} are of the best PSNR and this was predicted because they use the first “I” frame with first “P”

frame for motion estimation and compensation, Where I-frames are not compressed at all they are sent

with full details and the first P-frame is always not that bad because it is predicted from an I-frame,

therefore the first two Bs are of the highest PSNR values. Next mentioning the second P frame that has

the worst PSNR value because it was predicted from a predicted frame P{1}. After that the last two Bs

again rise to high quality as they got near to an I-frame.



42


Figure 7.2 PSNR for {p} and {B} predicted frames using „Raster full search‟

Figure 7.3 PSNR for {p} and {B} predicted frames using „Spiral full search‟



43


Now it is very clear and obvious that the performance of the two full search algorithms is higher than the

fast search algorithm “Logarithmic search”. Also it is important to mention that both “Raster and Spiral

full search algorithms” resulted in the same PSNR values as they work the same way by searching the

whole search window, but only differs in technique.

Figure 7.4 PSNR for predicted frames “Foreman video” using „Raster full search‟ different search window size

In Figure( 7.4), the PSNR values for the predicted frames with the change in the search window keeping a

constant macroblock size of 8x8 pixels, Using the “raster full search” it is obvious that increasing the

region of search will result in more precision and accuracy in finding the best match. Also Figure (7.4)

shows that all the predicted frame PSNR will change together as the macroblock size is constant, taking a

wise look at the figure, it can be concluded that the differ ence between search window “±7” and “±15”

is larger than the difference between “±15” and “±25”, as increasing the search window so much will

not give much better quality as the macoblock being searched for, will likely be near its original position

for most of the video scenes.



44


Testing another video file “Stephan”, which is important for our test as this video scenes contain a lot of

motion and variation between its frames in a tennis match, this video is a CIF with 30 frames per second

and of resolution 352x288 and of 4:2:0 sub-sampling. Testing for different macroblock size keeping a

constant search window of ±25 for full search algorithms and observe the variation of quality for

different type of searches. Also testing different search window size and keeping the macrobloack size

constant of 8x8.

Figure 7.5 PSNR for {p} and {B} predicted "Stephan video" frames using 'Logarithmic search‟

Figure 7.6 PSNR for {p} and {B} predicted "Stephan video" frames 'Full search algorithms‟



45


A noteworthy point out of the previous two graphs is that, for a macroblock size of 1x1 there is a huge

difference in the resulting PSNR between fast and full search algorithms, but the both macroblock sizes

8x8 and 16x16 the PSNR values are almost the same, which means that in a rapid motion scene “Tennis

match” using different search techniques with a large block size will not improve the quality and the only

way to increase the PSNR values is to minimize the macroblock size.

As more support for our compression technique, testing another video file “Fish”, which is also well

known for video processing and testing issues, this video is a CIF with 30 frames per second and of

resolution 352x288 and of 4:2:0 sub-sampling. Testing with different macroblock size and different

search algorithms, keeping a constant search window of ±25 for full search algorithms. Also testing

different search window size and keeping the macroblock size constant of 8x8.

Figure 7.7 PSNR for {p} and {B} predicted "Stephan video" frames for different search windows 'Raster full search algorithm‟



46


Figure 7.8 PSNR for {p} and {B} predicted "Fish video" frames using 'Logarithmic fast search algorithm‟

Figure 7.10Figure 7.9 PSNR for {P} and {B} predicted “Fish video” frames using „Raster full search algorithm‟



47


Figure 7.10 PSNR for {p} and {B} predicted "Fish video" frames different search windows using 'Raster full search algorithm‟

Fine looking to the above figure, it can be observed that the PSNR values for both search windows

[±15 ± 25] are the same, Since “Fish” video is a yellow fish moving along with the capturing

device, there is no need for a large window size and exhaustive searching between the block, as it is likely

to find the matching region very near to the current frame original position. Therefore increasing the

search window than ±15 will increase nothing to quality.



48


Figure 7.11 Foreman video predicted frames macroblock size 1 "Logarithmic search"



Figure 7.14 Stephan video predicted frames macroblock size 1 "Logarithmic search"





49


Figure 7.17 Fish video predicted frames macroblock size 1 "Logarithmic search"



Figure 7.20 Foreman video predicted frames macroblock size 1 "Raster search"





50


Figure 7.24 Stephan video predicted frames macroblock size 8 "Raster search"


Fi ure 7.26 Fish video redicted frames macroblock size 1 "Raster search"


Figure 7.28 Fish video predicted frames macroblock size 16 "Raster search"

Figure 7.27 Fish video predicted frames macroblock size 8 "Raster search"



51


Figure 7.29 PSNR values for "Foreman video" with search window 7 using 'Raster full search'



Figure 7.32 PSNR values for "Stephan video" with search window 7 using 'Raster full search'





53


Introducing “Transform coding” as two-dimensional discrete cosine transform to the compression coder,

passing the I-frames over the 2-D DCT for more compression as the I-frames or only coded without any

prediction, where there isn‟t any compression. Hence, 2-D DCT will effectively improve the compression

technique, testing the performance and the quality of the predicted frames using several DCT

compression qualities (i.e. Different Quantization compression ratios 1:64, 10:64, 21:64 and 36:64) on

video sequence “foreman” for a macroblock size of 8x8 and a search window ±7 for simplicity and

noticing the variation of the predicted frames.

Figure 7.38 PSNR for predicted frames using different 2D-DCT Compression Qualities



54


Figure 7.39 foreman video predicted frames "NO DCT"

Figure 7.40 foreman video predicted frames "DCT 36:64"






55

Chapter 8 Conclusion & Future Work

Chapter Eight

8. Conclusion &

Future Work

8.1 Conclusion

In this thesis, various techniques for motion estimation block matching algorithm were implemented and

tested, then the complete hybrid motion estimator and compensator with discrete cosine transform was

also tested. Results from the previous chapter conducted that; (1) Full search algorithms will always lead

to a better visual quality than fast search algorithms, but unfortunately with a significant increase in

execution time due to the search strategy complexity. (2) Smaller macroblock size will enhance the block

matching algorithm therefore will increase the probability of finding the best match block in the reference

frame which will result in better quality for all video sequences tested. (3) Increasing the Search window

will also give more flexibility for the algorithm to find the best matching block, trying different search

window sizes for various videos, it was obvious that in terms of PSNR the result was significantly

improved. As an important point to mention, Fish video gave the same results for both search windows

(15, 25) which means that increasing the search window more than a certain threshold for video with low



56


motion characteristics will increase nothing to quality and will increase execution time. On the other hand

for Stephan video the frames difference is high recognized which means that block (if they are the same

ones) will change their locations rapidly and therefore increasing the search window and searching in a

larger region will obviously give better quality. Next the part containing the DCT for the I-frames, we can

conclude that, DCT will increase the compression as I-frames are coded without any prediction, therefore

transform coding these frames will be an effective step for minimization of data. As the DCT is one of the

lossy types of compression, therefore it will show degradation in visual quality. Testing different

quantizer compression matrices will lead to different visual qualities as shown in part (7.2) as the

compression increases from 64:64 (i.e. no compression) till 1:64 (i.e. Highest compression) the quality

also inversely changes with the compression ratio.

8.2 Future work:

Techniques enhancing the Motion estimation block matching coder performance

Many empirical researches are being tested nowadays to:

1. Improve the predicted frames (decoded frames) quality.

2. Decrease the algorithm computational complexity.

3. Modify the coder to achieve efficient execution time.

4. Increase coding efficiency.

Describing some of the most obvious and novel improvements to our thesis, we concluded that changing

the search window for example for different videos yield different results and hence quality differs, there

are many experiments done to try to take

advantage of changing the search window with

the type of video, “Search Window Size

Decision‟, is an innovative topic enhancing this

Figure 8.1 Block diagram for Search window size decision after motion is detect



57


hybrid coder efficiency, where its main idea is applying a Motion Detection Algorithm for the decision of

the search window in the motion estimation, which will reduce the coder‟s complexity[20]. Another

innovative feature used in H.264/MPEG-4 AVC is the Block Size Selection Algorithm for inter-frame

coding, which will increase the encoder efficiency, but with insignificant degradation in the picture

quality. Results of the algorithm demonstrate a speed up in encoding time of up to 73% compared with

the H.264 benchmark. Block size is no longer fixed, but ranges from 4x4 to 16x16 for inter-frame coding

[21].

Since the performances of any of the mentioned algorithms highly depend on the characteristics of the

video contents, there is no single algorithm that can adapt to all kinds of video contents. A multiple stage

motion estimation scheme for video compression was proposed that tackles this issue, which is called

Content Adaptive Search Technique (CAST) and can provide adaptability to the video contents to

maximize the overall performance. CAST scheme consists of four stages; motion vector field prediction,

block-based segmentation, motion parameter extraction, and adaptive search strategy. Through pre-

processing the motion vector field of the previous reference frame in the first three stages, CAST extracts

the motion parameters for each region. The 4th

stage is a combination of various techniques including

motion vector prediction, search area decision and an adaptive fast search algorithm that is adjusted by a

mathematical model for the block distortion surface (BDS). CAST scheme improves the visual quality,

while yielding a faster speed, comparing with the other predictive ME algorithms [22].



58

References

[1] S.Dhahri, A.Zitouni, H. Chaouch, and R. Tourki, “ Adaptive Motion Estimator Based on Variable Block Size Scheme”, Proceedings of World Academy of Science, Engineering and Technology,

Volume: 38, February 2009.

[2] U-V.Koc, “Low Complexity and High Throughput Fully DCT-Based Motion Compensated Video

Coders”, National Science Foundation Engineering Research Center Program, University of Maryland, Harvard University, 1996

[3] Fayez M.Idris, “ An Algorithm and Architecture for Video Compression”, School of GraduatesStudies and Research, University of Ottawa, 1993.

[4] Lai Kam Cheong, “ Enhancing Techniques for a Standard Conforming Real-Time Video Codec”,

Department of Electronic and Information Engineering The Hang Kong Polytechnic University,September 2002

[5] Colin E.manning, “Motion Compensated Video Compression Overview”,http://www.newmediarepublic.com/dvideo/compression/adv08.html#blockmatching

[6] Iain E. G. Richardson, “VIDEO CODEC DESIGN: Developing Image and Video Compression

Systems”, Chichester : Wiley, 2002.

[7] John G. Proakis, Dimitris G. Manolakis, “ DIGITAL SIGNAL PROCESSING”, 4th edition,

Prentice Hall, 2007.

[8] Yao Wang, Jorn Ostermann, Ya-Qin Zhang, “VIDEO PROCESSING AND

COMMUNICATIONS ”, Prentice Hall, Upper Saddle River, NJ 07458, 2002.

[9] Dave Marshall, http://www.cs.cf.ac.uk/Dave/Multimedia/node256.html, April 2001.

[10] Dr. Leonardo Chiariglione, http://www.chiariglione.org/mpeg/index.asp, I-10040 Villar Dora,

ITALY.

[11] A. Zakhor, “ EECS 290T: Multimedia Signal Processing, Communications and Networking”,University of California at Berkeley department of Electrical Engineering & Computer Sciences,

Spring 2004, http://inst.eecs.berkeley.edu/~ee290t/sp04/

[12] K. R. Rao, Z. S. Bojkovic, D. A. Milovanovic, “ Multimedia Communication Systems:

Techniques, Standard and Networks”, Prentice Hall PTR, 2002.

http://www.cs.cf.ac.uk/Dave/Multimedia/node256.html



http://www.chiariglione.org/mpeg/index.asp


http://www-video.eecs.berkeley.edu/~avz



http://www.berkeley.edu/



http://www.eecs.berkeley.edu/

http://inst.eecs.berkeley.edu/~ee290t/sp04/




http://www.eecs.berkeley.edu/







[13] J.G. Apostolopoulos and S.J. Wee, ``Video Compression Standards'', Wiley Encyclopedia of

Electrical and Electronics Engineering, John Wiley & Sons, Inc., New York, 1999.

[14] V. Bhaskaranand K. Konstantinides, Image and Video Compression Standards: Algorithms and

Architectures, Boston, Massachusetts: KluwerAcademic Publishers, 1997.

[15] Yu- Nan Pan, “ A Fast Search Algorithm for Motion Estimation on H.264/AVC ”, Department of

Electrical Engineering National Central University, Jhongli 320, Taiwan, July 2004.

[16] Syed Ali Khayam,” The Discrete Cosine Transform (DCT): Theory and Application”, Department of Electrical & Computer Engineering, Michigan State University, March 2003.

[17] Ken Kabeen, Peter Gent, “ Image Compression and the Discrete Cosine Transform”, College of the Redwoods.

[18] Processor Development Tools http://www.autex.ru/dspa/dspa2008/04.pdf

[19] ADSP-BF561 EZ-KIT Lite® Evaluation System Manual, 2008 Analog Devices, Inc.,

http://www.analog.com/static/imported-files/eval_kit_manuals/ADSP-BF561%20EZ-

KIT%20LIte%20Manual%20Rev%203-2%20March%202008.pdf

[20] Gianluca Bailo, Massimo Bariani, Ivano Barbieri, Marco Raggio,” Search Window Size Decision

for Motion Estimation Algorithm in H.264 Video Coder ”, Department of Biophysical andElectronic Engineering, University of Geova, ITALY, 2004

[21] Hyungjoon Kim and Yucel Altunbasak,” Low-Complexity Macroblock Mode Selection for

H.264/AVC Encoders”, Center of Signal and Image processing, Georgia Institute of Technology,

Atlanta, 2004.

[22] Jiancong Luo, Ishfaq Ahmed, Yu Sun and Yongfang Liang,” A Multistage Fast Motion

Estimation Scheme for Video Compression”, Department of computer Science and Engineering,

University of Texas, Arlington, 2004.

http://www.autex.ru/dspa/dspa2008/04.pdf


http://www.analog.com/static/imported-files/eval_kit_manuals/ADSP-BF561%20EZ-KIT%20LIte%20Manual%20Rev%203-2%20March%202008.pdf