meandering based parallel 3drs algorithm for the multicore era ghiath al-kadi‡, jan...

39
Meandering Based Parallel 3DRS Algorithm for The Multicore Era Ghiath Al-kadi‡ , Jan Hoogerbrugge‡ , Surendra Guntur‡ , Andrei Terechko*, Marc Duranton‡ and Onno Eerenberg‡ ‡NXP Semiconductors, Eindhoven, the Netherlands. *Vector Fabrics, Eindhoven, the Netherlands th i th i th i This paper appears in: Consumer Electronics (ICCE), 2010 Digest of Tec hnical Papers International Conference on

Post on 19-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Meandering Based Parallel 3DRS Algorithm for The

Multicore Era

Ghiath Al-kadi‡ , Jan Hoogerbrugge‡ , Surendra Guntur‡ , Andrei Terechko*, Marc Duranton‡ and Onno Eerenberg‡

‡NXP Semiconductors, Eindhoven, the Netherlands. *Vector Fabrics, Eindhoven, the Netherlands

thithithi

This paper appears in:  Consumer Electronics (ICCE), 2010 Digest of Technical Papers International Conference on

I. INTRODUCTION True motion estimation 3DRS

II. THE SCALABLE MEANDERING BASED 3DRS

III. EVALUATION, RESULTS AND CONCLUSION

introduction true motion estimation --a method for finding

objects motion the motion vectors should represent true motion of the

objects in the video sequence

introduction For video compression applications it is enough to get a

motion vector corresponding to best match. This in turns results in lower residual energy and better compression.

Using traditional ME to find true motion vectors can only be estimated for blocks containing enough texture

difficulty of true motion estimation

When the video sequence is complex, especially having small objects and fast moving objects, motion vector is not easy to estimation

blocking artifact object movement result in

cover/uncover criterions

Main Application of true motion estimation

frame rate up-conversion(FRC) Add frame rate to 120 frames per second

is becoming increasingly necessary with the advent of advanced high resolution display technologies such as LCD and Plasma

Motion estimation is an integral part of FRC

The quality of the motion vector based interpolated

need true motion vector

How to find true motion vector

3-Dimensional Recursive Search(3DRS) algorithm is one of the most widely used methods to find true motion

The 3DRS algorithm is based on block matching and in order to find true motion the algorithm makes two assumptions (i) Objects are larger than a block of pixels; (ii)Objects have inertia.

3DRS

For all other blocks, we will have to rely on motion vector already estimated. construct a small set of candidate

vectors based on spatial relations Motion vector can be refined according

to the motion of neighboring blocks gradually pass by pass, and then true motion can be found with the spatial correlation of motion vectors

3DRS

However, since the picture is processed in a block based fashion according a specified scanning order, the motion information is only available for the blocks that have already been processed according to the scan order those processed in a previous field are

called temporal candidates.

3DRS

For one block, have these candidate motion vectors:

<1>spatial prediction candidate set:

: relative position of current block x and current frame n

<2>Temporal candidate set( estimated from previous frame):

3DRS

<3>Update candidates set : generated by adding small random vectors (u) to spatial candidate set, i.e. Update vector relative small  theoretically , update vector can be

random variable e.g. Gussian or uniform probability distribution

These (random) update vectors are essential for the convergence of the motion field and to correctly track variable object motion

General recursive process

"True-Motion Estimation with 3-D Recursive Search Block Matching"Gerard de Haan, Paul W. A. C. Biezen, Henk Huijgen, and Olukayode A. Ojo

Relative position of spatial and temporal predictor

Relative position of spatial and temporal predictor

r = 2 has been experimentally found to be best for a block size of 8*8 pixels.

3DRS Each pass in 3DRS motion estimation is

presented as follows:

: candidate vector in the i-1 pass candidate vector set

update vectors are randomly selected from the update set, US

convergence

I. INTRODUCTION

II. THE SCALABLE MEANDERING BASED 3DRS

III. EVALUATION, RESULTS AND CONCLUSION

THE SCALABLE MEANDERING BASED 3DRS

The scan order of 3DRS algorithms could either follow a “ raster” or a “meandering” pattern as shown in Fig. 1. One possible method involves processing Macro Blocks (MB) in scan order.

While the raster scanning pattern is easily parallelizable it has inferior convergence properties compared to the meandering scan pattern.

the meandering scan is quite challenging to parallelize due to the frequently changing scan direction as shown in Fig.1(A).

This paper addresses the above problem and presents a scalable multi-(co)processor friendly method to parallelize the meandering based 3DRS motion estimation algorithm withoutcompromising picture quality.

THE SCALABLE MEANDERING BASED 3DRS

THE SCALABLE MEANDERING BASED 3DRS

An analysis of this algorithm allows to make the following observations: (i) each meandering scan is composed of

two raster scans operating on odd rows or even rows as depicted in Fig. 1(B);

(ii) the two raster scans depend on each other;

(iii) the relative position and temporal (spatial) nature of the candidates constantly change based on the current direction of the scan in progress.

THE SCALABLE MEANDERING BASED 3DRS

If MB(i,j) is the current MB under consideration, then the spatial and temporal MBs available for candidate selection in the traditional 3DRS algorithm are S1ij and T1ij respectively.

THE SCALABLE MEANDERING BASED 3DRS

α β

1 2 3

α CB

1 β

2 3 4

1 2 3

α CB

1 β

α CB

S1S2 S3

T1,T2 T3

THE SCALABLE MEANDERING BASED 3DRS

The variables α and β are presented for a left to right scan order, changing the scan order implies swapping the content of these variables.

THE SCALABLE MEANDERING BASED 3DRS

the motion information in the neighboring blocks that are processed in the same iteration (i.e. spatial candidates) is more accurate than the ones available from the previous scan iteration.

With reference to the two raster scans shown in Fig.1(B), the currently processed block MB(i,j) denoted as B has only the MB denoted as A as a direct neighboring spatial candidate. All other direct neighboring candidates are temporal.

THE SCALABLE MEANDERING BASED 3DRS

THE SCALABLE MEANDERING BASED 3DRS

In order to maintain motion detection accuracy the selection of spatial candidates is

replaced to include MBs from the set S2ij instead of S1ij.

The temporal candidates are unchanged.

Thus, the parallel 3DRS algorithm constructs its candidate set from S2ij

and T2ij.

Parallelization of Raster Scan “ The 2D Wave”

in Fig. 2, MB(i,j) can be processed as soon as MB(i-2,j+1) completes. This results in processing MBs in a diagonal wave front manner which is referred to as “ 2D-Wave”

The runtime execution of the parallel 3DRS algorithm

However, the quality of the motion detection can be compromised because the neighboring MBs are not used (other than α or β) as spatial candidates.

To prevent the quality lost while still being able to find small objects, both raster scans can be simultaneously executed as shown in Fig. 1(B). This is done by assigning a Motion Estimator (ME) (co)processor to each row.

The runtime execution of the parallel 3DRS algorithm

The runtime execution of the parallel 3DRS algorithm

Fig. 3, for example depicts a system in which the parallel3DRS algorithm is mapped to four cores.

The simultaneous execution of the 2D-wave processing of the two raster scans can be viewed as two distinct phases:

The runtime execution of the parallel 3DRS algorithm

Phase One: The execution from the start position of each row to around the middle of the row

each raster scan executes the 2D wave with ME1 using the (S1ij, T1ij) candidate set for block matching while the other processors use the (S2ij, T2ij) set (α and β are swapped according to the scan direction).

The runtime execution of the parallel 3DRS algorithm

Phase Two: The execution from around the middle of the row to the end of the row (see Fig.3-right).

The processors executing would have overlapped the eight neighboring MBs are spatial.

Thus, ME1, ME2 and ME3 use the (S3ij, T3ij) candidate set while ME4 uses the (S1ij, T1ij) set for block matching.

I. INTRODUCTION

II. THE SCALABLE MEANDERING BASED 3DRS

III. EVALUATION, RESULTS AND CONCLUSION

EVALUATION

The proposed parallel 3DRS algorithm is evaluated for various video streams by performing simulations on the NeXVP architecture

The underlying architecture consists of 2 homogenous 4 issue slot Trimedia cores with a subset static interleaved multithreading (two foreground and two background threads)

EVALUATION

3DRS motion estimation performs 125 scans/second for Full HD 1920x1080 stream compared to 29 scans/second on a single core running the parallel 3DRS code.

For Quad HD 4096x2160 video, a rate of 100 scans/second was obtained on a similar architecture having 3 additional cores.

RESULT

Qualitative evaluation of the picture quality indicates that the parallel implementation of the algorithm performs as well as the traditional 3DRS algorithm with no visible degradation in picture quality.

RESULT

conclusion This paper presents a method to parallelize

the meandering based 3D recursive search (3DRS) motion estimation algorithm used in scan-rate up-conversion.

The proposed algorithm is scalable and can easily be mapped to multiple processing units such as multithreaded processors, multicores and/or co-processors in order to cope up with the increasingly hard to meet real time requirements of next generation video devices.

conclusion

Experiments show that the picture quality of the proposed parallel 3DRS algorithm is as good as the original nonparallelized algorithm for most video sequences.