meandering based parallel 3drs algorithm for the multicore era ghiath al-kadi‡, jan...
Post on 19-Dec-2015
216 views
TRANSCRIPT
Meandering Based Parallel 3DRS Algorithm for The
Multicore Era
Ghiath Al-kadi‡ , Jan Hoogerbrugge‡ , Surendra Guntur‡ , Andrei Terechko*, Marc Duranton‡ and Onno Eerenberg‡
‡NXP Semiconductors, Eindhoven, the Netherlands. *Vector Fabrics, Eindhoven, the Netherlands
thithithi
This paper appears in: Consumer Electronics (ICCE), 2010 Digest of Technical Papers International Conference on
I. INTRODUCTION True motion estimation 3DRS
II. THE SCALABLE MEANDERING BASED 3DRS
III. EVALUATION, RESULTS AND CONCLUSION
introduction true motion estimation --a method for finding
objects motion the motion vectors should represent true motion of the
objects in the video sequence
introduction For video compression applications it is enough to get a
motion vector corresponding to best match. This in turns results in lower residual energy and better compression.
Using traditional ME to find true motion vectors can only be estimated for blocks containing enough texture
difficulty of true motion estimation
When the video sequence is complex, especially having small objects and fast moving objects, motion vector is not easy to estimation
blocking artifact object movement result in
cover/uncover criterions
Main Application of true motion estimation
frame rate up-conversion(FRC) Add frame rate to 120 frames per second
is becoming increasingly necessary with the advent of advanced high resolution display technologies such as LCD and Plasma
Motion estimation is an integral part of FRC
The quality of the motion vector based interpolated
need true motion vector
How to find true motion vector
3-Dimensional Recursive Search(3DRS) algorithm is one of the most widely used methods to find true motion
The 3DRS algorithm is based on block matching and in order to find true motion the algorithm makes two assumptions (i) Objects are larger than a block of pixels; (ii)Objects have inertia.
3DRS
For all other blocks, we will have to rely on motion vector already estimated. construct a small set of candidate
vectors based on spatial relations Motion vector can be refined according
to the motion of neighboring blocks gradually pass by pass, and then true motion can be found with the spatial correlation of motion vectors
3DRS
However, since the picture is processed in a block based fashion according a specified scanning order, the motion information is only available for the blocks that have already been processed according to the scan order those processed in a previous field are
called temporal candidates.
3DRS
For one block, have these candidate motion vectors:
<1>spatial prediction candidate set:
: relative position of current block x and current frame n
<2>Temporal candidate set( estimated from previous frame):
3DRS
<3>Update candidates set : generated by adding small random vectors (u) to spatial candidate set, i.e. Update vector relative small theoretically , update vector can be
random variable e.g. Gussian or uniform probability distribution
These (random) update vectors are essential for the convergence of the motion field and to correctly track variable object motion
General recursive process
"True-Motion Estimation with 3-D Recursive Search Block Matching"Gerard de Haan, Paul W. A. C. Biezen, Henk Huijgen, and Olukayode A. Ojo
Relative position of spatial and temporal predictor
r = 2 has been experimentally found to be best for a block size of 8*8 pixels.
3DRS Each pass in 3DRS motion estimation is
presented as follows:
: candidate vector in the i-1 pass candidate vector set
update vectors are randomly selected from the update set, US
THE SCALABLE MEANDERING BASED 3DRS
The scan order of 3DRS algorithms could either follow a “ raster” or a “meandering” pattern as shown in Fig. 1. One possible method involves processing Macro Blocks (MB) in scan order.
While the raster scanning pattern is easily parallelizable it has inferior convergence properties compared to the meandering scan pattern.
the meandering scan is quite challenging to parallelize due to the frequently changing scan direction as shown in Fig.1(A).
This paper addresses the above problem and presents a scalable multi-(co)processor friendly method to parallelize the meandering based 3DRS motion estimation algorithm withoutcompromising picture quality.
THE SCALABLE MEANDERING BASED 3DRS
An analysis of this algorithm allows to make the following observations: (i) each meandering scan is composed of
two raster scans operating on odd rows or even rows as depicted in Fig. 1(B);
(ii) the two raster scans depend on each other;
(iii) the relative position and temporal (spatial) nature of the candidates constantly change based on the current direction of the scan in progress.
THE SCALABLE MEANDERING BASED 3DRS
If MB(i,j) is the current MB under consideration, then the spatial and temporal MBs available for candidate selection in the traditional 3DRS algorithm are S1ij and T1ij respectively.
THE SCALABLE MEANDERING BASED 3DRS
The variables α and β are presented for a left to right scan order, changing the scan order implies swapping the content of these variables.
THE SCALABLE MEANDERING BASED 3DRS
the motion information in the neighboring blocks that are processed in the same iteration (i.e. spatial candidates) is more accurate than the ones available from the previous scan iteration.
With reference to the two raster scans shown in Fig.1(B), the currently processed block MB(i,j) denoted as B has only the MB denoted as A as a direct neighboring spatial candidate. All other direct neighboring candidates are temporal.
THE SCALABLE MEANDERING BASED 3DRS
In order to maintain motion detection accuracy the selection of spatial candidates is
replaced to include MBs from the set S2ij instead of S1ij.
The temporal candidates are unchanged.
Thus, the parallel 3DRS algorithm constructs its candidate set from S2ij
and T2ij.
Parallelization of Raster Scan “ The 2D Wave”
in Fig. 2, MB(i,j) can be processed as soon as MB(i-2,j+1) completes. This results in processing MBs in a diagonal wave front manner which is referred to as “ 2D-Wave”
The runtime execution of the parallel 3DRS algorithm
However, the quality of the motion detection can be compromised because the neighboring MBs are not used (other than α or β) as spatial candidates.
To prevent the quality lost while still being able to find small objects, both raster scans can be simultaneously executed as shown in Fig. 1(B). This is done by assigning a Motion Estimator (ME) (co)processor to each row.
The runtime execution of the parallel 3DRS algorithm
Fig. 3, for example depicts a system in which the parallel3DRS algorithm is mapped to four cores.
The simultaneous execution of the 2D-wave processing of the two raster scans can be viewed as two distinct phases:
The runtime execution of the parallel 3DRS algorithm
Phase One: The execution from the start position of each row to around the middle of the row
each raster scan executes the 2D wave with ME1 using the (S1ij, T1ij) candidate set for block matching while the other processors use the (S2ij, T2ij) set (α and β are swapped according to the scan direction).
The runtime execution of the parallel 3DRS algorithm
Phase Two: The execution from around the middle of the row to the end of the row (see Fig.3-right).
The processors executing would have overlapped the eight neighboring MBs are spatial.
Thus, ME1, ME2 and ME3 use the (S3ij, T3ij) candidate set while ME4 uses the (S1ij, T1ij) set for block matching.
EVALUATION
The proposed parallel 3DRS algorithm is evaluated for various video streams by performing simulations on the NeXVP architecture
The underlying architecture consists of 2 homogenous 4 issue slot Trimedia cores with a subset static interleaved multithreading (two foreground and two background threads)
EVALUATION
3DRS motion estimation performs 125 scans/second for Full HD 1920x1080 stream compared to 29 scans/second on a single core running the parallel 3DRS code.
For Quad HD 4096x2160 video, a rate of 100 scans/second was obtained on a similar architecture having 3 additional cores.
RESULT
Qualitative evaluation of the picture quality indicates that the parallel implementation of the algorithm performs as well as the traditional 3DRS algorithm with no visible degradation in picture quality.
conclusion This paper presents a method to parallelize
the meandering based 3D recursive search (3DRS) motion estimation algorithm used in scan-rate up-conversion.
The proposed algorithm is scalable and can easily be mapped to multiple processing units such as multithreaded processors, multicores and/or co-processors in order to cope up with the increasingly hard to meet real time requirements of next generation video devices.