motion estimation ece 569 – spring 2010 toan nguyen shikhar upadhaya
TRANSCRIPT
![Page 1: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/1.jpg)
Motion Estimation
ECE 569 – Spring 2010Toan Nguyen
Shikhar Upadhaya
![Page 2: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/2.jpg)
Outline
• What is new with motion estimation• Four Step Search and Hexagon Search
Algorithms• Parallelization strategies• Results and discussions
![Page 3: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/3.jpg)
What is new with motion estimation?
• The familiar way – Full search• Full search is not so efficient• Some of the most popular fast search algorithms:
Diamond searchHexagon searchThree-step searchFour-step searchOrthogonal searchAnd many more
![Page 4: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/4.jpg)
So what is the best?
• There is a trade-off between the run time and the accuracy.
• Full search will be most accurate because of exhaustive search, but will require more time
• Fast search is faster but the accuracy will be reduced because of estimation algorithms.
• We implemented two of the most popular fast search algorithms for comparison:Four Step SearchHexagon Search
![Page 5: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/5.jpg)
Four Step Search Algorithm• Step 1: A minimum BDM point is found from a nine-checking points pattern on a 5 x 5
window located at the center of the 15 x 15 searching area. If the minimum BDM point is found at the center of the search window, go to Step 4; otherwise go to Step 2.
• Step 2: The search window size is maintained in 5 x 5. However, the search pattern will depend on the position of the previous minimum BDM point. If the previous minimum BDM point is located at the corner of the previous search
window, five additional checking points as shown in Fig. 2(b) are used. If the previous minimum BDM point is located at the middle of horizontal or vertical
axis of the previous search window, three additional checking points as shown in Fig. 2(c) are used.
If the minimum BDM point is found at the center of the search window, go to Step 4; otherwise go to Step 3.
• Step 3: The searching pattern strategy is the same as Step 2, but finally it will go to Step 4.
• Step 4: The search window is reduced to 3 x 3 as shown in Fig. 2(d) and the direction of the overall motion vector is considered as the minimum BDM point among these nine searching points.
![Page 6: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/6.jpg)
Four Step Search Example
![Page 7: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/7.jpg)
Hexagon Search Algorithm• Step 1: The large hexagon with seven checking points is centered at,
the center of a predefined search window in the motion field. If the MBD point is found to be at the center of the hexagon, proceed to Step 3; otherwise, proceed to Step 2.
• Step 2: With the MBD point in the previous search step as the center, a new large hexagon is formed. Three new candidate points are checked, and the MBD point is again identified. If the MBD point is still the center point of the newly formed hexagon, then go to Step 3; otherwise, repeat this step continuously.
• Step 3: Switch the search pattern from the large to the small size of the hexagon. The four points covered by the small hexagon are evaluated to compare with the current MBD point. The new MBD point is the final solution of the motion vector.
![Page 8: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/8.jpg)
Hexagon Search Example
![Page 9: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/9.jpg)
Design Implementation
• Parallelization is possible by dividing the image into small sub-image partitions.
• Each thread will work on a sub-image independently using a designed algorithm ( i.e Four step search or Hexagon Search).
• At the end, the minimum SAD of each sub-image is compared to get the final minimum SAD and avoid local minimum.
![Page 10: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/10.jpg)
Implementation Notes
• Since the number of threads we use is multiple of 2’s, if the number of sub-image is not multiple of 2’s, we need to pad the image with additional rows and columns and we ignore the results from those extra sub-images.
• We excluded the time it takes to read a text file and store data into the window and image arrays when we compare the runtime for performance analysis.
![Page 11: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/11.jpg)
Simulation Results• First we varied the number of threads per block to find the maximal
configuration that gives the best run time.
256 threads/block give the best performance.
16X1632X32
64X64
128X128
256X256
512X512
1024X1024
2048X2048
4096X40960
1
2
3
4
5
6Runtime of full search on various threads/block
3264128256512
Image Size
Runti
me
(sec
onds
)
![Page 12: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/12.jpg)
Simulation Results (cont.)• The runtime of the serial versions and the parallel versions of different
algorithms are collected and compare to see what kind of performance improvement we achieved.
• We only see the performance improvement when the image size is 256x256 or bigger. Any image of size smaller than this will actually decrease the performance.
16X1632X32
64X64
128X128
256X256
512X512
1024X1024
2048X2048
4096X40960
0.5
1
1.5
2
2.5
3
3.5Hexagon Search serial vs. parallel
Hexagon_SerialHexagon_parallel
Image Size
Runti
me
(sec
onds
)
16X1632X32
64X64
128X128
256X256
512X512
1024X1024
2048X2048
4096X40960
5
10
15
20
25
30 Full Search serial vs. parallel
FSS_Serial
FSS_Parallel
Image Size
Run
time
(sec
onds
)
16X1632X32
64X64
128X128
256X256
512X512
1024X1024
2048X2048
4096X40960
0.51
1.52
2.53
3.54
Four Step Search serial vs. parallel
4SS_Serial4SS_Parallel
Image Size
Runti
me
(sec
onds
)
![Page 13: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/13.jpg)
Simulation Results (cont.)
• So how much speed up do we get and which algorithm is better, Full Search, Four Step Search, or Hexagon Search?
16X1632X32
64X64
128X128
256X256
512X512
1024X1024
2048X2048
4096X40960
5
10
15
20
25
30
35
Parallel vs. serial versions speedup
Speed_UP_FSSpeed_UP_4SSSpeed_UP_Hexagon
Image size
Spee
d up
![Page 14: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/14.jpg)
Simulation Results (cont.)• Overall performance
16X1632X32
64X64
128X128
256X256
512X512
1024X1024
2048X2048
4096X40960
5
10
15
20
25
30
FSS_SerialFSS_Parallel4SS_Serial4SS_ParallelHexagon_SerialHexagon_parallel
Full_Serial Full_Parallel 4SS_Serial 4SS_Parallel Hexagon_Serial Hexagon_parallel16X16 0 0 0 0.016 0 0.07832X32 0 0.016 0 0.015 0 0.04764X64 0.01 0.016 0.01 0.015 0.01 0.062
128X128 0.02 0.016 0.01 0.015 0.01 0.062
256X256 0.09 0.031 0.02 0.016 0.02 0.047
512X512 0.41 0.078 0.06 0.016 0.06 0.063
1024X1024 1.64 0.265 0.236 0.032 0.22 0.062
2048X2048 6.56 0.922 0.87 0.047 0.85 0.078
4096X4096 26.29 3.719 3.38 0.11 3.3 0.157
![Page 15: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/15.jpg)
Simulation Results (cont.)
• Performance comparison between NVIDIA 8400 GS and 9800 GT GPUs.
16X1632X32
64X64
128X128
256X256
512X512
1024X1024
2048X2048
4096X40960
0.5
1
1.5
2
2.5
3
3.5
4
4.5
NVIDIA 8400 GS vs. 9800 GT performance
Speed-up_FSSSpeed-up_4SSSpeed-up_Hexagon
Image Size
Spee
d up
![Page 16: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/16.jpg)
Simulation Results (cont.)
• Distortion measurement (motion estimation quality).
16X1632X32
64X64
128X128
256X256
512X512
1024X1024
2048X2048
4096X40960
100
200
300
400
500
600
Fast Search Distortion
4SS distortionHexagon distortion
Image size
Dist
ortio
n
16X1632X32
64X64
128X128
256X256
512X512
1024X1024
2048X2048
4096X40960
200
400
600
800
1000
1200
1400
Min. SAD returned by different algorithms
Full-Step4SSHexagon
Image size
Min
. SAD
![Page 17: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/17.jpg)
Result Analysis Summary
1. Motion estimation parallel versions performance only improve when image is large (256x256). Smaller image will reduce performance. Larger image ~ greater speedup
2. Fast search algorithms outperform full search algorithm, hence “fast”.
3. Parallelization on Four Step Search gives a slightly edge improvement over Hexagon Search.
4. The distortion we see on the two fast search algorithms are similar.
![Page 18: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/18.jpg)
Result Conclusions
• Based on the data collected from different algorithms, Four Step Search gives a slightly better performance than Hexagon Search, while the distortion is very similar.Hence, Four Step Search is a better fast search
algorithm than Hexagon Search.Only perform motion estimation algorithms on
GPU if image size is larger than 256x256. Smaller image size should be ran serially on CPU.
![Page 19: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/19.jpg)
Limitations
• Image and window files are random.• Not make use of shared memory
![Page 20: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/20.jpg)
Other parallelization strategy
• After each step, the SAD of the new checking points will be computed. We can parallelize by having threads to compute SAD’s of all the points in the sub-image.
• Then after each step complete and the SAD for the new checking points needed, we already have them computed by the threads in previous step.
• Drawback of this strategy:Not getting a considerable amount of speedupLots of data transfer between host and deviceMore complicated implementation
![Page 21: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/21.jpg)
References• Deepak Turaga , Mohamed Alkanhal . "Search Algorithms for Block-
Matching in Motion Estimation". ECE - CMU. March 06, 2010 <http://www.ece.cmu.edu/~ee899/project/deepak_mid.htm>.
• Lai-Man Po, Wing-Chung Ma. A Novel Four-Step Search Algorithm for Fast Block Motion Estimation. JUNE 1996
• Xuan Jing, Lap-Pui Chau. "An Efficient Three-Step Search Algorithm for Block Motion Estimation". IEEE TRANSACTIONS ON MULTIMEDIA JUNE 2004: 435-437.
• Chen Lu, Wang. "Diamond Search Algorithm". ECE, U of Texas. March 06, 2010 <http://users.ece.utexas.edu/~bevans/courses/ee381k/projects/fall98/chen-lu-wang/presentation/sld012.htm>.
![Page 22: Motion Estimation ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya](https://reader036.vdocuments.site/reader036/viewer/2022062620/551b1fe6550346cf5a8b57ac/html5/thumbnails/22.jpg)
Questions?