data compression conference 2013 chenggang yan, yongdong zhang, feng dai and liang li 1

27
HIGHLY PARALLEL FRAMEWORK FOR HEVC MOTION ESTIMATION ON MANY-CORE PLATFORM Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

Upload: brice-blair

Post on 19-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

Introduction (1/2)  HEVC coding tree unit (CTU) 3

TRANSCRIPT

Page 1: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

1

HIGHLY PARALLEL FRAMEWORK FOR HEVC MOTION ESTIMATION ON MANY-

CORE PLATFORM

Data Compression Conference 2013

Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li

Page 2: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

2

Outline Introduction Related Work Proposed Method Experimental Results Conclusion

Page 3: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

3

Introduction(1/2)

HEVC coding tree unit (CTU)

Page 4: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

4

Introduction(2/2)

Local parallel method (LPM) Maximum parallelism of LMP is equal or less than 8. independent Pus (IPUs)

Directed acyclic graph (DAG)

Page 5: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

5

Related Work(1/2)

Local parallel method (LPM) [16] Motion estimate region (MER)

[16] Minhua Zhou, “AHG10: Configurable and CU-group level parallel merge/skip,” JCTVC-H0082, Feb. 2012

Page 6: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

6

Related Work(2/2)

Local parallel method (LPM)

123

M = 16 or 8

8

Page 7: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

7

Proposed Method A. Data Dependency Analysis

B. DAG for CTUs

C. Highly Parallel Framework

Page 8: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

8

Proposed Method.A(1/3)

Independent PUs (IPUs) The IPU’s left boundary and MER’s left boundary do not

overlap. The IPU’s upper boundary and MER’s upper boundary do not

overlap.

123

Page 9: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

9

Proposed Method.A(2/3)

Page 10: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

10

Proposed Method.A(3/3)

Neighboring CTUs left upper upper-left upper-right

Page 11: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

11

Proposed Method A. Data Dependency Analysis

B. DAG for CTUs

C. Highly Parallel Framework

Page 12: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

12

Proposed Method.B(1/4)

Generate a DAG to capture the dependency relationships of CTUs.

Page 13: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

13

Proposed Method.B(2/4)

DAG consists of a set of vertices V and edges E. data dependency <=> an edge. Processed <=> remove

123

Page 14: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

14

Proposed Method.B(3/4)

Condition matrix (CM)

Page 15: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

15

Proposed Method.B(4/4)

Page 16: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

16

Proposed Method A. Data Dependency Analysis

B. DAG for CTUs

C. Highly Parallel Framework

Page 17: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

17

Proposed Method.C(1/5)

Page 18: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

18

Proposed Method.C(2/5)

Step1 : Initialize DQ and CM. DQ is a waiting queue. CM is

designed to record the number of related CTUs for each CTU. Step2 :

When some values in the CM become zero, get the corresponding coordinates and push them into DQ.

Page 19: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

19

Proposed Method.C(3/5)

Step3 :Get coordinates from DQ and process corresponding

CTUs in parallel on many-core platform. Step4 :

Update CM. When a CTU with coordinate (i, j) in CM is processed, the values of coordinates (i+1, j), (i+1, j-1), (i,j+1) and (i+1,j+1) in CM will minus one operation.

Step5 :Repeat above steps 2~4 until each frame is over.

Page 20: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

20

Proposed Method.C(4/5)

Maximum parallelism of CTU

123

Maximum parallelism of highly parallel framework

123

Average parallelism of highly parallel framework

123

Page 21: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

21

Proposed Method.C(5/5)

Page 22: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

22

Experimental Results(1/5)

Page 23: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

23

Experimental Results(2/5)

Page 24: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

24

Experimental Results(3/5)

Page 25: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

25

Experimental Results(4/5)

Page 26: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

26

Experimental Results(5/5)

Page 27: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

27

Conclusion(1/1)

Highly parallel framework provide sufficient parallelism for many-core platforms.

Use the DAG-based order to parallelize CTUs.