rate-distortion based video compression optimal …978-1-4757-2566-7/1.pdf · rate-distortion based...

17
RATE-DISTORTION BASED VIDEO COMPRESSION Optimal Video Frame Compression and Object Boundary Encoding

Upload: phungdat

Post on 02-May-2018

221 views

Category:

Documents


4 download

TRANSCRIPT

RATE-DISTORTION BASED

VIDEO COMPRESSION

Optimal Video Frame Compression and Object Boundary Encoding

RATE-DISTORTION BASED VIDEO

COMPRESSION

Optimal Video Frame Compression and Object Boundary Encoding

Guido M. SCHUSTER U.S. Robotics

Skokie, Illinois, USA

and

Aggelos K. KATSAGGELOS Northwestern University

Evanston, Illinois, USA

SPRINGER-SCIENCE+BUSINESS MEDIA, B.V.

A C .I.P. Catalogue record for this book is available from the Library of Congress

ISBN 978-1-4419-5172-4 ISBN 978-1-4757-2566-7 (eBook) DOI 10.1007/978-1-4757-2566-7

Printed an acid-tree paper

AII Rights Reserved © 1997 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 1997 Softcover reprint of the hardcover lst edition 1997 No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, includ ing photocopying, record ing or by any information storage and retrieval system, without written permission from the copyright owner.

To Dawn

LIST OF FIGURES

LIS'r OF TABLES

Preface

1 INTRODUCTION 1.1 Motivation for video compression 1.2 Problem Statement 1.3 Contributions 1.4 Overview 1.5 Summary

CONTENTS

xi

xviii

xix

1 1 3

10

11 12

2 REVIEW OF LOSSY VIDEO COMPRESSION 13 2.1 Lossless versus lossy compression 2.2 Motion compensated waveform coding 2.3 Three dimensional waveform coding 2.4 Model-based video coding 2.5 Summary

3 BACKGROUND 3.1 Rate distortion theory 3.2 Operational rate distortion theory 3.3 Lagrangian multiplier method 3.4 Dynamic programming 3.5 Shortest path algorithm 3.6 Summary

vii

13 15 34 37 42

43 43 48 56 62 67 71

viii RATE-DISTORTION BASED VIDEO COMPRESSION

4 GENERAL CONTRIBUTIONS 73 4.1 Optimal bit allocation for dependent quantizers using the

minimum total distortion criterion 74 4.2 Very fast convex search based on a Bezier curve 78 4.3 Optimal bit allocation for dependent quantizers using the

minimum maximum distortion criterion 81 4.4 Optimal scanning path for a quad-tree decomposition 88 4.5 Optimal quad-tree decomposition with leaf dependencies 101 4.6 Summary 122

5 OPTIMAL MOTION ESTIMATION AND MOTION COMPENSATED INTERPOLATION FOR VIDEO COMPRESSION 123 5.1 Optimal region matching 124 5.2 Optimal QT-based motion estimator 134 5.3 Motion compensated interpolation 142 5.4 Summary 148

6 A VIDEO COMPRESSION SCHEME WITH OPTIMAL BIT ALLOCATION BETWEEN DISPLACEMENT VECTOR FIELD AND DISPLACED FRAME DIFFERENCE 151 6.1 Introduction 152 6.2 Notation and assumptions 154 6.3 Lossless MCVC 155 6.4 LossyMCVC 160 6.5 The minimum maximum distortion approach 161 6.6 A video compression scheme with optimal bit allocation

between DVF and DFD 162 6.7 Implementation Issues 167 6.8 Experiments 168 6.9 Summary 184

Contents lX

7 A VIDEO COMPRESSION SCHEME WITH OPTIMAL BIT ALLOCATION AMONG SEGMENTATION, MOTION AND RESIDUAL ERROR 187 7.1 Introduction 187 7.2 Notation and assumptions 189 7.3 Lossless VBSMCVC 191 7.4 Lossy VBSMCVC 194 7.5 Implementation 197 7.6 Experimental Results 200 7.7 Summary 213

8 AN OPTIMAL POLYGONAL BOUNDARY ENCODING SCHEME 217 8.1 Introduction 217 8.2 Problem Formulation 220 8.3 Distortion measures based on the maximum operator 224 8.4 Distortion measures based on the summation operator 232 8.5 Including secondary objectives 237 8.6 Extension of the admissible vertex set to off-boundary pixels 241 8.7 Multiple boundary encoding 248 8.8 Vertex encoding scheme 253 8.9 Experimental Results 259 8.10 Summary 265

REFERENCES 267

INDEX 287

LIST OF FIGURES

Chapter 1

1.1 Block diagram of a motion compensated video coder 1.2 Decomposition of the original sequence 1.3 Operational rate distortion curve

Chapter 2

2.1 Tradeoff space of video compression 2.2 Generic motion compensated video coder 2.3 Block matching for estimating the DVF 2.4 Zig-zag scan for the DCT coefficients 2.5 Block diagram of a generic MC-DCT coder 2.6 A three level pyramid video coding scheme

Chapter 3

3.1 Communication system 3.2 Operational rate distortion function 3.3 Bisection method 3.4 Rate distortion plane 3.5 Example trellis 3.6 Topologically sorted trellis 3. 7 Different stages of the shortest path algorithm

Chapter 4

4.1 Trellis for the image compression example 4.2 Continuous rate distortion curve 4.3 The R*(Dmax) function 4.4 Macro block MSE comparison

xi

5

6 7

14 16 18 30

33 36

45

49 59 61 66 69 72

78 79 85

88

xii RATE-DISTORTION BASED VIDEO COMPRESSION

4.5 Macro block quantizer step size comparison 89 4.6 Comparison between Y channels of the three approaches 90 4. 7 Frame segmented by a quad-tree 91 4.8 Quad-tree representation of the frame 91 4.9 Quad-tree notation 92 4.10 Recursive raster scan 93 4.11 Different scanning paths 94 4.12 Completely decomposed quad-tree 95 4.13 Recursive definition of the scanning path 95 4.14 Corrected raster scan 97 4.15 Optimal scanning path 97 4.16 Hilbert curve definition 99 4.17 Recursive Hilbert curve generation 100 4.18 The multilevel trellis for N = 5 and no = 3 105 4.19 Recursive rule for generating the "from" and "to" sets 107 4.20 Recursive distribution of the quad-tree encoding cost 109 4.21 Optimal path 113 4.22 Optimal Quad-tree decomposition 114 4.23 Modified Hilbert scan for a QCIF image 117 4.24 First frame encoded by H.263 119 4.25 First frame encoded by optimal mean value QT decomposition 120 4.26 Segmentation of the first frame encoded by QT decomposition 121

Chapter 5

5.1 The original 176-th and 180-th frames 129 5.2 The predicted frame and the DVF for TMN4 block matching 131 5.3 The predicted frame and the DVF when the rate is matched 132 5.4 The predicted frame and the DVF when the distortion is

matched 133 5.5 Modified Hilbert scan for level no = 3 of a QCIF frame 138 5.6 The predicted frame and the DVF when the rate is matched

for the QT-based scheme 139 5.7 The overall scanning path 140 5.8 The predicted frame and the DVF when the distortion is

matched for the QT-based scheme 141 5.9 Motion compensated interpolation of the first frame 143

List of Figures xm

5.10 The 84-th, 86-th and 88-th reconstructed frames of the "Miss America" sequence 147

5.11 The interpolated frame for the "Miss America" sequence 147 5.12 The QT segmentation and the DVF of the 86-th interpolated

frame of the "Miss America" sequence 148 5.13 The 176-th, 178-th and 180-th reconstructed frames of the

"Mother and Daughter" sequence 149 5.14 The motion compensated interpolation result for the "Mother

and Daughter" sequence 149 5.15 The QT segmentation and the DVF of the 178-th interpolated

frame of the "Mother and Daughter" sequence 150

Chapter 6

6.1 The trellis of the lossless MCVC example 159 6.2 The neighborhood needed for TMN 4 164 6.3 Rate comparison between TMN4 and the proposed coder,

where the TMN4 distortion is the target distortion of the proposed coder. 171

6.4 Rate difference between TMN4 and the proposed coder, where the TMN4 distortion is the target distortion of the proposed coder. 171

6.5 Distortion comparison between TMN4 and the proposed coder, where the TMN4 distortion is the target distortion of the proposed coder. 172

6.6 Distortion difference between TMN4 and the proposed coder, where the TMN4 distortion is the target distortion of the proposed coder. 172

6.7 The 12th reconstructed frame of the "Mother and Daughter" sequence 173

6.8 The optimal mode selection for the 16th frame of the "Mother and Daughter" sequence 174

6.9 The optimal quantizer selection for the 16th frame of the "Mother and Daughter" sequence 175

6.10 The optimal motion vector field for the 16th frame of the "Mother and Daughter" sequence. 176

6.11 Rate comparison between TMN4 and the proposed coder, where the TMN4 rate is the target rate of the proposed coder. 179

XIV RATE-DISTORTION BASED VIDEO COMPRESSION

6.12 Rate difference between TMN4 and the proposed coder, where the TMN 4 rate is the target rate of the proposed coder. 179

6.13 Distortion comparison between TMN4 and the proposed coder, where the TMN4 rate is the target rate of the proposed coder. 180

6.14 Distortion difference between TMN4 and the proposed coder, where the TMN4 rate is the target rate of the proposed coder. 180

6.15 Rate comparison between TMN4 and the proposed coder, where the distortion of the proposed coder is fixed. 182

6.16 Rate difference between TMN4 and the proposed coder, where the distortion of the proposed coder is fixed. 182

6.17 Distortion comparison between TMN4 and the proposed coder, where the distortion of the proposed coder is fixed. 183

6.18 Distortion difference between TMN4 and the proposed coder, where the distortion of the proposed coder is fixed. 183

Chapter 7

7.1 The multilevel trellis for N = 5 and n0 = 3 193 7.2 Rate comparison between TMN4 and the optimal coder, where

the TMN4 distortion is the target distortion of the optimal coder. 202

7.3 Rate difference between TMN4 and the optimal coder, where the TMN4 distortion is the target distortion of the optimal coder. 202

7.4 Distortion comparison between TMN4 and the optimal coder, where the TMN4 distortion is the target distortion of the optimal coder. 203

7.5 Distortion difference between TMN4 and the optimal coder, where the TMN4 distortion is the target distortion of the optimal coder. 203

7.6 The 12th reconstructed frame of the "Mother and Daughter" sequence. This frame is used to predict the 16th frame. 204

7.7 The optimal mode selection for the 16th frame of the "Mother and Daughter" sequence 205

7.8 The optimal motion vector field for the 16th frame of the "Mother and Daughter" sequence. 206

7.9 Rate comparison between TMN4 and the optimal coder, where the TMN4 rate is the target rate of the optimal coder. 208

List of Figures xv

7.10 Rate difference between TMN4 and the optimal coder, where the TMN4 rate is the target rate of the optimal coder. 208

7.11 Distortion comparison between TMN4 and the optimal coder, where the TMN4 rate is the target rate of the optimal coder. 209

7.12 Distortion difference between TMN4 and the optimal coder, where the TMN4 rate is the target rate of the optimal coder. 209

7.13 Rate comparison between TMN4 and the optimal coder, where the distortion of the optimal coder is fixed. 211

7.14 Rate difference between TMN4 and the optimal coder, where the distortion of the optimal coder is fixed. 211

7.15 Distortion comparison between TMN4 and the optimal coder, where the distortion of the optimal coder is fixed. 212

7.16 Distortion difference between TMN4 and the optimal coder, where the distortion of the optimal coder is fixed. 212

7.17 Optimal quad-tree segmentation and encoding modes for the 80th frame of the "Miss America" sequence 214

7.18 Optimal inhomogeneous motion vector field for the 80th frame of the "Miss America" sequence 215

Chapter 8

8.1 Interpretation of the boundary and the polygon approxima-tion as a fully connected weighted directed graph 226

8.2 Examples of polygons with rapid changes in direction. 227 8.3 Interpretation of the boundary and the polygon approxima-

tion as a weighted directed graph 228 8.4 The R*(Dma:z:) function, which is a non-increasing function

exhibiting a staircase characteristic 231 8.5 Pruned decision tree for the encoding of a boundary 236 8.6 The "band" concept 242 8. 7 Vector increments for the ordering of the admissible vertices 245 8.8 Distance as a function of the index 246 8.9 Result of the ordering algorithm 247 8.10 Pruned decision tree for the optimal encoding of three bound-

aries 252 8.11 Improved orientation encoding 258 8.12 Original segmentation 260 8.13 Optimal segmentation for Dma:z: = 1 pixel 260

xvi RATE-DISTORTION BASED VIDEO COMPRESSION

8.14 Optimal segmentation for Rma:z: = 280 bits 262 8.15 Closeup of the lower boundary 262 8.16 Lagrangian multiplier approach 263 8.17 Pruning approach 263 8.18 Comparison between the Lagrangian approach and the prun-

ing approach 264 8.19 Operational rate distortion function 264 8.20 Optimal extended segmentation for Dma:z: = 1 pixel 265

LIST OF TABLES

Chapter 4

4.1 Statistics of the two paradigms 4.2 Code word length for DC prediction error 4.3 Comparison between fixed block sizes

Chapter 5

5.1 Comparison between optimal motion estimators

Chapter 6

6.1 Average rate distortion comparison for the "Mother and Daugh­ter" sequence between TMN4 and the proposed coder for dif-

87 116 116

130

ferent modes of operation 177 6.2 Average rate comparison for the "Mother and Daughter" se-

quence between TMN4 and the distortion matched proposed coder with differently constrained search spaces 178

6.3 Average rate distortion comparison for the "Mother and Daugh-ter" sequence between TMN6 and the proposed coder for dif-ferent modes of operation 184

Chapter 7

7.1 Average rate distortion comparison for the "Mother and Daugh­ter" sequence between TMN4 and the proposed optimal coder for different modes of operation 207

7.2 Average rate distortion comparison for the "Mother and Daugh-ter" sequence between TMN6 and the proposed optimal coder for different modes of operation 216

xvii

xviii RATE-DISTORTION BASED VIDEO COMPRESSION

Chapter 8

8.1 Code word assignment for the runs 257

PREFACE

One of the most intriguing problems in video processing is the removal of the redundancy or the compression of a video signal. There are a large number of applications which depend on video compression. Data compression represents the enabling technology behind the multimedia and digital television revolution.

In motion compensated lossy video compression the original video sequence is first split into three new sources of information, segmentation, motion and residual error. These three information sources are then quantized, leading to a reduced rate for their representation but also to a distorted reconstructed video sequence. After the decomposition of the original source into segmentation, mo­tion and residual error information is decided, the key remaining problem is the allocation of the available bits into these three sources of information. In this monograph a theory is developed which provides a solution to this fundamental bit allocation problem. It can be applied to all quad-tree-based motion com­pensated video coders which use a first order differential pulse code modulation (DPCM) scheme for the encoding of the displacement vector field (DVF) and a block-based transform scheme for the encoding of the displaced frame differ­ence (DFD). An optimal motion estimator which results in the smallest DFD energy for a given bit rate for the encoding of the DVF is also a result of this theory. Such a motion estimator is used to formulate a motion compensated interpolation scheme which incorporates a global smoothness constraint for the DVF.

Several algorithms of general nature pertaining to the problem mentioned in the previous paragraph are also presented in this monograph. Among them is an optimal bit allocation scheme for dependent quantizers with arbitrary dependencies, a very fast convex search based on a Bezier curve, an optimal bit allocation scheme for dependent quantization using the minimum-maximum distortion criterion, an optimal scanning path for quad-trees, and an optimal quad-tree decomposition with leaf dependencies.

The optimal bit allocation problem is a rather old one in the information theory community. The optimal bit allocation problem as part of the design of a

xix

XX RATE-DISTORTION BASED VIDEO COMPRESSION

coder and decoder ( codec), or the design of a co dec in the rate-distortion (R­D) sense has became a quite popular problem in the last few years. Such a problem becomes extremely "critical" when codecs are designed to operate at low bit-rates, as represented by one of the functionalities of the ongoing standardizations effort MPEG4. Various other related encoding problems in the R-D sense have also become quite popular.

One such problem is the encoding of a boundary by either minimizing the bit rate given an acceptable level of distortion, or minimizing the distortion for a given bit budget. Suc~1 a problem plays an important role in object oriented video coding. In this work a polygonal approximation of the original boundary is used and two solutions are developed. With one of them the vertices of the polygon coincide with boundary points, while with the other one they are allowed to be located inside a band formed around the original boundary. Several different classes of distortion measures are investigated which result in different algorithms. Most of these algorithms are based on a weighted directed acyclic graph formulation of the problem.

This monograph was originally written by Guido M. Schuster as a doctoral thesis under the supervision of Prof. Aggelos K. Katsaggelos at Northwestern University. We gratefully acknowledge the financial support from Motorola Inc. and Northwestern University. The discussion with the members of the Motorola Visual Communications Group, especially SteveN. Levine, James C. Brailean, Mark R. Hanham, Chueng Auyeung and Kevin O'Connell, have resulted in many important inputs to this research effort. Finally, Guido Schuster wants to thank his wonderful wife, Prof. Dawn Barnes-Schuster. She did not only support him morally but she also spend many nighttime hours with him in the Lab. completing this work.

Guido M. Schuster and Aggelos K. Katsaggelos