object based video coding - a multimedia communication perspective muhammad hassan khan 2004-03-0020
Post on 21-Dec-2015
226 views
TRANSCRIPT
Object Based Video Coding - A Multimedia Communication Perspective
Muhammad Hassan Khan
2004-03-0020
Recap … Motivation for Video Coding Today’s Video Coding Problems with today’s video coding Desirable Features Solution to get desirable features
Object Based Video Coding MPEG-4 Support Model Based Coding
Major Problem: Segmentation Segmentation by Graph Cuts
Overview of Today’s Presentation Details of the Segmentation Process
Segmentation using Graph Cuts Results
What can we do once we have the segmented regions
Block-based Vs Parametric Motion Representation
Compatibility with MPEG-4
Segmentation using Graph Cuts How does it relate to segmentation of images?
It is primarily a pixel labeling problem Consider we want to label a pixel
D = Distance Function (Depends on the current pixel) S = Smoothness Function (Depends on the neighborhood)
To be minimized = α D + (1- α) S α serves as a prior! Hence graph based segmentation answers the question:
What is the best segmentation, given this function? We still haven’t answered how the two relate…
Segmentation using Graph Cuts Let us construct a simple graph to see how the two (graph
cuts and segmentation of images) relate
α
β
D(α)
D(β)
S
Segmentation using Graph Cuts Start with an initial labeling Find the Min-Cut Adjust the labels Iterate until a good minimization of the
function is reached
What can we do once we have the segmented regions? Shape Description
Generalized Hough Transform R-Table based representation We need to know a few things
Centroid of a shape
Texture Model Not explored in detail yet!
Example for Centroid
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0
0 0 1 0 0 1 0 0
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 1 0 0 1 0 0
0 0 0 1 1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
)5.3,5.4(),( cr
5.4
)776655443322(12
1
r
r
5.3
)665544332211(12
1
c
c
Shape Description-Finding Centroid For each
boundary point Find r = (x’, y’)
xc = x + x’
yc = y + y’
Φ is the angle which the tangent at (x, y) makes with the x-axis
φx’
y’
(xc, yc)
r
x
y
Encoding The R-Table
This can heavily exploit the redundancy between the magnitudes and directions of R-Table entries
We might as well go for DPCM Heaven Knows
Benefits Objects encoded independently and can hence be
manipulated independently in the transform domain
Block-based Vs Parametric Motion Representation Block based
Use variable block sizes within the segmented object based on texture model
Use smaller blocks around the boundary pixels Parametric Motion
We know that given that the world is planer, two images taken from a perspective camera of the same scene are related by a projective transformation
We can assume each object to lie in a plane, similar to the concept of VOP, and compute the projective transformation parameters to estimate motion
Compatibility with MPEG-4
hierarchically multiplexeddownstream control / data
hierarchically multiplexedupstream control / data
audiovisualpresentation
3D objects
2D background
voice
sprite
hypothetical viewer
projection
videocompositor
plane
audiocompositor
scenecoordinate
systemx
y
z user events
audiovisual objects
speakerdisplay
user input
Hierarchical Description
The scene divided into objects
Our Shape/Texture Representation Goes Here
References
Gary J. Sullivan, Pankaj Topiwala, Ajay Luthra SPIE Conference on Applications of Digital Image Processing XXVII, Special Session on Advances in the New Emerging Standard: H.264/AVC, August, 2004
Gabriel Antunes, Abrantes, Fernando Pereira, MPEG-4 Facial Animation Technology : Survey, Implementation and Results, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, No. 2, March 1999
Roger H Clarke, Image and Video Compression: A Survey Department of Computing and Electrical Engineering, Heriot-Watt University, Riccarton, Edinburgh EH14 4 AS, Scotland.
Noel Brady, MPEG-4 Standardized Methods for the Compression of Arbitrarily Shaped Video Objects, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, No. 8, December 1999
Boykov, Y.; Veksler, O.; Zabih, R.; Fast approximate energy minimization via graph cuts, Pattern Analysis and Machine Intelligence, IEEE Transactions on Volume 23, Issue 11, Nov. 2001 Page(s):1222 - 1239
References
P. Gerken, “Object-based analysis-synthesis coding of image sequences at very low bit rates,” IEEE Circuits System. Video Technology., vol. 4, pp. 228–235, June 1994.
T. Kaneko and M. Okudaira, “Encoding of arbitrary curves based on the chain code representation,” IEEE Trans. Communications., vol. 33, July 1985.
P. Nunes, F. Marques, F. Pereira, and A. Gasull, “A contour-based approach to binary shape coding using a multiple grid chain code,” Signal Process. Image Communications., to be published.
Moving Picture Experts Group. [Online]. Available www:http://www.cselt.it/mpeg
G. Abrantes and F. Pereira, “Interactive analysis for MPEG-4 facial models configuration,” in EUROGRAPHICS’98–Short Presentations, Lisbon, Portugal, Sept. 1998, pp. 1.6.1–1.6.4.