golnaz abdollahian, cuneyt m. taskiran, zygmunt pizlo, and edward j. delp c amera m otion -b ased a...

22
Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 1, JANUARY 2010

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Golnaz Abdol lahian, Cuneyt M. Taski ran, Zygmunt Piz lo , andEdward J . Delp

CAMERA MOTION-BASED ANALYSIS OF

USER GENERATED VIDEO

IEEE TRANSACTIONS ON MULTIMEDIA , VOL. 12 , NO. 1 , JANUARY 2010

Page 2: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

UGV generally has a rich camera motion structure that is generated by the person taking the video and it is typically unedited and unstructured.

The main application of our system is for mobile devices which have become more popular for recording, sharing, downloading and watching UGV use computationally efficient methods

We propose a new location-based saliency map which uses camera motion information to determine the saliency values of pixels with respect to their spatial location in the frame.

Introduction

Page 3: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Global Motion Estimation• In the majority of UGV, camera motion is limited to a few

operations, e.g. pan, tilt, and zoom; more complex camera movements, such as rotation, rarely occur in UGV

• our goal here is to be computationally efficient to be able to target devices with low processing power such as mobile devices

• use a simplified three-parameter global camera motion model in the three major directions

Motion-Based Frame Labeling

H :horizontalV : verticalR : radial

Page 4: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Template

The iteration stops when a local minimum is found

Motion-Based Frame Labeling

L1 distance between the 2-D template in the current frame and the previous template

Page 5: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Motion Classification – support vector machine(SVM) is used We first classify it as having a zoom or not, using the 3-

Dmotion vector as the feature vector SVM classifiers are trained on an eight-dimensional feature

vector derived from the parameters H and V over a temporal sliding window. The size of sliding window is different for blurry(N=7) and shaky(N=31)

The frames that are not labeled as zoom, blurry or shaky are identified as stable motion with no zooms.

Motion-Based Frame Labeling

Page 6: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Motion-Based Frame Labeling

Page 7: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Two frames are considered to be correlated if they have overlap with each other

Camera View : a temporal concept defined as a set of consecutive frames that are all correlated with each other.

View boundaries occur when the camera is displaced or there is a change of viewing angle

To detect view boundaries for temporally segmenting the video ,we defind the displacement vector between frames i and j as

Temporal Video Segmentation Based on The Use Of Camera View

Page 8: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

A boundary frame is flagged whenever the magnitude of the displacement vector, , for the current frame and that for the previously detected boundary frame is larger than

There is a constraint that boundary frame can’t be chosen during intervals labeled as blurry segments.

Temporal Video Segmentation Based on The Use Of Camera View

Page 9: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

A keyframe should be the frame with the highest subjective importance in the segment in order to represent the segment it is extracted from

Since our intention was to avoid the complex tasks of object and action recognition in our system, our keyframe selection strategy was only based on camera motion factor.

The following frames are selected as keyframes : The frame after a zoom-in The frame after a large zoom-out The frame where the camera is at pause For segments during which the camera has constant motion, all

frames are considered to be of relatively same importance. In this case, the frame closest to the middle of the segment and having the least amount of motion is chosen as the keyframe in order to minimize blurriness

Keyframe Selection

Page 10: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Combine several saliency map to generate the keyframes saliency maps

color contrast saliencymoving objects saliency maphighlighted faceslocation-based saliency map

Keyframe Saliency Maps and ROI Extraction

Page 11: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Use the RGB color space to generate the contrast-based saliency map

The three-dimensional pixel vectors in RGB space are clustered into a small number of color vectors using generalized Lloyd algorithm (GLA) for vector quantization

Color Contrast Saliency Map

Pij and q : RGB pixel valueΘ : neighborhood of pixel (i,j) (5*5)d : gaussian distance

Page 12: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

To determine the moving object saliency map, we examine the magnitude and phase of macro block relative motion vectors

Relative motion vector for the macro block at location (m,n) :

If relative motion below a threshold values -> assign 0

The motion intensity I and motion phase φ are defined as

Moving Object Saliency Map

Page 13: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

The phase entropy map, Hp, indicates the regions with inconsistent motion which usually belong to the boundary of the moving object

Moving Object Saliency Map

the probability of the kth phase whose valueis estimated from the histogram

Page 14: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

The direction of the camera motion also has a major eff ect on the regions where a viewer “looks” in the sequence

The global motion parameters were used to generate the location saliency maps for the extracted keyframes

Location-Based Saliency Map

kH,kV,Kr : constant (10,5,0.5)

r : distance of a pixel from the center rmax : maximum r in the frame

Page 15: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

After combining the H and V maps, the peak of the map function is at

The radial map, SR , is either decreasing or increasing as we move from the center to the borders, depending on whether the camera has a zoom-in/no-zoom or zoom-out operation

Location-Based Saliency Map

Page 16: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

First, the color contrast and moving object saliency maps are superimposed since they represent two independent factors in attracting visual attention

Faces are detected and highlighted after combining the low level saliency maps

The location-based saliency map is then multiplied pixel-wise with this map to yield the combined saliency map

Combined Saliency Map

Page 17: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

A region growing algorithm proposed is used to extract ROI from the saliency map

Fuzzy partitioning is employed to classify the pixels into

R1: ROI and R0: insignificant regions

seed selection 1) the seeds must have maximum local contrast 2) the seeds should belong to the attended areas

IdentificationNofF ROs

Page 18: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Experimental Results

Page 19: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Experimental Results

Page 20: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Experimental Results

left

left

Zoom-out

Zoom-out

Zoom-in

Page 21: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

Experimental Results

Page 22: Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS

UGVs contain rich content-based camera motion structure that can be an indicator of “importance” in the scene

Since camera motion in UGV may have both intentional and unintentional behaviors, we used motion classification as a preprocessing step

A temporal segmentation algorithm was proposed based on the concept of camera views which relates each subshot to a diff erent view

We use a simple keyframe selection strategy based on camera motion patterns to represent each view

we employed camera motion in addition to several other factors to generate saliency maps for keyframes and identify ROIs based on visual attention

Conclusion