study video stylization for digital ambient displays of home
Post on 18-May-2015
355 Views
Preview:
DESCRIPTION
TRANSCRIPT
STUDY OF
VIDEO STYLIZATION FOR DIGITAL AMBIENT DISPLAYS OF HOME
MOVIESNPAR 2010
Tinghuai Wang and John Collomosse University of Surrey, UK
David Slatter, Phil Cheatle and Darryl Greig
Hewlett-Packard Labs, Bristol UK.
Video clips are stylized into cartoons or paintings, and sequenced according to semantic and visual similarity
Abstract
“Digital Ambient Display” Video cartoon of home movie
Video segmentation based on multi-label graph cut Video
temporal coherent region maps (tracking regions)
enhance cartoon painting
System
Algorithm
Outline
Introduction System Overview View Stylization
Multi-label Graph cut, region propagation, refining region label, smoothing and filtering, stroke placement and shading
Video Sequencing Stochastic composition, rendering transitions
Results and Discussion Conclusion
Video temporal coherence
region
Video Home movie
Introduction
Digital Ambient Display
A genre of content consumption experience which we call ambient experience
Displaying still images in an ambient way Digital picture frame
Displaying video content in an ambient way ? Digital Ambient Displays (DAD)
Digital Ambient Displays (DADs)
Video “mid-level” scene abstract using Color region segmentation
Video temporal coherence region region propagation, multi-label mraphic cut
Video Home Movie Video selection, composition and transition
Related work
Stochastic selection of video clips Stochastic transitions between video frames
[Schodl et al. 00] Single video and based on visual similarity
Composition of photos for abstract [Collomosse and Hall 03]
Video artwork [Slatter et al. 10; Bizzocchi 08]
Little work of the use of artistic video stylization in ambient displays
Related work
Image segmentation Mid-level models of scene structure [Wang et
al . 04; Collomosse 04] to render in artistic styles
Mean-shift based stylization [Wang et al . 04] small and short-live segments
Spatio-temporal volumes from video [Collomosse et al 05] of 3D dimension (x,y,t)
Abstract video using a bilateral filer [Winnemoller et al. 06]
Lack of temporal coherence
Our approach of DADs
A novel video segmentation Segmentation is guided by motion flow from
the region of past frames
Video selection and composition Video selection, composition and transition is
guided by similarity
Video Segmentation
t
11 1
2 2 23 3 3
4 4 45 5
Labeling regions and tracking regions in temporal
System overview
Video StylizationMulti-label Graph cut, region propagation, refining region label, smoothing and filtering, stroke placement and shading
Video temporal coherence
region
Video Segmentation
A novel coherent video segmentation Multi-label graph cut on successive video
frames
Multi-label graph cut
previous frame fn-1Current frame fn
fn-2
fn-3
…
label
color distribution built Gaussian Mixture Model (GMM) of each region
past frames
propagated by motion
Video Segmentation
Assign region labels existing in frame It-1 to each pixel p in frame It(p)
Find the best mapping l : P L
where L = { l(1), …, l(p), … l(|P|) }, P is an 8-connected lattice of pixels
To minimize the global energy function to encourage Spatial homogeneity of contrast within each frame Temporal consistency of color distribution between frames
labeling
Minimize global energy E
U : temporal consistency of color distribution between frames
V : spatial homogeneity of contrast within each frame
where 1) L is label set of the previous frame2) P is connected pixels in belong to labels3) Θ is the color history model
Minimize energy of V
V : spatial homogeneity of contrast within each frame
Punish pair points (8-connected neighbor) where they have different label but have high color homogeneity !
2
1
3?
Minimize energy of V
V : spatial homogeneity of contrast within each frame
Punish pair points (8-connected neighbor) where they have different label but have high color homogeneity !
2
1
3?
Minimize energy of U
U : temporal consistency of color distribution between frames
f(n-k) ... f(n-4) f(n-3) f(n-2) f(n-1) f(n)
Label/color L1/255 … L4/245 L4/248 L1/250 L1/255 ?/255
Color histogram at pixel p – label/color at each frame
the color distribution at pixel p with label L1
color color
the color distribution at pixel p with label L4
pixel p
color distributions of different label assignment
255255
Minimize energy of U
the color distribution at pixel n with label l(pn)
color color
the color distribution at pixel m with label l(pm)
pixel n
pixel m
color distributions of different pixel
gP
U1
log
U : temporal consistency of color distribution between frames
Minimize energy of U
N : Normal distribution (μ, Σ)Σ N : Mixture of Gaussians (GMM)Θ : parameters of all GMMs, Θ = {ωik, μik, Σi,k; i = 1, …, L; k = 1, …, Ki}
U : temporal consistency of color distribution between frames
Multi-label Graph Cut
Minimize E is a NP-hard problem
Multi-label graph cut α-expansion iteration for each label
until E can not decrease [Boykov and Kolmogorov 2004].
Graph cut
Min-cut/max-flow
Maximum flow ≡ minimum cut
Hong Chen, “Introduction to Min-Cut/Max-Flow Algorithms”
Multiple-Label Graphic Cut
Hong Chen, “Introduction to Min-Cut/Max-Flow Algorithms”
Multi-label graph cut on Binary Label
[PAMI04] Boykov and Kolmogorov, “An Experimental Comparison of Min-Cur/Max-Flow Algorithms for Energy Minimization in Vision”
Multi-label graph cut on Binary Label
Mini-Cut problemMax-Flow problem of each pixel
Multi-label graph cut on Binary Label
Mini-Cut problem on boundary
Max-Flow problem
Region propagation
Estimate the motion of It-1 using RANSAC search based on SIFT features
[Lowe 04] rigid motion + deformation I’t-1
Propagation labeling per pixel from I’t-1 It Incorrect motion estimation ?
Use thinned skeleton to mitigate imprecise motion estimation
Region/Skeleton
≡regions skeleton
error motion estimation ?
region propagation with motion pruning skeleton robust region
Skeleton to robust motion estimation
use only the skeletons whose distance to the boundary exceeds a pre-set
confidence
Region propagation
It-1 Labeled It-1
skeleton I’t-1 It
region label warped according to per-pixel motion estimation
replace regions with skeleton to robust motion estimation
GMM
Build a GMM color model for each region
li Sampling historical colors of labelled pixels
over recent frames
How to sample historical colors? contribution weight
More recent color contributes more importance
Refining region labels
How about new objects appear in It ?
D
Refining region labels
Keep two color models for each label l in frame It
(1) Historical color model
(2) an update color model
If |Mh – Mu| > threshold, new objects are deemed present
Smoothing and filtering
Spatio-temporal smoothing Gaussian filter of 3x3x3 (x-y-t)
Filtering Remove false segmentation and short-lived
object
smoothing
filtering
D
Filtering - remove short-lived object
K disconnected objects (e.g. c1, c2, c3…, cK) with the same label
dl,k : duration of kth object with label lτr : threshold. in this paper, 6 frames
D l : duration of label l
If the duration of any of these disconnected video object within this time window is shorter than threshold, this video object is removed
Stroke Placement and Shading
β-spline stroke Face detection
Painterly Rendering Painterly Rendering with Curved Brush Strokes of
Multiple Sizes [Hertzmann 1998]
Interpolate an orientation field from the shape of the region in this paper
Result of painterly rendering
Video SequencingStochastic composition, rendering transitions
Video Home movie
Stochastic composition
Stochastic composition
V1 V2
V3
Vi
V4
Video sequencing depends on
1. ds (Va, Vb) : semantic distance (tag) between tags of video A and B
2. dv (i, j) : the similarity of videos
Rendering Transitions
Smoothing video transition similar
similar
similar
similar
Region Morphing
Rendering Transitions
C(:) indicates mean color similarity;
A(:) indicates relative area;
S(:) indicates shape similarity in terms of region compactness
Smoothing video transition
0.5 0.4 0.1
Results
23 videos, manually tagged collection
Comparison - BOY
•BOY sequence•This paper•‘synergistic’ mean-shift + edge (Comaniciu, 2002) •spatio-temporal method (Paris, 2008).
Result - BEAR
fine scale features (e.g. the bear’s eyes and nose) are retained
Result - KITE
background detail may (optionally) be abstracted by modifying the initial frame segmentation to merge unwanted detailed regions
Result - DRAMA
correct handling of regions that disappear and appear within sequences
Conclusion
Digital Ambient Display (DAD) Select, stylized and transitions between clips
automatically
A novel algorithm for coherent video segmentation based on multi-label graph cut
Parse scene structures to enable shading and painterly effects
Create interesting transition effects between clips using region correspondence
Future work
Backward propagation of region labels to improve coherence of segmentations
Improve painterly rendering by region motion caused by occlusion vs. object deformation
Graph optimization algorithm similar to [Kovar et al. 02] to plane routes through a subset of clips e.g. to encompass a theme such as “family vacations” rather than traversing the whole database
Automatic meta-data annotation on user video collection, e.g. photo categorization [Ruiz et al. 03]
END
top related