© SRI International Sarnoff© SRI International
Alignment & Its Applications in Computer Vision
PU COS 429October 22, 2015
Harpreet S. Sawhney
CTO-Vision Technologies(Technical Director, Vision&Learning)
SRI InternationalPrinceton, NJ
© SRI International Sarnoff© SRI International Sarnoff© SRI International
SRI Information and Computing Sciences• 250 researchers• $100M revenue• >75% is USG business
Artificial IntelligenceCenter
Virtual Personal AssistanceLarge Scale Text UnderstandingMulti-INT, Large Scale, Data OrganizationKnowledge RepresentationAutomated Reasoning
Cyber Security Information SecuritySmart Grid TechnologiesHigh Assurance Systems
Speech Recognition and TranslationNatural Language UnderstandingSIGINT ExploitationSpeech Analytics/Information ExtractionSocial Media Sentiment Analysis
Computer ScienceLaboratory
Security and SurveillanceReal-Time Video ProcessingLow Power Embedded SystemsAdaptive & Cognitive TrainingHuman-Machine InteractionLarge-scale Image and Video SearchUAS ISR and Geo-registrationGPS-denied NavigationMixed & Augmented Reality
VisionTechnologies
Speech Technologyand Research Lab
2
Vision & Robotics
Vision & Learning
Vision Systems
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Technology for K-12 and higher education
Technology Spin-off VenturesGrowth opportunities that bring SRI innovations to market
3
Panoramic image editing software*
Drug dispensing system*
Anti-counterfeiting systems
Customer service tools*
Surgical robotics
Portable power systems**
(formerly Rosedale Medical)
Glucose monitoring system
LCD technology*
Iris biometric identification*
Drug discovery
Disposable hearing aid*
Video-on-demand services**
Wireless mesh networks*
Publicly Traded
Information Technology
Materials Biomedical
Speech recognitionfor customer service
Electronic signature solutions
Digital color printing applications*
Super-bright LED light engines*
Video enhancement systems
Electroactive polymers*
Optical networkcomponents
Virtual personal assistant for the
iPhone*
Enterprise social media technology
Metal “print and plate”manufacturing process
DNA testing services*
*Acquired or merged** Dissolved
Stray voltage detection services
Environmentally friendly light products*
Real-time web video streaming and
sharing
Robotics
Next-generation personalized web
search tool
Travel search and planning
Educational gamingplatform
Innovative robots formanufacturing/service
Electroadhesion formaterials handling
Digital imaging system
Smart calendar for iPhone
© SRI International Sarnoff© SRI International
SRI Center for Vision Technologies
• 90 staff members• 30 year history in
Real Time Computer Vision
• 150+ patents First real time AR broadcast on live TV 1994: Ads in Baseball Games >> 10 Yard Line in Football
Live traffic Monitoring, deployed all over the country
VideoBrush: First ever live Video Mosaicing (now part of all Android phones)
IED DetectionCurrently saving lives in theatre
Breast Cancer: MRI based Tumor
Some Accomplishments
Slide 4
© SRI International Sarnoff© SRI International
SRI Center for Vision Technologies
• Computational Sensing
– Embedded Vision
• 2D/3D reasoning– GPS denied navigation– 3D mapping – Augmented reality– Aerial Surveillance
• Vision analytics– Video understanding– Image search
• Human behavior modeling
Search based on image/ video content
First ever Augmented Reality binoculars
Human Behavior Modeling: Social interaction and communication with computers
GPS Denied Navigation (Dismount, Robots, Vehicles, Aerial, Naval etc.)
Leading Platforms
Slide 5
© SRI International Sarnoff© SRI International6
SRI Vision Technology Algorithm Portfolio
• Non-uniform correction(sensor defect)
• AGC / color correction
• Extreme low light
• High frame rate capture
• Motion-adaptive processing
• Multi-spectral / VNIR/ SWIR
• Pan-Tilt control loop
• Image enhancement
• Stabilization
• Motion tracking
• Image fusion
• Mosaics
• Depth of field extension
• Dynamic range extension
• Vision guided prefiltering
• Super-resolution
• Dense stereo
• Face and Body detection
• Head/Face/Gaze tracking
• Landmark matching
• Moving Target Indication
• Multi Target Tracking
• Image based geo-location
• 3D LiDAR for SLAM
• Visual Odometry
• Robotic Navigation
• Geo-registration
• Visual Search / fast indexing
• Image Geo-Location• Image and Video data
mining• Object Recognition
• Activity detection
• Wide area surveillance
• Occlusion reasoning
• Gesture recognition
• Human State Estimation
(not a complete set)
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment for Change Detection : Tampering
7
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment for Change Detection
8
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment for Change Detection : Tampering
9
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment for Change Detection
10
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment for Mosaicing
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment for Moving Object Detection
12
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment for GeoSpatial Information
13
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment for Augmented Reality
14
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment in 3D
15
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment for Augmented Reality
Guide User through Emergency Response Procedures
User can ask questions and interactively diagnose problems
Display overlaid animations with directions
Automatically observe user actions and state of equipment and provide warnings and feedback
16
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment for Special Effects: MatchMove
17
© SRI International Sarnoff© SRI International
Pin-hole Camera Model
fZ
Y
y
ZY
f=y fP≈p
© SRI International Sarnoff© SRI International
Camera Rotation (Pan)
f
Z’
Y’
y’
Z′Y′
f=y′ P′f≈p′PR′=P′
pR′≈p′
© SRI International Sarnoff© SRI International
Camera Rotation (Pan)
f
Z’’
Y’’
y’’
Z ′′Y ′′
f=y ′′ P ′′f≈p ′′PR ′′=P ′′
pR ′′≈p ′′
© SRI International Sarnoff© SRI International
Image Motion due to Rotationsdoes not depend on the depth / structure of the scene
Verify the same for a 3D scene and 2D camera
© SRI International Sarnoff© SRI International
Pin-hole Camera Model
fZ
Yy
ZY
f=y fP≈p
© SRI International Sarnoff© SRI International
Camera Translation (Ty)
fZ
Yy
XX
X
X
Z′Y′
f=y′ P′f≈p′ T′+P=P′
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Translational Displacement
Z′Y′
f=y′
ZTy+Y
f=y′
ZTy
f=y-y′
Z′Y′
f=y′
Tz+ZY
f=y′
ZTz
yy y-- ′=′
Image Motion due to Translationis a function ofthe depth of the scene
© SRI International Sarnoff© SRI International Sarnoff© SRI International
© SRI International Sarnoff© SRI International
Alignment Accounts for Motions…
• Motion Models– 2D
26
• Homography is the most general 2D model. – Includes all the transformations as special cases.
© SRI International Sarnoff© SRI International
Parameterization
ImagesCOP
P,P'
Rotations/HomographiesPlane Projective Transformations
RPP' =cc Rpp ≈'
RKppK '' ≈
RKpKp 1'' −≈
pHp'∞≈
© SRI International Sarnoff© SRI International Sarnoff© SRI International
3D Motion…
28
𝑝𝑝 ≈ 𝐾𝐾 𝑅𝑅 𝑇𝑇0 1 P
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Alignment Methods
29
Acknowledgement: Adapted slides from http://slazebni.cs.illinois.edu/
© SRI International Sarnoff© SRI International
Direct Methods for Visual Motion Estimation
Employ Models of Motionand Estimate Visual MotionthroughImage Alignment
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Direct Methods : The How Alignment of spatio-temporal images is a means of obtaining :Dense Representations, Parametric Models
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Direct Method based Alignment
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Formulation of Direct Model-based Image Alignment
)p(I1 ′ )p(I2
p)p(up −
Model image transformation as :))Θ;p(up(I)p(I 12 −=
Images separated by time, space, sensor types
Reference CoordinateSystem
Generalized pixelDisplacement
ModelParameters
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Formulation of Direct Model-based Image Alignment
)p(I1 ′ )p(I2
p)p(up −
Compute the unknown parameters and correspondenceswhile aligning images using optimization :
∑i
iΘ),σ;r(ρmin ));(()( 12 Θ−−= iiii pupIpIr
What all can be varied ?
Filtered ImageRepresentations(to account for Illumination changes,Multi-modalities)
ModelParameters
Measuringmismatches(SSD, Correlations)
OptimizationFunction
© SRI International Sarnoff© SRI International Sarnoff© SRI International
How do we solve for the motion ?
)p(I))p(up(I)p(I 112 ′== -Use Taylor Series Expansion
)2(O)p(uI)p(I)p(I T112 +∇= -
Image Gradient
Convert constraint into an objective function
∑∈
∈Rp
2T1SSD ))p(I)p(uI()u(E δ+=
)p(I)p(I 12 -
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Optical Flow Constraint Equation
)2(O)p(uI)p(I)p(I T112 +∇= -
0)p(I)p(uIT1 ≈+δ∈
At a Single Pixel
Leads to
0IuIuI ty
yx
x =++
Normal FlowII
- t
∇
© SRI International Sarnoff© SRI International Sarnoff© SRI International
© SRI International Sarnoff© SRI International Sarnoff© SRI International
© SRI International Sarnoff© SRI International Sarnoff© SRI International
© SRI International Sarnoff© SRI International Sarnoff© SRI International
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Generalized M-Estimation
∑i
iΘ),σ;r(ρmin ));(()( 12 Θ−−= iiii pupIpIr
• Given a solution )m(Θ at the mth iteration, find Θδ by solving :
∑∑ ∑ ∂∂
−=∂∂∂
∂∂
l i i k
ii
i
il
l
i
k
i
i
i rrrrrr
rr
θρθ
θθρ )()(
k∀
iw
• iw is a weight associated with each measurement.Can be varied to provide robustness to outliers.
Choices of the );r( i σρ function:
2SS
σ1
r)r(ρ=
222
2GM
)rσ(σ2
r)r(ρ
+=
2
2
SS σ2rρ =
22
22
GM σr1σrρ
+=
© SRI International Sarnoff© SRI International
Optimization Functions & their Corresponding Weight Plots
Geman-Mclure Sum-of-squares
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Model-based Coarse-to-fine Image AlignmentPyramid Processing and Alignment
∑ +−p
2
ΘΘ))u(p;(pI(p)I( 21min )
{ R, T, d(p) }{ H, e, k(p) }
{ dx(p), dy(p) }
d(p)
Warper-
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Application : Object Insertion/Deletion with Layers Video Stream with
deleted moving objectOriginal Video
Dynamic Mosaic Video
© SRI International Sarnoff© SRI International Sarnoff© SRI International
1D vs. 2D SCANNING
• 1D : The topology of frames is a ribbon or a string.Frames overlap only with their temporal neighbors.
• 2D : The topology of frames is a 2D graphFrames overlap with neighbors on manysides
(A 300x332 mosaic captured by mosaicing a 1D sequence of 6 frames)
© SRI International Sarnoff© SRI International Sarnoff© SRI International
1D vs. 2D SCANNING
The 1D scan scaled by 2 to 600x692 A 2D scanned mosaic of size 600x692
© SRI International Sarnoff© SRI International Sarnoff© SRI International
FRAME-TO-Frame VS. LOCAL-TO-GLOBALALIGNMENT
• Uses limited 2D spatial context
• Causal commitment to parameters cannot be corrected
• Demands large overlap betweenframes
• Uses all the available frame-to-frameconstraints
• Global solution is optimal subject tolocal frame-to-frame constraints
• Works even with small overlap betweenframes
© SRI International Sarnoff© SRI International Sarnoff© SRI International
CHOICE OF 2D MANIFOLD
Plane Cylinder Cone
Sphere Arbitrary
© SRI International Sarnoff© SRI International Sarnoff© SRI International
PROBLEM FORMULATION
Given an arbitrary scan of a scene
Create a globally aligned mosaic by minimizing
∑ ∑∈
++=Gij i
iij EEE mosaic) theofArea (min 2
}{σ
iP
Create a compact appearance while being geometrically consistent
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Loss Function to be Minimized
∑ ∑∈
++=Gij i
iij EEE mosaic) theofArea (min 2
}{σ
iP
where
ation transformdistortionleast like criterion prioria for allow toerror term reference to Frame:
relations odneighborho therepresents that Graph : and neighbors betweenerror alignment of measureAny :
mapping, image-to- Reference:
i
ij
EG
jiEXPuP iii =
© SRI International Sarnoff© SRI International
GLOBALLY CONSISTENT ALIGNMENT: Bundle Adjustment
• Given: arcs ij in graph G of neighbors
• The local alignment parameters, Qij, help establish feature correspondence between i and j
• If uil and ujl are correspondingpoints in frames i,j, then
211 |)()(| jljili PP uuEij−− −=
• Incrementally adjust poses Pi to minimize
ui uj
Eij
∑ ∑∈
+=Gij i
iij EEE}{
miniP
© SRI International Sarnoff© SRI International Sarnoff© SRI International
LOCAL TO GLOBAL MOSAIC ALGORITHM
TopologyDetermination
TemporalCoarse
RegistrationLocal
Coarse&FineRegistration
GlobalConsistency
ColorMatching/Blending
MosaicRepresentation
Imagesor
Video
Panoramic Visualization
Virtual Reality
Other Applications
© SRI International Sarnoff© SRI International Sarnoff© SRI International
PLANAR TOPOLOGY EVOLUTION
Whiteboard Video Sequence75 frames
PLANAR TOPOLOGY EVOLUTION
© SRI International Sarnoff© SRI International Sarnoff© SRI International
FINAL MOSAIC
© SRI International Sarnoff© SRI International Sarnoff© SRI International
SPHERICAL MOSAICS
Sarnoff Library VideoCaptures almost the complete sphere
with 380 frames
© SRI International Sarnoff© SRI International Sarnoff© SRI International
SPHERICAL TOPOLOGY EVOLUTION
© SRI International Sarnoff© SRI International Sarnoff© SRI International
SPHERICAL MOSAICSarnoff Library
© SRI International Sarnoff© SRI International Sarnoff© SRI International
SPHERICAL MOSAICSarnoff Library
NEW SYNTHESIZED VIEWS
© SRI International Sarnoff© SRI International Sarnoff© SRI International
FINAL MOSAICPrinceton University Courtyard
© SRI International Sarnoff© SRI International Sarnoff© SRI International
VIDEO MOSAIC EXAMPLE
Princeton Chapel Video Sequence54 frames
© SRI International Sarnoff© SRI International Sarnoff© SRI International
UNBLENDED CHAPEL MOSAIC
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Image Merging withLaplacian Pyramids
Image 1 Image 2
1 2
Combined Seamless Image
© SRI International Sarnoff© SRI International Sarnoff© SRI International
VORONOI TESSELATIONS W/ L1 NORM
© SRI International Sarnoff© SRI International Sarnoff© SRI International
BLENDED CHAPEL MOSAIC
© SRI International Sarnoff
Applications of 2D/3D Alignment
© SRI International Sarnoff© SRI International
High Dynamic Range Management
68
Improve overall driving experience, see under adverse conditions
Today’s imagers can’t image full outdoor scene dynamic range
Real-time, low latency high dynamic range sensor processing
• Robust motion adaptive frame to frame alignment• Local contrast enhancement for deep pixel range• Tight sensor exposure management
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Extreme blur reduction examples
70
Eye Chart
Aerial
SRI’s image processing (MASI)for extreme camera motion
© SRI International Sarnoff© SRI International
Temporal Image Enhancement and Haze Reduction
71
Challenges:- Robust under low
SNR conditions- Difficult temporal
registration- Moving platforms- Low feature
contentRaw Low SNR video Multi-frame Temporal
Alignment and Fusion
Original Imagery Dehaze Dehaze and CN
Dehaze and Enhancement for Submarine Periscope
Video
© SRI International Sarnoff© SRI International
Contrast Normalization for Wide Dynamic Range
7272
© SRI International Sarnoff© SRI International
Three Band Fusion with Contrast Normalization
73
VIS SWIR
LWIR Fused
© SRI International Sarnoff© SRI International
High Quality Stereo Sequence Synthesis (IMAX 3D Content Creation)
Live Action Content• Camera is very large.
• Requires two strips of large format film.
• Size of camera and cost of film limits production.
15 perforations
70 mm
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Live Action Sequence
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Live Action : Hybrid Input
Left
Right
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Synthesized Output
Left
Right
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Hybrid Stereo Camera... pure upsampling is not an option ...
INPUT OUTPUT
Left Eye(1.5K)
Right Eye(6K)
Left Eye(6K)
Right Eye(6K)
1:16
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Render the High-Res content into the coordinate systemof the Low-Res Frame !
How can the Hybrid Camera be Realized ?
??
Left Eye
Right Eyet
t
t+2t+1t-1t-2
© SRI International Sarnoff© SRI International
ApproachConvergence of Computer Vision & IBR
• Compute stereo disparities at lo-res.
• Compute motion (Optical Flow) at lo-res.
• Compute quality map at lo-res.
• Synthesize hi-res frame.
• Fill-in and color correct mis-matched pixels.
• Temporal de-scintillation.
© SRI International Sarnoff© SRI International Sarnoff© SRI International
Correspondences by Coarse-to-fine Model-based Image AlignmentAPrimer
Synthesis vs. Up-resing : Live Action
Synthesis vs. Up-resing : CG Animation