richard g. baraniuk chinmay hegde
DESCRIPTION
Manifold Learning in the Wild A New Manifold Modeling and Learning Framework for Image Ensembles Aswin C. Sankaranarayanan Rice University. Richard G. Baraniuk Chinmay Hegde. Sensor Data Deluge. Internet Scale Databases. Tremendous size of corpus of available data - PowerPoint PPT PresentationTRANSCRIPT
Richard G. Baraniuk Chinmay Hegde
Manifold Learning in the WildA New Manifold Modeling and Learning Framework for Image Ensembles
Aswin C. SankaranarayananRice University
Sensor Data Deluge
Internet Scale Databases• Tremendous size of corpus of available data
– Google Image Search of “Notre Dame Cathedral” yields 3m results 3Tb of data
Concise Models• Efficient processing / compression requires
concise representation• Our interest in this talk: Collections of images
Concise Models• Our interest in this talk:
Collections of image parameterized by q \in
Q– translations of an object
q: x-offset and y-offset
– rotations of a 3D objectq: pitch, roll, yaw
– wedgeletsq: orientation and offset
Concise Models• Our interest in this talk:
Collections of image parameterized by q \in
Q– translations of an object
q: x-offset and y-offset
– rotations of a 3D objectq: pitch, roll, yaw
– wedgeletsq: orientation and offset
• Image articulation manifold
Image Articulation Manifold• N-pixel images:
• K-dimensional articulation space
• Thenis a K-dimensional manifoldin the ambient space
• Very concise model– Can be learnt using Non-linear dim. reduction
articulation parameter space
Ex: Manifold Learning
LLEISOMAPLEHEDiff. Geo …
• K=1rotation
Ex: Manifold Learning
• K=2rotation and scale
Smooth IAMs• N-pixel images:
• Local isometry image distance parameter space distance
• Linear tangent spacesare close approximationlocally
• Low dimensional articulation space
articulation parameter space
Smooth IAMs
articulation parameter space
• N-pixel images:
• Local isometry image distance parameter space distance
• Linear tangent spacesare close approximationlocally
• Low dimensional articulation space
Smooth IAMs
articulation parameter space
• N-pixel images:
• Local isometry image distance parameter space distance
• Linear tangent spacesare close approximationlocally
• Low dimensional articulation space
• Ex: translation manifold
all blue images are equidistant from the red image
• Local isometry
– satisfied only when sampling is dense
0 20 40 60 80 100
0
0.5
1
1.5
2
2.5
3
3.5
Translation q in [px]
Euc
lidea
n di
stan
ce
Theory/Practice Disconnect Isometry
Theory/Practice DisconnectNuisance articulations
• Unsupervised data, invariably, has additional undesired articulations– Illumination– Background clutter, occlusions, …
• Image ensemble is no longer low-dimensional
Image representations
• Conventional representation for an image– A vector of pixels– Inadequate!
pixel image
Image representations• Replace vector of pixels with an abstract
bag of features
– Ex: SIFT (Scale Invariant Feature Transform) selects keypoint locations in an image and computes keypoint descriptors for each keypoint
– Very popular in many many vision problems
Image representations• Replace vector of pixels with an abstract
bag of features
– Ex: SIFT (Scale Invariant Feature Transform) selects keypoint locations in an image and computes keypoint descriptors for each keypoint
– Keypoint descriptors are local; it is very easy to make them robust to nuisance imaging parameters
Loss of Geometrical Info• Bag of features representations hide
potentially useful image geometry
• Goal: make salient image geometrical info more explicit for exploitation
Image space
Keypoint space
Key idea• Keypoint space can be endowed with a rich
low-dimensional structure in many situations
Key idea• Keypoint space can be endowed with a rich
low-dimensional structure in many situations
• Mechanism: define kernels , between keypoint locations, keypoint descriptors
Keypoint Kernel• Keypoint space can be endowed with a rich
low-dimensional structure in many situations
• Mechanism: define kernels , between keypoint locations, keypoint descriptors
• Joint keypoint kernel between two images
is given by
Many Possible Kernels• Euclidean kernel
• Gaussian kernel
• Polynomial kernel
• Pyramid match kernel [Grauman et al. ’07]
• Many others
Keypoint Kernel• Joint keypoint kernel between two images
is given by
• Using Euclidean/Gaussian (E/G) combination yields
From Kernel to MetricLemma: The E/G keypoint kernel is a Mercer kernel
– enables algorithms such as SVM
Lemma: The E/G keypoint kernel induces a metric on the space of images
– alternative to conventional L2 distance between images– keypoint metric robust to nuisance imaging parameters,
occlusion, clutter, etc.
Keypoint GeometryTheorem: Under the metric induced by the kernel
certain ensembles of articulating images formsmooth, isometric manifolds
• Keypoint representation compact, efficient, and …
• Robust to illumination variations, non-stationary backgrounds, clutter, occlusions
Keypoint GeometryTheorem: Under the metric induced by the kernel
certain ensembles of articulating images formsmooth, isometric manifolds
• In contrast: conventional approach to image fusion via image articulation manifolds (IAMs) fraught with non-differentiability (due to sharp image edges)– not smooth– not isometric
Application: Manifold Learning
2D Translation
Application: Manifold Learning
2D Translation IAM KAM
Manifold Learning in the Wild• Rice University’s Duncan Hall Lobby
– 158 images– 360° panorama using handheld camera– Varying brightness, clutter
• Duncan Hall Lobby• Ground truth using state of the art
structure-from-motion software
Manifold Learning in the Wild
Ground truth IAM KAM
Manifold Learning in the Wild• Rice University’s Brochstein Pavilion
– 400 outdoor images of a building– occlusions, movement in foreground, varying background
Manifold Learning in the Wild• Brochstein Pavilion
– 400 outdoor images of a building– occlusions, movement in foreground, background
IAM KAM
Internet scale imagery• Notre-dame
cathedral – 738 images
– Collected from Flickr
– Large variations in illumination (night/day/saturations), clutter (people, decorations), camera parameters (focal length, fov, …)
– Non-uniform sampling of the space
Organization• k-nearest neighbors
Organization• “geodesics’
3D rotation
“Walk-closer”
“zoom-out”
Summary• Challenges for manifold learning in the wild are both
theoretical and practical
• Need for novel image representations– Sparse features
Robustness to outliers, nuisance articulations, etc. Learning in the wild: unsupervised imagery
• Promise lies in fast methods that exploit only neighborhood properties– No complex optimization required