1 ecological statistics and perceptual organization charless fowlkes work with david martin and...
Post on 21-Dec-2015
214 views
TRANSCRIPT
1
Ecological Statistics and Perceptual Organization
Charless Fowlkes
work with David Martin and Jitendra Malik
at University of California at Berkeley
2
“ I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of color. Do I have 327? No. I have sky, house, and trees.”
3
“ I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of color. Do I have 327? No. I have sky, house, and trees.”
010011010....
4
“ I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of color. Do I have 327? No. I have sky, house, and trees.”
Laws of Organization in Perceptual Forms Max Wertheimer (1923)
14
• How do these cues apply to real world images?
• How are different cues combined?
• Why does the visual system use these cues?
15
Ecological Validity
• Brunswik & Kamiya 1953: Gestalt rules reflect the structure of the natural world
• Attempted to validate the grouping rule of proximity of similars
• Brunswik was ahead of his time… we now have the tools.
Egon Brunswik (1903-1955)
16
Strategy
1. Collect high-level ground-truth annotations for a large collection of images
2. Develop computational models of cues for perceptual organization calibrated to ground-truth training data
3. Measure cue statistics and evaluate the relative “power” of different cues
18
• 30 subjects, age 19-23 • 1,458 person hours over 8 months• 1,020 Corel images• 11,595 Segmentations
– color, gray, inverted/negated
“You will be presented a photographic image. Divide the image into some number of segments, where the segments represent “things” or “parts of things” in the scene. The number of segments is up to you, as it depends on the image. Something between 2 and 30 is likely to be appropriate. It is important that all of the segments have approximately equal importance.”
20
Scene
Background
Sky
Trees Shore
Water
Small Top
L R
Mermaid
Foreground
Rocks
Base
Land
(a)
(b)
(c)
Scene
Background
Trees Shore
Water
Small Top
L R
Mermaid
Foreground
Rocks
Base
Land
Scene
Background
Trees Shore
Water
Small Top
L R
Mermaid
Foreground
Rocks
Base
Land
Sky
Sky
21
Overview
• Grouping– Local Boundary Detection– Local Human Performance
• Figure/Ground– Local Figure/Ground Cues– Local Human Performance
• Discussion
23
Gradient Features
• Brightness Gradient (BG) – Difference of brightness distributions
• Color Gradient (CG)– Difference of color distributions
• Texture Gradient (TG)– Difference of distributions of
V1-like filter responses
1976 CIE L*a*b* color space
Distributions are represented by
smoothed histograms
r(x,y)
i ii
ii
hg
hghg
22 )(
2
1),(
24
Local Boundary DetectionImage
Boundary Cues
Model
Pb
Brightness
Color
Texture
• Using training data to learn the posterior probability of a boundary P(b=1|x,y,) from local gradient information
• Logistic regression to combine cues
Cue CombinationBrightnessBrightness
Color
Texture
30
How good are humans locally?Off-Boundary On-Boundary
•Algorithm: r = 9, Humans: r = {5,9,18}
•Fixation(2s) -> Patch(200ms) -> Mask(1s)
32
Findings
• Texture gradient information is important for natural scenes
• Optimal local cue combination is achievable with a simple linear model
• Algorithm for performing local boundary detection which performs nearly as well as local humans (and better than traditional edge detectors).
33
Overview
• Grouping– Local Boundary Detection– Local Human Performance
• Figure/Ground– Local Figure/Ground Cues– Local Human Performance
• Discussion
34
Local Cues for Figure/Ground
• Assume we have a perfect segmentation
• Can we predict which region a contour belongs to based on its local shape?– Size
– Convexity
– Lower Region
35
Figure-Ground Labeling
- start with 200 segmented images of natural scenes- boundaries labeled by at least 2 different human subjects- subjects agree on 88% of contours labeled
37
Convexity(p) = log(ConvF / ConvG)
ConvG = percentage of straight lines that lie completely within region G
pG F
Convexity [Metzger 1953, Kanizsa and Gerbino 1976]
43
“Upper Bounding” Local Performance
• Present human subjects with local shapes, seen through an aperture.
Configuration Configuration + Content
47
Findings
• Convexity, size and lower-region are ecologically valid.
• Boundary configuration is relatively weak compared to luminance content.
• Local judgments based on luminance content can be quite accurate.
48
• How do these cues apply to real world images?
• How are different cues combined?
• Why does the visual system use these cues?
Perceptual organization as a computational theory of vision