1
Statistics of natural images
May 30, 2010Ofer BartalAlon Faktor
2
Outline
• Motivation• Classical statistical models• New MRF model approach• Learning the models• Applications and results
3
Motivation
• Big variance in appearance • Can we even dream of modeling this?
4
Motivation
• Main questions:– Do all natural images obey some common
“rules”?– How can one find these “rules”?– How to use “rules” for computer vision
tasks?
5
Motivation
• Why bother to model at all?
• “Noise”, uncertainty
• Model helps choose the “best” possible answer
• Lets see some examples
Natural image model
6
Noise-blur removal
• Consider the classical De-convolution problem
• Can be formulated as linear set of equations:
?Y h X N X
cs cs csy Hx n
H Y+X
N
7
X
=
YNh
X̂
?
Noise-blur removal
8
Inpainting
Y AX n
Y
?X
1 0 0 ... 0
0 1 0 0 ... 0
0 0 0 0 0 1 0 0 ... 0
0 0 0 0 0 0 1 0 ... 0
0 0 .... 0 1
A
Missing lines of identity matrix = missing pixels (under-determined system)
9
Motivation
• Problems: – Unknown noise– H may be singular (Deconvolution)– H may be under-determined (Inpainting)
• So there can be many solutions. • How can we find the “right” one?
10
Motivation
• Goal: Estimate x– Assume:
• Prior model of natural image:• Prior model of noise:
– Use MAP estimator to find x:
* arg max ( | ) arg max ( | ) ( )x x
x P x y P y x P x
( )xP x
* arg max ( ) ( )n xx
x P y Hx P x
( )nP n
11
Energy Minimization problem
• The MAP problem can be reformulated as:
data term( | )+prior term( )
ˆ arg min x
E y x x
x E
E
x
13
Classical models
• Smoothness prior (model of image gradients) – Gaussian prior (LS problem)– L1 Prior and sparse prior (IRLS problem)
Image gradient
14
Gaussian Priors
• Assume:
– Gaussian priors on gradients of x:
– Gaussian noise:• Using this assumption:
2
221( )
2
x
p x e
* arg min 2T T
xx x Tx x b
* arg max ( ) ( )n xx
x P y Hx P x
2~ (0, )n N
15
Non-Gaussian Priors
• Empirical results: image gradients have a Non-Gaussian heavy tailed distribution
• We assume L1 or sparse prior• We solve it by IRLS –iterative re-weighted LS
16
De-convolution Results
Gaussian prior Sparse priorBlurred image
Good results on simple images
17
De-noising Results
De-noising resultNoisy image
Poor results on real natural images
18
Classical models – Pro’s and Con’s
• Advantages:– Simple and easy to implement
• Disadvantages:– Too Heuristic– Only one property - Smoothness– Bias towards totally smooth images:
P P
19
Going Beyond Classical Models
0 1 2 3 40
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
number of similar patches (in log10 scale)
prob
abilt
y
0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
number of similar patches (in log10 scale)
prob
abilt
y
0 1 2 3 40
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
number of similar patches (in log10 scale)
prob
abilt
y
20
Modern Approach
• Model is based on image properties• Choose properties using image dataset
• Questions:1. What types of properties?
Responses to linear filters.2. How to find good properties?
Either pre-determined bank or learn from data.3. How should combine properties to one distribution?
We will see how.
21
Mathematical framework
• Want: A model p(I) of real distribution f(I).• Computationally hard:
– A 100x100 pixel image has 10,000 variables• Can explicitly model only a few dimensions at a time
Arrow = viewpoint of few dimensions
22
Mathematical framework
• A viewpoint is a response to a linear filter• A distribution over these responses is a
marginal of real distribution f(I)• (Marginal = Distribution over a subset of variables)
Arrow = marginal of f(I)
23
Mathematical framework
• If p(I) and f(I) have the same marginal distributions of linear filters then p(I)=f(I) (proposition by Zhu and Mumford)
• “Hope”: If we will choose K “good” filters then p(I) and f(I) will be “close”.
How do we measure “close?”
24
Distance between distributions
• Kullback-Leibler divergence:
• Problem - f(I) unknown• Proposition - use instead:
• Measures fit of model to observations
( ), ( ; , ) log ( ) log ( ; , )f fKL f I p I S E f I E p I S
~
( ), ( ; , ) log ( ; , ) log ( ; , )P X
KL f I p I S p I S p I S
( ; , )p I S X
25
Illustration
log ( ; , )p I S
log ( ; , )p I S
~
KL
~
( ), ( ; , ) log ( ; , ) log ( ; , )P X
KL f I p I S p I S p I S
26
Getting synthesized images
• Get synthesized images by sampling the learned model
• Sample using Markov Chain Monte Carlo (MCMC).
• Drawback: Learning process is slow
xp
27
Our model P(I) – A MRF
• MRF = Markov Random Field• A MRF is based on a graph G=(V,E).
V – pixels E – between pixels that affect each other
• Our distribution is the MRF:
( )
1( ) exp ( )c c
c Cliques
p I U IZ
28
Simple grid MRF
• Here, cliques are edges• Every pixel belongs to 4 cliques
29
MRF
• We limit ourselves to:
– Cliques of fixed size (over-lapping patches)
– Same for all cliques
• We get:
( ) ( )( ) ( )
1
( ) ( )K
Tc cU I F I
( ) ( )( )
1 1
1( ) exp ( )
C KT
cc
p I F IZ
U
30
MRF simulation
( ) ( )( )
1 1
1( ) exp ( )
C KT
cc
p I F IZ
31
Histogram simulation
( ) ( )obsnH I
Histogram of a marginal
32
MRF
• In terms of convolutions:
• Denote: Set of potential functions:
• Denote: Set of filters:
( )
1...K
( )
1...KS F
( ) ( )
1 ( , )
,1( ; , )
K
x y
F I x y
p I S eZ
33
MRF - A simple example
• Cliques of size 1• Pixels are i.i.d and distributed by grayscale histogram
grayscale histogram
Drawback: cliques are too small
34
MRF - Another simple example
• Clique = whole image• Result: Uniform distribution on images in dataset
Px
Drawback: cliques are too big
37
Revisiting classical models
• Actually, the classical model is a pairwise MRF:
• Has cliques of size 2:
• Has only 2 linear filters => 2 marginals
• No guarantee that p(I) will be close to f(I)
( , )( ( , )) ( ( , ))1
( )x yx yI x y I x y
p I eZ
39
Zhu and Mumford’s approach (1997)
• We want to find K “good” filters• Strategy:
– Start off with a bank B of possible filters– Choose subset that minimizes the
distance between p(I) and f(I)– For computational reasons, choose filters one by
one using a greedy method
, | |S B S K
41
Choosing the next filter
• AIG = the difference between the model p(I) and the data from the viewpoint of marginal
• AIF = the difference in between different images in dataset from the viewpoint of marginal
( ) ( ) ( )IC AIG AIF
( ) ( )( ; , )
1
( ) ( )
1
1( ) ( ) ( )
2
1( ) ( )
2obs
Mobsn P I S
n
Mobsn
n
AIG H I E H IM
AIF H IM
42
Algorithm – Filter selection
Bank of filters
IC
IC
IC arg max
max
Model ( )learn
44
Learning the potentials
( ) Model
( )IC
Calculate update
Init
(Using maximum entropy on P)
45
The bank of filters
• Filter types: – Intensity filter (1X1)– Isotropic filters - Laplacian of Gaussian (LG, )– Directional filters - Gabor (Gcos, Gsin)
• Computation in different scales - image pyramid
Laplacian of Gaussian Gabor
46
Running example of algorithmExperiment I
Use only small filters
47
Results
All learned potentials have a diffusive nature
( ) ( )( )
1 1
1( ) exp ( )
C KT
cc
p I F IZ
48
Running example of algorithmExperiment II
• Only gradient filters, in different scales• Small filters -> diffusive potential (as expected)• Surprisingly: Large filters -> reactive potentials
Diffusive Reactive
50
Examples of the synthesized images
Experiment I Experiment II
This image is more “natural” because it has some regions with sharp boundaries
51
Outline
• We have seen:– MRF models – Selection of filters from a bank – Learning potentials
• Now:– Data-driven filters – Analytic results for simple potentials– Making sense in results– Applications
52
Roth and Black’s approach
filters potentials
Chosen from bank Learn a-parametricallyX XLearn from data Learn parametrically
Learn together
53
Motivation – model of natural patches
• Why learn filters from data?• Inspiration from models of natural patches:
– Sparse coding– Component analysis– Product of experts
54
Motivation – Sparse Coding of patches
• Goal: find a set s.t.
•
• Learn from database of natural patches
• Only few filters should fire on a given patch
1
, are sparseN
i i ii
patch a F a
iF
,i ia patch F
iF
1 2 3 4 5
55
Motivation – Component analysis
• Learn by component analysis:– PCA– ICA
• Results in “filters like” components– PCA – first components look like contrast filters– ICA - components look like Gabor filters
iF
56
PCA results
high
low
57
ICA results
• Independent filters • Can derive model for patches:
1
( ) ( )n
Ti i
i
P x p F x
TiF x
ip
58
Motivation – Product of experts
• More sophisticated model for natural patches:
• Training of MLE => “intuitive” filters:
2
1
1( ; ) ( ; ), , ( ) 1
( 2, )
i
i i i
KT
POE i sti ii
F Fz
zZ F
p X X
texturecontrast
59
• extension of POE to FOE:
Field of experts (FOE)
( )1 1
1( ; ) exp ( ; ) ,,
( , )
C KT
FOE i iii
i ii
cc
F Fp I IFZ
log( )st
( )1 1
( ; )iC K
TFOE i c
c i
E F I
Roth S., Black M. J., Fields of experts IJCV, 2009
60
The experts
• Student-t experts2
( ) 12
i
i
zz
( )st z
61
Meaning of
• Higher means:– Punishes high responses more severely – A filter with higher weight
( )st z
1
2
( )
1
1( ; , ) exp
( , )
g log 12
K
FOE i i i iii i
TCi c
ic
p I F gZ F
F I
Learning the model
log ( ; , )p I S
log ( ; , )p I S
~
KL
Model
1
2K
MCMCinit
random
65
Results of learning FOE
Filters aren’t “intuitive”
F
67
So far…
filters potentials
Chosen from bank Learned a-parametrically
diffusive reactive
Small filters Large filters
non-intuitive?
68
So far…
filters potentials
Learned from database Learned parametrically
non-intuitive?
69
What now?
• Revisiting POE and FOE with Gaussian potentials
• Relation to non-Gaussian potentials• Making sense of previous results
Weiss Y., Freeman W. T. What makes a good model of natural images?. CVPR, 2007
70
Gaussian POE
2
2
1
2
1
* 2
1
1; exp ( )
( )
ln ; ( ) ln ( )
arg min ( ) ln ( )i
z
KT
GPOE i iii
KT
GPOE i i ii
KT
i i iML F i
e
p x F F xZ F
p x F F x Z F
F F x Z F
71
• Claim: Z is constant for any set of K orthonormal vectors
•
• This has an analytic solution – the K minor components of the data
Gaussian POE
* 2
1
arg min ( )i
KT
i iF orthonormal i
F F x
72
• Non-intuitive high-frequency filters• Reminder - PCA
ResultsExample of learned filters
high
low
73
Gaussian FOE
2
2( )
1 1
2 2( )
1 1 1 ,
2 2 2
1
2
1
1( ;{ }) exp ( )
({ })
( ) ( * ) ( , )
{ }( ) { }( ) ( ) { }( )
ln ({ }) ln { }( ) ln ( )
K CT
GFOE i i ci ci
K C KTi c i
i c i x y
K
ii
K
i ii
z
p I F F IZ F
F I F I x y
F I G I
Z F F G
74
Gaussian FOE
* 2
( )1 1
2*
( )
*2
arg min ( ) ln ( )
( ) arg min ( ) { }( ) ln ( )
1( )
{ }( )
i
K CT
i i c iML F i c
MLG
ML
F F I Z F
G G I G
GI
75
Gaussian FOE
• satisfies:
=> Optimal filters have high frequencies
2*2
1
1{ }( )
{ }( )
K
ii
FI
2*
1
{ }( )K
ii
F
2{ }( )I
*iF
76
• Non-Gaussian potentials -> modeled by GSM
• Properties of GFOE hold for GSM
Gaussian Scale Mixture (GSM)
77
Revisiting FOE
• Student t expert – fit GSM• Filters have the property of
Natural image Roth and Black filters
2
21
1{ }( )
{ }( )
K
ii
FI
high-frequency filters
78
Learning FOE with fixed filters
Algorithm prefers high-frequency filters
79
Conclusion
• For Gaussian potentials and GSM’s:learning => High frequency filters
• Experimental evidence to this phenomena • Maybe there is a “logic” behind this non-intuitive
result?
80
Making Sense of results
• Criterion for “good” filters for patches – Rarely fire on natural images and fire frequently on all other images
Patches from Natural images
Histogram of filter responses
White noise
81
Making Sense of results
• An image was modeled by what you don’t expect to find in it
• This is satisfied by the classical prior of smooth gradients
• But why limit ourselves to intuitive filters?• Maybe non-intuitive filters can do better…
82
reactivediffusive
White noise
Patches from Natural images
Revisiting diffusive and reactive potentials
White noise
Patches from Natural images
83
Inference
• We learned a model• We can use it for inference problems
– Corrupted information– Missing information
• Exact inference – Loopy BP • Approximate inference - gradient based
optimization
84
Belief Propagation
• Observed data is incorporated to model byiy i
ix
iy
85
Belief Propagation
Message passing Algorithm
• Exact only on tree MRFs • Efficient only on pairwise MRFs
86
Alternative by Roth and Black
• Reminder:
• Approximate inference by gradient-based optimization :
• Advantage: Low computational cost
• Drawback: only local minimum if not convex
= argmin ( ( | )) ( ( ))MAPI
I Log P I I Log P I
Uncertainty \Noise model Learned model
( 1) ( ) , ( , )t tII I I I E I I
87
Partition function
=> No need to estimate partition function
• We get:
( , ) 1
arg min ( ( | )) * ( , );n
i iI x y i
Log P I I F I x y
( , ) 1
argmin ( ( | )) log( ( , )) * ( , );n
i i i iI x y i
Log P I I Z F F I x y
X
(Doesn’t depend on )
I
88
The gradient step
( ) ( )
( , ) 1 1
* ( , ); * '( * ; )n N
i iI i i i
x y i i
F I x y F F I
( )iF( )iF
• How to derivate the second term?• By a mathematical “trick” we get:
89
• Assume Gaussian noise
• So the Gradient step is:
De-noising
2
2
( | ) ( )
1( ( | )) ( )
2
nP I I P I I
Log P I I I I
( ) ( )2
1
1( ) * '( * ; )
Ni i
ii
I I I F F I
90
Results
91
Results
92
Results
Original Noisy(20.29dB)
FOE(28.72dB)
Poritilla (Wavelets)(28.9dB)
Non-local means
(28.21dB)
StandardNon-Linear diffusion (27.18dB)
State of the art
Generalprior
93
Results on Berkeley databaseWiener filter
Non-Linear diffusion FOE
Poritilla1Poritilla2
Out
put P
SNR
Low noise
High noise
Input PSNRLow noise
High noise
Input PSNR
94
How many 3x3 filters to take?
Number of filters
Size of filter – 3X3Performance start saturating when we reach 8 filters
95
Dependence on size and shape of clique
What is the best filter?
97
Inpainting - Reminder
Y AX n
Y X
Problem: pixels outside mask can change
Solution: constraint them
Inpainting
• Assume pixels outside mask M don’t change
• So the gradient step is: ( ) ( )
1
* '( * ; )N
i ii
i
I M F F I
Advanced Topics In Computer
Vision CourseSpring 2010
Advanced Topics In Computer
Vision CourseSpring 2010
0-1 Mask Image we want to inpaint98
99
Results
100
Results
101
ResultsFOE Bertalmio
FOE Bertalmio
PSNR 29.06dB 27.56dB
SSIM 0.9371 0.9167
102
Pro’s and Con’s
• Perform well on narrow straws or small holes (even if they cover most of the image)
• Isn’t able to fill large holes• Isn’t designed to handle textures
103
Thank you for Listening…