lecture 7: more math + image filtering€¦ · justin johnson eecs 442 wi 2020: lecture 7 - january...

Justin Johnson January 30, 2020EECS 442 WI 2020: Lecture 7 -

Lecture 7:More Math +

Image Filtering

Administrative

HW0 was due yesterday!

HW1 due a week from yesterday

Cool Talk Today:

https://cse.engin.umich.edu/event/numpy-a-look-at-the-past-present-and-future-of-array-computation

Last Time:Matrices, Vectorization,

Linear Algebra

Eigensystems

• An eigenvector 𝒗𝒊 and eigenvalue 𝜆$ of a matrix 𝑨satisfy 𝑨𝒗𝒊 = 𝜆$𝒗𝒊 (𝑨𝒗𝒊 is scaled by 𝜆$)

• Vectors and values are always paired and typically you assume 𝒗𝒊 ' = 1

• Biggest eigenvalue of A gives bounds on how much 𝑓 𝒙 = 𝑨𝒙 stretches a vector x.

• Hints of what people really mean:• “Largest eigenvector” = vector w/ largest value• “Spectral” just means there’s eigenvectors somewhere

Justin Johnson January 30, 2020EECS 442 WI 2020: Lecture 7 - 6

Suppose I have points in a grid

Now I apply f(x) = Ax to these pointsPointy-end: Ax . Non-Pointy-End: x

Red box – unit square, Blue box – after f(x) = Ax. What are the yellow lines and why?

𝑨 =1.1 00 1.1

𝑨 =0.8 00 1.25

Now I apply f(x) = Ax to these pointsPointy-end: Ax . Non-Pointy-End: x

Red box – unit square, Blue box – after f(x) = Ax. What are the yellow lines and why?

𝑨 =0.8 00 1.25

Red box – unit square, Blue box – after f(x) = Ax. Can we draw any yellow lines?

𝑨 =cos(𝑡) −sin(𝑡)sin(𝑡) cos(𝑡)

Eigenvectors of Symmetric Matrices

• Always n mutually orthogonal eigenvectors with n (not necessarily) distinct eigenvalues

• For symmetric 𝑨, the eigenvector with the largest eigenvalue maximizes 𝒙

𝑻𝑨𝒙𝒙𝑻𝒙

(smallest/min)

• So for unit vectors (where 𝒙𝑻𝒙 = 1), that eigenvector maximizes 𝒙𝑻𝑨𝒙

• A surprisingly large number of optimization problems rely on (max/min)imizing this

Singular Value Decomposition

Rotation

Can always write a mxn matrix A as: 𝑨 = 𝑼𝚺𝑽𝑻

Eigenvectors of AAT

Sqrt of Eigenvalues

of ATA

Rotation

Can always write a mxn matrix A as: 𝑨 = 𝑼𝚺𝑽𝑻

Eigenvectors of AAT

Sqrt of Eigenvalues

of ATA

Rotation

Eigenvectors of ATA

• Every matrix is a rotation, scaling, and rotation• Number of non-zero singular values = rank /

number of linearly independent vectors• “Closest” matrix to A with a lower rank

UA =σ1

σ2σ3

number of linearly independent vectors• “Closest” matrix to A with a lower rank

UÂ =σ1

number of linearly independent vectors• “Closest” matrix to A with a lower rank• Secretly behind basically many things you do with

matrices

Solving Least-Squares

Start with two points (xi,yi)

𝑦@𝑦' = 𝑥@ 1

𝑥' 1𝑚𝑏

𝒚 = 𝑨𝒗

𝑦@𝑦' = 𝑚𝑥@ + 𝑏

𝑚𝑥' + 𝑏

We know how to solve this –invert A and find v (i.e., (m,b) that

fits points)

(x1,y1)

(x2,y2)

Start with two points (xi,yi)

𝑦@𝑦' = 𝑥@ 1

𝑥' 1𝑚𝑏

𝒚 = 𝑨𝒗

𝑦@𝑦' − 𝑚𝑥@ + 𝑏

𝑚𝑥' + 𝑏

𝒚 − 𝑨𝒗 ' =

= 𝑦@ − 𝑚𝑥@ + 𝑏'+ 𝑦' − 𝑚𝑥' + 𝑏

(x1,y1)

(x2,y2)

The sum of squared differences between the actual value of y and what the model says y should be.

Suppose there are n > 2 points

𝑦@⋮𝑦G

=𝑥@ 1⋮ ⋮𝑥G 1

𝑚𝑏

𝒚 = 𝑨𝒗

Compute 𝑦 − 𝐴𝑥 ' again

𝒚 − 𝑨𝒗 ' =I$J@

𝑦$ − (𝑚𝑥$ + 𝑏) '

Given y, A, and v with y = Av overdetermined (A tall / more equations than unknowns) We want to minimize 𝒚 − 𝑨𝒗 𝟐, or find:

arg min𝒗 𝒚 − 𝑨𝒗 '

(The value of x that makes the expression smallest)

Solution satisfies 𝑨𝑻𝑨 𝒗∗ = 𝑨𝑻𝒚or

𝒗∗ = 𝑨𝑻𝑨R@𝑨𝑻𝒚

(Don’t actually compute the inverse!)

When is Least-Squares Possible?

Given y, A, and v. Want y = Av

Ay = vWant n outputs, have n knobs to fiddle with, every knob is useful if A is full rank.

A: rows (outputs) > columns (knobs). Thus can’t get precise output you want (not enough knobs). So settle for “closest” knob setting.

When is Least-Squares Possible?

Given y, A, and v. Want y = Av

Ay = vWant n outputs, have n knobs to fiddle with, every knob is useful if A is full rank.

A: columns (knobs) > rows (outputs). Thus, any output can be expressed in infinite ways.

Homogeneous Least-Squares

Given a set of unit vectors (aka directions) 𝒙𝟏, … , 𝒙𝒏 and I want vector 𝒗 that is as orthogonal to all the 𝒙𝒊 as possible (for some definition of orthogonal)

𝑨𝒗 =− 𝒙𝟏𝑻 −

⋮− 𝒙𝒏𝑻 −

Stack 𝒙𝒊 into A, compute Av

=𝒙𝟏𝑻𝒗⋮

𝒙𝒏𝑻𝒗

𝒙𝟏𝒙𝟐

𝒙𝒏…

𝒗𝑨𝒗 𝟐 =I

𝒙𝒊𝑻𝒗𝟐Compute

0 if orthog

Sum of how orthog. v is to each x

Homogenous Least-Squares

• A lot of times, given a matrix A we want to find the v that minimizes 𝑨𝒗 ' .

• I.e., want 𝐯∗ = argmin𝒗

𝑨𝒗 ''

• What’s a trivial solution? • Set v = 0 → Av = 0• Exclude this by forcing v to have unit norm

Let’s look at 𝑨𝒗 ''

𝑨𝒗 '' = Rewrite as dot product

𝑨𝒗 '' = 𝒗𝑻𝑨𝑻𝐀𝐯 = 𝐯𝐓 𝐀𝐓𝐀 𝐯

𝑨𝒗 '' = 𝐀𝐯 Z(𝐀𝐯) Distribute transpose

We want the vector minimizing this quadratic formWhere have we seen this?

arg min𝒗 [J@

𝑨𝒗 '

*Note: 𝑨𝑻𝑨 is positive semi-definite so it has all non-negative eigenvalues

(1) “Smallest”* eigenvector of 𝑨𝑻𝑨(2) “Smallest” right singular vector of 𝑨

Ubiquitious tool in vision:

For min → max, switch smallest → largest

Derivatives

Remember derivatives?

Derivative: rate at which a function f(x) changes at a point as well as the direction that increases the function

Given quadratic function f(x)

𝑓 𝑥 is function

𝑔 𝑥 = 𝑓] 𝑥

𝑔 𝑥 =𝑑𝑑𝑥𝑓(𝑥)

𝑓 𝑥, 𝑦 = 𝑥 − 2 ' + 5

Given quadratic function f(x)

What’s special about x=2?

𝑓 𝑥 minim. at 2𝑔 𝑥 = 0 at 2

a = minimum of f →𝑔 𝑎 = 0

Reverse is not true

𝑓 𝑥, 𝑦 = 𝑥 − 2 ' + 5

Rates of change

Suppose I want to increase f(x) by

changing x:

Blue area: move leftRed area: move right

Derivative tells you direction of ascent and

𝑓 𝑥, 𝑦 = 𝑥 − 2 ' + 5

Calculus to Know

• Really need intuition• Need chain rule• Rest you should look up / use a computer algebra

system / use a cookbook • Partial derivatives (and that’s it from multivariable

calculus)

Partial Derivatives

• Pretend other variables are constant, take a derivative. That’s it.

• Make our function a function of two variables

𝑓' 𝑥, 𝑦 = 𝑥 − 2 ' + 5 + 𝑦 + 1 '

𝑓 𝑥 = 𝑥 − 2 ' + 5𝜕𝜕𝑥𝑓 𝑥 = 2 𝑥 − 2 ∗ 1 = 2(𝑥 − 2)

𝜕𝜕𝑥𝑓' 𝑥 = 2(𝑥 − 2)

Pretend it’s constant → derivative = 0

Zooming Out𝑓' 𝑥, 𝑦 = 𝑥 − 2 ' + 5 + 𝑦 + 1 '

Dark = f(x,y) lowBright = f(x,y) high

Taking a slice of𝑓' 𝑥, 𝑦 = 𝑥 − 2 ' + 5 + 𝑦 + 1 '

Slice of y=0 is the function from before:𝑓 𝑥 = 𝑥 − 2 ' + 5𝑓] 𝑥 = 2(𝑥 − 2)

Taking a slice of𝑓' 𝑥, 𝑦 = 𝑥 − 2 ' + 5 + 𝑦 + 1 '

aab𝑓' 𝑥, 𝑦 is rate of

change & direction in x dimension

Zooming Out𝑓' 𝑥, 𝑦 = 𝑥 − 2 ' + 5 + 𝑦 + 1 '

Gradient/Jacobian:Making a vector of

∇d=𝜕𝑓𝜕𝑥

,𝜕𝑓𝜕𝑦

gives rate and direction of change.

Arrows point OUT of minimum / basin.

What Should I Know?

• Gradients are simply partial derivatives per-dimension: if 𝒙 in 𝑓(𝒙) has n dimensions, ∇d(𝑥)has n dimensions

• Gradients point in direction of ascent and tell the rate of ascent

• If a is minimum of 𝑓(𝒙)→ ∇e a = 𝟎• Reverse is not true, especially in high-dimensional

spaces

Image Filtering

A Noisy Image

Cleaning it up

Slide Credit: D. Lowe

• We have noise in our image• Let’s replace each pixel with a weighted

average of its neighborhood• Weights are filter kernel

1/9 1/9 1/9

1D Case

1/3 1/3 1/3Filter

Signal 10 12 9 11 10 11 12

Output 10.33

1D Case

1/3 1/3 1/3Filter

Signal 10 12 9 11 10 11 12

Output 10.33 10.66

1D Case

1/3 1/3 1/3Filter

Signal 10 12 9 11 10 11 12

Output 10.33 10.66 10

1D Case

1/3 1/3 1/3Filter

Signal 10 12 9 11 10 11 12

Output 10.33 10.66 10 10.66

1D Case

1/3 1/3 1/3Filter

Signal 10 12 9 11 10 11 12

Output 10.33 10.66 10 10.66 11

Applying a 2D Filter

I11 I12 I13

I21 I22 I23

I31 I32 I33

I14 I15 I16

I24 I25 I26

I34 I35 I36

I41 I42 I43

I51 I52 I53

I44 I45 I46

I54 I55 I56

F11 F12 F13

F21 F22 F23

F31 F32 F33

Filter

O11 O12 O13

O21 O22 O23

O31 O32 O33

Output

I11 I12 I13

I21 I22 I23

I31 I32 I33

I14 I15 I16

I24 I25 I26

I34 I35 I36

I41 I42 I43

I51 I52 I53

I44 I45 I46

I54 I55 I56

Input & FilterF11 F12 F13

F21 F22 F23

F31 F32 F33

Output

O11 = I11*F11 + I12*F12 + … + I33*F33

I11 I12 I13

I21 I22 I23

I31 I32 I33

I14 I15 I16

I24 I25 I26

I34 I35 I36

I41 I42 I43

I51 I52 I53

I44 I45 I46

I54 I55 I56

Input & FilterF11 F12 F13

F21 F22 F23

F31 F32 F33

Output

O12 = I12*F11 + I13*F12 + … + I34*F33

I11 I12 I13

I21 I22 I23

I31 I32 I33

I14 I15 I16

I24 I25 I26

I34 I35 I36

I41 I42 I43

I51 I52 I53

I44 I45 I46

I54 I55 I56

F11 F12 F13

F21 F22 F23

F31 F32 F33

Filter Output

How many times can we apply a 3x3 filter to a 5x6 image?

I11 I12 I13

I21 I22 I23

I31 I32 I33

I14 I15 I16

I24 I25 I26

I34 I35 I36

I41 I42 I43

I51 I52 I53

I44 I45 I46

I54 I55 I56

Input Output

Oij = Iij*F11 + Ii(j+1)*F12 + … + I(i+2)(j+2)*F33

O11 O12 O13

O21 O22 O23

O31 O32 O33

F11 F12 F13

F21 F22 F23

F31 F32 F33

Filter

Edge Cases

Convolution doesn’t keep the whole image. Suppose f is the image and g the filter.

f/g Diagram Credit: D. Lowe

Full: Any part of g touches f.

Same: Output is same size as f

Valid: Filter doesn’t fall off edge

Edge Cases

What to about the “?” region?

Symm: fold sides over

pad/fill: add value, often 0

? ? ? ?

Circular/Wrap: wrap around

f/g Diagram Credit: D. Lowe

Edge Cases: Does It Matter?

InputImage

Box Filtered???

(I’ve applied the filter per-color channel)Which padding did I use and why?

Note – this is a zoom of the filtered, not a filter of the zoomed

Edge Cases: Does It Matter?

InputImage

Box FilteredSymm Pad

Box FilteredZero Pad

(I’ve applied the filter per-color channel)

Note – this is a zoom of the filtered, not a filter of the zoomed

Practice with Linear Filters

Original

?0 0 0

Original

The Same!

Original

?0 0 0

Original

Shifted LEFT

1 pixel

Original

?0 1 0

Original

Shifted DOWN1 pixel

Original

1/9 1/9 1/9

Original

1/9 1/9 1/9

Blur(Box Filter)

Original 1/9 1/9 1/9

1/9 1/9 1/9

Original 1/9 1/9 1/9

1/9 1/9 1/9

Sharpened(Acccentuates

difference from local average)

Sharpening

Properties: Linear

Assume: I image f1, f2 filters Linear: apply(I,f1+f2) = apply(I,f1) + apply(I,f2)I is a white box on black, and f1, f2 are rectangles

Note: I am showing filters un-normalized and blown up. They’re a smaller box filter (i.e., each entry is 1/(size^2))

=A( , )+A( , ) =

Properties: Shift-Invariant

Assume: I image, f filterShift-invariant: shift(apply(I,f)) = apply(shift(I,f))Intuitively: only depends on filter neighborhood

A( , ) =

Note: “Shift-Invariant” is standard terminology, but I think “Shift-Equivariant” is more correct

Annoying Terminology

Often called “convolution”. Actually cross-correlation.

Cross-Correlation(Slide filter over image)

Convolution(Flip filter, then slide it)

Properties of Convolution

• Any shift-invariant, linear operation is a convolution (⁎)• Commutative: f ⁎ g = g ⁎ f• Associative: (f ⁎ g) ⁎ h = f ⁎ (g ⁎ h)• Distributes over +: f ⁎ (g + h) = f ⁎ g + f ⁎ h• Scalars factor out: kf ⁎ g = f ⁎ kg = k (f ⁎ g)• Identity (a single one with all zeros):

Property List: K. Grauman

Next Time:More Image Filtering,

Edge Detection

lecture 7: more math + image filtering€¦ · justin johnson eecs 442 wi 2020: lecture 7 - january...

Documents

eecs 442 computer vision: homework...

eecs 442 – computer vision project proposals

eecs 442 – computer vision single view...

eecs 70 discrete mathematics and probability theory fall...

numerical linear algebra - electrical engineering and ......

ece 442 communication system design lecture 4. spread...

sorting - eecs 2011€¦ · sorting eecs 2011 1/95

eecs 442 computer vision epipolar geometry 442 – computer...

tintamarre - justin lapierre - justin lapierre

eecs 442 computer vision cameras - university of … slides...

eecs 442 – computer vision segments of this lectures are...

eecs 442 computer vision cameraswithout cameras we...

eecs 442 computer vision multiple view geometry affine ...x...

eecs 442 computer vision announcements - … · eecs 442...

eecs 442 computer vision: homework...

veravitrum queretaro tel. (01) 442 713 7582 cel. 442 …

eecs 311: chapter 2 notes chris riesbeck eecs northwestern

eecs 442 computer vision single view · pdf fileeecs 442 –...

eecs 442 – computer vision object recognition ·...

eecs 598-008 & eecs 498-008: intelligent programming …