como observa un robot

C18 2013 1 / 38

C18 Computer Vision

David Murray

[email protected]/∼dwm/Courses/4CV

Michaelmas 2013

C18 2013 2 / 38

Computer Vision: This time ...

1. Introduction; imaging geometry; camera calibration2. Salient feature detection – edges, line and corners3. Recovering 3D from two images I: epipolar geometry.4. Recovering 3D from two images II: stereo correspondence

algorithms; triangulation.

C18 2013 3 / 38

Lecture 2

2.1 Cameras as photometric devices (just a note)

2.2 Image convolution

2.3 Edge detection

2.4 Edges to strings, strings to lines

2.5 Corner detection

C18 2013 4 / 38

2.1 Cameras as photometric devicesIn Lecture 1, we considered the camera as a geometric abstractiongrounded on the rectilinear propagation of light.

But they are also photometric devices.

Important to consider the way image formation depends on:the nature of scene surface (reflecting, absorbing ...)the relative orientations surface, light source and camerathe power and spectral properties of sourcethe spectral properties of the imaging system.

The important overall outcome (eg, Forsyth & Ponce, p62) is that

image irradiance is proportional to the scene radiance

A relief! This means the image really can tell us about the scene.

C18 2013 5 / 38

Cameras as photometric devices /ctdBut the study of photometry (often called physics-based vision)requires detailed models of the reflectance properties of the sceneand of the imaging process itself — the sort of models that underpinphoto-realistic graphics too.

Eg, understanding how light scatters on water droplets allowed thisimage to be de-fogged

Original De-fogged

Can we avoid going into such detail? ... Yes — by consideringaspects of scene geometry that are revealed in step changes inimage irradiance.

C18 2013 6 / 38

Step irradiance changes are due to ...

1. Changes in Scene Radiance.Natural (eg shadows) ordeliberately introduced via artificialillumination: light stripes aregenerated either by a mask orfrom a laser.

2. Changes in the scenereflectance at sudden changes insurface orientation. These arise atthe intersection of two surfaces,and thus represent geometricalentities fixed on the object in thescene.3. Changes in reflectance properties due to changes in surfacealbedo. The reflectance properties are scaled by a changing albedoarising from surface markings. Again these are fixed to the object.

C18 2013 7 / 38

Feature detection

We are after step spatial changes in image irradiance because(i) they are likely to be tied to scene geometry; and(ii) they are likely to be salient (have high info content)

A simple classification of changes in image irradiance I(x , y) is intoareas that, locally, have

1D structure 2D structure

⇒ Edge Detectors ⇒ Corner Detectors

C18 2013 8 / 38

2.2 Image operations for Feature detection

Feature detection should be a local operation, working withoutknowledge of higher geometrical entities or objects ...We should use pixel values I(x , y) and derivatives ∂I/∂x and∂I/∂y , and so on.

It would be useful to have a non-directional combination of these, sothat a feature map of a rotated image is identical to the rotatedfeature map of the original image: F(RI) = RF(I)

Considering edge detection, two possibilities are:

• Search for maxima in the gradient magnitude√(∂I/∂x)2 + (∂I/∂y)2 — 1st order, but non-linear

• Search for zeroes in the Laplacian∇2I = ∂2I/∂x2 + ∂2I/∂y2 — linear, but 2nd order

C18 2013 9 / 38

Which to choose?The gradient magnitude is attractive because it is first order in thederivatives. Differentiation enhances noise, and the 2nd derivatives inthe Laplacian operator introduces even more.

The Laplacian is attractive because it is linear, which means it can beimplemented by a succession of fast linear operations – effectivelymatrix operations as we are dealing with a pixelated image.

Both approaches have been used.

For both methods we need to consider* how to compute gradients, and* how to suppress noiseso that insignificant variations in pixel intensity are not flagged up asedges.

This will involve spatial convolution of the image with an impulseresponse function that smooths the image and computes thegradients.

C18 2013 10 / 38

Preamble: spatial convolutionYou are familiar with the 1D convolution integral in the time domainbetween a input signal i(t) and impulse response function h(t)

o(t) = i(t) ∗ h(t) =

∫ +∞

−∞i(t − τ)h(τ)dτ =

∫ +∞

−∞i(τ)h(t − τ)dτ .

The second equality reminds us that convolution commutesi(t) ∗ h(t) = h(t) ∗ i(t). It also associates.

In the frequency domain we would write O(s) = H(s)I(s).

Now in a continuous 2D domain, the spatial convolution integral is

o(x , y) = i(x , y) ∗ h(x , y) =

∫ +∞

−∞

∫ +∞

−∞i(x − a, y − b)h(a,b)dadb

In the spatial domain you’ll often see h(x , y) referred to as thepoint spread function or the convolution mask.

C18 2013 11 / 38

Spatial convolution /ctdFor pixelated images I(x , y) we need a discrete convolution

O(x , y) = I(x , y) ∗ h(x , y) =∑

i

∑j

I(x − i , y − j)h(i , j)

for x , y ranging over the image width and height respectively, and i , jensuring that access is made to any and all non-zero entries in h.

Many authors rewrite the convolution by replacing h(i , j) with h(−i ,−j)

O(x , y) =∑

i

∑j

I(x − i , y − j)h(i , j) =∑

i

∑j

I(x + i , y + j)h(−i ,−j)

=∑

i

∑j

I(x + i , y + j)h(i , j)

This looks more like the expression for a cross-correlation but,confusingly, it is still called a convolution.

C18 2013 12 / 38

Computing partial derivatives using convolutionWe can approximate ∂I/∂x at image pixel (x , y)using a central difference

∂I/∂x ≈ (1/2) [I(x + 1, y)− I(x , y)] + [I(x , y)− I(x − 1, y)]= (1/2) I(x + 1, y)− I(x − 1, y)

Writing this as a “proper” convolution we would set

h(−1) = +12

h(0) = 0 h(1) = −12

D(x , y) = I(x , y) ∗ h(x) =1∑

i=−1

I(x − i , y)h(i)

Notice how the “proper” mask is reversed from what we might naïvelyexpect from the expression.

C18 2013 13 / 38

Computing partial derivatives using convolution /ctdNow, as ever,

(∂I/∂x) ≈ (1/2)[I(x + 1, y)− I(x − 1, y)]

Writing this as a “sort of correlation”

h(−1) = −1/2 h(0) = 0 h(1) = +1/2

D(x , y) = I(x , y) ∗ h(x) =1∑

i=−1

I(x + i , y)h(i)

Note how we can just lay this mask directly on the pixels to bemultiplied and summed ...

C18 2013 14 / 38

Example Results

The actual image used isgrey-level, not colour

x-gradient “image” y -gradient “image”

C18 2013 15 / 38

In 2 dimensions ...As before, one imagines the flipped “correlation-like” mask centred ona pixel, and the sum of products computed.

Often a 2D mask is “separable” in that it can be broken up in to twoseparate 1D convolutions in x and y

O = h2d ∗ I = fy ∗ gx ∗ I

The computational complexity is lower — but intermediate storage isrequired, and so for a small mask it may be cheaper to use it directly.

C18 2013 16 / 38

Example result — Laplacian (non-directional)

The actual image used isgrey-level, not colour Laplacian

C18 2013 17 / 38

Noise and smoothingDifferentiation enhancesnoise — the edge appearsclear enough in the image,but less so in the gradientmap.If we knew the noise spectrum, we might find an optimal brickwallfilter g(x , y)⇔ G(s) to suppress noise edges outside the signal edgeband. We would applied this g to the image (g ∗ I) before findingderivatives.

But the sharper the cut-off in spatial-frequency, the wider the spatialsupport of g(x , y) has to be (it tends to an Infinite Impulse Response(IIR) filter). This is expensive to compute.

Conversely, if g(x , y) has finite size (an FIR filter) it will not beband-limited, spatially blurring of the “signal” edges.

Can we compromise spread in space and spatial-frequency in someoptimal way?

C18 2013 18 / 38

Compromise in space and spatial-frequencySuppose our IR function is h(x) and h H is a Fourier transformpair.

Define the spreads in space and spatial-frequency as X and Ω where

X 2 =

∫(x − xm)2h2(x)dx∫

h2(x)dxwith xm =

∫xh2(x)dx∫h2(x)dx

Ω2 =

∫(ω − ωm)2H2(ω)dω∫

H2(ω)dωwith ωm =

∫ωH2(ω)dω∫H2(ω)dω

Now vary h to minimize the product of the spreads U = XΩ.An uncertainty principle indicates that Umin = 1/2 when

h(x) = A Gaussian function =1√2πσ

exp(−x2

σ2

)

C18 2013 19 / 38

2.3 The Canny Edge Detector (JF Canny 1986)2D Gaussian smoothing is applied to the image

S(x , y) = G(x , y) ∗ I =1

2πσ2 exp[− (x2 + y2)

2σ2

]∗ I(x , y)

But separated into two separable masks

S(x , y) =1√2πσ

exp[− (x2)

2σ2

]∗ 1√

2πσexp

[− (y2)

2σ2

]∗ I(x , y)

Then 1st derivatives in x and y are found by two further convolutions

Dx (x , y) = ∂S/∂x = hx ∗ SDy (x , y) = ∂S/∂y = hy ∗ S.

Interestingly Canny’s decision to use a Gaussian is made from aseparate study of how to maximize robustness, localization, anduniqueness in edge detection.

C18 2013 20 / 38

Canny/ Implementation

Step 1: Gaussian smoothimage S = G ∗ I

This is separableS = Gx ∗Gy ∗ I

Often 5× 1 or 7× 1 masks

Step 2: Compute gradients∂S/∂x , ∂S/∂yusing derivative masks.

C18 2013 21 / 38

Canny/ ImplementationStep 3: Non-maximal suppressiona. Use ∂S/∂x and ∂S/∂y to find gradient magnitude and direction ata pixel.

b. Track along direction to markedpositions on neighbourhood perimeter.c. Linearly interpolate gradient magnitudeat positions using magnitudes at arrowpixels.d. If gradient magnitude at central pixelgreater than both interpolated values,declare an edgel, and fit parabola to find(x , y) to sub-pixel acuity along the edgedirection.

Step 4:Store edgel’s sub-pixel position (x , y); gradient strength; andorientation (full 360as light/dark distinguished),

C18 2013 22 / 38

Canny/ Implementation: Hysterical Thresholding

Step 5:Link edgestogether intostrings usingorientation to aidsearch forneighbouringedgels.

Step 6:Performthresholding withhysteresis, byrunning alongthe linked strings

C18 2013 23 / 38

Example of Edgel and String Outputs

C18 2013 24 / 38

2.4 From Strings to Straight LinesSplit algorithm for lines:

1. Fit a straight line to all edgels in string2. If RMS error less than threshold accept

and stop.3. Otherwise find point of highest curvature

on edge string and split into two. Repeatfrom 1 for each substring.

Merge algorithm for lines:1. Fit straight lines to each pair of

consecutive edgels in a string2. Compute RMS error for each potential

merger of an adjacent pair of lines into asingle line, and find the pair for whichthe RMS error is minimum

3. If RMS error less than threshold, mergeand repeat from 2.

C18 2013 25 / 38

Actually fitting lines and finding that RMS errorConsider n “edgels” that have been linked into an edge (sub-)string.

Fitting a line requires orthogonal regression.See B14, or the variant here ...

Find

min E(a,b, c) =n∑

i=1

(axi + byi + c)2

subject to a2 + b2 = 1

Hence a 2-dof problem.

C18 2013 26 / 38

From edge elements to lines /ctd

Now min E(a,b, c) = minn∑

i=1

(axi + byi + c)2 requires

∂E/∂c = 0.⇒ 2

∑ni=1 (axi + byi + c) = 0 ⇒ c = −(ax + by) .

Therefore E becomes

E =n∑

i=1

[a(xi − x) + b(yi − y)]2 =[a b

]UT U

[ab

]where the 2nd-moment matrix is

UT U =

[ ∑ni=1 x2

i − nx2 ∑ni=1 xiyi − nxy∑n

i=1 xiyi − nxy∑n

i=1 y2i − ny2

],

Solution a,b given by the unit eigenvector of UT U corresponding tosmaller eigenvalue. (Eigen-decomposition or SVD.)

RMS error given by the eigenvalue itself.

C18 2013 27 / 38

Example of String and Line Outputs

C18 2013 28 / 38

Problems using 1D image structure for GeometryComputing edges make the feature map sparse but interpretable.Much of the salient information is retained.

If the camera motion is known, feature matching is a 1D problem, forwhich edges are very well suited (see Lecture 4).

However, matching is much harder whenthe camera motion is unknown: known asthe Aperture problem.

End points are unstable, hence linematching is largely uncertain. (Indeedonly line orientation is useful for detailedgeometrical work.)

In general, matching requires the unambiguity of 2D image featuresor “corners”.

Corners defined as sharp peaks in the 2D autocorrelation oflocal patches in the image.

C18 2013 29 / 38

2.5 Corner detection: preamble on auto-correlation

Suppose that we are interested incorrelating a (2n + 1)2 pixel patchat (x , y) in Image I with a similarpatch displaced from it by (u, v).

We would write the correlation between the patches as

Cuv (x , y) =n∑

i=−n

n∑j=−n

I(x + i , y + j)I(x + u + i , y + v + j)

As we keep (x , y) fixed, but change (u, v), we build up theauto-correlation surface around (x , y).

C18 2013 30 / 38

Preamble: Auto-correlationReminder correlation:

Cuv (x , y) =n∑

i=−n

n∑j=−n

I(x + i , y + j)I(x + u + i , y + v + j)

Here are the surfaces around 3 different patches

Plain corner Straight edge 3-way corner

A pixel in a uniform region will have a flat autocorrelation

A pixel on an edge will have a ridge-like autocorrelation, but

A pixel at a corner has a peak.

C18 2013 31 / 38

Results ...So a simple corner detector might havethe following steps at each pixel

a. Determine the auto-correlationCuv (x , y) around the pixel positionx , y .

b. Find positions x , y where Cuv (x , y) ismaximum in two directions.

c. If Cuv (x , y) > threshold mark as acorner.

There is an expression which can be computed more cheaply thanCuv which gives comparable qualitative results.This is the sum of squared differences

Euv (x , y) =+n∑

i=−n

+n∑j=−n

[I(x + u + i , y + v + j)− I(x + i , y + j)]2

C18 2013 32 / 38

Harris Corner Detector (Chris Harris, 1987)Earlier we estimated the gradient from the pixels values.Now assume we know the the gradients, and estimate a pixeldifference using a 1st order Taylor expansion ...

I(x + u, y + v)− I(x , y) ≈ u∂I(x , y)

∂x+ v

∂I(x , y)

∂y

So the sum of squared differences can be approximated as

Euv (x , y) =+n∑

i=−n

+n∑j=−n

(I(x + u + i , y + v + j)− I(x + i , y + j))2

≈∑

i

∑j

(u∂I∂x

+ v∂I∂y

)2

=∑

i

∑j

(u2(∂I∂x

)2

+ 2uv∂I∂x

∂I∂y

+ v2(∂I∂y

)2)

Note, ∂I/∂x etc are computed at the relevant (x + i , y + j).

C18 2013 33 / 38

Harris Corner Detector /ctdThe double sum over i ,j is replaced by a convolution with

W =

1 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 1

Euv (x , y) =∑

i

∑j

(u2(∂I∂x

)2

+ 2uv∂I∂x

∂I∂y

+ v2(∂I∂y

)2

)

= u2(W ∗(∂I∂x

)2

) + 2uv(W ∗ ∂I∂x

∂I∂y

) + v2(W ∗(∂I∂y

)2

)

= ( u v )

(p rr q

)(uv

)where

p(x , y) = W ∗ (∂I/∂x)2

q(x , y) = W ∗ (∂I/∂y)2

r(x , y) = W ∗ (∂I/∂x)(∂I/∂y)

C18 2013 34 / 38

Harris Corner Detector /ctd

We can introduce smoothing by replacing W with a Gaussian G, sothat

Euv (x , y) = ( u v )

(p rr q

)(uv

)where

p(x , y) = G ∗ (∂I/∂x)2

q(x , y) = G ∗ (∂I/∂y)2

r(x , y) = G ∗ (∂I/∂x)(∂I/∂y)

The quantities p, q and r computed at each (x , y) define the shape ofthe auto-correlation function Euv (x , y) at (x , y)

C18 2013 35 / 38

Harris Corner Detector /ctdRecall that

Euv (x , y) = ( u v )

(p rr q

)(uv

)

Now (u, v) defines a direction, so an estimate of Euv a unit distanceaway from (x , y) along a vector direction n is then

E ≈ n>Snn>n

where S =

(p rr q

)

Now recall (eg, from A1 Eng Comp notes) that if λ1 and λ2 are thelarger and smaller eigenvalues of S, respectively, that

λ2 ≤n>Snn>n

≤ λ1 ⇒ λ2 ≤ E ≤ λ1

C18 2013 36 / 38

/ctd Harris Corner DetectorThis allows a classification of image structure:• Both λ1, λ2 ≈ 0 — the autocorrelation is small in all directions:

image must be flat.• λ1 0, λ2 ≈ 0 — the autocorrelation is high in just one direction⇒a 1D edge

• λ1 0, λ2 0 — the autocorrelation is high in all directions⇒a2D corner.

Harris’ original “interest” score was later modified by Harris andStephens

SHarris =λ1λ2

λ1 + λ2SHS = λ1λ2 − α

(λ1 + λ2)2

4

α is a positive constant 0 ≤ α ≤ 1 which decreases the response toedges, sometimes called the “edge-phobia”.With α > 0 edges give negative scores, and corners positive. Scores close to zeroindicate flat surfaces or “T” features which are intermediary between edges andcorners. The size of α determines how much edges are penalised.

C18 2013 37 / 38

Examples

C18 2013 38 / 38

Summary

In this lecture we have consideredAs enabling techniques, we have described convolution andcorrelation.

Described how to smooth imagery and detect gradients

Described how to recover 1D structure (edges) in an imageIn particular, have developed the Canny Edge detector.

Considered how to join edgels into strings, and to fit lines.

Described how to recover 2D structure (corners) in an image.Have developed the Harris cor ner detector

como observa un robot

Documents