localising discontinuities using weak continuity constraints

Pattern Recognition Letters 6 (1987) 51-59 June 1987

North-Holland

Localising discontinuities using weak continuity constraints

A. BLAKE and A. Z I S S E R M A N University of Edinburgh, Dept. of Computer Science, Edinburgh, United Kingdom

Received September 1985

Revised 20 March 1986

Abstract: Many processes in computer vision can be formulated concisely as optimisation problems. In particular the localisa-

tion of discontinuities can be regarded as optimisation under weak continuity constraints. A weak constraint is a constraint which can be broken - but at a cost.

In this paper we illustrate the use of weak constraints by considering a simple example - edge detection in 1D. Finite elements are used to discretise the problem. The cost function is minimised using a 'graduated non-convexity ' algorithm. This gives a local relaxation scheme which could be implemented in parallel.

Results are given for a serial computer implementat ion of the method. They show that the algori thm does perform as theoreti-

cally predicted, and that it is robust in the presence of noise. Results are also given for a 2D version of the method, applied to real images.

Key words: Weak constraints, convexity, low-level vision, optimisation.

I. Introduction

Solutions for various computer vision problems, including shape-from shading (Ikeuchi and Horn, 1981), surface reconstruction (Grimson, 1981; Terzopoulos, 1983), optic flow recovery (Horn and Schunk, 1981), and motion correspondence (Ull-

man, 1979), have used optimisation algorithms that are local and parallel in nature. In all but the last case, the data being processed is a field, represented as a 2D digital array. It is difficult, however, to detect and preserve discontinuities in such fields. One approach is to build 'weak continuity constraints ' (Blake, 1983a, b) into the optimisation. That paper described a method for fitting a piecewise constant function to the field. More recently, statistical algorithms for fitting more general surfaces (e.g. membranes) have been de- veloped (Geman and Geman, 1984).

Here we extend the original, deterministic (non- statistical) algorithm in (Blake, 1983a) to fit a

piecewise continuous membrane. Such a membrane can be visualised as a tearable elastic skin, which is stretched over the data. The discussion in this paper deals mostly with 1D data, for simplicity of presen- tation. But it is easily extended to 2D. An example is given of 2D membrane fitting used as a very effective edge-detector. The method in (Blake, 1983a) was a special case - in the limit of very high membrane tension - of the method presented here. In- terestingly enough, this edge detector proves both more effective and more efficient at lower tension. Moreover, it may be more efficient than statistical methods (Marroquin, 1984), requiring an order of magnitude fewer iterations.

Weak constraints are constraints that are usually obeyed but may be broken on occasion - when there are pressing reasons to do so. This is expressed mathematical ly by charging a penalty a for each broken constraint. This penalty is then 'weighed' against certain 'o ther costs' (detailed below). If breaking a constraint leads (somehow) to a total

0167-8655/87/$3.50 © 1987, Elsevier Science Publishers B.V. (North-Holland) 51

Volume 6, Number 1 PATTERN RECOGNITION LETTERS June 1987

saving in 'other costs' that exceeds the penalty a,

then breaking that constraint is deemed 'worth-

while'. Clearly a is a measure of strength of each constraint. I f a is chosen to be very large, the constraints become virtually unbreakable; in that case

they revert to being conventional 'always obeyed'

constraints.

Knowing now what weak constraints are, it may

be apparent that weak continuity constraints would

enforce continuity 'almost everywhere'. Edge detection, for instance, can be regarded as enforcing

continuity almost everywhere in a 2D scalar field:

given a noisy intensity image d(x,y), construct a

piecewise smooth scalar field u(x,y) such that: 1. u(x, y) is a good approximation to d(x, y). 2. u(x,y) has as few discontinuities as possible

consistent with requirement 1 above. Edges are those (relatively few) places where con-

tinuity constraints are broken.

In this paper we describe how to use weak constraints for edge detection, and then how to carry

-1 h 1o -h a)

~ X ) D = 2 h 2, P = 0 , S = 0

h //O =~- E ~ 2h 2

- 1 1 x

- h b)

U.IX ) D=O, P=a, S=0 h/: =¢- E = ~

-1 1 x -h c)

L~.ix) p = o , E = D + S

h I , f ' - - - = 2h2Atanh(1/) , )

[ / ' - x (-<2h2) ? 1

Figure 1. Calculating energy for data consisting of a single step.

out the optimisation using a 'graduated non-

convexity' algorithm.

2. Edge detect ion in 1D

For illustrative purposes, we consider a simple

problem using weak continuity. • one-dimensional - reconstructing a 1D function

u(x) from data d(x), for x e [0,N]. This is done by

fitting an elastic string to d(x). • one kind of discontinuity only: a step discon-

tinuity or break in the elastic string - a place where C o continuity is broken. A penalty of u is incurred

for each discontinuity, adding up to a total penalty

of P. • the 'other costs' mentioned earlier will consist of

2 components: (a) A measure of faithfulness to data, D--

Jo (u - d) 2 dx. (b) Stretching energy, S = 2 2 loUU '2 dx - a mea-

sure of how severely the function u(x) is deform- ed. The constant 2 2 is a measure of elasticity or

'stretchability' or willingness to deform. 1

Although this is a relatively simple problem, the

methods outlined in this paper can be generalised to 2 dimensions without significant additional

problems. The problem is to minimise the total energy:

E = P + D + S , (1)

that is, for a given d(x), to find that function u(x) for which the total energy E is smallest. Without the term P (if the energy were simply E = D + S) this problem could be simply solved using calculus

of variations. For example Figure l(d) shows the

function U that minimises D + S , given the data cl(x) in Figure l(a).

It is clearly a compromise between minimising D and minimising S - a trade-off between sticking

close to the data and avoiding very steep gradients. The precise balance of these 2 elements is con- trolled by 2. If 2 is small, D (faithfulness to data) dominates. The resulting u(x) is a close fit to data

1 Really interpreting S as a stretching energy is only valid when the string is approximately aligned with the x axis. Another way to think of S is that it tries to keep the function u(x) as flat as possible.

52


d(x): for example, for a square step d(x) (Figure l(a)), u(x) fits it as closely, as in Figure l(d).

When the P term is included in E, the minimisation problem becomes more interesting. No longer is the minimisation of E straightforward mathema- tically (except in a few special cases, such as the isolated step edge). E may have many local minima. For example, for the problem of Figure 1, (c) and (d) are both local minima. Only one is a global minimum; which one is lower depends on the values of a, A, h. If the step (a) is high (h > ~ ) , (c) is the global optimum (with discontinuity). Otherwise the global optimum is (d) (no discontinuity). So, at an isolated step, a discontinuity is 'marked' if the step height exceeds the 'contrast threshold' h 0 = ~ . This behaviour can be shown to degrade gracefully • in the presence of noise, • when 2 steps are moved closer together, so that they are within the ' interaction range' 2.

To solve the problem of minimising E for real data, it is first expressed in discrete terms, replacing u(x) by ui, i = 0 . . . . . N. A method will be described for finding a {ui} for which the energy E is close to minimal.

3. O b t a i n i n g a s o l u t i o n

3.1. Discrete problem

The 'finite element method' is a good means of converting continuous problems, like the one just described, into discrete problems. In our example, this is relatively easy to do. The continuous interval [0,N] is divided into N unit sub-intervals ( 'elements') [0, 1] . . . . . [ N - 1,N], and nodal values are defined:

Ui=U(i ), i = 0 . . . . . N.

Then u(x) is represented by a linear piece in each

/ / NL2 N~I h 2> x

Figure 2. Dividing a line into sub-intervals or 'elements ' .

sub-interval (Figure 2). The energies defined earlier now become:

N D= ~ (lli-di) 2, (2)

o N

S = 2 2 2 ( b / i - b/i- 1)2( 1 -- li), (3) 1

N

P= a ~ l i, (4) 1

where l i is a 'line-process'. Either: • l i= 1 indicating that there is a discontinuity be-

tween X i_ 1, X i ,

• or li = 0 indicating continuity in that subinterval. Note that when li= 1 the elastic string is 'broken' and the relevant energy term in (3) is disabled.

3.2. Eliminating the line process

Our problem, now in discrete form, is:

min E. {ui,,',}

It transpires that the minimisation over the {li} can be done 'in advance'; the problem reduces simply to a minimisation over the {ui}. This is an advantage for two reasons:

1. The computation is simpler as it involves just one set of real variables {ui}, without the boolean variables {l i }.

2. The absence of boolean variables enables the 'graduated non-convexity method ' , as described below, to be applied.

To eliminate the line-process {li}, S + P must first be expressed in a modified form:

N

S + P = ~ h ( u i - u i _ l , li) 1

where h(Au, l)=2Z(Au)2(1-l)+al. (5)

o) I ~,,% b) 1 ,,~/'x AU

Figure 3. (a) The energy function for local interaction between adjacent nodes. (b) The line process / can be eliminated from (a)

by minimisation over / e {0, 1}.

53

Volume 6, Number 1 PATTERN RECOGNITION LETTERS

This is derived directly f rom (3) and (4). All de- pendence of E on the line-process {li} is now con- tained in the N copies of h that appear in the formula (5). The function h (plotted in Figure 3) governs local interactions between the {ui}.

The problem is now (from (5)):

N min D+ ~ h ( u i - u i _ l , l i ) , {u, li} 1

or

min D + m i n ~ h(ui-ui_l, l i) , {u~} \ {it} 1

since D does not involve the {li}. Now, eliminating the minimisation over {li}, the problem becomes

N where F=D+ ~ g(ui--Ui_l) , (6)

1 rain F, {u/}

and

g(Au)= min h(Au, l). /~{0.1}

The function g is shown in Figure 3b and is simply the minimum of the 2 graphs in Figure 3a. Explicit-

ly, g is

12a2(Au) 2 if lAu]<l/rd/2, g(A u) = otherwise.

What Figure 3 shows graphically, is that minimisation over each element of the line-process can be done in advance: h(Au, l) is replaced by g(Au). Nothing of value has been thrown away however. The line process can be explicitly recovered f rom the optimal {ui}, once they have been found:

li = [ ~ if lui- ui-1] > ~ / 2 ' otherwise.

3.3. Graduated non-convexity

The discrete problem (6) has now been fully set up. A method of finding a solution is needed. It must avoid the pitfall of sticking in local minima. And the cost function F h a s many local minima. In fact there is (in general) one local minimum of F corresponding to each state of the line process - 2 u local minima! Stochastic methods avoid them by random fluctuations, spasmodic injections of

o)

June 1987

b)

Figure 4. (a) Stochastic methods avoid local minima by using random motions to jump out of them. (b) The 'graduated non- convexity' method constructs an approximating convex func-

tion, free of spurious local minima.

energy to shake free of them (Figure 4a). It would appear, however, to be in the interests

of computat ional efficiency to use a nonrandom method. Graduated non-convexity, rather than in- jecting energy randomly, modifies the cost func-

tion (Figure 4b). In the graduated non-convexity method, the cost

function F i s first approximated by a new function F* which is convex and hence can only have one local minimum, which must also be a global minimum. 2 Descent on F* (descending, that is, in the

N dimensional space of variables {ui}) must land up at the minimum. Now, for some data d this minimum may also be a global minimum of F - which is what we were after. There is a simple test to detect when this happens, provided F* is constructed as described in the next section. The condition is that

F*(ul . . . . . II N ) = F(ul . . . . . uN) (7)

where {ui} here is the global opt imum of F*. This is because F* is constructed so that for all {ui}, F*(ul . . . . . UN)<--F(ul ... . . UN). Consequently, if {ui } is the global minimum of F*, and also lies on F as in (7), then it must also be the global minimum of F - as in Figure 4b.

Occasionally (for a data set {ui}) the test (7) will succeed. More often it will not, indicating

2 Actually there are some details to take care of here, dis- tinguishing between convexity and strict convexity.

54


that {ui} is not at a global minimum of F - but

usually F(ul . . . . . UN) will be close to minimal. To

squeeze out the last ounce, one might start from that {ui} (that minimizes F*) and proceed, down-

hill on F, to a local minimum of F. Certainly, that

strategy can only improve things, inasmuch as the cost F(ul . . . . . UN) cannot increase and will most

probably decrease. But it is more effective to pro-

ceed gradually from F* to F, rather than in one jump. To do this, a family of cost functions F (p)

is defined. When p = 1, F (p) = F* and when p = 0,

F (p) = F, and as p varies from 1 to 0, s o F (p) varies

smoothly from F* to F. Descent o n F (p), in {ui} space is (ideally) performed continuously while p

changes slowly, maintaining 'equilibrium' all the way. In practice, perhaps 11 successive values for

p : {1,0.9, .... 0.1,0} are used. At each of these values of p, a complete descent is performed on

F (p), and the {ui} reached is used as a starting point for descent on the next F (p) in the sequence.

Note that any starting point will do for the first of the sequence F (~). That is because F(~)=F * has

only one minimum, which will be attained by descent from any starting point.

(compare this with F in (6)) with g* as in Figure 5.

Note that g* is not itself convex, but is chosen in

such a way that, when g* terms are added to D in

(8), F* is convex. Subject to that condition, g* is

chosen as close as possible to g. It is g .=g( l ) , where

~J. Z(A u)2,

g(P)(A u) = ~ot - c(]A u] - r)2/2

L a,

if ]Au] <q, if q<_]dul<r, if ]Au]>_r,

(9)

where c = 1/2p, r 2= ot(2/c+ 1/2z), and q = ot/)~2r. It is in the mid-range q<_lAul<r, that g * ( d u ) ¢ g(A u). There, the degree of negative second derivative of function g* is carefully chosen exactly to

balance the positive second derivatives in the D

term, in (8). This yields an F* which is only just convex, and otherwise is as close as possible to F. The appendix gives further details.

The family of functions F (pl can now be written down:

N

F (p) = D + ~ g ( P ) ( u i -- H i_ 1)" ( 1 0 ) 1

3.4. Constructing the convex function F*

It remains to explain how F* is constructed. The

requirement is that it be convex and as close an approximation to F as possible. Finding such an F*

is a 'custom' job - tailored to fit a particular cost

function F - so that a very close convex approxima-

tion is obtained. General methods for constructing convex approximations would probably work, but

could not be expected to construct such close ones. For the simple 1D problem under discussion, we define

N

F * = D + ~ g * ( u i - u i _ l ) (8) 1

Ig'(Au)

q r

Figure 5. The local interaction energy of Figure 3b is modified to produce the convex cost function.

3.5. Optimisation algorithms

There are numerous ways to minimise F (p), in-

cluding direct descent and gradient methods. Direct descent is particularly straightforward to imple-

ment and runs like this: propose a change in one

of the nodal values ui, see if that leads to a reduc- tion in F (p) (this only involves a local computa-

tion); if it does then make the change. Schematically a simple program using integer arithmetic which implements direct descent and the graduated non-convexity method would be:

for p : = l; p>_O; p : = p - O . l do for 6 : = 1 ; 0_>2 B; O : = 0 / 2 d o

changed := true while changed = true do

changed := false for i = 0 . . . . . N do

if F(P)(/,/1 . . . . . /'/i d- t~ . . . . . UN)

< F(P)(Ul . . . . . u i . . . . . uN~ then [1] Ui : = Ui-Jr ~

changed := true

55


else if F(P)(u I . . . . . u i - ~, . . . . UN)

< F(P)(ul . . . . . u i . . . . . UN) then [2]

Ui : : Ui-- t~

changed : = t rue

This algorithm obtains {ui} to a precision of B bits. It can be made to run using entirely integer arithmetic - mostly 16 bit. Although F (p)

is expressed as a sum i = 0, . . . . iV, of local terms like g(P)(u i - -Ui_l ) the effect of altering a particular u i (as in [I] and [2] in the algorithm above) can be computed f rom just a few of those terms.

For example ui appears only in g(P)(ui - -Ui_I) ,

g(P)(ui+ 1 -- Ui) and, in D, in the term u i - d i . Not only does this simplify the computat ion of the effect on F (p) of changing ui, but it is also possible to perform such computat ion on many u i in parallel. To do direct descent in parallel, the u i cannot all be updated simultaneously, as that could lead to non-convergent (oscillatory) behaviour. Rather, the {u i , i odd} are updated simultaneously, then the {u i, i even}. In the 2D problem, fitting a membrane rather than a stretched string, a chequer board pattern is used. First the 'black squares ' are updated simultaneously, then the 'white ' ones. For higher order problems (e.g. the th in p la te ) more complex patterns are required.

Alternatively, gradient descent methods make

use of the local gradient of the function to guide the choice of direction and step size. In general they converge faster (i.e. get closer to the minimum in fewer steps) than the direct descent method, but require floating point arithmetic. Again, such algorithms are local and parallel.

We have used a successive over relaxation (SOR) iteration scheme. For each p the minimum of F (p)

is found by using:

U n+l :_--un+ W { - - U F + [di+hP_un+ll

+ hf+u~+ 1]/(1 + h p_ + hP+)}

where h p_ = h P ( u ~ -un+l~i-I i, hiP+ =hP(un+l - un) ,

1 d g P ( x ) hP (x) =

2 x d x

w is the SOR parameter and uF are the nodal values after n iterations.

This works well provided ;t is not too large. To speed up convergence multigrid techniques could

be used (Terzopoulos, 1983).

We have found a simple and useful method for accelerating the process - by using activity flags. The basis for this is that after a few iterations the nodal values do not change substantially for much of the image - most of the work occurs at sites near edges. To capitalise on this we set a limit; if a nodal value does not change by more than this limit during an iteration then its activity flag is turned o f f and it is not included in the next iteration (saving the computat ion time for that node); if a nodal value changes by more than the limit then its activity flag is switched on and so are its neighbours ' .

The use of activity flags speeds up the minimisation by as much as a factor of 10 with no appre- ciable loss in accuracy. They would be of no benefit, however, in an implementat ion that used Single Instruction Multiple Data parallelism.

4. Results

4.1. 1D da ta

Figures 6 and 7 show examples of the graduated non-convexity method in action on the elastic string problem discussed earlier.

The minimum for each p was determined by using a gradient method. No more than 80 iterations in total were needed for each example.

The data is shown in Figure 6a. It has 3 steps, one of which is smaller than the other two. The parameters u and 2 were chosen so that the global opt imum will have discontinuities at the larger two steps, but will have no discontinuity at the smaller one. (For simple cases such as this, analytical methods can determine the global opt imum. This serves as a check on algorithm performance.)

The results of applying the graduated non- convexity to this data are shown in Figures 6b-d. Figure 6b is the global op t imum for the convex function F* - already the middle step has started to close up. The minima of F ~p) are then found as p is decreased in steps f rom 1 to 0. Figure 6c shows the minimum u(x) of F (0"7). Finally Figure 6d

shows the end of the process - the minimum of F. As can be seen the global opt imum has been found with two step edges (marked) and no discontinuity in the middle.

56


b) I ~ . ~ ~.

d) ~ ~

Figure 6. Snapshots of the graduated non-convexity method: (a) initial data; (b) optimum for convex function F*; (c) minimum

for F(°7I; (d) minimum for F.

b)

Figure 7. Snapshots of the graduated non-convexity method: (a) initial data; (b) optimum for convex function F*; (c) minimum

for F(°7); (d) minimum for F.

Figure 7 shows the data of Figure 6a with added white Gaussian noise (a = 10). As the scheme pro- ceeds the noise is smoothed, but underlying discontinuities in the data are preserved. This contrasts with linear filters, in which smoothing of noise tends to be accompanied by smoothing of underlying discontinuities. The discontinuities labelled in Figure 7d are the same as in Figure 6d, despite the added noise.

Theoretical analysis predicted that when steps are isolated (far apart compared to the ' interaction length' 2) then the scheme treats them indepen- dently. A given step is labelled as a discontinuity if its height exceeds 2h0, where h0=V~/22. When the steps are close - separation s < 2 - it can be shown that the effective threshold is raised by a factor 21/~. Figure 8 shows that the algorithm in- deed exhibits the expected behaviour, for isolated steps. In Figure 8a the penalty a is large and the threshold h0 is high. Consequently the optimal

solution for u(x) is continuous - all steps are below threshold. In Figures 8b, c and d, a is reduced and more edges are marked. In each case the graduated non-convexity method finds the theoretically predicted global minimum.

4.2. 2D images

Figure 9 compares the output of a 2D version of the algorithm with the output of a thresholded directional derivative of a Gaussian, using non- maximum suppression and hysteresis (Canny, 1983). The results are very similar.

The main problem with the weak continuity approach is that steep slopes are broken into several steps. This happens when the gradient of a slope in the data exceeds the 'gradient limit' ho/2. This can be avoided by minimising the curvature instead of the gradient (a thin plate model); this will be the subject of a later paper.

57


2ho

~ 2ho

d) ~ _ t 2ho

x

Figure 8. Edge detection using weak continuity for the data of Figure 6.

Acknowledgements

This work was supported by SERC grant G R / D 1439.6 and by the University of Edinburgh. The Royal Society of London 's IBM Research Fellow- ship supported A. Blake. We are very grateful to Bernard Buxton for many helpful comments.

Appendix. Ensuring convexity of F*

As described above we require a F* that is as close as possible to F but also convex. Function F* is defined, in terms of g* as in (8). This appendix describes how g* is chosen. We start by considering g* to have the following general form:

~AZ(Au) 2, if Idul <q, g*(Au) = la-c([Au[-r)2/2 if q<_ IAu] <r,

L_u, if IAu[ >_ r, (A1)

Figure 9.2D edge detection: (a) Image of an industrial part; (b) its edges using a directional Gaussian; (c) its edges using weak constraints, with a comparable contrast threshold setting to that

in (b).

where r 2 = a(2/c + 1/,~2), and q = a/)t2r. The mid range quadratic has been chosen so that

the function and its first derivatives are continuous at ]Aul =p and ]Au I =q. In the limit of large negative second derivative ( - c ) the function g* ap- proaches the original g. So the problem is to choose

58


the second derivative as negative as possible (then g* is close to g) but also ensure that F* is just convex (i.e. not strictly convex).

The convexity of F* can be determined f rom the Hessian matrix H, where

OZF *

Hij = OlgiO~j j "

It can be shown that provided H is positive semi- definite (it has eigenvalues 2 _> 0) then F* is convex. H is a (N+ 1)* (N+ 1) tri-diagonal matrix. The ith row is given by3:

" * " ' A " i - 6 t ui), i f j = i - 1 , ~2+g*"(Aui )+g*"(AUi+l) , if j = i ,

Hij= | - g*"(Aui+ l), i f j = i + l ,

{_0, otherwise.

I f all the nodal values lie in the mid range then H is a symmetric matrix:

H = 0 0 c 2 - 2 c c 0 . 0 0 0 c 2 - 2 c c

There is a simple formula for its eigenvalues, namely:

2s = 2 - 2c + 2c cos(sn/(N + 2)),

s = 1 . . . . . ( N + 1). (A2)

This gives the worst possible case; any other situa-

3 The first and last rows do not have this form but this does not invalidate the worst case convexity argument outlined above.

tion (where not all nodal values are in the mid range) will be 'more convex' than this.

Equation (A2) shows that provided c = 1/2 then all the eigenvalues will be non-negative and hence F* will be convex.

For other problems, where there may be no formula like (A2) for the eigenvalues, a convex func-

tion can still be formed provided a non-negative greatest lower bound is found for the eigenvalues.

References

Blake, A. (1983a). The least disturbance principle and weak constraints. Pattern Recognition Letters 1, 393-399.

Blake, A. (1983b). Parallel computation in low-level vision. Ph.D. Thesis, University of Edinburgh.

Canny, J.F. (1983). Finding edges and lines in images. S.M. thesis, M.I.T., Cambridge, MA.

Geman, S. and D. Geman (1984). Stochastic relaxation, Gibbs distribution, and Bayesian restoration of images. IEEE PAMI, Nov. 1984.

Grimson, W.E.L. (1981). From Images to Surfaces. MIT Press, Cambridge, MA.

Horn, B.K.P. and B.G. Schunk (1981). Determining optical flow. AI Journal 17, 185-203.

lkeuchi, K. and B.K,P. Horn (1981). Numerical shape from shading and occluding boundaries. AI Journal 17, 141-184.

Marroquin, J. (1984). Surface reconstruction preserving discontinuities. Memo 792, AI Laboratory, MIT, Cambridge, MA.

Terzopoulos, D. (1983). Multilevel computational processes for visual surface reconstruction, Computer Vision Graphics and Image Processing 24, 52-96.

Ullman, S. (1979). Relaxed and constrained optimisation'by local processes. Computer Graphics and Image Processing 10, 115-125.

59

localising discontinuities using weak continuity constraints

Documents