machine vision - athena.ecs.csus.eduathena.ecs.csus.edu/~belkhouf/mvsummary9s15.pdf ·...

1

Machine visionSummary # 9: Classification and pattern recognition

Fig. 1. A classification problem

PATTERN RECOGNITION AND CLASSIFICATION

One of the main tasks and applications of machine visionis to perform object recognition and classification. In a typicalclassification problem, we have different objects that we wantto assign to different classes. The basic idea is to extractfeatures that can be used to distinguish between the membersof different classes. Members of different classes tend toform separate clusters. Figure 1 shows a simple application ofclassification using machine vision. The fruit is loaded on theconveyer belt. The camera measures the properties of the fruit.Based on the feature set, the classification algorithm decideswhere to put the fruit. Three important aspects are consideredin any classification problem as shown in figure 2: the classes,the feature set, and the discrimination function.

Example 1

Class membership:• Class 1: Dogs.• Class 2: Cats.

How to distinguish between dogs and cats?• Weight Less than 10 kg ⇒ it’s a cat• Weight more than 10 Kg ⇒ it’s a dog.

There are cats that weight more than 10 kg and dogs thatweight less than 10 kg. We have two kinds of errors: catsclassified as dogs and dogs classified as cats. How to improvethe classification process?

• Select better threshold for the weight.

Fig. 2. A classification problem

Fig. 3. Block diagrams for classification

• Use more features: We cannot get good results with onefeature only. The solution is to use multiple features.

Feature vector

The use of multiple features helps improving the classifi-cation results. When we combine features, we can use logic

Machine vision, spring 2015 Summary 9

Fig. 4. Three different classes

operators such as and/ or. In the case of the previous examplecan take weight and height as features: these two featuresare dependent. Dependency of features create some kind ofredundancy. Independent features gives better classificationresults.

Examples of simple features

Figure 4 shows a simple classification problem where thegoal is to classify letters belonging to three different classes.Shape signatures, area, perimeter, color, Euler number arepossible features. In order to decide about the features, weneed to have statistical data. We need to establish criteria sothat same class members tend to cluster and those belongingto different classes tend to separate. The results of the featureset chosen for the letters of figure 4 are shown in figure 5

DECISION THEORETICAL APPROACH

Classification is done based on a decision function. Thisfunction characterizes the boundary between classes. We de-fine the following

• Pattern vector or feature set:

x =

x1

x1

...xn

(1)

• Classes:

K =

K1

K2

...Kl

(2)

The objective of classification is to assign objects to theirclasses. We define a discrimination function g(x) for each

95 100 105 110 115 120 125110

115

120

125

130

135

140

145

150

B

C

A

Fig. 5. Feature set representation

class. The decision rule is derived from the following equation:

g1(x)− g2(x) = 0 (3)

There are two cases:

• linearity separable sets• nonlinearly separable sets

Linear discrimination functions are the simplest. In this case,we talk about classes that are linearly separable. In general wehave

• 1 feature ⇒ discrimination function is a point.• 2 features ⇒ discrimination function is a line or a curve.• 3 features ⇒ discrimination function is plane or a surface.• n (n > 3) features ⇒ discrimination function is a

hyperplane (linearly separable) or hypersurface.

Three traditional classification methods are discussed:

• Minimum centroid distance (also called minimum dis-tance classifier)

• Neural networks (NN)• Support vector machine (SVM)

Minimum centroid distance classifier

Consider two classes with centroid X̄1 and X̄2. X̄1, X̄2 arevectors. The minimum distance between centroid satisfies:

|X̄1 −X| = |X̄2 −X| (4)(X̄1 −X)2 = (X̄2 −X)2 (5)

X̄21 − 2XX̄1 +X2 = X2

2 − 2XX̄2 + X̄22 (6)

Example 2

Assuming the average dog weight is 10kg and the averagecat weight is 5kg, solve using the minimum distance classifier.

2


The equation for the minimum distance classifier gives us:

(weight− 5)2 = (weight− 10)2

weight2 − 10weight+ 25 = weight2 − 200weight+ 100

weight = 7.5

Thus, based on the minimum distance classifier:• Weight > 7.5kg =⇒ it’s a dog• Weight < 7.5kg =⇒ it’s a cat

Example 3

Different fonts are used for letters B and C. A shapesignature method is used to derive the feature set, whichconsists of the normalized angle and the distance to thecentroid of the letter. The numerical values for the featuresare shown in figure 6. We have for the mean values:

• Letter B

d1 = 0.4725 (7)a1 = 0.4230 (8)

• Letter C

d1 = 0.4535 (9)a1 = 0.1865 (10)

Find and draw the boundary line.We use the minimum distance approach

(X − X̄1)2 = (X − X̄2)

2 (11)

This is expressed as follows

(X − X̄1)T (X − X̄1) =

[x1 − d1 x2 − a1

] [ x1 − d1x2 − a1

](12)

By putting g1(x) = g2(x), we get:

x21 − 2x1d1 + d21 + x2

2 − 2a1x2 + a21 =

x21 − 2x1d2 + d22 + x2

2 − 2a2x2 + a22

2x2(a1 − a2) =

2x1(d2 − d1) + d21 + a21 − d22 − a22

x2 = −0.0794x1 + 0.343

y = −0.0794x+ 0.343

The boundary line and the code are shown in figure 7 and 8,respectively.

In a classification problem, the machine automatically needsto assign objects to their respective classes. In the learningphase, the goal is to find a boundary function that separatesthe classes. There are two cases:

• linearity separable classes• nonlinearly separable classes.

There are many applications of classification such as finger-print recognition, character or handwritten words recognition,DNA sequence, and automatic target recognition.

ARTIFICIAL NEURAL NETWORKS

Inspired from biological neural networks, artificial neuralnetworks have high level of parallelism, distributed represen-

Fig. 6. Feature sets based on shape signature

Fig. 7. Decision line for the example 3

tation and computation, and the memory is integrated to theprocessor. Artificial neural networks have the ability to learn.Learning can be used to solve various problems in patternrecognition and classification, control and prediction. Figure13 illustrates the neural network. The decision function is alinear combination of weights, with

• g(x) ≥ 0 ⇒ output = 1 ⇒ class K1• g(x) < 0 ⇒ output = −1 ⇒ class K2

The question is how to find the decision function g(x), i.e.,how to determine the weights and the bias.

3


Fig. 8. Code for minimum distance classifier

Fig. 9. Principle of neural nets

ALGORITHM

The neural network learning algorithm can be summarizedas follows:

• Initialize the weights wi and bias b to small randomnumbers

• Present a feature set x = (x1, x2, ..., xn)T and evaluate

the output of the neuron• Update the weights and bias according to

w(k + 1) = w(k) + (d− y)x (13)b(k + 1) = b(k) + (d− y) (14)

where d is the desired output and y is the actual output.This rule is called the perceptron learning rule.

Learning occurs only when the neural networks makes amistake. It is proven that the method converges when theclasses are linearly separable.

EXAMPLE

We have two classes:

• Class 1: point (3,1) and d = 1• Class 2: point (1,1) and d = 0

This is shown in figure 10. The output is defined as follows

output =

{1 if g(x > 0)0 if g(x ≤ 0)

}(15)

• Initialization: w = [0, 0]T ,b = 0, g(x) = 0 ⇒ y = 0.• At point (3, 1)

d = 1

g(x) = 0

y = 0

w =

[00

]+[1− 0

] [ 31

]=

[31

]b = 0 + (1− 0) = 1

• At point (1, 1)

d = 0

g(x) = 3 · 1 + 1 · 1 + 1 = 5

y = 1

w =

[31

]+[0− 1

] [ 11

]=

[20

]b = 1 + (0− 1) = 0

• At point (3, 1)

d = 1

g(x) =[2 0

] [ 31

]+ 0 = 6

y = 1

w =

[20

]+[0] [ 3

1

]=

[20

]b = 0

• At point (1, 1)

d = 0

g(x) =[2 0

] [ 11

]+ 0 = 2

y = 1

w =

[20

] [−1

] [ 11

]=

[1−1

]b = 0− 1 = −1

4


0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

(3, 1)(1, 1)

Fig. 10. Decision line for the example

• At point (3, 1)

d = 1

g(x) =[1 −1

] [ 31

]− 1 = 3− 1− 1 = 1

y = 1

w =

[1−1

]+

[0] [ 3

1

]=

[1−1

]b = −1 + 0 = −1

• At point (1, 1)

d = 0

g(x) =[1 −1

] [ 11

]− 1 = −1

y = 0

w =

[1−1

] [0] [ 1

1

]=

[1−1

]b = = −1

The method does converge to

w = =

[1−1

](16)

b = −1 (17)

The equation for the decision function is then

g(x) =∑

wixi + b (18)

g(x) = 1 · x1 − 1 · x2 − 1 (19)(20)

The separation line is then

g(x) = 1 · x1 − 1 · x2 − 1 = 0 (21)y = x− 1 (22)

Fig. 11. There are many possible solutions for the decision line

Matlab code allowing to use the Perceptron learning rule isbelow

class1=[1 1]class2=[3 1]x=[class1;class2]

x=x’t=[1 0]y=tfigure(2)plotpv(x,y);net = perceptron;net = train(net,x,y);view(net);figure(2)plotpc(net.IW{1},net.b{1});

SUPPORT VECTOR MACHINE

Recall that the equation of hyperplane can be written as

g(x) =∑

wixi + b = 0 (23)

This equation can be written under matrix form as follows

g(x) = wT x + b (24)

where w and x are vectors.If the data is linearly separable, there is an infinite number

of hyperplanes as shown in figure 11. Support vector machinefinds the optimal solution. It solves an optimization problem.For now, we assume linear separability, with 2 features, classesare separated by a line. With reference to figure 12, thefollowing lines

L1 : w · x + b = 1 (25)L2 : w · x + b = −1 (26)

5


Fig. 12. Illustration of the support vectors

pass through the limit points. These points are called supportvectors. The decision line is

L : wT · x + b = 0 (27)

The margin width is

margin =2

||w||(28)

The support vectors as well as the decision line are shown infigure 12.

The SVM problem is formulated as an optimization problemas follows

maximize2

||w||(29)

Subject to:

For yi = 1, wT · xi + b ≥ 1

For yi = −1, wT · xi + b ≤ −1

The problem can be written as

minimize1

2||w||2 (30)

Subject to:

yi(wT · xi + b) ≥ 1

In previous system, ||w||2 is used instead of ||w||. The twoapproaches are equivalent.

Dual form

The problem is formulated as Lagrangian function

Maximize Q =N∑j=1

αj −1

2

N∑j=1

N∑i=1

αiαjyiyj(xj · xi) (31)

Subject to:

αk ≥ 0 (32)N∑j=1

αjyj = 0 (33)

w =

N∑i∈SV

αiyixi (34)

b = yi −N∑j=1

αjyj(xj · xi) (35)

It is possible to solve using Matlab function quadprog.Note that α ̸= 0 corresponds to the support vectors. In theparticular case where N = 2, the sum in (31) can be writtenas ∑2

j=1 αjα1yjy1(xj · x1) + αjα2yjy2(xj · x2) =

α1α1y1y1(x1 · x1) + α1α2y1y2(x1 · x2)+α2α1y2y1(x2 · x1) + α2α2y2y2(x2 · x2)

(36)

Thus∑2j=1 αjα1yjy1(xj · x1) + αjα2yjy2(xj · x2) =

α21(x1 · x1) + α2

2(x2 · x2)− 2α1α2(x1 · x2)(37)

Example 1

We consider the example of last time where

x1 =

[31

](38)

x2 =

[11

](39)

The dot product gives

x1 · x1 = 3× 3 + 1× 1 = 10 (40)x2 · x2 = 2 (41)x1 · x2 = 4 (42)

The cost function is given by

Q = α1 + α2 −[5α2

1 + α22 − 4α1α2

](43)

α1, α2 > 0 (44)α1 = α2 (45)

Q = 2α1 − 2α21 (46)

and

∂Q

∂α1= 2− 4α1 = 0 (47)

α1 = 0.5 = α2 (48)

w = α1y1x1 + α2y2x2 (49)

= 0.5 ·[

31

]+ 0.5(−1)

[11

]=

[10

](50)

6


0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

(3, 1)(1, 1)

Using SVM

Using NN

Fig. 13. Decision line for the example

b = 1− [α1y1x1 · x1 + α2y2x2 · x1] (51)= 1− [0.5(1)(10) + 0.5(−1)(4)] (52)= −2 (53)

The discrimination function is

w1x1 + w2x2 + b = 0 (54)x1 + 0− 2 = 0 (55)

x1 = 2 (56)

which is a vertical line. The result is shown in figure 13. Themargin width can be easily verified as follows

margin =2

||w||=

2

1(57)

The Kernel trick

Figure 14 shows a particular case where data is not linearlyseparable. One possibility is to use a nonlinear function ϕ toseparate the data. This is called the Kernel trick.

Fig. 14. The Kernel trick

7

machine vision - athena.ecs.csus.eduathena.ecs.csus.edu/~belkhouf/mvsummary9s15.pdf ·...

Documents