pattern recognition linear classifier by zaheer ahmad
TRANSCRIPT
![Page 1: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/1.jpg)
Pattern RecognitionLinear Classifiers
Zaheer Ahmad PhD Scholar
[email protected] of Computer Science University of Peshawar
![Page 2: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/2.jpg)
Agenda• Pattern Recognition
– Features and Patterns– Classifiers– Approaches– Design Cycle
• Linear Classification– Linear Discriminant Functions – Linear Separability– Fisher Discriminant Functions – Support Vector Machines(SVMs)
![Page 3: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/3.jpg)
What is pattern recognition?
• “The assignment of a physical object or event to one of several pre-specified categories” –Duda and Hart
• “The science that concerns the description or classification (recognition) of measurements” –Schalkoff
• “The process of giving names to observations x”, –Schürmann
• Pattern Recognition is concerned with answering the question “What is this?” –Morse
![Page 4: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/4.jpg)
Applications of PR
• Image processing • Computer vision • Speech recognition • Data Mining • Automated target recognition • Optical character recognition • Seismic analysis
• Man and machine diagnostics • Fingerprint identification• Industrial inspection • Financial forecast • Medical diagnosis • ECG signal analysis
![Page 5: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/5.jpg)
Terminology
• Recognition: During recognition (or classification) given objects are assigned to prescribed classes.
• classification is the problem of identifying which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known
• An algorithm that implements classification, especially in a concrete implementation, is known as a classifier
• A classifier is a machine which performs classification.
![Page 6: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/6.jpg)
Features • Feature is any distinctive aspect, quality or
characteristic of an object• Features may be symbolic (i.e., color) or numeric
(i.e., height) • The combination of features is a -dim column 𝑑 𝑑
vector called a feature vector • The -dimensional space defined by the feature 𝑑
vector is called the feature space – Objects are represented as points in feature space; the
result is a scatter plot
![Page 7: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/7.jpg)
Features
![Page 8: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/8.jpg)
a “good” feature vector? • The quality of a feature vector is related to its
ability to discriminate • It should include examples from different
classes • Examples from the same class should have
similar feature values • Examples from different classes have different
feature values
![Page 9: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/9.jpg)
More feature properties
![Page 10: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/10.jpg)
Pattern and Pattern Class • A pattern is an object, process or event that can be given a
name.• Pattern is a composite of traits or features characteristic of
an individual • In classification tasks, a pattern is a pair of variables { , } 𝑥 𝜔
where – 𝑥 is a collection of observations or features (feature vector) – 𝜔 is the concept behind the observation (label/category)
• A pattern class (or category) is a set of patterns sharing common attributes and usually originating from the same source.
• A class/ pattern class is a set of objects having some important properties in common
![Page 11: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/11.jpg)
Decision Boundary/Surface
• A line or curve separating the classes is a decision boundary
• The equation g(x) = 0 defines the decision surface that separates points assigned to the category ω1 from points assigned to the category ω2
• When g(x) is linear, the decision surface is a hyperplane• If x1 and x2 are both on the hyperplane then
![Page 12: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/12.jpg)
Decision Boundary
Slope intercept form of a Line(Straight Line):The equation of a line with a defined slope m can also be written as follows: y = mx + b
![Page 13: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/13.jpg)
Classifiers • The task of a classifier is to partition feature
space into class-labeled decision regions • Borders between decision regions are called
decision boundaries • The classification of feature vector consists 𝑥
of determining which decision region it belongs to, and assign to this class 𝑥
![Page 14: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/14.jpg)
Pattern recognition approaches
Statistical • Patterns classified based on an underlying statistical model of
the features – The statistical model is defined by a family of class-conditional
probability density functions ( | ) (Probability of 𝑝 𝑥 𝜔feature vector given class ) 𝑥 𝜔
Neural • Classification is based on the response of a network of
processing units (neurons) to an input stimuli (pattern) – “Knowledge” is stored in the connectivity and strength of the synaptic
weights – Trainable, non-algorithmic, black-box strategy
![Page 15: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/15.jpg)
• Very attractive since – it requires minimum a priori knowledge – with enough layers and neurons, ANNs can create any
complex decision region Syntactic • Patterns classified based on measures of structural similarity • “Knowledge” is represented by means of formal grammars or
relational descriptions (graphs) • Used not only for classification, but also for description
Typically, syntactic approaches formulate hierarchical descriptions of complex patterns built up from simpler sub patterns
![Page 16: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/16.jpg)
![Page 17: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/17.jpg)
The pattern recognition design cycle
Data collection • Probably the most time-intensive component of a PR project • How many examples are enough? Feature choice • Critical to the success of the PR problem
– “Garbage in, garbage out”
• Requires basic prior knowledge Model choice • Statistical, neural and structural approaches • Parameter settings
![Page 18: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/18.jpg)
Training • Given a feature set and a “blank” model,
adapt the model to explain the data • Supervised, unsupervised and reinforcement
learning Evaluation • How well does the trained model do? • Overfitting vs. generalization
![Page 19: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/19.jpg)
• Classification in which the decision boundary in the feature (input) space is linear
Linear Classification
• In linear classification the input space is split in (hyper-)planes, each with an assigned class
![Page 20: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/20.jpg)
Linear Separable
• If a hyperplanar decision boundary exists that correctly classify all the training samples for a c=2 class problem, the samples are said to be linearly separable.
![Page 21: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/21.jpg)
Linear Discriminant Function• A discriminant function that is a linear
combination of the components of x is called a linear discriminant function and can be written as
where w is the weight vector and w0 is the bias (or threshold weight).
![Page 22: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/22.jpg)
![Page 23: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/23.jpg)
Linear Classifiers
• Linear Classifiers– a linear classifier is a mapping which partitions
feature space using a linear function (a straight line, or a hyperplane)
– it is one of the simplest classifiers we can imagine• “separate the two classes using a straight line in feature
space”
– in 2 dimensions the decision boundary is a straight line
![Page 24: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/24.jpg)
2-Class Data with a Linear Decision Boundary
-4 -2 0 2 4 6 8 10 12 14-4
-2
0
2
4
6
8
Feature 1
Featu
re 2
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
DecisionBoundary
DecisionRegion 1
Decision Region 2
![Page 25: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/25.jpg)
Data that is Not “Linearly Separable”
2 3 4 5 6 7 8 9 10-1
0
1
2
3
4
5
6
Feature 1
Featu
re 2
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
DecisionRegion 2
DecisionRegion 1
DecisionBoundary
![Page 26: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/26.jpg)
Fisher’s linear discriminant
• A simple linear discriminant function is a projection of the data down to 1-D.– So choose the projection that gives the best separation of
the classes. • An obvious direction to choose is the direction of the line
joining the class means.– But if the main direction of variance in each class is not
orthogonal to this line, this will not give good separation (see the next figure).
• Fisher’s method chooses the direction that maximizes the ratio of between class variance to within class variance.– This is the direction in which the projected points contain
the most information about class membership (under Gaussian assumptions)
![Page 27: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/27.jpg)
• Classes well-separated in D-space may strongly overlap in 1-dimension – Adjust component of the weight vector w – Select projection to maximize class-separation
• Can be generalized for multiple classes
![Page 28: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/28.jpg)
A picture showing the advantage of Fisher’s linear discriminant.
When projected onto the line joining the class means, the classes are not well separated.
Fisher chooses a direction that makes the projected classes much tighter, even though their projected means are less far apart.
![Page 29: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/29.jpg)
Math of Fisher’s linear discriminants
• What linear transformation is best for discrimination?
• The projection onto the vector separating the class means seems sensible:
• But we also want small variance within each class:
• Fisher’s objective function is:
xwTy
12 mmw
)(
)(
222
121
2
1
mys
mys
Cnn
Cnn
22
21
212 )(
)(ss
mmJ
wbetween
within
![Page 30: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/30.jpg)
)(:
)()()()(
)()(
)()(
121
2211
1212
22
21
212
21
mmSw
mxmxmxmxS
mmmmS
wSw
wSww
W
Cn
Tnn
Cn
TnnW
TB
WT
BT
solutionOptimal
ss
mmJ
More math of Fisher’s linear discriminants
![Page 31: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/31.jpg)
Support Vector Machines(SVMs)
• a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks.
• a good separation is achieved by the hyperplane that has the largest distance to the nearest training data point of any class
• the larger the margin the lower the generalization error of the classifier.
A support vector machine (SVM) is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis.
![Page 32: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/32.jpg)
A separating hypreplane
1iy 1iy
Separating Hyperplane
But There are many possibilities
for such hyperplanes !!
0w x b
1x
2x
![Page 33: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/33.jpg)
Separating Hyperplanes
1iy 1iy
Yes, There are many possible separating hyperplanes
It could be
Which one should we choose!
this one or this or this or maybe….!
![Page 34: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/34.jpg)
Choosing a separating hyperplane:
ix'x
-Hyperplane should be as far as possible from any sample point.
-This way a new data that is close to the old samples will be classified correctly.
Good generalization!
![Page 35: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/35.jpg)
Choosing a separating hyperplane.The SVM approach: Linear separable case
-The SVM idea is to maximize the distance between The hyperplane and the closest sample point.
In the optimal hyper- plane:
The distance to the closest negative point =
The distance to the closest positive point.
![Page 36: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/36.jpg)
Choosing a separating hyperplane.The SVM approach: Linear separable case
ix
Mar
gin
d d
d
These are Support Vectors
Support vectors are the samples closest to the separating hyperplane.
![Page 37: Pattern Recognition Linear Classifier by Zaheer Ahmad](https://reader033.vdocuments.site/reader033/viewer/2022052504/55364b564a7959e81d8b4918/html5/thumbnails/37.jpg)
thanK yoU