introduction to pattern recognition · what is pattern recognition? a pattern is an object, process...
Post on 01-Jun-2020
19 Views
Preview:
TRANSCRIPT
03/03/2016
1
EE-002
Computational Learning &
Pattern Recognition
Turgay IBRIKCI
Çukurova University
Electrical-Electronics Engineering Department
2
Where or how to find me?
Associate Prof. Dr. Turgay IBRIKCI
Room # 305 Thursdays 9:30- 12:00
(322) 338 6868 / 139
turgayibrikci@hotmail.com
3
Course Outline
The course is divided in two parts: theory and practice.
1. Theory covers basic topics in pattern recognition theory and applications with computational learning.
2. Practice deals with basics of MATLAB and implementation of pattern recognition algorithms. We assume that you know MATLAB or you will learn yourself
4
Course Grading
Grading the Class:
Project 40%
Report
Presentation (Week 14 ; 20 mins)
Final Exam 20% (Week 15-We decide together)
Homeworks 40% (At least 4 homeworks)
Full attending the class 10% (Required Bonus)
In This Course
How should objects to be classified be represented?
What algorithms can be used for recognition
(or matching)?
How should learning (training) be done?
Much of the topics concern statistical classification methods.
They include generative methods such as those based on Bayes decision theory and related techniques of parameter estimation and density estimation.
Apply the algorithms with MATLAB
What is pattern recognition?
A pattern is an object, process or event that can be given a name.
A pattern class (or category) is a set of patterns sharing common attributes and usually originating from the same source.
During recognition (or classification) given objects are assigned to prescribed classes.
A classifier is a machine which performs classification.
“The assignment of a physical object or event to one of several prespecified categeries” -- Duda & Hart
03/03/2016
2
Examples of applications
• Optical Character
Recognition (OCR)
• Biometrics
• Diagnostic systems
• Military applications
• Handwritten: sorting letters by postal code, input device for PDA‘s.
• Printed texts: reading machines for blind people, digitalization of text documents.
• Face recognition, verification, retrieval.
• Finger prints recognition.
• Speech recognition.
• Medical diagnosis: X-Ray, EKG analysis.
• Machine diagnostics, waster detection.
• Automated Target Recognition (ATR).
• Image segmentation and analysis (recognition from aerial or satelite photographs).
What are Patterns?
Laws of Physics & Chemistry generate patterns.
Patterns in Astronomy
Humans tend to see patterns everywhere.
Patterns in Biology
Applications: Biometrics, Computational Anatomy, Brain Mapping.
Patterns of Brain Activity
Relations between brain activity, emotion, cognition, and behaviour.
Variations of Patterns
Patterns vary with expression, lighting, occlusions.
03/03/2016
3
Speech Patterns
Acoustic signals.
Data
examples
Data
Data
examples
Data
Data
examples
Data
Data
examples
Data
Goal of Pattern Recognition
Recognize Patterns. Make decisions about patterns.
Visual Example – is this person happy or sad?
Speech Example – did the speaker say “Yes” or “No”?
Physics Example – is this an atom or a molecule?
03/03/2016
4
Approaches
Statistical PR: based on underlying statistical model of patterns and pattern classes.
Structural (or syntactic) PR: pattern classes represented by means of formal structures as grammars, automata, strings, etc.
Neural networks: classifier is represented as a network of cells modeling neurons of the human brain (connectionist approach).
Basic concepts
y x
nx
x
x
2
1 Feature vector
- A vector of observations (measurements).
- is a point in feature space .
Hidden state
- Cannot be directly measured.
- Patterns with equal hidden state belong to the same class.
Xx
x X
Yy
Task
- To design a classifer (decision rule)
which decides about a hidden state based on an observation.
YX:q
Pattern
Example
x
2
1
x
x
height
weight
Task: jockey-hoopster recognition.
The set of hidden state is
The feature space is
},{ JHY2X
Training examples )},(,),,{( 11 ll yy xx
1x
2x
Jy
Hy Linear classifier:
0)(
0)()q(
bifJ
bifH
xw
xwx
0)( bxw
Example: Salmon versus Sea
Bass
Generative methods attempt to model the full appearance of Salmon and Sea Bass.
Discriminative methods extract features sufficient to make the decision (e.g. length and brightness).
Fish Features. Length.
Salmon are usually shorter than Sea Bass.
Fish Features. Lightness.
Sea Bass are usually brighter than Salmon.
03/03/2016
5
Components of PR system
Sensors and
preprocessing
Feature
extraction Classifier
Class
assignment
• Sensors and preprocessing.
• A feature extraction aims to create discriminative features good for classification
• A classifier.
• A teacher provides information about hidden state -- supervised learning.
• A learning algorithm sets PR from training examples.
Learning algorithm Teacher
Pattern
Feature extraction
Task: to extract features which are good for classification.
Good features: • Objects from the same class have similar feature values.
• Objects from different classes have different values.
“Good” features “Bad” features
Feature extraction methods
km
m
m
2
1
nx
x
x
2
11φ
2φ
nφ
km
m
m
m
3
2
1
nx
x
x
2
1
Feature extraction Feature selection
Problem can be expressed as optimization of parameters of featrure extractor .
Supervised methods: objective function is a criterion of separability (discriminability) of labeled examples, e.g., linear discriminat analysis (LDA).
Unsupervised methods: lower dimesional representation which preserves important characteristics of input data is sought for, e.g., principal component analysis (PCA).
φ(θ)
Classifier
A classifier partitions feature space X into class-labeled regions such that
||21 YXXXX }0{||21 YXXX and
1X 3X
2X
1X
1X
2X
3X
The classification consists of determining to which region a feature vector x belongs to. Borders between decision boundaries are called decision regions.
Representation of classifier
A classifier is typically represented as a set of discriminant functions
||,,1,:)(f YX ii x
The classifier assigns a feature vector x to the i-the class if )(f)(f xx ji ij
)(f1 x
)(f2 x
)(f || xY
maxx y
Feature vector
Discriminant function
Class identifier
30
An Introduction
Bayesian Decision Theory came long before Version Spaces, Decision Tree Learning and Neural Networks. It was studied in the field of Statistical Theory and more specifically, in the field of
Pattern Recognition.
Bayesian Decision Theory is at the basis of important learning schemes such as the Naïve Bayes Classifier, Learning Bayesian Belief Networks and the EM Algorithm.
03/03/2016
6
Bayesian decision making
• The Bayesian decision making is a fundamental statistical approach which
allows to design the optimal classifier if complete statistical model is known.
Definition: Obsevations
Hidden states
Decisions
A loss function
A decision rule
A joint probability D
DX:q
)p( y,x
XY
RDYW :
Task: to design decision rule q which minimizes Bayesian risk
Yy Xx
yy )),W(q(),p(R(q) xx
32
Bayes Theorem
Goal: To determine the most probable hypothesis, given
the data D plus any initial knowledge about the prior probabilities of the various hypotheses in H.
Prior probability of h, P(h): it reflects any
background knowledge we have about the chance that h is a correct hypothesis (before having observed the data).
Prior probability of D, P(D): it reflects the
probability that training data D will be observed given no knowledge about which hypothesis h holds.
Conditional Probability of observation D, P(D|h): it denotes the probability of observing data D
given some world in which hypothesis h holds.
33
Bayes Theorem (Cont’d)
Posterior probability of h, P(h|D): it represents the probability that h holds given the observed training data D. It reflects our confidence that h holds after we have seen the training data D and it is the quantity that Machine Learning researchers are interested in.
Bayes Theorem allows us to compute P(h|D):
P(h|D)=P(D|h)P(h)/P(D) 34
Bayesian Belief Networks
The Bayes Optimal Classifier is often too costly to apply.
The Naïve Bayes Classifier uses the conditional independence assumption to defray these costs. However, in many cases, such an assumption is overly restrictive.
Bayesian belief networks provide an intermediate approach which allows stating conditional independence assumptions that apply to subsets of the variable.
35
Representation in
Bayesian Belief Networks
Storm BusTourGroup
Lightning Campfire
Thunder ForestFire
Each node is asserted to be conditionally independent of its non-descendants, given its immediate parents
Associated with each node is a conditional probability table, which specifies the conditional distribution for the variable given its immediate parents in the graph
36
Inference in Bayesian
Belief Networks
A Bayesian Network can be used to compute the probability distribution for any subset of network variables given the values or distributions for any subset of the remaining variables.
Unfortunately, exact inference of probabilities in general for an arbitrary Bayesian Network is known to be NP-hard.
In theory, approximate techniques (such as Monte Carlo Methods) can also be NP-hard, though in practice, many such methods were shown to be useful.
03/03/2016
7
Example of Bayesian task
Task: minimization of classification error.
A set of decisions D is the same as set of hidden states Y.
0/1 - loss function used
yif
yify
)q(1
)q(0)),W(q(
x
xx
The Bayesian risk R(q) corresponds to probability of
misclassification.
The solution of Bayesian task is
)p(
)p()|p(maxarg)|(maxargR(q)minargq *
q
*
x
xx
yyypy
yy
Limitations of Bayesian approach
• The statistical model p(x,y) is mostly not known therefore learning must be employed to estimate p(x,y) from training examples {(x1,y1),…,(x,y)} -- plug-in Bayes.
• Non-Bayesian methods offers further task formulations:
• A partial statistical model is avaliable only:
• p(y) is not known or does not exist.
• p(x|y,) is influenced by a non-random intervetion .
• The loss function is not defined.
• Examples: Neyman-Pearson‘s task, Minimax task, etc.
Discriminative approaches
Given a class of classification rules q(x;θ) parametrized by
θ the task is to find the “best” parameter θ* based on a set of training examples {(x1,y1),…,(x,y)} -- supervised learning.
The task of learning: recognition which classification rule is to
be used.
The way how to perform the learning is determined by a
selected inductive principle.
Learning Theory
Both Generative and Discriminative methods require training data to learn the models/features/decision rules.
Machine Learning concentrates on learning discrimination rules.
Key Issue: do we have enough training data to learn?
Empirical risk minimization
principle
The true expected risk R(q) is approximated by empirical risk
1emp )),;W(q(
1));(q(R
iii yx θxθ
with respect to a given labeled training set {(x1,y1),…,(x,y)}.
The learning based on the empirical minimization principle is defined as
));(q(Rminarg emp* θxθ
θ
Examples of algorithms: Perceptron, Back-propagation, etc.
Overfitting and underfitting
Problem: how rich class of classifications q(x;θ) to use.
underfitting overfitting good fit
Problem of generalization: a small emprical risk Remp does not
imply small true expected risk R.
03/03/2016
8
Structural risk minimization
principle
An upper bound on the expected risk of a classification rule qQ
)1
log,,1
(R(q)RR(q)
hstremp
where is number of training examples, h is VC-dimension of class
of functions Q and 1- is confidence of the upper bound.
SRM principle: from a given nested function classes Q1,Q2,…,Qm,
such that mhhh 21
select a rule q* which minimizes the upper bound on the expected risk.
Statistical learning theory -- Vapnik & Chervonenkis
Machine Learning is…
Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
Machine Learning is…
Machine learning is programming computers to optimize a performance criterion using example data or past experience. -- Ethem Alpaydin
The goal of machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest. -- Kevin P. Murphy
The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions. -- Christopher M. Bishop
Machine Learning is…
Machine learning is about predicting the future based on the past.
-- Hal Daume III
Machine Learning is…
Machine learning is about predicting the future based on the past. -- Hal Daume III
Training
Data
model/
predictor
past
model/
predictor
future
Testing
Data
Supervised learning
Supervised learning: given labeled examples
label
label1
label3
label4
label5
labeled examples
examples
03/03/2016
9
Supervised learning
Supervised learning: given labeled examples
model/
predictor
label
label1
label3
label4
label5
Supervised learning
model/
predictor
Supervised learning: learn to predict new example
predicted label
Supervised learning:
classification
Supervised learning: given labeled examples
label
apple
apple
banana
banana
Classification: a finite set of
labels
Classification Example
Differentiate between low-risk and high-risk customers from their income and savings
Supervised learning:
regression
Supervised learning: given labeled examples
label
-4.5
10.1
3.2
4.3
Regression: label is real-valued
Regression Example
Price of a used car
x : car attributes (e.g. mileage)
y : price
y = wx+w0
54
03/03/2016
10
Regression Applications
Economics/Finance: predict the value of a stock Epidemiology Car/plane navigation: angle of the steering wheel, acceleration, … Temporal trends: weather over time …
Supervised learning:
ranking
Supervised learning: given labeled examples
label
1
4
2
3
Ranking: label is a ranking
Ranking example
Given a query and
a set of web pages,
rank them according
to relevance
Unsupervised learning
Unupervised learning: given data, i.e. examples, but no labels
Unsupervised learning
applications
learn clusters/groups without any label
customer segmentation (i.e. grouping) image compression bioinformatics: learn motifs …
Unsupervised learning
Input: training examples {x1,…,x} without information about
the hidden state.
Clustering: goal is to find clusters of data sharing similar properties.
Classifier
Learning
algorithm
θ
},,{ 1 xx },,{ 1 yy
Classifier
ΘY)(X: L
YΘX :q
Learning algorithm
(supervised)
A broad class of unsupervised learning algorithms:
03/03/2016
11
Example of unsupervised
learning algorithm
k-Means clustering:
Classifier
||||minarg)q(,,1
iki
y mxx
Goal is to minimize
1
2)q( ||||
ii ixmx
ij
j
i
iII
,||
1xm })q(:{ ij ji xI
Learning algorithm
1m
2m
3m
},,{ 1 xx
},,{ 1 kmmθ
},,{ 1 yy
References
Books
Theodoridis, Koutroumbas Pattern Recognition( 4th Edition, 2004)
Duda, Heart: Pattern Classification and Scene Analysis. J. Wiley & Sons, New York, 1982. (2nd edition 2000).
Fukunaga: Introduction to Statistical Pattern Recognition. Academic Press, 1990.
Bishop: Neural Networks for Pattern Recognition. Claredon Press, Oxford, 1997.
Schlesinger, Hlaváč: Ten lectures on statistical and structural pattern recognition. Kluwer Academic Publisher, 2002.
Journals
Journal of Pattern Recognition Society.
IEEE transactions on Neural Networks.
Pattern Recognition and Machine Learning.
Slices : Vojtěch Franc
top related