methods in medical image analysis statistics of pattern recognition: classification and clustering...

41
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University of Pittsburgh Computer Science

Upload: gregory-shields

Post on 11-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Methods in Medical Image Analysis

Statistics of Pattern Recognition: Classification and Clustering

Some content provided by Milos Hauskrecht, University of Pittsburgh Computer Science

Page 2: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

ITK Questions?

Page 3: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Classification

Page 4: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Classification

Page 5: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Classification

Page 6: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Features

• Loosely stated, a feature is a value describing something about your data points (e.g. for pixels: intensity, local gradient, distance from landmark, etc)

• Multiple (n) features are put together to form a feature vector, which defines a data point’s location in n-dimensional feature space

Page 7: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Feature Space

• Feature Space -– The theoretical n-dimensional space occupied

by n input raster objects (features). – Each feature represents one dimension, and

its values represent positions along one of the orthogonal coordinate axes in feature space.

– The set of feature values belonging to a data point define a vector in feature space.

Page 8: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Statistical Notation

• Class probability distribution:

p(x,y) = p(x | y) p(y)

x: feature vector – {x1,x2,x3…,xn}

y: class

p(x | y): probabilty of x given y

p(x,y): probability of both x and y

Page 9: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Example: Binary Classification

Page 10: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Example: Binary Classification

• Two class-conditional distributions:

p(x | y = 0) p(x | y = 1)

• Priors:

p(y = 0) + p(y = 1) = 1

Page 11: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Modeling Class Densities

• In the text, they choose to concentrate on methods that use Gaussians to model class densities

Page 12: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Modeling Class Densities

Page 13: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Generative Approach to Classification

1. Represent and learn the distribution:

p(x,y)

2. Use it to define probabilistic discriminant functionse.g.

go(x) = p(y = 0 | x)

g1(x) = p(y = 1 | x)

Page 14: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Generative Approach to Classification

Typical model:

p(x,y) = p(x | y) p(y)

p(x | y) = Class-conditional distributions (densities)

p(y) = Priors of classes (probability of class y)

We Want:

p(y | x) = Posteriors of classes

Page 15: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Class Modeling

• We model the class distributions as multivariate Gaussians

x ~ N(μ0, Σ0) for y = 0

x ~ N(μ1, Σ1) for y = 1

• Priors are based on training data, or a distribution can be chosen that is expected to fit the data well (e.g. Bernoulli distribution for a coin flip)

Page 16: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Making a class decision

• We need to define discriminant functions ( gn(x) )

• We have two basic choices:– Likelihood of data – choose the class (Gaussian) that

best explains the input data (x):

– Posterior of class – choose the class with a better posterior probability:

Page 17: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Calculating Posteriors

• Use Bayes’ Rule:

• In this case,

)(

)()|()|(

BP

APABPBAP =

Page 18: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Linear Decision Boundary

• When covariances are the same

Page 19: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Linear Decision Boundary

Page 20: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Linear Decision Boundary

Page 21: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Quadratic Decision Boundary

• When covariances are different

Page 22: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Quadratic Decision Boundary

Page 23: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Quadratic Decision Boundary

Page 24: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Clustering• Basic Clustering Problem:

– Distribute data into k different groups such that data points similar to each other are in the same group

– Similarity between points is defined in terms of some distance metric

• Clustering is useful for:– Similarity/Dissimilarity analysis

• Analyze what data point in the sample are close to each other

– Dimensionality Reduction• High dimensional data replaced with a group (cluster) label

Page 25: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Clustering

Page 26: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Clustering

Page 27: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Distance Metrics

• Euclidean Distance, in some space (for our purposes, probably a feature space)

• Must fulfill three properties:

Page 28: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Distance Metrics

• Common simple metrics:

– Euclidean:

– Manhattan:

• Both work for an arbitrary k-dimensional space

Page 29: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Clustering Algorithms

• k-Nearest Neighbor

• k-Means

• Parzen Windows

Page 30: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

k-Nearest Neighbor

• In essence, a classifier

• Requires input parameter k– In this algorithm, k indicates the number of

neighboring points to take into account when classifying a data point

• Requires training data

Page 31: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

k-Nearest Neighbor Algorithm

• For each data point xn, choose its class by finding the most prominent class among the k nearest data points in the training set

• Use any distance measure (usually a Euclidean distance measure)

Page 32: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

k-Nearest Neighbor Algorithm

++

++

-

-

-

-

-

-e1

1-nearest neighbor:the concept represented by e1

5-nearest neighbors:q1 is classified as negative

q1

Page 33: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

k-Nearest Neighbor

• Advantages:– Simple– General (can work for any distance measure you

want)

• Disadvantages:– Requires well classified training data– Can be sensitive to k value chosen– All attributes are used in classification, even ones that

may be irrelevant– Inductive bias: we assume that a data point should be

classified the same as points near it

Page 34: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

k-Means

• Suitable only when data points have continuous values

• Groups are defined in terms of cluster centers (means)

• Requires input parameter k– In this algorithm, k indicates the number of

clusters to be created

• Guaranteed to converge to at least a local optima

Page 35: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

k-Means Algorithm

• Algorithm:1. Randomly initialize k mean values

2. Repeat next two steps until no change in means:1. Partition the data using a similarity measure

according to the current means

2. Move the means to the center of the data in the current partition

3. Stop when no change in the means

Page 36: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

k-Means

Page 37: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

k-Means

• Advantages:– Simple– General (can work for any distance measure you want)– Requires no training phase

• Disadvantages:– Result is very sensitive to initial mean placement– Can perform poorly on overlapping regions– Doesn’t work on features with non-continuous values (can’t

compute cluster means)– Inductive bias: we assume that a data point should be classified

the same as points near it

Page 38: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Parzen Windows

• Similar to k-Nearest Neighbor, but instead of using the k closest training data points, its uses all points within a kernel (window), weighting their contribution to the classification based on the kernel

• As with our classification algorithms, we will consider a gaussian kernel as the window

Page 39: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Parzen Windows

• Assume a region defined by a d-dimensional Gaussian of scale σ

• We can define a window density function:

• Note that we consider all points in the training set, but if a point is outside of the kernel, its weight will be 0, negating its influence

∑=

−=S

j

jSxGS

xp1

2),)((

1),( σσ

rrr

Page 40: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Parzen Windows

Page 41: Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University

Parzen Windows

• Advantages:– More robust than k-nearest neighbor– Excellent accuracy and consistency

• Disadvantages:– How to choose the size of the window?– Alone, kernel density estimation techniques

provide little insight into data or problems