cz5225: modeling and simulation in biology lecture 7, microarray class classification by machine...

CZ5225: Modeling and Simulation in BiologyCZ5225: Modeling and Simulation in Biology

Lecture 7, Microarray Class Classification by Lecture 7, Microarray Class Classification by Machine learning MethodsMachine learning Methods

Prof. Chen Yu ZongProf. Chen Yu Zong

Tel: 6874-6877Tel: 6874-6877Email: Email: [email protected]@nus.edu.sg

http://bidd.nus.edu.sghttp://bidd.nus.edu.sgRoom 07-24, level 8, S16, Room 07-24, level 8, S16,

National University of SingaporeNational University of Singapore

22

Machine Learning MethodMachine Learning Method Inductive learning:

Example-based learning

Descriptor

Positive examples

Negative examples

33

Machine Learning MethodMachine Learning Method

A=(1, 1, 1)B=(0, 1, 1)C=(1, 1, 1)D=(0, 1, 1)E=(0, 0, 0)F=(1, 0, 1)

Feature vectors: Descriptor

Feature vector

Positive examples

Negative examples

44

Machine Learning MethodMachine Learning Method Feature vectors in input space:

A=(1, 1, 1)B=(0, 1, 1)C=(1, 1, 1)D=(0, 1, 1)E=(0, 0, 0)F=(1, 0, 1)

Z

Input space

X

Y

BAE

F

Feature vector

55

Vector A= (a1, a2, a3, …, aN)

Task of machine learning transformed into the job for finding of a border-Line for optimal separation of the known positive and negative samples in a training-set

Positive

Negative

Machine Learning Method

66

Patient_X= (gene_1, gene_2, gene_3, …, gene_N)

N (number of dimensions) is normally larger than 2, so we can’t visualize the data.

Cancerous

Healthy

Classifying Cancer Patients vs. Healthy Patients from Microarray

77

Classifying Cancer Patients vs. Classifying Cancer Patients vs. Healthy Patients from MicroarrayHealthy Patients from Microarray

Cancerous

Healthy

Gene_1 expression level

For simplicity, pretend that we are only looking at expression levels of 2 genes.

-5 0 5

-5

0

5

Gen

e_2

expr

essi

on le

vel

Up-regulated

Down-regulated

88


Cancerous

Healthy


Question:

How can we build a classifier for this data?

-5 0 5

-5

0

5

Gen

e_2

expr

essi

on le

vel

99


Cancerous

Healthy


Simple Classification Rule:IF gene_1 <0 AND gene_2 <0THEN person=healthy

IF gene_1 >0 AND gene_2 >0THEN person=cancerous

-5 0 5

-5

0

5

Gen

e_2

expr

essi

on le

vel

1010


Simple Classification Rule:IF gene_1 <0 AND gene_2 <0 AND

… gene 5000 < Y

THEN person=healthy

IF gene_1 >0 AND gene_2 >0

… gene 5000 >WTHEN person=cancerous

If we move away from our simple example with 2 genes to a realistic case with say 5000 genes, then

1. What will these rules look like?

2. How will we find them?

Gets a little complicated, unwieldy…

1111


Cancerous

Healthy

Gene_1 expression level-5 0 5

-5

0

5

Gen

e_2

expr

essi

on le

vel

Reformulate the previous rule

SIMPLE RULE:

•If data point lies to the ‘left’ of the line, then ‘healthy’.

•If data point lies to ‘right’ of line then ‘cancerous’

It is easier to generalize this line to 5000 genes than it is a list of rules. Also easier to solve mathematically.

1212

Extension to More Than 2 Genes (dimensions)Extension to More Than 2 Genes (dimensions)

Cancerous

Healthy

-5

0 5-5

0

5

•Line in 2D: x1C1 + x2C2 = T

•If we had 3 genes, and needed to build a ‘line’ in 3-dimensional space, then we would be seeking a plane.

Plane in 3D: x1C1 + x2C2 + x3C3 = T

•If we were looking in more than 3 dimensions, the ‘plane’ is called a hyperplane. A hyperplane is simply a generalization of a plane to dimensions higher than 3.

Hyperplane in N-dimensions: x1C1 + x2C2 + x3C3 + … + xNCN = T

1313

Classification Methods (1)Classification Methods (1)

1414


1515


1616


1717


1818


1919


2020


K Nearest Neighbor Method

2121


2222


2323


2424

Classification Methods (5) Classification Methods (5) SVM SVM

What is SVM? • Support vector machines, a machine learning method,

learning by examples, statistical learning, classify objects into one of the two classes.

Advantages of SVM: • Diversity of class members (no racial discrimination). • Low over-fitting risk • Easier to find “optimal” parameters for better class

differentiation performance

2525

Classification Methods (5)Classification Methods (5)SVM MethodSVM Method

BorderNew border

Project to a higher dimensional space

Protein familymembers

Nonmembers


Nonmembers

2626

Classification Methods (5)Classification Methods (5)SVM methodSVM method

Support vector

Support vector

New border


Nonmembers

2727

What is a good Decision Boundary?What is a good Decision Boundary?

• Consider a two-class, linearly separable classification problem

• Many decision boundaries!– The Perceptron algorithm

can be used to find such a boundary

– Different algorithms have been proposed

• Are all decision boundaries equally good?

Class 1

Class 2

2828

Examples of Bad Decision BoundariesExamples of Bad Decision Boundaries

Class 1

Class 2

Class 1

Class 2

2929

Large-margin Decision BoundaryLarge-margin Decision Boundary• The decision boundary should be as far away from the data

of both classes as possible– We should maximize the margin, m– Distance between the origin and the line wtx=k is k/||w||

Class 1

Class 2

m

3030

SVM MethodSVM Method


Nonmembers

New border

Support vector

Support vector

3131

SVM MethodSVM Method

Border line is nonlinear

3232

SVM methodSVM method

Non-linear transformation: use of kernel function

3333

SVM methodSVM method

Non-linear transformation

3434

kxw

0xw

*0x

*x

w

kxx

w

*0

*

between distance euclidean The

vector normal itsby defined is hyperplaneA

ixwy

w

xwy

ii

ii

xw i

all for

to subject

separated being

classes to subject point closest the and

hyperplane the between distance the Maximize

)(

)(minmax . :classes between separation or margin The

w2

1

Mathematical Algorithm of SVM

3535

.hyperplane the for term offset an is

all for

to subject

is nformulatio equivalent An

b

ibxwy

w

ii 1)(

min2

21

.

)(

.01)(

min 22

21

all for and

to subject

separable not are data the When

xfsigny

bxwxf

ibxwy

Cw

iiii

ii

kxw

0xw

*0x

*x

i


3636

.1

01)(

min 22

21

iii

iiii

i

xfy

ibxwy

Cw

i

all for and

to subject

1

C1 22))(1(min Pfxfy ii

iFf

Empirical error Complexity tradeoff


3737

)(xx

nmmn

φ

φ

where :

Map data to higher dimensional space, feature space

and

all for and

to subject

.)()(

.01))((

min 22

21

bxwxf

ibxwy

Cw

iiii

iK i

φ

φ

Construct linear classifier in this space

).()(),(),()( yxyxKbxxKαyxfi

iii φφ where.

Which can be written as

Mathematical Algorithm of SVM Nonlinear decision boundaries

3838

Mathematical Algorithm of SVMMathematical Algorithm of SVM

3939

SVM Performance MeasureSVM Performance Measure

4040


4141


4242


• Sensitivity P+ =TP/(TP+FN) accuracy for positive samples

• Specificity P- =TN/(TN+FP) accuracy for negative samples

• Overall prediction accuracy

• Matthews correlation coefficient

FNFPTNTP

TNTPQ

))()()((

**

FPTNFNTNFPTPFNTP

FPFNTNTPC

))()()((

**

FPTNFNTNFPTPFNTP

FPFNTNTPC

4343

Why SVM Works?Why SVM Works?• The feature space is often very high dimensional. Why don’t we have the curse

of dimensionality?

• A classifier in a high-dimensional space has many parameters and is hard to estimate

• Vapnik argues that the fundamental problem is not the number of parameters to be estimated. Rather, the problem is about the flexibility of a classifier

• Typically, a classifier with many parameters is very flexible, but there are also exceptions

– Let xi=10i where i ranges from 1 to n. The classifier

can classify all xi correctly for all possible combination of class labels on xi

– This 1-parameter classifier is very flexible

4444

Why SVM works?Why SVM works?

• Vapnik argues that the flexibility of a classifier should not be characterized by the number of parameters, but by the flexibility (capacity) of a classifier– This is formalized by the “VC-dimension” of a classifier

• Consider a linear classifier in two-dimensional space• If we have three training data points, no matter how those points are

labeled, we can classify them perfectly

4545

VC-dimensionVC-dimension• However, if we have four points, we can find a labeling

such that the linear classifier fails to be perfect

• We can see that 3 is the critical number• The VC-dimension of a linear classifier in a 2D space is

3 because, if we have 3 points in the training set, perfect classification is always possible irrespective of the labeling, whereas for 4 points, perfect classification can be impossible

4646

VC-dimensionVC-dimension

• The VC-dimension of the nearest neighbor classifier is infinity, because no matter how many points you have, you get perfect classification on training data

• The higher the VC-dimension, the more flexible a classifier is

• VC-dimension, however, is a theoretical concept; the VC-dimension of most classifiers, in practice, is difficult to be computed exactly– Qualitatively, if we think a classifier is flexible, it

probably has a high VC-dimension

cz5225: modeling and simulation in biology lecture 7, microarray class classification by machine...

Documents