the chicken project dimension reduction-based penalized logistic regression for cancer...

Post on 21-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Chicken Project

Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Mi

croarray Data By L. Shen and E.C. Tan

Name of student: Kung-Hua Chang

Date: July 8, 2005

SoCalBSI

California State University at Los Angeles

Background

Microarray data have the characteristics that the number of samples ismuch less than the number of variables.

This causes the “curse of dimensionality” problem.

In order to solve this problem, many dimension reduction methods are used such as Singular Value Decomposition and Partial Least Squares.

Background (cont’d)

Singular Value Decomposition and Partial Least Squares.

Given a m x n matrix X that stores all of the gene expression data. Then X can be approximated as:

Background (cont’d)

Background (cont’d)

Logistic regression and least square regression.

They are ways to draw a line that can approximate a set of points.

Background (cont’d)

The difference is that logistic regression equations are solved iteratively. A trial equation is fitted and tweaked over and over in order to improve the fit. Iterations stop when the improvement from one step to the next is suitably small.

Least square regression can be solved explicitly.

Background (cont’d)

Penalized logistic regression is just a logistic regression method except that there is a cost function associated with it.

Background (cont’d)

Support Vector Machine (SVM) SVM tries a find a hyper-plane that can

separate different sets of data. Not a linear model.

Hypothesis

The combination of dimension reduction-based penalized logistic regression has the best performance compared to support vector machine and least squares regression.

Data Analysis

The above table shows the number of training/testing cases in the seven publicly available cancer data sets.

Data Analysis (cont’d)

Data Analysis (cont’d)

Data Analysis

Data Analysis

Generally, the partial least square based classifier uses less time than the singular value decomposition based classifier.

Data Analysis (cont’d)

The penalized logistic regression training requires solving a set of linear equations iteratively until convergence, while the least square regression training requires solving a set of linear equations only once. So it’s reasonable to see that penalized logistic regression uses more time than the least square regression.

Data Analysis (cont’d)

The overall time required by partial least squares and SVD-based regression method is much less than that of support vector machine.

Data Analysis

Conclusion

The combination of dimension reduction

based penalized logistic regression has the

best performance compared to support

vector machine and least squares

regression.

References

[1] L. Shen and E.C. Tan (to appear in June, 2005) "Dimension Reduction-Based Penalized Logistic Regression for Cancer Classification Using Microarray Data", IEEE/ACM Trans. Computatio

nal Biology and Bioinformatics

[2] SoCalBSI: http://instructional1.calstatela.edu/jmomand2/

[3] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning; Data mining, Inference and Prediction. Spring

er Verlag, New York, 2001.

top related