the chicken project dimension reduction-based penalized logistic regression for cancer...

19
The Chicken Project Dimension Reduction-Ba sed Penalized logistic Regression for cancer classification Using M icroarray Data By L. Shen and E.C. Ta n Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI California State University at Los Angeles

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

The Chicken Project

Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Mi

croarray Data By L. Shen and E.C. Tan

Name of student: Kung-Hua Chang

Date: July 8, 2005

SoCalBSI

California State University at Los Angeles

Page 2: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Background

Microarray data have the characteristics that the number of samples ismuch less than the number of variables.

This causes the “curse of dimensionality” problem.

In order to solve this problem, many dimension reduction methods are used such as Singular Value Decomposition and Partial Least Squares.

Page 3: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Background (cont’d)

Singular Value Decomposition and Partial Least Squares.

Given a m x n matrix X that stores all of the gene expression data. Then X can be approximated as:

Page 4: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Background (cont’d)

Page 5: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Background (cont’d)

Logistic regression and least square regression.

They are ways to draw a line that can approximate a set of points.

Page 6: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Background (cont’d)

The difference is that logistic regression equations are solved iteratively. A trial equation is fitted and tweaked over and over in order to improve the fit. Iterations stop when the improvement from one step to the next is suitably small.

Least square regression can be solved explicitly.

Page 7: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Background (cont’d)

Penalized logistic regression is just a logistic regression method except that there is a cost function associated with it.

Page 8: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Background (cont’d)

Support Vector Machine (SVM) SVM tries a find a hyper-plane that can

separate different sets of data. Not a linear model.

Page 9: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Hypothesis

The combination of dimension reduction-based penalized logistic regression has the best performance compared to support vector machine and least squares regression.

Page 10: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Data Analysis

The above table shows the number of training/testing cases in the seven publicly available cancer data sets.

Page 11: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Data Analysis (cont’d)

Page 12: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Data Analysis (cont’d)

Page 13: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Data Analysis

Page 14: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Data Analysis

Generally, the partial least square based classifier uses less time than the singular value decomposition based classifier.

Page 15: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Data Analysis (cont’d)

The penalized logistic regression training requires solving a set of linear equations iteratively until convergence, while the least square regression training requires solving a set of linear equations only once. So it’s reasonable to see that penalized logistic regression uses more time than the least square regression.

Page 16: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Data Analysis (cont’d)

The overall time required by partial least squares and SVD-based regression method is much less than that of support vector machine.

Page 17: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Data Analysis

Page 18: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

Conclusion

The combination of dimension reduction

based penalized logistic regression has the

best performance compared to support

vector machine and least squares

regression.

Page 19: The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name

References

[1] L. Shen and E.C. Tan (to appear in June, 2005) "Dimension Reduction-Based Penalized Logistic Regression for Cancer Classification Using Microarray Data", IEEE/ACM Trans. Computatio

nal Biology and Bioinformatics

[2] SoCalBSI: http://instructional1.calstatela.edu/jmomand2/

[3] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning; Data mining, Inference and Prediction. Spring

er Verlag, New York, 2001.