genomic signal processing: ensemble dependence model for classification and prediction of cancer...

Post on 20-Dec-2015

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Genomic Signal Processing:Ensemble Dependence Model for

Classification and Prediction of Cancer Based on Gene Expression Data

Joseph DePasquale

Engineering Frontiers

26 Apr 07

Overview

• Motivation

• Background– Genes, Cancer, DNA Microarrays

• Ensemble Dependence Model– Basic structure– Inclusion in a classification system

• Results

• Conclusions

Motivation

• Estimated 1.4 million new cases of cancer– Roughly 550,000 will die from their disease

• In New Jersey 43,910 new cases – 17,720 deaths

• In 2005, NIH estimates that the overall cost for cancer → 210 billion dollars

Background

• What is cancer?– Uncontrolled division of damaged cells

• Apoptosis

– Risk increases with age

• Cause of unregulated cell growth

Background

• What is a gene?– Components– Functionality

• What is the importance of protein?– Essential to all living things– Participate in all functions within cells

• What is the significance of gene products?

DNA Microarrays

• Expression profiling– Represents the simultaneous activity of

thousands of individual genes

• Publicly available data– Complexity has led to a need for the

standardization of experimental setup• MIAME• MAQC

Taken from: http://en.wikipedia.org/wiki/DNA_microarray

Ensemble Dependence Model

• Genes with similar expression profiles are combined together into clusters– Expression profile of each cluster is the

average profile of all genes in that cluster

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

Ensemble Dependence Model

4

3

2

1

4

3

2

1

434241

343231

242321

141312

4

3

2

1

*

0

0

0

0

n

n

n

n

x

x

x

x

aaa

aaa

aaa

aaa

x

x

x

x

NAXX

Ensemble Dependence Model

• Model-driven method– Feature selection

• Not all genes are relevant• T-test

– Gene clustering• Number of clusters• Gaussian mixture model

– Model learning/classification• Dependence matrices generated for two cases

Classification

• Maximum likelihood rule– Binary hypothesis-testing problem– Tests fit of unknown samples to each model

)(*)(5.0|)|)2log((5.0)|Pr( 11 CCC

TCCC

k MXAXVMXAXVHX

Normal Case:

Cancer Case:

)(*)(5.0|)|)2log((5.0)|Pr( 10 NNN

TNNN

k MXAXVMXAXVHX

EDM-Based Cancer Classification

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

Results

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

ResultsHere, 200 different subsets of gastric data are used to calculate 200

different dependence matrices, eigenvalues of these matrices are plotted

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

NAXX

Results

Eigenvalues = {1, 1, 1, -3} NAXX

01

01

010

3

2

3

1

3

2

3

2

1

2

1

3

1

2

1

321

idealA

Results

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

In Summary

Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

Conclusions

• EDM is a model-based system that is used for cancer classification and prediction based on publicly available gene expression data– Dependence of clusters to other clusters

• Classification results are comparable with widely accepted ML algorithm

• Eigenvalues of dependence matrix could be a valuable cancer prediction tool

References[1] P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification

and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.

[2] P. Qui, Z. J. Wang, and K.J.R. Liu. “Ensemble dependence model for classification and prediction of cancer and normal gene expression data,” Bioinformatics, vol. 21, no. 14, pp. 3114-3121, May 2005.

[3] D. Anastassiou. “Genomic Signal Processing,” IEEE Signal Processing Magazine, vol. 18, no. 4, pp. 8-20, July 2001.

[4] J. Astola, I. Tabus, I. Shmelevich, and, E. Dougherty. “Genomic Signal Processing,” Signal Processing (Elsevier), vol. 83, pp. 691-694, 2003.

[5] American Cancer Society. “Cancer Facts and Figures 2006,” ACS :: Statistics for 2006 [Online]. Available: http://www.cancer.org/downloads/STT/CAFF2006PWSecured.pdf

[6] http://en.wikipedia.org/wiki/Gene[7] http://en.wikipedia.org/wiki/Gene_expression[8] http://en.wikipedia.org/wiki/Protein[9] http://en.wikipedia.org/wiki/DNA_microarray[10] M. Karnick. “Genomic Signal Processing,” Engineering Frontiers, The presentation

directly previous to mine, Apr 2007.

top related