multiple examples of tumor tissue (public data from whitehead/mit) svm classification of multiple...

17
Multiple Examples of tumor tissue (public data from Whitehead/MIT) SVM Classification of Multiple Tumor Types DNA Microarray Data Oracle Data Mining Actual\Predicted BR PR LU CO LY BL ML UT LE RE PA OV MS BR BREAST-BR 1 1 PROSTATE-PR 1 1 LUNG-LU 1 2 COLON-CO 3 LYMPHOM A-LY 6 BLADDER-BL 1 2 M ELANOM A-M L 1 1 UTERUS-UT 2 LEUKEMIA-LE 1 5 RENAL-RE 3 PANCREAS-PA 1 2 OVARY-OV 1 2 M ESOTHELIOM A- MS 3 BRAIN-BR 4 78.25% accuracy Green=Correct Red=Errors We feed multiple cancer types data into the Oracle DB: 16,063 genes, 144 cancer patients. We mine the data using Support Vector Machines and create the confusion matrix

Upload: domenic-sanders

Post on 28-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Multiple Examples of tumor tissue (public data from Whitehead/MIT)

SVM Classification of Multiple Tumor Types

DNA Microarray Data

Oracle Data Mining

Actual\Predicted BR PR LU CO LY BL ML UT LE RE PA OV MS BR

BREAST-BR 1 1 PROSTATE-PR 1 1 LUNG-LU 1 2 COLON-CO 3 LYMPHOMA-LY 6 BLADDER-BL 1 2 MELANOMA-ML 1 1 UTERUS-UT 2 LEUKEMIA-LE 1 5 RENAL-RE 3 PANCREAS-PA 1 2 OVARY-OV 1 2 MESOTHELIOMA-MS

3

BRAIN-BR 4

78.25% accuracy

Green=Correct Red=Errors

We feed multiple cancer types data into the Oracle DB: 16,063 genes, 144 cancer

patients.

We mine the data using Support Vector Machines and create the confusion matrix

SVM Classification of Multiple Tumor Types

Actual\Predicted BR PR LU CO LY BL ML UT LE RE PA OV MS BR

BREAST-BR 1 1 PROSTATE-PR 1 1 LUNG-LU 1 2 COLON-CO 3 LYMPHOMA-LY 6 BLADDER-BL 1 2 MELANOMA-ML 1 1 UTERUS-UT 2 LEUKEMIA-LE 1 5 RENAL-RE 3 PANCREAS-PA 1 2 OVARY-OV 1 2 MESOTHELIOMA-MS

3

BRAIN-BR 4

78.25% accuracy

Green=Correct Red=Errors

Oracle Data Mining’s SVM models are able to accurately predict the multi-class tumor problem with

78.25% accuracy.

Identify Biomarkers for DLBC Lymphoma Treatment Outcome

Attribute Importance identifies genes correlated with Lymphoma cancer.

Find a Cure for Lymphoma

Literature search on Lymphoma Set up a project workspace Set up a meeting Check lab protocols Store cell histology images Analyze gene expression results Study the markers Find a lead

Study the Markers

Statistical analysis Protein sequence analysis (Swissprot) BLAST Search Protein secondary structure study Search of genes and genetic disorders (OMIM) Pathway modeling

Data Analysis with JDeveloper

Data Analysis with JDeveloper

PKC Distribution Difference

Statistical Analysis

Create an External Table to read data from lymphoma.txt.

Statistical Analysis

Calculate Mean and Standard Deviation

The t-test shows that the PKC expression levels in cured and fatal patients are significantly different.

Protein sequence analysis Load SwissProt into Oracle XML DB

Load SwissProt into XML DB to learn more about expressed genes of interest

Load SwissProt into XML DB

FTP SwissProt data and schema into Oracle XML DB

Load SwissProt into XML DB

Access XML schema using XML Spy (XML editor) which connects to the database using WebDAV

Load SwissProt into XML DB

Register the XML Schema

Once schema is registered, XML DB automatically generates tables

Describe the Table Generated