svm based prioritization of cancer causing mutations in centromere protein family

24
A short introduction to Centrosomal Variants SVM based prioritization of cancer causing mutations in centromere protein family

Upload: ambuj-kumar

Post on 24-Jul-2015

157 views

Category:

Science


2 download

TRANSCRIPT

Page 1: SVM based prioritization of cancer causing mutations in centromere protein family

A short introduction to Centrosomal Variants

SVM based prioritization of cancer causing mutations in centromere protein family

Page 2: SVM based prioritization of cancer causing mutations in centromere protein family

Centromere

The  centromere  is  the  part  of  a  chromosome  that  links  sister chromatids.

During  anaphase  of  mitosis,  paired  centromeres  in  each  distinct chromosome  begin  to  move  apart  as  daughter  chromosomes migrate centromere first toward opposite ends of the cell.

It is the most condensed and constricted region of a chromosome.

It serves as the point of attachment for spindle fibers. 

Deregulation  in  the  their  activity  leads  to  several  checkpoint dissorders and pathogeneticities.

Page 3: SVM based prioritization of cancer causing mutations in centromere protein family

Mutation induced centromere dysfunctioning is linked with several human diseases

Bardet­Biedl­syndrome

Polycystic kidney disease

Lissencephaly

Primordial Dwarfism

Autosomal Primary Recessive Microcephaly

Cancer

Page 4: SVM based prioritization of cancer causing mutations in centromere protein family

Few important Centromere protein families

CEP family proteins CENP family proteins MAD family proteins hSAS family proteins CEPTIN family proteins

Page 5: SVM based prioritization of cancer causing mutations in centromere protein family

CENP-E recruitment and its activity is mediated by several other proteomic

complexes

Page 6: SVM based prioritization of cancer causing mutations in centromere protein family

Proteins selected for evaluation

CENPA, CENPB, CENPC, CENPE, CENPF, CENPH, CENPI, CENPJ, CENPK, CENPL,

CENPM, CENPN, CENPO, CENPP, CENPQ, CENPR, CENPS, CENPT, CENPU, CENPV,

CENPW, CENPX, CENPY, CENPZ

Total 823 structural variants from CENP protein family were collected for this study

Page 7: SVM based prioritization of cancer causing mutations in centromere protein family

Machine Learning: What is it all about

1. Computers are very intelligent and has greater compilaton ability.

2. It can learn everything, no matter what you give.

3. Training data must not contain any wrong values.

4. To prevent the use of spurious datas we must validate and scale the entire dataset

before starting the training session.

5. There are three different methodologies in machine learning.

a. Supervised learning methods

b. Unsupervised learning methods

c. Reinforcement learning methods

Page 8: SVM based prioritization of cancer causing mutations in centromere protein family

Supervised learning is the machine learning task of inferring a function from supervised (labeled) training data.

A supervised learning algorithm analyzes the training data and produces an inferred function.

The parallel task in human and animal psychology is often carride out by this method.

Few widely used supervised learning algorithms are:

1. Support vector machines

2. Bayesian statistics

3. Artificial neural network

4. Random Forests

5. Regression analysis

Page 9: SVM based prioritization of cancer causing mutations in centromere protein family

Support Vector Machines

A support vector machine (SVM) is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis.

Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other.

More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite dimensional space, which can be used for classification.

Here consider Đ as a training data for which,

Đ = {(xi,yi) | xi Є Rp, yi Є {1, -1}} (for i=1 to n) For training we used radial basis function kernal for greater accuarcy

(RBF): K(xi , xj) = exp(−γ ||xi − xj||^2), γ > 0.

Page 10: SVM based prioritization of cancer causing mutations in centromere protein family

Objective

To identify the cancer associated nsSNP's in CENP protein family using support vector machine 

approach

Page 11: SVM based prioritization of cancer causing mutations in centromere protein family

1. Examination of protocol

2. Application of protocol to collect datasets for training the machine

4. Application of designed classifier to identify the cancer associated mutations in CENP family proteins.

3. Designing a Support Vector Machine classifier system using machine learning algorithm

Methodology

5. Studying the dynamic behaviour of cancer associated structural variants

Page 12: SVM based prioritization of cancer causing mutations in centromere protein family

Examination of protocol was carried out on CENPE proteinExamination of protocol was carried out on CENPE protein

➔Centromere-associated protein-E (CENPE), a protein with 2701 amino acids and relative molecular weight of 312 kDa, is highly expressed in mitosis and accumulates in the cell just prior to mitosis.

➔It is required for efficient, stable microtubule capture at kinetochores.

➔It plays an essential role in integrating the mechanics of microtubule-chromosome interactions with mitotic checkpoint signaling, and has emerged as a novel target for cancer therapy.

➔It contains ATP-sensitive motor-like domain at its N-terminus that is actively involved in hydrolyzing ATP to produce directed mechanical force along microtubules.

➔Absence of CENPE reduces tension at the bi-orientated chromosomes resulting in misaligned chromosomes in the metaphase plate, leading to metaphase arrest.

➔CENPE expression was also found to be reduced in human HCC tissue, and lower expression of CENPE was found to be inducing aneuploidy in LO2 cells.

Page 13: SVM based prioritization of cancer causing mutations in centromere protein family

Prediction of oncogenic mutant in CENPE using SNP prediction tools

We first collected 100 nsSNP reported in CENPE coding gene from NCBI dbSNP database.

SIFT, Polyphen, PhDSnp, Pmut, CancPredict and Dr. Cancer tools were used to identify the cancer associated SNP from the available dataset.

We found Y63H as highly deleterious and cancer associated using above tools.

To analyse the structural consequences of this mutation we further carried out olecular dynamic simulation of CENPE native and mutant motor domain for 5 ns timescale.

Insilico X-ray scatering was carried out throughout the simulation in order to observe the change in ionic density in native and mutant structure.

Root mean square deviation was then plotted to analyze the relative fluctuation of the structures.

Page 14: SVM based prioritization of cancer causing mutations in centromere protein family

Molecular blueprint of structural variation in CENPE motor domain: Inside body environment

Native Mutant

Page 15: SVM based prioritization of cancer causing mutations in centromere protein family

Time (ps) Time (ps)

NativeNative MutantMutant

Root Mean Square FluctuationRoot Mean Square Fluctuation

Page 16: SVM based prioritization of cancer causing mutations in centromere protein family

Calculation of R208K CENPE-ATP association constantCalculation of R208K CENPE-ATP association constant

According to Debye-Huckel theory

Ҡ(reaction rate constant) œ ­U (electrostatic interaction energy)

Ҡnative            Ҡmutant

134.6        Ҡmutant

Ҡmutant                    134.6 Ҳ 3.06

Unative=

Umutant

-13.42=

-3.06

=13.42

= 30.69

CENPEnative + ATP -> CENPEnative-ATP complex; = 134.6Ҡ

CENPEmutant + ATP -> CENPEmutant-ATP complex; = 30.69Ҡ

Page 17: SVM based prioritization of cancer causing mutations in centromere protein family

0 s

Page 18: SVM based prioritization of cancer causing mutations in centromere protein family

Time (seconds) Time (seconds)

Native Mutant

CENPE-ATP

CENPE-ADP CENPE-ADP

CENPE-ATP

Time (seconds) Time (seconds)

Page 19: SVM based prioritization of cancer causing mutations in centromere protein family

Tools used to collect training data's

Row 1 Row 2 Row 3 Row 40

2

4

6

8

10

12

Column 1Column 2Column 3

Tools used to collect SNP training datas

1. SIFT, Polyphen, PhDSnp, Pmut, CancPredict and Dr. Cancer tools were used to collect the SNP datasets.

2. Cancer variant datas were obtained from Swissvar database.

3. Neutral variants were randomly picked from Swissprot database.

4. Scaling, training and model generation were carried out using support vector machine algorithm.

5. RBF kernal was used to generate the classifier model.

6. Rescaling and cross-validation was carried out by changing the Ć and γ values untill the maximum accuracy was obtained.

Page 20: SVM based prioritization of cancer causing mutations in centromere protein family
Page 21: SVM based prioritization of cancer causing mutations in centromere protein family

Model designed for neutral variantsModel designed for 100 Neutral and Cancer variants

Page 22: SVM based prioritization of cancer causing mutations in centromere protein family

References Kim Y, Holland AJ, Lan W, Cleveland DW. Aurora kinases and protein phosphatase 1

mediate chromosome congression through regulation of CENP-E. Cell. 2010 142:444-55.

Maia AF, Feijão T, Vromans MJ, Sunkel CE, Lens SM. Aurora B kinase cooperates with CENP-E to promote timely anaphase onset. Chromosoma. 2010 119:405-13.

Yang CP, Liu L, Ikui AE, Horwitz SB. The interaction between mitotic checkpoint proteins, CENP-E and BubR1, is diminished in epothilone B-resistant A549 cells. Cell Cycle. 2010 Mar 15;9(6):1207-13.

Hess B, Kutzner C, van der Spoel D, Lindahl E (2008) GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J Chem Theory Comput. 4:435–447.

Frisch C, Fersht AR, Schreiber G. Experimental assignment of the structure of the transition state for the association of barnase and barstar. J Mol Biol. 2001 308:69-77.

Page 23: SVM based prioritization of cancer causing mutations in centromere protein family

AcknowledgementAcknowledgement

J. Febin PrabhudassAsst. Prof. Seniour

School of Biosciences and TechnologyVIT Univerity

Page 24: SVM based prioritization of cancer causing mutations in centromere protein family