predicting cancer phenotypes from somatic genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf ·...

19
Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer Yifeng Tao 1 , Chunhui Cai 2 , William W. Cohen 1,* , Xinghua Lu 2,3,* 1 School of Computer Science, Carnegie Mellon University 2 Department of Biomedical Informatics, University of Pittsburgh 3 Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh 1

Upload: others

Post on 19-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact TransformerYifeng Tao1, Chunhui Cai2, William W. Cohen1,*, Xinghua Lu2,3,*

1School of Computer Science, Carnegie Mellon University2Department of Biomedical Informatics, University of Pittsburgh

3Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh

1

Page 2: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Tumor origin and progression

• Cancers are mainly caused by somatic genomic alterations (SGAs)• Driver SGAs (~10s/tumor): Promote tumor progression• Passenger SGAs (~100s/tumor): Neutral mutations• How to distinguish drivers from passengers?

2

S Nik-Zainal et al. 2017

Page 3: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Cancer drivers

• How to distinguish drivers from passengers?• Frequency: recurrent mutations more likely to be drivers

• Conserved domain: protein function significantly disturbed

• All unsupervised. But drivers are defined as mutations that promote to tumor development…

3

B Vogelstein et al. 2013ND Dees et al. 2012MS Lawrence et al. 2013

B Reva et al. 2011 B Niu et al. 2016

Page 4: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Cancer drivers

• Identify driver SGAs with supervision of downstream phenotypes

• Change of RNA expression• Differentially expressed

genes (DEGs)

• Candidate models• Bayesian model (C Cai et al. 2019)

• Lasso/Elastic net (R Tibshirani1994)

• Multi-layer perceptrons(MLPs) (F Rosenblatt 1958)

• Models do prediction & driver detection?

4

Model (?) thatpredicts DEGs accurately& identifies driver SGAs

Page 5: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Self-attention mechanism

• Models do prediction & driver detection?

• Attention mechanism• Initially in CV (K Xu et al.

2015)/NLP (A Vaswani et al. 2017)

• Better interpretability• Improves performance

• Self-attention mechanism (Z Yang et al. 2016)

• Contextual deep learning framework: weights determined by all the input mutations

5

Model with self-attention thatpredicts DEGs accurately& identifies driver SGAs

𝛼" 𝛼# 𝛼$ 𝛼% 𝛼& 𝛼'𝛼( = 1

Page 6: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Genomic impact transformer (GIT)

• Transformer: encoder-decoder architecture• Encoder: self-attention mechanism; Decoder: MLP

6

Page 7: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Encoder: Multi-head self-attention

• Tumor embedding is the weighted sum of gene embeddings:

• Weights determined by input gene embeddings:

7

Page 8: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Pre-training gene embedding: Gene2Vec

• Co-occurrence pattern (e.g., mutually exclusive alterations)

8

g

c

Pathway 1

Pathway 2

Pathway 3

MD Leiserson et al. 2015 T Mikolov et al. 2013

Page 9: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Improved performance in predicting DEGs

• Predicting DEGs from SGAs• Conventional models• Ablation studies

9

51

53

55

57

59

61

63

F1 s

core

73

74

75

76

77

78

79

Accu

racy

Page 10: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Candidate drivers via attention mechanism

10

Page 11: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Gene embedding space

• Functionally similar genes are close in gene embedding space• Qualitatively and quantitatively (i.e., GO enrichment, NN accuracy)

11

Page 12: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Tumor embedding: Survival analysis

• Tumor embeddings reveal distinct survival profiles

12

Page 13: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Tumor embedding: Drug response

• Tumor embeddings are predictive of drug response

13

Page 14: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Conclusions and future work

• Biologically inspired neural network framework• Identifying cancer drivers with supervision of DEGs• Accurate prediction of DEGs from mutations

• Side products• Gene embedding: informative of gene functions• Tumor embedding: transferable to other phenotype prediction tasks

• Code and pretrained gene embedding:https://github.com/yifengtao/genome-transformer

• Future work• Fine-grained embedding representation in codon level• Tumor evolutionary features, e.g., hypermutability, intra-tumor

heterogeneity

14

Page 15: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Acknowledgments

• Dr. Xinghua Lu• Dr. William W. Cohen• Dr. Chunhui Cai• Michael Q. Ding• Yifan Xue

15

Page 16: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Quantitative measurement of gene embeddings

• Functional similar genes à closer in embedding space• Go enrichment:

• NN accuracy:

16

4

5

6

7

8

9

10

11

Random pairs Gene2Vec Gene2Vec+GIT

NN

acc

uray

(%)

Page 17: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Tumor embedding space

17

Page 18: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Gene2Vec algorithm

18

Page 19: Predicting Cancer Phenotypes from Somatic Genomic ...yifengt/paper/tao2020git-slides-v1.0.pdf · Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Gene2Vec: Co-occurrence patterns

• Co-occurrence does not necessarily mean similar embeddings• Ex 1: two cats sit there .• Ex 2: two cats stand there .• Ex 3: two dogs sit there .

19

Pathway 1:number

Pathway 2:noun

Pathway 3:verb

MD Leiserson et al. 2015 T Mikolov et al. 2013

onetwo

several

cat

dog

stand

sit

lie