gene expression and regulation machine learning tools to

33
Avantika Lal, 07/15/2021 Machine Learning Tools to Analyze Gene Expression and Regulation

Upload: others

Post on 15-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gene Expression and Regulation Machine Learning Tools to

Avantika Lal, 07/15/2021

Machine Learning Tools to Analyze Gene Expression and Regulation

Page 2: Gene Expression and Regulation Machine Learning Tools to

2

Outline• GPU-accelerated genomic analysis at NVIDIA

• Gene expression and multimodal data integration

• Single-cell sequencing

• Variational autoencoders

• Predicting gene expression from sequence

• Deep learning to improve data quality

Page 3: Gene Expression and Regulation Machine Learning Tools to

3

GPU-accelerated genomics at NVIDIA

Page 4: Gene Expression and Regulation Machine Learning Tools to

4

Parabricks v3.6 Release- July 2021

WGS Pipeline for 30x Human Genome in 22 Minutes on an DGX A100

Update referenced mapped data / re-analysis of legacy data▪ > 10x acceleration of BAM2FastQ

More Comprehensive Somatic Calling▪ LoFreq addition, expanding to 4 callers▪ Vote-based merge tool for fast variant filtering (e.g. allele frequency)▪ VCF Annotation Tool for better quality variants (+ BAM QC)

De novo germline mutation pipeline▪ Supports trio sequencing and uses Google’s DeepVariant1.0

Two Structural Variant Callers▪ Manta and Smoove (Lumpy)

Free 90 day trial license https://www.nvidia.com/en-us/docs/nvidia-parabricks-general/

Page 5: Gene Expression and Regulation Machine Learning Tools to

5

Gene expression profiling and multi-omic integration

Gen

es

ConditionsGene programs

Gen

e pr

ogra

ms

x

Gen

es

Conditions

Page 6: Gene Expression and Regulation Machine Learning Tools to

6

Gene expression profiling and multi-omic integration

Meyer, C., Liu, X. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat Rev Genet 15, 709–721 (2014).

Page 7: Gene Expression and Regulation Machine Learning Tools to

7

Multi-omic data integration

M. R. Corces et al., Science 362, eaav1898 (2018). DOI: 10.1126/science.aav1898

Page 8: Gene Expression and Regulation Machine Learning Tools to

8

Single-cell sequencing

RNA

DNA

Methy

lation ATAC

ChIP

Protein

DR-seqscTrio-seqTARGET-seq

CITE-seqREAP-seqRAIDECCITE-seq

scMT-seqscM&T-seqscTrio-seqscNMT-seq

sci-CARSNARE-seqscNMT-seqSHARE-seq

Paired-Tag

Page 9: Gene Expression and Regulation Machine Learning Tools to

9

Single-cell data analysis

Count matrix

Gene 1 ……. Gene k

Barc

ode

1…Ba

rcod

e n

Dimension reduction,

visualization and

clustering

Cell type A

Cell type B

Single-cell separation Sequencing Read mappingBarcoded reads

Page 10: Gene Expression and Regulation Machine Learning Tools to

10

Analytical challenges in single-cell data

Doublets & multiplets

Library size

Batch effects

Noise

Dropout

Dimension reduction, clustering and differential expression

A

B

A+B

CountFr

eque

ncy

0

1

2

3

Batch

Page 11: Gene Expression and Regulation Machine Learning Tools to

11

Variational Autoencoder (VAE)

μ1

μ2

σ1

σ2

z1

z2

Encoder Decoder

Mean

Variance

Define distributions

Sample from distributions

Inpu

t

Reconstructed Input

Page 12: Gene Expression and Regulation Machine Learning Tools to

12

scVI

Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nature methods. 2018 Dec;15(12):1053-8.

Page 13: Gene Expression and Regulation Machine Learning Tools to

13

Cell type annotation

Xu C, Lopez R, Mehlman E, Regier J, Jordan MI, Yosef N. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Molecular systems biology. 2021 Jan;17(1):e9620.

Page 14: Gene Expression and Regulation Machine Learning Tools to

14

Doublet identification

Bernstein NJ, Fong NL, Lam I, Roy MA, Hendrickson DG, Kelley DR. Solo: doublet identification in single-cell RNA-Seq via semi-supervised deep learning. Cell Systems. 2020 Jul 22;11(1):95-101.

Page 15: Gene Expression and Regulation Machine Learning Tools to

15

Bulk deconvolution

Menden K, Marouf M, Oller S, Dalmia A, Magruder DS, Kloiber K, Heutink P, Bonn S. Deep learning–based cell composition analysis from tissue expression profiles. Science advances. 2020 Jul 1;6(30):eaba2619.

Page 16: Gene Expression and Regulation Machine Learning Tools to

16

VAEs for other single-cell modalities

Ashuach T, Reidenbach DA, Gayoso A, Yosef N. PeakVI: A Deep Generative Model for Single Cell Chromatin Accessibility Analysis. bioRxiv. 2021 Jan 1.

Page 17: Gene Expression and Regulation Machine Learning Tools to

17

Modeling cell-type specific profiles

Given DNA sequence, can we predict gene expression and/or functional profiles in different cell types?

Based on these learned rules, can we predict the cell-type specific effect of sequence variation and mutations?

Page 18: Gene Expression and Regulation Machine Learning Tools to

18

Convolutional neural networks

https://commons.wikimedia.org/wiki/File:ConvolutionAndPooling.svg

Page 19: Gene Expression and Regulation Machine Learning Tools to

19

CNNs for genomics

Sequence input

Coverage track input

Convolutional/pooling layers Other layers

Genomic positionACGT

Convolutional filters

Prediction

Page 20: Gene Expression and Regulation Machine Learning Tools to

20

Basenji

Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome research. 2018 May 1;28(5):739-50.

Page 21: Gene Expression and Regulation Machine Learning Tools to

21

Predicting the impact of genetic variation

Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome research. 2018 May 1;28(5):739-50.

Page 22: Gene Expression and Regulation Machine Learning Tools to

22

Predicting somatic mutation effects

Atak ZK, Taskiran II, Demeulemeester J, Flerin C, Mauduit D, Minnoye L, Hulselmans G, Christiaens V, Ghanem GE, Wouters J, Aerts S. Interpretation of allele-specific chromatin accessibility using cell state–aware deep learning. Genome research. 2021 Jun 1;31(6):1082-96.

Minnoye L, Taskiran II, Mauduit D, Fazio M, Van Aerschot L, Hulselmans G, Christiaens V, Makhzami S, Seltenhammer M, Karras P, Primot A. Cross-species analysis of enhancer logic using deep learning. Genome research. 2020 Dec 1;30(12):1815-34.

Page 23: Gene Expression and Regulation Machine Learning Tools to

23

Epigenomic data quality

Low sequencing depth Sample/experimental factors Low aggregate cell count

Fresh tissue

Flash-frozen

50 million reads

1 million reads

23

Page 24: Gene Expression and Regulation Machine Learning Tools to

24

AtacWorks

Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).

Page 25: Gene Expression and Regulation Machine Learning Tools to

25

Training models to enhance low-coverage ATAC-seq

Clean signal + peak calls(50 million reads)

Noisy signal(1 million reads)

Page 26: Gene Expression and Regulation Machine Learning Tools to

26

Denoising and peak calling from low-coverage ATAC-seq

Page 27: Gene Expression and Regulation Machine Learning Tools to

27

Rescuing low-quality data

High quality

Low quality

Low quality + AtacWorks

Bulk ATAC-seq data from human Erythroblasts

Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).

Page 28: Gene Expression and Regulation Machine Learning Tools to

28

Applying AtacWorks to scATAC-seq

Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).

Page 29: Gene Expression and Regulation Machine Learning Tools to

29

Increasing single-cell resolution

AtacWorks can obtain the same quality results from ~10x fewer cells, increasing the resolution of single-cell chromatin accessibility profiling by an order of magnitude.

Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).

Page 30: Gene Expression and Regulation Machine Learning Tools to

30 Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).

Identifying regulatory DNA associated with lineage priming

Page 31: Gene Expression and Regulation Machine Learning Tools to

31

Identifying regulatory DNA associated with lineage priming

Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).

Page 32: Gene Expression and Regulation Machine Learning Tools to

32

Regulatory modeling with deep learning

Cell / tissue-specific

data generation

Cell type - specific profiles

Data enhancement /

imputation

Train cell-type specific regulatory

models

Predict cell-type specific variant

effects…..AACGCTCCGTAGG….

Cell type A

Cell type B

Cell type A

Cell type B

…..AACGCTTCGTAGG…

Page 33: Gene Expression and Regulation Machine Learning Tools to