gene expression and regulation machine learning tools to
TRANSCRIPT
![Page 1: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/1.jpg)
Avantika Lal, 07/15/2021
Machine Learning Tools to Analyze Gene Expression and Regulation
![Page 2: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/2.jpg)
2
Outline• GPU-accelerated genomic analysis at NVIDIA
• Gene expression and multimodal data integration
• Single-cell sequencing
• Variational autoencoders
• Predicting gene expression from sequence
• Deep learning to improve data quality
![Page 3: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/3.jpg)
3
GPU-accelerated genomics at NVIDIA
![Page 4: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/4.jpg)
4
Parabricks v3.6 Release- July 2021
WGS Pipeline for 30x Human Genome in 22 Minutes on an DGX A100
Update referenced mapped data / re-analysis of legacy data▪ > 10x acceleration of BAM2FastQ
More Comprehensive Somatic Calling▪ LoFreq addition, expanding to 4 callers▪ Vote-based merge tool for fast variant filtering (e.g. allele frequency)▪ VCF Annotation Tool for better quality variants (+ BAM QC)
De novo germline mutation pipeline▪ Supports trio sequencing and uses Google’s DeepVariant1.0
Two Structural Variant Callers▪ Manta and Smoove (Lumpy)
Free 90 day trial license https://www.nvidia.com/en-us/docs/nvidia-parabricks-general/
![Page 5: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/5.jpg)
5
Gene expression profiling and multi-omic integration
Gen
es
ConditionsGene programs
Gen
e pr
ogra
ms
≅
x
Gen
es
Conditions
![Page 6: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/6.jpg)
6
Gene expression profiling and multi-omic integration
Meyer, C., Liu, X. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat Rev Genet 15, 709–721 (2014).
![Page 7: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/7.jpg)
7
Multi-omic data integration
M. R. Corces et al., Science 362, eaav1898 (2018). DOI: 10.1126/science.aav1898
![Page 8: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/8.jpg)
8
Single-cell sequencing
RNA
DNA
Methy
lation ATAC
ChIP
Protein
DR-seqscTrio-seqTARGET-seq
CITE-seqREAP-seqRAIDECCITE-seq
scMT-seqscM&T-seqscTrio-seqscNMT-seq
sci-CARSNARE-seqscNMT-seqSHARE-seq
Paired-Tag
![Page 9: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/9.jpg)
9
Single-cell data analysis
Count matrix
Gene 1 ……. Gene k
Barc
ode
1…Ba
rcod
e n
Dimension reduction,
visualization and
clustering
Cell type A
Cell type B
Single-cell separation Sequencing Read mappingBarcoded reads
![Page 10: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/10.jpg)
10
Analytical challenges in single-cell data
Doublets & multiplets
Library size
Batch effects
Noise
Dropout
Dimension reduction, clustering and differential expression
A
B
A+B
CountFr
eque
ncy
0
1
2
3
Batch
![Page 11: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/11.jpg)
11
Variational Autoencoder (VAE)
μ1
μ2
σ1
σ2
z1
z2
Encoder Decoder
Mean
Variance
Define distributions
Sample from distributions
Inpu
t
Reconstructed Input
![Page 12: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/12.jpg)
12
scVI
Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nature methods. 2018 Dec;15(12):1053-8.
![Page 13: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/13.jpg)
13
Cell type annotation
Xu C, Lopez R, Mehlman E, Regier J, Jordan MI, Yosef N. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Molecular systems biology. 2021 Jan;17(1):e9620.
![Page 14: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/14.jpg)
14
Doublet identification
Bernstein NJ, Fong NL, Lam I, Roy MA, Hendrickson DG, Kelley DR. Solo: doublet identification in single-cell RNA-Seq via semi-supervised deep learning. Cell Systems. 2020 Jul 22;11(1):95-101.
![Page 15: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/15.jpg)
15
Bulk deconvolution
Menden K, Marouf M, Oller S, Dalmia A, Magruder DS, Kloiber K, Heutink P, Bonn S. Deep learning–based cell composition analysis from tissue expression profiles. Science advances. 2020 Jul 1;6(30):eaba2619.
![Page 16: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/16.jpg)
16
VAEs for other single-cell modalities
Ashuach T, Reidenbach DA, Gayoso A, Yosef N. PeakVI: A Deep Generative Model for Single Cell Chromatin Accessibility Analysis. bioRxiv. 2021 Jan 1.
![Page 17: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/17.jpg)
17
Modeling cell-type specific profiles
Given DNA sequence, can we predict gene expression and/or functional profiles in different cell types?
Based on these learned rules, can we predict the cell-type specific effect of sequence variation and mutations?
![Page 18: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/18.jpg)
18
Convolutional neural networks
https://commons.wikimedia.org/wiki/File:ConvolutionAndPooling.svg
![Page 19: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/19.jpg)
19
CNNs for genomics
Sequence input
Coverage track input
Convolutional/pooling layers Other layers
Genomic positionACGT
Convolutional filters
Prediction
![Page 20: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/20.jpg)
20
Basenji
Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome research. 2018 May 1;28(5):739-50.
![Page 21: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/21.jpg)
21
Predicting the impact of genetic variation
Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome research. 2018 May 1;28(5):739-50.
![Page 22: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/22.jpg)
22
Predicting somatic mutation effects
Atak ZK, Taskiran II, Demeulemeester J, Flerin C, Mauduit D, Minnoye L, Hulselmans G, Christiaens V, Ghanem GE, Wouters J, Aerts S. Interpretation of allele-specific chromatin accessibility using cell state–aware deep learning. Genome research. 2021 Jun 1;31(6):1082-96.
Minnoye L, Taskiran II, Mauduit D, Fazio M, Van Aerschot L, Hulselmans G, Christiaens V, Makhzami S, Seltenhammer M, Karras P, Primot A. Cross-species analysis of enhancer logic using deep learning. Genome research. 2020 Dec 1;30(12):1815-34.
![Page 23: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/23.jpg)
23
Epigenomic data quality
Low sequencing depth Sample/experimental factors Low aggregate cell count
Fresh tissue
Flash-frozen
50 million reads
1 million reads
23
![Page 24: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/24.jpg)
24
AtacWorks
Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).
![Page 25: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/25.jpg)
25
Training models to enhance low-coverage ATAC-seq
Clean signal + peak calls(50 million reads)
Noisy signal(1 million reads)
![Page 26: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/26.jpg)
26
Denoising and peak calling from low-coverage ATAC-seq
![Page 27: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/27.jpg)
27
Rescuing low-quality data
High quality
Low quality
Low quality + AtacWorks
Bulk ATAC-seq data from human Erythroblasts
Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).
![Page 28: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/28.jpg)
28
Applying AtacWorks to scATAC-seq
Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).
![Page 29: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/29.jpg)
29
Increasing single-cell resolution
AtacWorks can obtain the same quality results from ~10x fewer cells, increasing the resolution of single-cell chromatin accessibility profiling by an order of magnitude.
Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).
![Page 30: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/30.jpg)
30 Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).
Identifying regulatory DNA associated with lineage priming
![Page 31: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/31.jpg)
31
Identifying regulatory DNA associated with lineage priming
Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021).
![Page 32: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/32.jpg)
32
Regulatory modeling with deep learning
Cell / tissue-specific
data generation
Cell type - specific profiles
Data enhancement /
imputation
Train cell-type specific regulatory
models
Predict cell-type specific variant
effects…..AACGCTCCGTAGG….
Cell type A
Cell type B
Cell type A
Cell type B
…..AACGCTTCGTAGG…
![Page 33: Gene Expression and Regulation Machine Learning Tools to](https://reader031.vdocuments.site/reader031/viewer/2022012020/61689749d394e9041f70e90a/html5/thumbnails/33.jpg)