size matters: accurate detection and phasing of structural ...size matters: accurate detection and...

17
Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018

Upload: others

Post on 19-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

Size matters: accurate detection and phasing of Structural VariationsFritz Sedlazeck

June, 14, 2018

Page 2: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

Identification of SVs with long reads

Sedlazeck et al. Nature Methods (2018)

Page 3: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

Short-read validation / False Discovery

ONT data

PacBio data

Illumina data

Insertion In rep. region

Inversion:

Translocation:

Truncated reads:

Insertion In rep. region

Sedlazeck et al. Nature Methods (2018)

Page 4: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

How can we leverage these technologies in large cohorts?

CCDG44k+

Short-Read WGS

Long-range sequencing

Comprehensive Genomes

● Informed Sample Selection● Disease Context

● Validate complex variation● Novel SVs● Phasing information● Ethnicity Variant Catalogs

● Technology Strategies● Data Merging Pipelines

Page 5: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

How to select samples: SVCollector

SVCollector for

Num Samples

Cum

mula

tive

Fra

ction

of S

Vs

0 10 20 30 40 50 60 70 80 90 100

0.0

0.2

0.4

0.6

0.8

1.0

greedy

topN

random

1000 Genomes: selecting 4% of the samples

Sedlazeck et al. (bioarchive)

Population TopN Greedy

AFR 99 60

SAS 0 16

EAS 0 14

EUR 0 6

AMR 1 4

Subpopulation 30.77% 96.15%

% S

Vs

in p

op

ula

tio

n

Number of Samples

Page 6: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

• Quantitative sample selection• Avoid dependency on self reported phenotypes• Assuming observed variation (Freeze 1) is a prior for real variation• Using common SVs (AF>0.001)

• ~100 Baylor CCDG F1 samples• Multiple Ethnicities

Sedlazeck et al. (bioarchive)

Ethnicity Male Female

African American 22 40

Hispanic American 9 8

Caucasian 6 10

Unknown (5 samples)# Samples selected

SVCollector : CCDG Sample Selection

Page 7: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

Comprehensive Genomes Pilot: Preliminary Data

@HGSC

External

Family StudyIllumina, PCR-Free

PacBio 10X

Genomics

RNA-Seq (Read-pairs

– M)

Ashkenazi Jewish trio

38x 18x 30x 50

HGSC Control Trio >100x 19x 56x 68

HapMap CEU Trio >100x 40x 30x 35

Sedlazeck et al. (in preparation)

Page 8: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

HLA-F deletion: Long + RNA-Seq

PacBio: HS-1011

RNA-SeqHS-1011

PacBio:NA12878

RNA-SeqNA12878

PacBio:NA24385

RNA-SeqNA24385

FPKM: 54.2613

FPKM: 19.3986

FPKM: 16.9305

Sedlazeck et al. (in preparation)

Page 9: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

Phasing: chr6

Technology N50 Phasing (Mbp)

PacBio 0.243

10x-Longranger 0.901

10x-Hapcut2

0.978

PacBio+10x 1.039

Technology N50 Phasing (Mbp)

PacBio 0.276

10x-Longranger 8.523

10x-Hapcut2

67.576

PacBio+10x 67.576

MHC LPA

PacBio

10x genomics

Both

PacBio

10x genomics

Both

HLALPA

HS-1011: DNA Mol. Length 27.6 kb

NA24385: DNA Mol. Length 99.9 kb

Sedlazeck et al. (in preparation)

Page 10: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

Comprehensive Genomics

Comprehensive Genomes:Sedlazeck et al. (in preparation)

Interaction of SVs+SNV with methylation and RNA-Seq

Overview of long read projects

Diploid genomesRegions (Parkinson, Gaucher):Leija-Salazar (bioRxiv)

Entex consortium (in preparation)

Detection of VariantsNGMLR + Sniffles Sedlazeck et.al. (2018)

SURVIVOR Jeffares et. al. (2017)

ClairvoyanteLuo et al. (bioRxiv)

GiaB (in preparation)

SVs in GenomesCancer (SKBR3)Nattestrad et al (bioRxiv)

44,000 Population (CCDG)Sedlazeck et.al. (in prep)

0.0e+00 5.0e+07 1.0e+08 1.5e+08

0.0

00

.10

0.2

0

CHR6: Average SV Allele Frequency per 100kb

Position

Alle

le f

req

uen

cy

Page 11: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

Methods

SURVIVOR:• Tool kit for SVs• Published: Nature Communications (2017)• Available:

github.com/fritzsedlazeck/SURVIVOR

Sniffles:• SVs detection for long reads• Published: Nature Methods (2018)• Also nested SV• Available:

github.com/fritzsedlazeck/Sniffles

NextGenMap-LR:• Long read mapper• Published: Nature Methods (2018)• Available:

github.com/philres/nextgenmap-lr

SVCollector• Automated sample selection• bioRxiv• Available:

github.com/fritzsedlazeck/SVCollector

Clairvoyante• SNV caller• bioRxiv• Available:

github.com/aquaskyline/Clairvoyante

Crossstich• Localized assembly + phasing of SVs+SNV• Available:

github.com/schatzlab/crossstitch

Page 12: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of
Page 13: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

Acknowledgments

William Salerno

Stephen Richards

Richard Gibbs

Philipp Rescheneder

Moritz Smolka

Arndt von Haeseler

Michael Schatz

Schatz lab

Page 14: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of
Page 15: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

3.4 How much coverage do we need?

NA12878 (55x original)SKBR3 (69x original)

Page 16: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

Short indels 8-100bp

SV Scalpel Sniffles found(%) Sniffles additional:

DEL 30,988 90.5% 871

INS 191,817 71.70% 13,503

Page 17: Size matters: accurate detection and phasing of Structural ...Size matters: accurate detection and phasing of Structural Variations Fritz Sedlazeck June, 14, 2018. Identification of

Minimap2: Pacbio