high throughput sequencing technologies: what we can know

11
High Throughput Sequencing Technologies: What We Can Know Brian Krueger, PhD Duke University Center for Human Genome Variation

Upload: brian-krueger

Post on 29-May-2015

409 views

Category:

Science


1 download

DESCRIPTION

Presentation on the pitfalls of short read sequencing and some solutions. Detailed slide notes loosely follow what I said

TRANSCRIPT

Page 1: High Throughput Sequencing Technologies: What We Can Know

High Throughput Sequencing Technologies:

What We Can Know

Brian Krueger, PhDDuke University

Center for Human Genome Variation

Page 2: High Throughput Sequencing Technologies: What We Can Know

2nd Generation Sequencing Overview

Align reads to a reference genome

Fragmented DNA

Ligate Adaptors

Add Bases

ImageCleave

Wash Wash

Bind Library and create clusters

Sequencing Cycle

Repeat Hundreds of times on billions of

clusters

Genomic DNA

Page 3: High Throughput Sequencing Technologies: What We Can Know

2nd Generation Sequencing Advances

• V3 System Chemistry– 300GB per Flowcell– 11 Days to Data– Genome: $4700, Exome: $790

• V4 System Chemistry– 600GB per Flowcell– 6 Days to Data– Genome: $3000, Exome: $640

• X System Chemistry– 1GB per Patterned Flowcell– 3 Days to Data– Genome: $1500, Exome: $500

Page 4: High Throughput Sequencing Technologies: What We Can Know

Techniques for Acquiring Data

• Whole Genome Sequencing– Obtain whole blood or tissue sample– Create sequencing libraries of all DNA fragments

• Whole Exome Sequencing– Utilizes a selection protocol to fish out ONLY

coding DNA sequences– Create sequencing libraries from enriched DNA– Reduces cost and analysis time

• Custom Capture– Same protocol as Exome sequencing– Only target desired DNA sequences

• Amplicon Sequencing– Use PCR to amplify target DNA– Sequence amplified DNA (Amplicon)

• RNA-Seq– Extract RNA, capture mRNA, convert to cDNA– Used for differential gene expression analyses,

RNA isoform detection

Page 5: High Throughput Sequencing Technologies: What We Can Know

Chromosome

Common DNA MutationsCommon DNA MutationsSequence

vari

ants

Str

uct

ura

l vari

ants

Single nucleotide variant

Small insertion

Small deletion

Deletion

Translocation

Reference

A B C DATCGGGTCATGTCA

ATCGGGTCATATCA

A B C D

ATCGGGTCATGACGTCA

A B C D

ATCGGGTCAT

A B C D

A C D

A B GE

Duplication

A B C DC

Inversion A B

D C

F

Credit: Elizabeth Ruzzo, PhD, CHGV

Page 6: High Throughput Sequencing Technologies: What We Can Know

Disadvantages of Current Techniques

• Amplification errors– All polymerases have an inherent error rate (10-6-10-7)

• GC bias– PCR bias against GC rich sequences– Exome capture bias against GC rich sequences

• Trouble detecting small insertions and deletions– Capture baits may not hybridize well– Capture cannot be used to reliably detect large CNVs

• Cannot be used for De novo assembly– Read length too short to span long repeat regions– Not good for detecting trinucleotide repeat

expansions • Miss large structural variations

– Translocations and inversions likely will be missed– Require significant read depth at break points for

these variations to be detected• Trouble with RNA-seq isoform detection

– Like large structural variations, hard to accurately detect all splice isoforms using short read technology

A

CD

GE FA

A B C DB B

A B C DB B BB B

A B C DBB B

X

X

Page 7: High Throughput Sequencing Technologies: What We Can Know

Solutions!

• Solutions for many of these problems exist– As always, come at a cost

• Whole Genome Sequencing - $1500– Reduce Exome Artifacts

• Better Indel Detection and higher coverage in high GC regions

• Can be used to detect large copy number variations

• PCR Free Whole Genome Sequencing– Reduces amplification bias and polymerase

error artifacts• WGS will miss large structural variations

(Inversions, Translocations, microsatellites)– Combine with long read technologies– Added cost of $1000-$10,000– Higher cost = better detection

Page 8: High Throughput Sequencing Technologies: What We Can Know

Long-ish Read Sequencing Technologies

• Mate-Pair Sequencing– Insert size increased from 300bp to 3-8KB– Sequence ends of mate-pairs to pair reads

over much longer distances– Use short reads to fill gaps– Adds $1000 to Genome cost

Page 9: High Throughput Sequencing Technologies: What We Can Know

Long-ish Read Sequencing Technologies

• Illumina Synthetic Long Reads– Fragment Genomic DNA to 10KB– Dilute across a 384 well plate– Fragment clonal 10KB fragments into

300bp fragments and barcode – Sequence fragments and use barcodes to

re-create the long reads synthetically– Use as a short read scaffold to perform De

Novo sequencing– Has been used in HLA sequencing and De

Novo assembly of the Drosophila genome including accurate mapping of 80% of the transposable elements

– Adds $1800 to Genome cost

10kb fragmentation

Barcoding and clonal amp

Nextera prep

Sequencing

Page 10: High Throughput Sequencing Technologies: What We Can Know

True Long Read Sequencing Technologies

• Defined as single molecule sequencing• Less complex sample prep and much longer read length

(1-100kb) compared to 200-400bp for 2nd Gen• Two categories

– Sequencing by synthesis• Pioneered by Pacific Biosciences• Sequencer uses super microscopes and polymerase bound

nanowells to WATCH DNA as it is sequenced in real time• Nanowells filled with DNA bases• Fluorescence of base only detected at the polymerase

– Direct sequencing by passing DNA through a nanopore• Bases fed through a membrane bound nanopore• Ionic difference between both sides of the membrane• Detect how ion flow changes at the pore as each base passes

through• Oxford Nanopore, Base4, Stratos Genomics, Genia

• Bleeding edge technology– Many technical hurdles with very high error rates (10-40%)– Current best use is to create scaffolds for De Novo assembly– Very expensive technology

• Costs 3-10x as much as Illumina to do whole genome sequencing

PacBio

Oxford Nanopore

Page 11: High Throughput Sequencing Technologies: What We Can Know

Questions??

• Reading/Viewing Material:• Sequencing Methods Ecosystem -

http://res.illumina.com/documents/products/research_reviews/sequencing-methods-review.pdf

• Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly repetitive transposable elements - http://biorxiv.org/content/early/2014/01/19/001834

• Characterization of the human ESC transcriptome by hybrid sequencing - http://www.pnas.org/content/110/50/E4821.short

• Nanopore Sequencing Web Conference - http://www.youtube.com/watch?v=UtXlr19xTh8