ngs: the...

Post on 18-Jun-2020

13 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

NGS: the basics

Human genome sequence

June 26th 2000: official announcement of the completion of the draft of the human genome sequence (truly finished in 2004)

Costs: HGP:

3 billion $ 15 years

Celera: 200 million $

2 years

Craig Venter Francis Collins

2004: 2 Requests for Application NIH

Current technologies are able to produce the sequence of a mammalian-sized genome of the desired data quality for $10 to $50 million; the goal of this initiative is to reduce costs by at least two orders of magnitude. It is anticipated that emerging technologies are sufficiently advanced that, with additional investment, it may be possible to achieve proof of principle or even early stage commercialization for genome-scale sequencing within five years.

A parallel RFA solicits grant applications to develop technologies to meet the longer-term goal of achieving four-orders of magnitude cost reduction in about ten years, so that a mammalian-sized genome could be sequenced for approximately $1000.

Increased efficiency: decreased costs

Exponential cost decrease

Efficient integration of each individual step to slash down the costs

Massively parallel sequencing Next generation sequencing

Key: direct sequencing of DNA without the bacterial cloning step From colonies to polonies

454

Roche GS Flex

454: Library preparation

Clonal amplification of single molecules

Emulsion PCR

454: Sequencing by pyrosequencing

GS Flex throughput (2011-2013)

Up to a million sequences 700 bp long (up to 1 kb)

in 23 hours

454: Game over!

Jonathan Rothberg: “In the sequencing business, one needs to innovate or die. At 454 we were always first; first non-bacterial cloning, first commercialization, first next-gen individual human genome, Neanderthal, mammoth, deep sequencing, cancer sequencing, drug response studies, HIV, metagenomics, first drug target by whole genome sequencing, and many more firsts. Always innovating, always first."

454: Game over!

In 2007, Roche acquired 454 for $155 million in cash and stock. Rothberg said that when Roche bought 454, the company was "two years ahead of everyone else," but after the purchase, "they lost that lead, no more firsts, no more innovation."

Rothberg strikes back!

Rothberg: "When I woke up and found Roche had bought 454 without me, I had to restart. It cost three years. We had to invent a new scalable way to sequence — ion semiconductor sequencing — and establish a clear path towards both truly low-cost and mobile sequencing." He went on to found Ion Torrent, which was bought by Life Technologies in 2010 for $375 million in cash and stock, and another $350 million based on milestones.

Ion Torrent

Simple Natural Chemistry

Fast Direct Detection

Nucleotides flow sequentially over Ion semiconductor chip Direct detection of natural DNA extension A few seconds per incorporation

Sensor Plate

Silicon Substrate Drain Source Bulk

dNTP

To column receiver

∆ pH

∆ Q

∆ V

Sensing Layer

H+

Rothberg J.M. et al Nature doi:10.1038/nature10242

Scalable Semiconductor Technology

Wafer Semiconductor Manufacturing

Chip Semiconductor Packaging

Chip Cross Section

Semiconductor Design

The Chip is the Machine™

Scalability Simplicity Speed

Two machines, 5 chips

PGM 314

316 318

Proton P1

P2?

Ion Torrent Specs

314 Chip: 0.4 to 0.5 million reads, 30 to 100 Mb data 316 Chip: 2 to 3 million reads, 300 to 1000 Mb data 318 Chip: 4 to 5.5 million reads, 0.6 to 2 Gb data 200bp or 400bp reads, 2 to 7 hours

Proton P1: 60-80 million reads, up to 10 Gb data 200bp reads, 2-4 hours Proton P2: L’arlésienne!

Barcode read just before insert with Ion Torrent

Barcoded adapter Insert Biotin adapter

Barcode

Sequencing primer

Ion Torrent paired-end sequencing

Illumina genome analyzer, HiSeq, Miseq

(formerly Solexa)

Solexa amplification step

Amplification and sequencing on a solid support

Sequencing by synthesis

CRT: cyclic reversible termination

Sequencing by synthesis

Amplification and sequencing on a solid support

Illumina: Primary data analysis

120 tiles per lane 480 images per lane and cycle: 36nt run = 138,240 images = 945 Gb 2x50nt run = 384,000 images = 1.3 Tb 2x100nt run = 768,000 images = 5.3 Tb

Image analysis (Illumina)

Image registration:

Get image coordinates congruent

Register images between cycles

Cluster identification

Template of cluster positions

created from first five cycles

A C

T G

Cluster identification

If neighboring clusters have identical sequences during first 5 cycles: one cluster

If neighboring clusters have different sequences during first 5 cycles: two clusters

As a consequence: Barcodes should not be included in the first bases otherwise the

probability of fusing two different clusters would be too high

Illumina paired-end sequencing

Barcoding with a single index (Illumina)

Barcoding with dual indexing (Illumina)

Illumina-Solexa throughput (End 2013)

Up to 3 billion sequences, up to 2*100 bp long in 11days (Hiseq2000)

Or 0.6 billion, 2*150 bp, in 40 hours (Hiseq2500) Or 12-55 million, 2*250, in 39 hours (Miseq V2) Or 22-25 million, 2*300, in 65 hours (Miseq V3)

Solid sequencing

Applied Biosystems

Solid Applied Library

Solid Applied Library

Emulsion PCR

Solid Applied Library

Solid Applied Sequencing

Solid Applied Sequencing

Solid throughput (Early 2009)

Up to 0.2 billion sequences up to 2*60 bp long

in 7 days

Complete Genomics

A human genome for 5,000$

Step1: fragment tagging

Complete Genomics

A human genome for 5,000$

Step2: Clonal DNA amplification

Complete Genomics

A human genome for 5,000$

Step3:Distribution over patterned substrate 1 billion spots per slide

Complete Genomics

A human genome for 5,000$

Step 4: Sequencing by ligation

Complete Genomics

A human genome for 5,000$

Step 5: Assembly

Complete Genomics

A human genome for 5,000$

Costs slashing: small volumes, «simple» equipment

Third Generation sequencing

Single molecule sequencing No PCR amplification

Helicos Bioscience

Single molecule fluorescent sequencing on a flow cell

Helicos

Cyclic reversible termination: single DNA molecule extended one base at a time, blocking fluorescent label removed and washed, and reiteration

Helicos

Improved cyclic reversible termination and single DNA molecule detection

Helicos throughput

Up to 1 Billion sequences On average 32 bp long

in 7 days

Pacific Biosciences

Long single molecule sequencing

Pacific Biosciences

The label is on the phosphate, and the label is captured transiently using a DNA polymerase tethered on a nanopore

Pacific Biosciences

Thousands of nanoguides concentrate light

The ZMW nanostructure provides excitation confinement in the zeptoliter (10−21 liter) regime

Pacific Biosciences

Label on the phosphate, not on the base

Pacific Biosciences

Real time detection of incorporation of each base on thousands of molecules

Pacific Biosciences throughput

Each pore: 10 bases/sec Claim: in 2013, a high quality human

genome in 15 minutes

Third or Fourth generation sequencing

Single molecules, no fluorophore Oxford Nanopore Technology

Oxford Nanopore

Nanopore Array chip Pore across lipid bilayer

Exonuclease

Bases passing through the pore generate a change in the electrical conductance of the membrane allowing electrical measurements. A, T, G, C and MeC can be distinguished.

Oxford Nanopore

There are several more possibilities in the pipelines

BioNanomatrix VisiGen

Dover Systems Intelligent Bio-Systems

ZS Genetics Reveo

LightSpeed Genomics

top related