1 dna sequencing achim tresch uoc / mpipz cologne treschgroup.de/omicsmodule1415.html...

1

DNA SequencingAchim Tresch

UoC / MPIPZ Cologne

treschgroup.de/[email protected]

- DNA sequencing in the last century

- Current technologies (Illumina, Ion Torrent)

- New developments (PacBio, Nanopore)

Topics

T

Sanger sequencing

- Random incorporation of blocked nucleotides at any position, reaction stops in a small fraction of the reads

TTGCACTTGAGTCGTAACGTGAACTCAGCATAGGCTCAGATAGAT

A-Reaction: add dATP (elongation) and ddATP (block)Analogous: C-, G-, T-Reaction

ddATP

- Developed by Fred Sanger in the 70ies (1918-2013, 2*Nobel laureate: 1958 – protein structure of insulin, 1980 – sequencing of nucleic acids)

- Sequencing by synthesis: DNA polymerase is synthesizing a complementray strand by adding single nucleotides

TTGCACTGAGTCGAACGTGACTCAGCATAGGCTCAGATAGAT

TTGCACTTGAGTCGAACGTGAACTCAGCATAGGCTCAGATAGAT

A-Reaction: TTGCATTGCACTTGA

C-Reaction: TTGCTTGCACTTGCACTTGAGTC

G-Reaction: TTGTTGCACTTGTTGCACTTGAGTTGCACTTGAGTCG

T-Reaction: TTTTTGCACTTTGCACTTTTGCACTTGAGT

TTGCACTTGAGT

ddNTP

Sanger sequencing

ladder of DNA fragments electrophoresis sequence

T

G

C

A

GATTGATAGTTGCCTAACTATCAACGTATAGGCTCAGATAGAT

GGAGATGATTGATTGGATTGAGATTGATGATTGATAGATTGATAGGATTGATAGTGATTGATAGTTGATTGATAGTTGGATTGATAGTTGC

- labeled ddNTPS, capillary sequencing

A

Sanger sequencing

Pyrosequencing

- immobilize DNA on beads, pyrosequencing in microreactors

dTTP

TTGCACTGAGTCGTAACGTGACTCAGCATAGGCTCAGATAGAT

PPiATP

Oxyluciferin + light

454 technology

DNA-loaded beads + primer+ polymerase + sulfurylase+ luciferase

flowgram

TTGCACTGAGTCGTAACGTGACTCAGCAAGTCTATTCACCCAC...

454 technology

Problem: homopolymers difficult to detect

increase throughput:

- DNA gel electrophoresis, single genes in few days

- capillary electrophoresis, 96 capillaries per machine, human genome in a few years

- sequencing on microbeads: 454 technology

Parallelisation & Miniaturisation

Illumina sequencing:

- sequencing by synthesis

- massive parallelisation and miniaturisation by self-organising DNA microarrays on a glass surface

- several hundred Gb, >109 reads per run

Illumina technology

- generate libraries

- grow clusters on a flowcell

- sequence by addition and imaging of blocked & fluorescence-labeled nucleotides

Illumina technology

library preparation:

DNA fragments

Blunting by Fill-in and exonuclease

Phosphorylation

Addition of A-overhang

Ligation to adapters

Illumina technology

cluster generation: 1. flowcell

P5

P7

5’

5’

S.P. # 1 Insert

P5’

P7’

S.P. # 2

TAG

Illumina technology

cluster generation: 1. flowcell 2. hybridize template

Illumina technology

cluster generation: 1. flowcell 2. hybridize template 3. immobilize

template

Illumina technology


template 4. bridge amplification

Illumina technology


template 4. bridge amplification 5. linearisation

Illumina technology


template 4. bridge amplification 5. linearisation 6. cleave reverse strand

Illumina technology


template 4. bridge amplification 5. linearisation 6. cleave reverse strand 7. block 3‘-ends

Illumina technology


template 4. bridge amplification 5. linearisation 6. cleave reverse strand 7. block 3‘-ends 8. hybridize primer

Illumina technology

Imaging & Sequencing:

Illumina technology

Nucleotide + fluorescent dye

+ terminator

reversible terminators:

Illumina technology

fluorescently labelled clusters:

Illumina technology

data output:

Hiseq:- ca. 250 Mio reads * 8 lanes- 2*100 bp paired end -> 400 Gb / 8 days

Hiseq rapid run:- ca. 200 Mio reads * 2 lanes- 2*150 bp paired end -> 120 Gb / 40 hours- (2*250 bp paired end) -> 200 Gb / 60 hours)

Miseq:- ca. 25 Mio reads * 1 lane- 2*300 bp paired end -> 15 Gb / 65 hours

Illumina technology

Fastq quality scores

good quality quality drops towards the end

0.1 %error1 %error

Data quality of short reads

Amplification Artifacts

Duplicate reads

Ion torrent:

semiconductor sequencing- detect H+ release upon nucleotide incorporation by DNA polymerase

Ion torrent

work flow:

Ion torrent

data output:

Ion Proton:

- up to 80 mio reads - up to 10 Gb (200 base read length) - 4 hours runtime

Ion Torrent PGM:

- up to 5 mio reads - up to 2 Gb (400 base read length) - 8 hours runtime

Ion torrent

homopolymer problem?

Ion torrent

- nonlinear increase of signal

what can we do with short reads?

RNA-seq, identify transcripts, count reads per transcript assessment of differential expression

problem: reads are too short to establish connectivity of all exons, difficult/impossible to quantify multiple isoforms of a gene

Sequencing Applications

Stefan Krebs, 30.09.2013

Single end: ambiguous mapping

Paired end sequencing: read fragment from both ends-> resolve ambiguities

Improvements: Paired end Reads

further improvements

long jumping mate-pair libraries:circularize large fragment and reads junctions (2-10 kb)

resolve large repeats in genome assembly

Improvements: Circularization

Third generation Sequencing

- single molecule detection-several kilobases read length-moderate output (150.000 wells)-expensive instrument and high cost per base

Pacific Biosciences

Pacific Biosciences

Pacific Biosciences

Read length distribution

Pacific Biosciences

Read quality

Pacific Biosciences

- DNA polymerase coupled to pore releases tags when incorpotating labeled nucleotides

- tags passing through nanopore change ion current

- read length = length of DNA fragment

Oxford Nanopore

everything that can be converted to a DNA strand can be sequenced- even long-term data storage by encoding in synthetic DNA is possible

BIOLOGICAL APPLICATIONS:sequencing of genomes, transcriptomes, population diversity, composition of microbial communities, ChIPseq, methyl-Seq, translating RNA from ribosomes, ...

MEDICAL APPLICATIONS:whole genome sequencing, exome sequencing, tumor diagnostics, sequencing of T-cell receptor diversity, identification of pathogens, ...

FORENSICS, FOOD SAFETY, ARCHEOLOGY, …

Applications

Other Approaches

Summary third generation Sequencing

Acknowledgements

Stefan KrebsGene CenterLMU Munich

1 dna sequencing achim tresch uoc / mpipz cologne treschgroup.de/omicsmodule1415.html...

Documents

sanger sequencing slide

illumina sequencing

run illumina technology

cluster generation

nanopore topics slide

capillary sequencing

technology problem

years sequencing