1 dna sequencing achim tresch uoc / mpipz cologne treschgroup.de/omicsmodule1415.html...
TRANSCRIPT
- DNA sequencing in the last century
- Current technologies (Illumina, Ion Torrent)
- New developments (PacBio, Nanopore)
Topics
T
Sanger sequencing
- Random incorporation of blocked nucleotides at any position, reaction stops in a small fraction of the reads
TTGCACTTGAGTCGTAACGTGAACTCAGCATAGGCTCAGATAGAT
A-Reaction: add dATP (elongation) and ddATP (block)Analogous: C-, G-, T-Reaction
ddATP
- Developed by Fred Sanger in the 70ies (1918-2013, 2*Nobel laureate: 1958 – protein structure of insulin, 1980 – sequencing of nucleic acids)
- Sequencing by synthesis: DNA polymerase is synthesizing a complementray strand by adding single nucleotides
TTGCACTGAGTCGAACGTGACTCAGCATAGGCTCAGATAGAT
TTGCACTTGAGTCGAACGTGAACTCAGCATAGGCTCAGATAGAT
A-Reaction: TTGCATTGCACTTGA
C-Reaction: TTGCTTGCACTTGCACTTGAGTC
G-Reaction: TTGTTGCACTTGTTGCACTTGAGTTGCACTTGAGTCG
T-Reaction: TTTTTGCACTTTGCACTTTTGCACTTGAGT
TTGCACTTGAGT
ddNTP
Sanger sequencing
ladder of DNA fragments electrophoresis sequence
T
G
C
A
GATTGATAGTTGCCTAACTATCAACGTATAGGCTCAGATAGAT
GGAGATGATTGATTGGATTGAGATTGATGATTGATAGATTGATAGGATTGATAGTGATTGATAGTTGATTGATAGTTGGATTGATAGTTGC
- labeled ddNTPS, capillary sequencing
A
Sanger sequencing
Pyrosequencing
- immobilize DNA on beads, pyrosequencing in microreactors
dTTP
TTGCACTGAGTCGTAACGTGACTCAGCATAGGCTCAGATAGAT
PPiATP
Oxyluciferin + light
454 technology
DNA-loaded beads + primer+ polymerase + sulfurylase+ luciferase
flowgram
TTGCACTGAGTCGTAACGTGACTCAGCAAGTCTATTCACCCAC...
454 technology
Problem: homopolymers difficult to detect
increase throughput:
- DNA gel electrophoresis, single genes in few days
- capillary electrophoresis, 96 capillaries per machine, human genome in a few years
- sequencing on microbeads: 454 technology
Parallelisation & Miniaturisation
Illumina sequencing:
- sequencing by synthesis
- massive parallelisation and miniaturisation by self-organising DNA microarrays on a glass surface
- several hundred Gb, >109 reads per run
Illumina technology
- generate libraries
- grow clusters on a flowcell
- sequence by addition and imaging of blocked & fluorescence-labeled nucleotides
Illumina technology
library preparation:
DNA fragments
Blunting by Fill-in and exonuclease
Phosphorylation
Addition of A-overhang
Ligation to adapters
Illumina technology
cluster generation: 1. flowcell
P5
P7
5’
5’
S.P. # 1 Insert
P5’
P7’
S.P. # 2
TAG
Illumina technology
cluster generation: 1. flowcell 2. hybridize template
Illumina technology
cluster generation: 1. flowcell 2. hybridize template 3. immobilize
template
Illumina technology
cluster generation: 1. flowcell 2. hybridize template 3. immobilize
template 4. bridge amplification
Illumina technology
cluster generation: 1. flowcell 2. hybridize template 3. immobilize
template 4. bridge amplification
Illumina technology
cluster generation: 1. flowcell 2. hybridize template 3. immobilize
template 4. bridge amplification
Illumina technology
cluster generation: 1. flowcell 2. hybridize template 3. immobilize
template 4. bridge amplification
Illumina technology
cluster generation: 1. flowcell 2. hybridize template 3. immobilize
template 4. bridge amplification
Illumina technology
cluster generation: 1. flowcell 2. hybridize template 3. immobilize
template 4. bridge amplification 5. linearisation
Illumina technology
cluster generation: 1. flowcell 2. hybridize template 3. immobilize
template 4. bridge amplification 5. linearisation 6. cleave reverse strand
Illumina technology
cluster generation: 1. flowcell 2. hybridize template 3. immobilize
template 4. bridge amplification 5. linearisation 6. cleave reverse strand 7. block 3‘-ends
Illumina technology
cluster generation: 1. flowcell 2. hybridize template 3. immobilize
template 4. bridge amplification 5. linearisation 6. cleave reverse strand 7. block 3‘-ends 8. hybridize primer
Illumina technology
Imaging & Sequencing:
Illumina technology
Nucleotide + fluorescent dye
+ terminator
reversible terminators:
Illumina technology
fluorescently labelled clusters:
Illumina technology
data output:
Hiseq:- ca. 250 Mio reads * 8 lanes- 2*100 bp paired end -> 400 Gb / 8 days
Hiseq rapid run:- ca. 200 Mio reads * 2 lanes- 2*150 bp paired end -> 120 Gb / 40 hours- (2*250 bp paired end) -> 200 Gb / 60 hours)
Miseq:- ca. 25 Mio reads * 1 lane- 2*300 bp paired end -> 15 Gb / 65 hours
Illumina technology
Fastq quality scores
good quality quality drops towards the end
0.1 %error1 %error
Data quality of short reads
Amplification Artifacts
Duplicate reads
Ion torrent:
semiconductor sequencing- detect H+ release upon nucleotide incorporation by DNA polymerase
Ion torrent
work flow:
Ion torrent
data output:
Ion Proton:
- up to 80 mio reads - up to 10 Gb (200 base read length) - 4 hours runtime
Ion Torrent PGM:
- up to 5 mio reads - up to 2 Gb (400 base read length) - 8 hours runtime
Ion torrent
homopolymer problem?
Ion torrent
- nonlinear increase of signal
what can we do with short reads?
RNA-seq, identify transcripts, count reads per transcript assessment of differential expression
problem: reads are too short to establish connectivity of all exons, difficult/impossible to quantify multiple isoforms of a gene
Sequencing Applications
Stefan Krebs, 30.09.2013
Single end: ambiguous mapping
Paired end sequencing: read fragment from both ends-> resolve ambiguities
Improvements: Paired end Reads
further improvements
long jumping mate-pair libraries:circularize large fragment and reads junctions (2-10 kb)
resolve large repeats in genome assembly
Improvements: Circularization
Third generation Sequencing
- single molecule detection-several kilobases read length-moderate output (150.000 wells)-expensive instrument and high cost per base
Pacific Biosciences
Pacific Biosciences
Pacific Biosciences
Pacific Biosciences
Read length distribution
Pacific Biosciences
Read quality
Pacific Biosciences
- DNA polymerase coupled to pore releases tags when incorpotating labeled nucleotides
- tags passing through nanopore change ion current
- read length = length of DNA fragment
Oxford Nanopore
everything that can be converted to a DNA strand can be sequenced- even long-term data storage by encoding in synthetic DNA is possible
BIOLOGICAL APPLICATIONS:sequencing of genomes, transcriptomes, population diversity, composition of microbial communities, ChIPseq, methyl-Seq, translating RNA from ribosomes, ...
MEDICAL APPLICATIONS:whole genome sequencing, exome sequencing, tumor diagnostics, sequencing of T-cell receptor diversity, identification of pathogens, ...
FORENSICS, FOOD SAFETY, ARCHEOLOGY, …
Applications
Other Approaches
Summary third generation Sequencing
Acknowledgements
Stefan KrebsGene CenterLMU Munich