hamas 1
TRANSCRIPT
CONTENTS:
• DNA
• DNA SEQUENCING
• PURPOSE
• DAN SEQUENCING METHODS
• HISTORICAL METHOD
• NEXT GENERATION METHODS
• APPLICATIONS
• Deoxyribonucleic acid (DNA) is a nucleic acid that functions include
Storage of genetic information
Self-duplication & inheritance
Expression of the genetic message
• DNA’s major function is to code for proteins. Information is encoded in the order of the nitrogenous bases.
Deoxyribonucleic Acid
DNA SEQUENCING
Determining the order of bases in a
section of DNA
To analyze gene structure and its
relation to gene expression as well as
protein conformation
PURPOSE
Deciphering “code of life”
Detecting mutations
Typing microorganisms
Identifying human halotypes
Designating polymorphisms
DNA SEQUENCING METHODS
•Historically there are two main methods of DNA
sequencing
1. Maxam and Gilbert method
2. Sanger method
Modern sequencing equipment uses the principles of
the Sanger technique.
•A. M. Maxam and W.Gilbert-1975
•Chemical Sequencing
•Treatment of DNA with certain
Chemicals DNA cuts into
Fragments Monitoring of
sequences
MAXAM & GILBERT METHOD
Maxam-Gilbert 1975
Chemical Sequencing
Issues:
• Need perfectly pure single species of DNA
• Nasty Chemicals
• Radioactive End-labeling
• 4-lanes/read
• Sequence only what you can purify
Advantages:
- 1st DNA sequencing available
- 2-300 bp/read
Fragment
population
distribution
corresponds to
appearance of
base within
sequence
• Most common approach used for DNA
sequencing .
• Invented by Frederick Sanger - 1977
• Nobel prize - 1980
• Also termed as Chain Termination or
Dideoxy method
SANGER METHOD
Sanger “Sequencing-by-Synthesis” 1977
Issues:
- Radioactive End-labeling
- 4-lanes/read
- Sequencing gels
Advantages:
- 4-500 bp/reads
- Radioactive Incorporation
- Primer gives you control
dNTP ddNTP
PCR Dye-Terminator 1990’s
Issues:
- Sequencing gels
- 1 run/day
Advantages:
- 600-700 bp/reads
- 96 reads/run
- Each terminator dye has a different color.
Lets you combine all 4 reactions in one
lane.
- Single lane/read
- Primer gives you control
A breakthrough: fluorescent chain-terminating inhibitors
Automated DNA sequencer
• Capillary electrophoresis
• Costs reduced by 90%
• Human operation 15 min/day/machine
• 1 million bp/day
3730x/ DNA analyzer
First generation DNA sequencer
• Manual preparation of acrylamide gels
• Manual loading of samples
• Contigs of 500-600 bp
• 2.4 millions bp/year(1000 years needed to sequence the human genome)
ABI PRISM 377
Human Genome Project (15 years) Hierarchical
Shotgun Sequencing [start1990]
- Randomly insert Human DNA into BAC clones (~150kbp each)
- Combine these BAC clones to create a scaffold of the human
genome. Each BAC clone will be mapped to a region on a Human
Chromosome
- Pass BAC clones to different Genome Centers throughout US
- At each center, each vector is sequenced using shotgun sequencing
- Wait 15 years for results.
Issues with Shotgun Sequencing
• Reads-> contigs -> scaffolds -> genome reconstruction
• Repeat regions can confuse Contig assemblers.
• It was hoped that by focusing each shotgun run to a single 40-150kb region, these issues
would be minimized.
• According to Venter, it simply multiplied the number of times one encountered the same
problem
Shotgun Sequencing: Venter 1997
Same approach is used throughout NGS
Paired-end sequencing:
1. Randomly cut genomic DNA.
2. Use Gel-purification to make three
libraries of random DNA fragments:
2kb, 10kb, 50kb
2. Sequence from both ends.
3. Use distance information to assemble
contigs into scaffolds.
Distance information allows you to ‘jump’
over repeat regions.
This approach allowed Venter to ‘jump’ over
the federal sequencing project
NEXT GENERATION
SEQUENCING
Common Pipeline. Library Prep.
Sequencing – Massively Parallel Sequencing.
Bioinformatics - Data Analysis .
Popular Platforms (commercially available): Roche 454
AB SOLiD
Illumina(HiSeq,MiSeq).
Newer Platforms (Experemental ): Ion Torrent
PacBio RS
Oxford Nanopore.
Roche 454 – Library Prep.Emulsion PCR (emPCR)
1. DNA fragmentations and
adaptor ligation.
2. DNA fragments are added
to an oil mixture containing
millions of beads.
3. Emulsion PCR results in
multiple copies of the
fragment.
4. Beads are deposited on plate
wells ready for sequencing.
Illumina – Sequence by Synthesis
1. Add dye-labeled
nucleotides.
2. Scan and detect nucleotide
specific fluorescence.
3. Remove 3’ – blocking group
(Reversible termination).
4. Cleave fluorescent group.
5. Rinse and Repeat.
http://hmg.oxfordjournals.org
Illumina – Sequencing by Synthesis
Platform Run
Time
Yield
(GB)
Error Type Error Rate
GAIIx 14 days 96 Sub 0.1
HiSeq
2000/2500
10/2
days
600/120 Sub 0.03*
MiSeq 1 day 2 Sub 0.03*
IlluminaAdvantages
High throughput /low cost.
Suitable for a wide rage of applications most notably whole genome sequencing.
Disadvantages
Substitution error rates (recently improved).
Lagging strand dephasing causes sequence quality deterioration towards the end of read.
SOLiD (Life/AB)
SOLiD: Sequencing by Oligonucleotide Ligation and Detection.
Library Prep: emPCR
“2-base encoding” – Instead of the typical single dNTP addition, two base matching probes are used. (possible 16 probes).
Color Space – four color sequencing encoding further increases accuracy.
SOLiD (Life/AB)
1. Anneal primer and hybridize
probe(8-mer).
2. Ligation and Detection.
3. Cleave fluorescent tail(3-mer).
4. Repeat ligation cycle.
Repeat steps 1-4 with (n-1) primer.
SOLiD – Color Space
• 16 possible base combination are
represented by 4 colors.
• All possible sequencing
combination need to be decoded.
SOLiD – SNP Detection
Summary –
Due to the 2-base encoding system you get very low error rate. (Error rate of around .01)
It is still however limited by short reads. (35bp)
Ion Torrent (Life Technologies)
J. M. Rothberg, et al. Nature (2011) 475:348-352
• Similar to pyrosequencing but uses
semiconducting chip to detect dNTP
incorporation.
• The chip measure differences in pH.
• Shown to have problems with
homopolymer reads and coverage bias
with GC-rich regions.
• Ion Proton™ promises higher output and
longer reads.
PacBio RS
Single Molecule Sequencing – instead of
sequencing clonally amplified templates from beads
(Pyro) or clusters (Illumina) DNA synthesis is
detected on a single DNA strand. Zero-mode waveguide (ZMW)
• DNA polymerase is affixed to the bottom of a
tiny hole (~70nm).
• Only the bottom portion of the hole is
illuminated allowing for detection of
incorporation of dye-labeled nucleotide.
PacBio RS
Advantages
No amplification required.
Extremely long read lengths.
Average 2500 bp. Longest 15,000bp.
Disadvantages
High error rates.
Oxford Nanopore
Announced Feb. 2012 at
ABGT conference.
Measure changes in ion
flow through nanopore.
Potential for long read
lengths and short
sequencing times.
http://www2.technologyreview.com
Nanopore sequencing
The system uses the Staphylococcus auereus toxin α-hemolysin, a robust heptameric
protein which normally forms holes in membranes.
DNA and RNA can be electrophoretically driven through a nanopore of suitable
diameter (Kasianowicz J.J. et al 1996 PNAS)
Nanopore sequencing – how does it work?
When a small voltage (~100 mV) is imposed across a
nanopore in a membrane separating two chambers
containing acqueous electrolytes, the ionic current
through the pore can be measured
Molecules going through the nanopore cause
disruption in the ionic current, and by measuring the
disruption molecules can be identified.
Lipid bilayer with high electronic resistant
Ionic current
Hemolysin
Nanopore – strand sequencing
The DNA polymer passes through the
nanopore itself
The nanopore is engineered to allow
single-base resolution within the strand
A DNA polymerase, coupled with a α-
hemolysin, synthesizes a new strand of
DNA using as a template the polymer
coming out of the pore
DNA Polymer
Nanopore sequencing
Advantages
minimal sample preparation
no requirement for ploymerase and ligase
potential of very long read-lengths ( > 10,000 – 50,000 nt )
it might well achieve the $1,000 per mammalian genome goal
the instrument is inexpensive
EPIGENOME
Transcriptional active site
Protein -DNA interaction
Methylation analysis
METAGENOME
Microbial diversity
FORENSICS
MEDICINES
AGRICULTURE
GENOME DE NOVO
RESEQUENCING
EXOME SEQUENCING
ANCIENT DNA
RNA-SEQ/WHOLE TRANCRIPTOME
mRNA Expression and Discovery
Alternative Splicing
microRNA Expression and discovery
APPLICATIONS
References
Mardis, E. R. A decade’s perspective on DNA sequencing technology. Nature (2011) 470:198 - 203
Metzker, M.L. Sequencing technologies – the next generation. Nature Review Genetics(2010) 11:31 – 46
N. J. Loman, C. Constantinidou, Jacqueline Z. M. Chan, M. Halachev, M. Sergeant, C. W. Penn, E. R. Robinson & M. J. Pallen. High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nature Review Microbiology 10, 599-606.