hamas 1

Ahmad Ali

IBGE

TRENDS IN DNA SEQUENCING,

METHODS AND APPLICATIONS

CONTENTS:

• DNA

• DNA SEQUENCING

• PURPOSE

• DAN SEQUENCING METHODS

• HISTORICAL METHOD

• NEXT GENERATION METHODS

• APPLICATIONS

• Deoxyribonucleic acid (DNA) is a nucleic acid that functions include

Storage of genetic information

Self-duplication & inheritance

Expression of the genetic message

• DNA’s major function is to code for proteins. Information is encoded in the order of the nitrogenous bases.

Deoxyribonucleic Acid

DNA SEQUENCING

Determining the order of bases in a

section of DNA

To analyze gene structure and its

relation to gene expression as well as

protein conformation

PURPOSE

Deciphering “code of life”

Detecting mutations

Typing microorganisms

Identifying human halotypes

Designating polymorphisms

DNA SEQUENCING METHODS

•Historically there are two main methods of DNA

sequencing

1. Maxam and Gilbert method

2. Sanger method

Modern sequencing equipment uses the principles of

the Sanger technique.

•A. M. Maxam and W.Gilbert-1975

•Chemical Sequencing

•Treatment of DNA with certain

Chemicals DNA cuts into

Fragments Monitoring of

sequences

MAXAM & GILBERT METHOD

A graphical demonstration

Principle

Maxam-Gilbert 1975

Chemical Sequencing

Issues:

• Need perfectly pure single species of DNA

• Nasty Chemicals

• Radioactive End-labeling

• 4-lanes/read

• Sequence only what you can purify

Advantages:

- 1st DNA sequencing available

- 2-300 bp/read

Fragment

population

distribution

corresponds to

appearance of

base within

sequence

• Most common approach used for DNA

sequencing .

• Invented by Frederick Sanger - 1977

• Nobel prize - 1980

• Also termed as Chain Termination or

Dideoxy method

SANGER METHOD

Sanger sequencing: chain-terminating inhibitors

Sanger “Sequencing-by-Synthesis” 1977

Issues:

- Radioactive End-labeling

- 4-lanes/read

- Sequencing gels

Advantages:

- 4-500 bp/reads

- Radioactive Incorporation

- Primer gives you control

dNTP ddNTP

http://upload.wikimedia.org/wikipedia/commons/d/df/DNA_Sequencin_3_labeling_methods.jpg

http://upload.wikimedia.org/wikipedia/commons/d/df/DNA_Sequencin_3_labeling_methods.jpg

PCR Dye-Terminator 1990’s

Issues:

- Sequencing gels

- 1 run/day

Advantages:

- 600-700 bp/reads

- 96 reads/run

- Each terminator dye has a different color.

Lets you combine all 4 reactions in one

lane.

- Single lane/read

- Primer gives you control

A breakthrough: fluorescent chain-terminating inhibitors

Automated DNA sequencer

• Capillary electrophoresis

• Costs reduced by 90%

• Human operation 15 min/day/machine

• 1 million bp/day

3730x/ DNA analyzer

First generation DNA sequencer

• Manual preparation of acrylamide gels

• Manual loading of samples

• Contigs of 500-600 bp

• 2.4 millions bp/year(1000 years needed to sequence the human genome)

ABI PRISM 377

Human Genome Project (15 years) Hierarchical

Shotgun Sequencing [start1990]

- Randomly insert Human DNA into BAC clones (~150kbp each)

- Combine these BAC clones to create a scaffold of the human

genome. Each BAC clone will be mapped to a region on a Human

Chromosome

- Pass BAC clones to different Genome Centers throughout US

- At each center, each vector is sequenced using shotgun sequencing

- Wait 15 years for results.

Issues with Shotgun Sequencing

• Reads-> contigs -> scaffolds -> genome reconstruction

• Repeat regions can confuse Contig assemblers.

• It was hoped that by focusing each shotgun run to a single 40-150kb region, these issues

would be minimized.

• According to Venter, it simply multiplied the number of times one encountered the same

problem

Shotgun Sequencing: Venter 1997

Same approach is used throughout NGS

Paired-end sequencing:

1. Randomly cut genomic DNA.

2. Use Gel-purification to make three

libraries of random DNA fragments:

2kb, 10kb, 50kb

2. Sequence from both ends.

3. Use distance information to assemble

contigs into scaffolds.

Distance information allows you to ‘jump’

over repeat regions.

This approach allowed Venter to ‘jump’ over

the federal sequencing project

NEXT GENERATION

SEQUENCING

Common Pipeline. Library Prep.

Sequencing – Massively Parallel Sequencing.

Bioinformatics - Data Analysis .

Popular Platforms (commercially available): Roche 454

AB SOLiD

Illumina(HiSeq,MiSeq).

Newer Platforms (Experemental ): Ion Torrent

PacBio RS

Oxford Nanopore.

Roche 454 – Library Prep.Emulsion PCR (emPCR)

1. DNA fragmentations and

adaptor ligation.

2. DNA fragments are added

to an oil mixture containing

millions of beads.

3. Emulsion PCR results in

multiple copies of the

fragment.

4. Beads are deposited on plate

wells ready for sequencing.

Roche 454 – Pyrosequencing

M.L. Metzker, Nature Review Genetics(2010)

Illumina – Library Prep.

Illumina – Sequence by Synthesis

1. Add dye-labeled

nucleotides.

2. Scan and detect nucleotide

specific fluorescence.

3. Remove 3’ – blocking group

(Reversible termination).

4. Cleave fluorescent group.

5. Rinse and Repeat.

http://hmg.oxfordjournals.org

Illumina – Sequencing by Synthesis

Platform Run

Time

Yield

(GB)

Error Type Error Rate

GAIIx 14 days 96 Sub 0.1

HiSeq

2000/2500

10/2

days

600/120 Sub 0.03*

MiSeq 1 day 2 Sub 0.03*

IlluminaAdvantages

High throughput /low cost.

Suitable for a wide rage of applications most notably whole genome sequencing.

Disadvantages

Substitution error rates (recently improved).

Lagging strand dephasing causes sequence quality deterioration towards the end of read.

SOLiD (Life/AB)

SOLiD: Sequencing by Oligonucleotide Ligation and Detection.

Library Prep: emPCR

“2-base encoding” – Instead of the typical single dNTP addition, two base matching probes are used. (possible 16 probes).

Color Space – four color sequencing encoding further increases accuracy.

SOLiD (Life/AB)

1. Anneal primer and hybridize

probe(8-mer).

2. Ligation and Detection.

3. Cleave fluorescent tail(3-mer).

4. Repeat ligation cycle.

Repeat steps 1-4 with (n-1) primer.

SOLiD – Color Space

• 16 possible base combination are

represented by 4 colors.

• All possible sequencing

combination need to be decoded.

SOLiD – SNP Detection

Summary –

Due to the 2-base encoding system you get very low error rate. (Error rate of around .01)

It is still however limited by short reads. (35bp)

Ion Torrent (Life Technologies)

J. M. Rothberg, et al. Nature (2011) 475:348-352

• Similar to pyrosequencing but uses

semiconducting chip to detect dNTP

incorporation.

• The chip measure differences in pH.

• Shown to have problems with

homopolymer reads and coverage bias

with GC-rich regions.

• Ion Proton™ promises higher output and

longer reads.

PacBio RS

Single Molecule Sequencing – instead of

sequencing clonally amplified templates from beads

(Pyro) or clusters (Illumina) DNA synthesis is

detected on a single DNA strand. Zero-mode waveguide (ZMW)

• DNA polymerase is affixed to the bottom of a

tiny hole (~70nm).

• Only the bottom portion of the hole is

illuminated allowing for detection of

incorporation of dye-labeled nucleotide.

PacBio RS

Advantages

No amplification required.

Extremely long read lengths.

Average 2500 bp. Longest 15,000bp.

Disadvantages

High error rates.

Oxford Nanopore

Announced Feb. 2012 at

ABGT conference.

Measure changes in ion

flow through nanopore.

Potential for long read

lengths and short

sequencing times.

http://www2.technologyreview.com

Nanopore sequencing

The system uses the Staphylococcus auereus toxin α-hemolysin, a robust heptameric

protein which normally forms holes in membranes.

DNA and RNA can be electrophoretically driven through a nanopore of suitable

diameter (Kasianowicz J.J. et al 1996 PNAS)

Nanopore sequencing – how does it work?

When a small voltage (~100 mV) is imposed across a

nanopore in a membrane separating two chambers

containing acqueous electrolytes, the ionic current

through the pore can be measured

Molecules going through the nanopore cause

disruption in the ionic current, and by measuring the

disruption molecules can be identified.

Lipid bilayer with high electronic resistant

Ionic current

Hemolysin

Nanopore – strand sequencing

The DNA polymer passes through the

nanopore itself

The nanopore is engineered to allow

single-base resolution within the strand

A DNA polymerase, coupled with a α-

hemolysin, synthesizes a new strand of

DNA using as a template the polymer

coming out of the pore

DNA Polymer

Nanopore sequencing

Advantages

minimal sample preparation

no requirement for ploymerase and ligase

potential of very long read-lengths ( > 10,000 – 50,000 nt )

it might well achieve the $1,000 per mammalian genome goal

the instrument is inexpensive

EPIGENOME

Transcriptional active site

Protein -DNA interaction

Methylation analysis

METAGENOME

Microbial diversity

FORENSICS

MEDICINES

AGRICULTURE

GENOME DE NOVO

RESEQUENCING

EXOME SEQUENCING

ANCIENT DNA

RNA-SEQ/WHOLE TRANCRIPTOME

mRNA Expression and Discovery

Alternative Splicing

microRNA Expression and discovery

APPLICATIONS

References

Mardis, E. R. A decade’s perspective on DNA sequencing technology. Nature (2011) 470:198 - 203

Metzker, M.L. Sequencing technologies – the next generation. Nature Review Genetics(2010) 11:31 – 46

N. J. Loman, C. Constantinidou, Jacqueline Z. M. Chan, M. Halachev, M. Sergeant, C. W. Penn, E. R. Robinson & M. J. Pallen. High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nature Review Microbiology 10, 599-606.

hamas 1

Health & Medicine