an overview of dna sequencing - michigan state university lecture_sequencingassembly.pdfbacs are...

70
An Overview of DNA Sequencing

Upload: others

Post on 03-Feb-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

An Overview ofDNA Sequencing

Page 2: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Prokaryotic DNA

http://en.wikipedia.org/wiki/Image:Prokaryote_cell_diagram.svg

Plasmid

Page 3: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Eukaryotic DNA

http://en.wikipedia.org/wiki/Image:Plant_cell_structure_svg.svg

Page 4: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

The two strands of a DNA molecule are held together by weak bonds (hydrogen bonds) between the nitrogenous bases, which are paired in the interior of the double helix.

The two strands of DNA are antiparallel; they run in opposite directions. The carbon atoms of the deoxyribose sugars are numbered for orientation.

DNA Structure

http://en.wikipedia.org/wiki/Image:DNA_chemical_structure.png

Page 5: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

The goal of sequencing DNA is to tell the order of the bases, or nucleotides, that form the inside of the double-helix molecule.

We can do this in one of two ways:

• Directed Sequencing• Shotgun Sequencing

Sequencing DNA

Page 6: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Directed Sequencing=Primer Walking

• Start with genome, gene, clone, PCR product

• Design a primer and sequence a certain segment of the genome, usually the beginning.

• From that sequence, design the next primer and sequence the next segment of the genome.

• Continue designing primers and sequencing until the genome is completed.

.

Page 7: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Shotgun Sequencing

• Start with a whole genome or a large piece of the DNA (a BAC).

• Shear the DNA into many different, random segments.

• Sequence each of the random segments.

• Then, put the pieces back together again in their original order.

Page 8: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Theory Behind Shotgun Sequencing

Haemophilus influenzae 1.83 Mb baseCoverage unsequenced (%)

1X 37%2X 13%5X 0.67%6X 0.257X 0.09%

For 1.83 Mb genome, 6X coverage is 10.98 Mb of sequence, or 22,000 sequencing reactions, 11000 clones (1.5-2.0 kb insert), 500 bp average read.

0

500

1000

1500

2000

2500

3000

0 20000 40000 60000 80000

Sequences

Gap

s

Page 9: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

BAC based Projects

BACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces of DNA that are 50 to 300 kilobases. To make copies of the DNA we insert the BAC into an E. coli cell.

chromosome

break into large pieces

clone into BAC vector

transform in E. coli and process each BAC as individual

projects

Page 10: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Shotgun vs. Directed: Which is the better, more efficient method?

• With directed, sequences are done in order and there’s no puzzle to put back together. Minimal computing power is required.

• Primers must be continually designed and purchased.

• If an area is difficult to sequence, you could get stuck.

• Shotgun sequencing takes less time; all the sequences can be done at about the same time.

• Minimal cost

• But, the pieces have to be assembled, a time-consuming process requiring extensive computational power.

Page 11: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

2. Random Sequencing Phase

a. sequence DNA (15,000 sequences/ Mb)

GGG ACTGTTC ...

a. isolate DNA

b. fragment DNA

c. clone DNA

3. Closure Phase

a. assemble sequences

b. close gaps

d. annotation

c. edit

237 239

2384. COMPLETE GENOME SEQUENCE

1.Library construction

Whole Genome Shotgun Sequencing

Page 12: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Library Construction

• Construct shotgun libraries of the genome or target DNA; sizes of inserts are varied; could be small (2-3 kb), medium (8-12 kb), or fosmid (30-40 kb)

• Start with multiple copies of purified DNA

• Shear the DNA using mechanical force, breaking it into smaller pieces

• DNA fragments are cloned into a plasmid vector to replicate the DNA; these fragments are called “inserts”

Page 13: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Library Construction

• Insert fragment DNA into a vector such as pBR322

• Transform into E. coli cells and, using an antibiotic, select for cells that have a plasmid. The plasmids carry antibiotic resistant genes. When plated in the presence of an antibiotic, the cells without a plasmid die.

Page 14: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Template Preparation• Isolation of the plasmid DNA

• Transformed E.coli is plated onto an agar plate. Every E. coli colony will contain plasmids with the same insert.

• Colonies are “picked” & transferred to liquid media where they multiply; use 384 well high throughput plates

• Plasmids are isolated and suspended in a buffer.

Page 15: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Template Production LaboratoryCurrent Capacity: 22,000,000 plasmids/year

Page 16: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Sanger Sequencing

• Utilize dideoxy sequencing method of chain termination (Sanger)

• Each plasmid is reacted with a forward and reverse primer (2 reactions for each piece of DNA).

• Done in high throughput manner in 384 well plates

Page 17: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

-Initial dideoxy sequencing involved use of radioactive dATP and 4 separate reactions (ddATP, ddTTP, ddCTP, ddGTP) & separation on 4 separate lanes on an acrylamide gel with detection through autoradiogram

-New techologies use 4 fluorescently labeled bases and separation on capillaries and detection through a CCD camera

Sequencing reactions

Page 18: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

DNA sequencing

Page 19: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Plasmid Structure

Antibioticresistance

F primer binds, synthesis

R primer binds,

synthesis

Page 20: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Sequencing Machines

• The DNA fragments are loaded into capillaries in the sequencing machines.

• Polymer in the capillaries provides a matrix for separating the DNA fragments based on size.

• Separation of the fragments through a matrix is called electrophoresis.

Page 21: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Sequence Production LaboratoryCurrent Capacity: 40,000,000 sequences/year

Page 22: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Data Collection• A laser excites the fluorescent dyes.• A camera detects the fluorescence.• Data collection software collects the data.

Capillary array view

Page 23: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Sequencing Machine Output

AACTCATCGAATCCGTACGGGAACTCATCGAATCCGTACGGAACTCATCGAATCCGTACGAACTCATCGAATCCGTACAACTCATCGAATCCGTAAACTCATCGAATCCGTAACTCATCGAATCCGAACTCATCGAATCCAACTCATCGAATCAACTCATCGAATAACTCATCGAAAACTCATCGAAACTCATCGAACTCATCAACTCATAACTCAAACTCAACTAACAAA

Fluorescent Sequencing Gel

Four colors, one lane per sample

Fluorescent Sequencing GelEach fragment differs by one

nucleotide

This is a diagram of just one lane. Reading from the bottom, where the fragment is only one base long, the fluorescent dye is an A. This is the first base in the sequence.

Page 24: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Data Analysis

• An chromatogram is produced and the bases are called

• Software assign a quality value to each base • Phred & TraceTuner

• Read DNA sequencer traces• Call bases• Assign base quality values• Write basecalls and quality values to output files.

Page 25: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Assemble Fragments

SEQUENCER OUTPUT

ASSEMBLE FRAGMENTS

CLOSURE & ANNOTATION

TAGCTAGCAGCTAGC

AGCTAGGCTC

AGCTCGCTAGCTAGCTAGCTAGCTAGGCTC

AGCTCGCTATAGCTAGCTA

CTAGCTAGCTAGGCTCGCTAGCTAGCT

CTCGCTAGCTAG

AGCTCGCTAGCTAGCTAGCTAGC

AGCTAGGCTC AGCTCGCTACTAGCTAGCTAGGCTC

GCTAGCTAGCT

AGCTCGCTAGCTATAGCTAGCTA

AGCTAGC

CTCGCTAGCTAGTAGCTAGC

GCTAGCTAGC

Page 26: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

• CONSENSUS SEQUENCE

• FRAGMENTS FROM

SEQUENCING PROCESS

Page 27: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Closure

• Assemble the sequence files, relate them to each other, and close gaps

• Involves many computational programs to identify overlapping sequences, linkages between sequences

• Back to the lab for hard to close gaps

Page 28: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Complicating Factors

• A procedure that works well in one species may not produce the same results in even a closely related species.

• Each genome is uniquely different in its size and how it sequences and assembles

• New technologies must be developed to tackle the unique characteristics and properties of difficult genomes

Page 29: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Whole Genome Shotgun Sequencing:Modifying for Eukaryotes

Not restricted to bacterial organisms

Sequence eukaryotes:

whole genome draft sequence; same approach as

with bacteria

chromosome by chromosome; sequence genome

using large insert bacterial artificial chromosome

(BAC) clones anchored to the chromosomes

combination of whole genome and chromosome

by chromosome

Page 30: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Whole Genome Draft Sequencing

2. Random Sequencing Phase

a. sequence DNA (15,000 sequences/ Mb)

GGG ACTGTTC ...

a. isolate DNA

b. fragment DNA

c. clone DNA

3. Closure Phase

a. assemble sequences

b. close gaps

d. annotation

c. edit

237 239

2384. COMPLETE GENOME SEQUENCE

1.Library construction

Page 31: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Whole Genome Draft Sequencing

2. Random Sequencing Phase

a. sequence DNA (15,000 sequences/ Mb)

GGG ACTGTTC ...

a. isolate DNA

b. fragment DNA

c. clone DNA

3. Closure Phase

a. assemble sequences

b. close gaps

d. annotation

c. edit

237 239

2384. COMPLETE GENOME SEQUENCE

1.Library construction

Advantages:Saves time and money

(~50 %)

Disadvantages: Incomplete sequence, contains errors

Page 32: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

454 Genome Sequencing System• Library prep, amplification and sequencing: 2-4 days• Single sample preparation from bacterial to human genomic DNA• Single amplification per genome with no cloning or cloning artifacts• Picoliter volume molecular biology• 100 Mb per run (4-5 hr); less than $ 20,000 per run• Read lengths 200-230 bases• Massively parallel imaging, fluidics and data analysis • Requires high genome coverage for good assembly• Error rate of 1-2%

Page 33: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

454-Pyrosequencing

Perform emulsion PCR

Depositing DNA Beads into the PicoTiter™Plate

Construct Single stranded

adaptor liagated DNA

Sequencing by Synthesis:Simultaneous sequencing of the entire genome in

hundreds of thousands of picoliter-size wells

Pyrophosphate signal generation

Page 34: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

What is an EST? single pass sequence from cDNAspecific tissue, stage, environment, etc.

Multiple tissues, states..with enough sequences, can ask quantitative

questions

cDNAlibrary in E.coli

pick individual clones

template prep

pBluescript

T7 T3

Insert in

Expressed Sequence Tags (ESTs): Sampling the Transcriptome and Genic Regions

Page 35: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Uses of EST sequencing:-Gene discovery-Digital northerns/insights into transcriptome-Genome analyses, especially annotation of genomic DNA

Issues with EST sequencing:-Inherent low quality due to single pass nature-Not 100 % full length cDNA clones -Redundant sequencing of abundant transcripts

Address throughclustering/

assembly to buildconsensus sequences

= Gene Index,Unigene Set,

Transcript Assembly

Page 36: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

EST Clustering

All ESTs and mRNAs from an organismCluster and Assemble

Set of clustered, assembled sequences=

contigs, Transcript Assembly, Tentative Consensus, Unigene

Longer, more accurate sequence of the transcript

Sequences which do not cluster or

assemble=Singletons, singlets

Single pass transcript

Page 37: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

http://www.jgi.doe.gov/education/how/how30minflash.html

http://www.illumina.com/media.ilmn?Title=Sequencing-By-Synthesis%20Demo&Cap=&PageName=solexa%20technology&PageURL=203&Media=1

Web Links for Animation on Genome Sequencing

Page 38: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

From Fragments to Finished Genome

Overview of Sequencing, Assembly, and Closure Processes

Page 39: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces
Page 40: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Genome Sequencing Process

Library Construction

Clone Picking

Template Preparation

Sequencing Reactions

Electrophoresis andBase Calling

Genome Assembly

Genome ClosureOrder ContigsClose Gaps

Identify RepeatsFinish the Genome

Annotation

rDNAMolecules

Page 41: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Sequence Requirements

1. Free vector should be at low or undetectable level.

2. No chimeric clones. Chimeras occur two or more random fragments from separate parts of the genome recombine and end up next to each other.

3. The majority of the inserts should be of relatively uniform size.

4. Libraries need to be random and cover the whole genome.

Page 42: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Basecalling & Quality AssignmentsPhred & TraceTuner

•Read DNA sequencer traces

•Call bases

•Assign base quality values

•Write basecalls and quality values to output files.

Warner Brothers, Inc.

Page 43: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

What are phred quality values?

The quality value q assigned to a base call is defined as:

q q = = -- 10 x log10 x log1010((pp))

where where pp is the estimated error probability is the estimated error probability for that basefor that base--call.call.

Page 44: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

OR

A base-call having a probability of 1/1000 of being incorrect is assigned a quality value of

30.

ProbabilityProbability Quality ValueQuality Value1/1001/100 20201/101/10 1010

Page 45: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces
Page 46: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Assembling the fragments

Page 47: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Merging two sequences

…AGCCTAGACCTACAGGATGCGCGGACACGTAGCCAGGACCAGTACTTGGATGCGCTGACACGTAGCTTATCCGGT…

overlap (19 bases)overhang (6 bases)

overhang

overlap - region of similarity between regionsoverhang - un-aligned ends of the sequences

The assembler screens merges based on: • length of overlap• % identity in overlap region• maximum overhang size.

% identity = 18/19 % = 94.7%

Page 48: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

TIGR Assembler

Greedy

• Build a rough map of fragment overlaps

• Pick the largest scoring overlap• Merge the two fragments• Repeat until no more merges can be

done

Page 49: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Forward-reverse constraints

• The sequenced ends are facing towards each other • The distance between the two fragments is known

(within certain experimental error)

clone length

sequenced ends

F R

Page 50: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Scaffolding

• Given a set of non-overlapping contigsorder and orient them along a chromosome

III III IV

I

IIIII

IV

Page 51: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Clone-mates

Vector

Insert

F R

FR

I II

R

I

F

II

F

II

R

I

Page 52: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Linking information

• Overlaps

• Mate-pair links

• Similarity links

• Physical markers

• Gene synteny

reference genome

physical map

Page 53: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Grouping the contigs

Page 54: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Assembly gaps

group A group B

physical gap

sequencing gaps

sequencing gap - we know the order and orientation of the contigs and have at least one clone spanning the gap

physical gap - no information known about the adjacent contigs, nor about the DNA spanning the gap

Page 55: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Unifying view of assembly

Assembly

Scaffolding

Page 56: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Why Completeness is Important

• Improves characterization of genome features• Gene order, replication origins

• Better comparative genomics• Genome duplications, inversions

• Determination of presence and absence of particular genes and features is less subjective

• Missing sequence might be important (e.g., centromere)• Allows researchers to focus on biology not sequencing• Facilitates large scale correlation studies• Controls for contamination

Page 57: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

What Is Closure?

• Obtaining sequence that was not obtained during random sequencing which resulted in:

• Sequencing Gaps• Physical Ends

• Confirming the integrity of assemblies

• Repeats and misassemblies• Verification of Clone Coverage

• Confirming the base sequence of the consensus

• Editing• Verification of Sequence Coverage

Page 58: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Sequence Validation: Sequence coverage

1X 2X 3X

Sequence coverage rule:Every base in an assembly must be covered by at least two sequences of high quality.

Why?Validating sequence coverage provides a high degree of confidence in the consensus base calls.

Page 59: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Sequence editor

• In this example there is an obvious discrepancy between the base calls of several of the underlying clones in this region.

Page 60: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Causes for gaps• Non-random shotgun library

• Toxicity of genes or promoters in E. coli• Genomic DNA difficult to clone (capsular polysaccharides)• Unstable regions (low complexity)

• Sequencing problems• Hard stops

• Secondary structures• Very high or low GC content• Small unit tandem repeats

• Loss of signal• Homopolymeric tracts• Very high or low GC content

Page 61: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Closure Challenges: Sequencing Through Secondary Structures

Page 62: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Hairpin structure

Page 63: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Homopolymeric tracts

Page 64: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces
Page 65: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Solutions• Apply different sequencing chemistries

• Big-Dye terminator (default)• Dye-primer• dGTP mix (GC rich regions)

• Denature structures - Additives• Betaine• DMSO

• Break structure• Restriction digest• Transposon insertion• Micro-libraries

Page 66: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Repetitive Areas

• Repetitive areas are regions of high similarity within the genome/BAC.

• Sequences in these areas may be misassembled by the Assembler.

• Verification of the sequence of repetitive areas:

• A. Identify potential repetitive areas, using repeatFinder and other tools.

• B. Classify repeats based on length, copy number, % similarity, structure and complexity.

• C. If repeats are misassembled, transpose spanning clones or obtain PCR products and sequence to verify assembly.

Page 67: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Mis-assembled repeatClones link different repeat flanks

Page 68: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

Resolved Repeat• Unique flank order is correct

• Use linking information across the repeat (large insert clones or PCR)

• Consensus sequence is correct• Use linked clones that have one mate in the

repeat and the other anchored in unique sequence

• Transposon mediated libraries

Page 69: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

> GDRFE25TFCATTGAACACTAGGAGCCATAGAC………(up to 60 bases per line)GTTCAACCGTTTAAGGCAAAACTTA………AATTTTGGGCAGACTCTAGATCATG………GGTAATACATACTCTGGGATTACGA………

Fasta Format

Page 70: An Overview of DNA Sequencing - Michigan State University Lecture_SequencingAssembly.pdfBACs are Bacterial Artificial Chromosomes. They are large transport systems that can hold pieces

> GDRFE25TFCATTGAACACTAGGAGCCATAGACT………(up to 60 bases per line)GTTCAACCGTTTAAGGCAAAACTTA………AATTTTGGGCAGACTCTAGATCATG………GGTAATACATACTCTGGGATTACGA………

> GDRFE45TFACTGGTTCACATGGAGGGATAGTAC………(up to 60 bases per line)GACACTCCGTAGCTGGCAATCCTTA………GGCTCTCAATCGAGACTCTAGTTAC………TCCAATATGGGCTCATGGAACAAGA………

> GDRFE67TFCATTGAACACTAGGAGCCATAGATC………(up to 60 bases per line)AATGTGGCGTAGCTGCCACTTGGTA………TACCGTCAATCGTATTGTCTAGTTAC………GGGAGATAATATGGGCTCATATGGT………

>

Multifasta Format