dna sequencing - caldwell-west caldwell public schools...3 sequencing dna: •rapid dna sequencing...

21
1 DNA sequencing

Upload: others

Post on 03-Feb-2021

23 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    DNA sequencing

  • 2

    Cloning Wolffia a. cDNA fragments into pTriplEX2

    Determine the size of the insert by PCR and digests

  • 3

    Sequencing DNA:• Rapid DNA sequencing methods were

    first developed in the mid 1970's.• DNA sequencing has developed rapidly;

    many genomes are completely sequenced.

    • 1995 bacterium H. influenzae 1.8 x 106 bp ~1,700 genes

    • 1996 yeast Saccharomyces cerevisiae 12 x 106 ~6,000 genes

    • 1998 nematode Caenorhabditis elegans 97 x 106 ~ 20,000 genes

    • 2003 Human genome! 3 x 109 ~25,000 genes

    Handling the Explosion of Sequence Data

  • 4

    GenBank is the NIH genetic sequence database, anannotated collection of all publicly available DNA sequences.

    Dec, 2008

    GenBank is the NIH genetic sequence database, anannotated collection of all publicly available DNAsequences.

    Currently >1.0 X 1012 bases in 1.08 X 106 sequence recordsin the traditional GenBank divisions.

    1.5 X 1012 bases in 4.8 X 107 sequence records in theWhole Genome Sequencing (WGS) division.

  • 5

    Why are we sequencingthese Genomes?

    • The information generated from theseprojects will serve as a blueprint forinvestigating the structure, function, andexpression patterns of genes that areinvolved in various cellular processes (thiswas controversial at the time).

    • A goal of this project is to make youfamiliar with genes and searchingnucleotide and protein databases.

    • The number and location of all restrictionsites without restriction mapping.

    Once a gene is sequenced a lot of information can be determined about the gene

  • 6

    • After conceptual translation of the DNAsequence into protein sequence, possiblesimilarities to other proteins.

    Once a gene is sequenced a lot of information can be determined about the gene

    AATTCGAGTTTGTG

    ASN-TRP-SER-LEU ILE-ALA-VAL-CYS PHE-GLU-PHE-TRP

    Frame 1

    Frame 2

    Frame 3

    • After conceptual translation of the DNAsequence into protein sequence, possiblesimilarities to other proteins.

    • Structure predictions of the encodedprotein based on the protein sequence.

    Once a gene is sequenced a lot of information can be determined about the gene

  • 7

    1980 Nobel Prize: Fred Sanger and Walter Gilberteach developed methods for DNA sequencing in 1970s

    Gilbert(Chemical Method)

    Sanger(Enzymatic Method)

    Almost everyone uses Sanger's method (or variantsthereof) today.New methods being developed

    How does it work?The fundamental idea behind both methods is the same.

    One needs a known or fixed starting point on one end ofthe DNA to be sequenced.

    DNA fragments are then generated that are random inlength but end with a defined type of base--either A, G,C or T.

    The random populations of DNA fragments are thenseparated using high-resolution gels or chromatography.

    This gel system can separate fragments that differ in aslittle as one base in length.

  • 8

    No one really knows why DNAsynthesis can't start fresh(without adding onto somethingthat's already there).

    Will take advantage of thisrequirement

    Synthesis of the newly synthesizedstrand goes in the opposite directionto the template strand!

    5' 3'

    5' 3'

    Template

  • 9

    Synthesis of the newly synthesizedstrand goes in the opposite directionto the template strand!

    5' 3'

    5' 3'

    Template

    Synthesis of the newly synthesizedstrand goes in the opposite directionto the template strand!

    5' 3'

    5' 3'

    Template

  • 10

    •There are A, C, G, and Tdeoxyribonucleotides (dA, dC,dG, dT)•dNTP = deoxynucleotidetri-phosphate.•The phosphates providethe energy necessary forDNA synthesis.

    •A name for an elephantwithout a tail is ddNTP =dideoxynucleotide tri-phosphate.(ddA, ddC, ddG, ddT).•This nucleotide is missing a 3’ -OH.•Once incorporated into a DNAmolecule, DNA synthesis stops.

    p. 4-3

  • 11

    Dideoxy (Sanger) Sequencing• DNA polymerase is used in order

    to synthesize a complimentarysingle stranded DNA from atemplate.

    • Elongation occurs at the 3´ end ofa primer DNA that is annealed to“template” DNA.

    • Overall chain growth is in the 5´-3´direction.

    • dNTPs are added to the growingDNA chain until a ddNTP is added.

    • When a ddNTP is incorporated atthe 3´ end of the growing primerchain, chain elongation isterminated at G,A, T, or C becausethe primer chain now lacks a 3´hydroxyl group.

    Dideoxy (Sanger) Sequencing

    • Necessary reagents for sequencing.– DNA polymerase– Primers– dNTP– Buffer– Labeling reagent– Dideoxy nucleotides– Template DNA– ddNTP

  • 12

    Dideoxy (Sanger) Sequencing

    • Many individual strands willbe replicated in eachreaction tube using athermalcycler.

    • Incorporation of dNTP intoall possible sites within thenewly synthesized strandshould occur.

    • Newly synthesized strandswill then be separated andanalyzed by size by gelelectrophoresis.

    Dideoxy (Sanger) Sequencing

  • 13

    33

    (not in chapter)

  • 14

    Protocol

    • PCR reaction in orderto amplify the DNA.

    • DNA purification– Removes ddNTPs and

    other impurities• Sequencing

    (not in chapter)

    Reading sequence the old way…

  • 15

    The figure on the right showsthe action spectra of thefour dyes that are normallylinked to ddNTPs forautomated DNA sequencing.Each dye fluoresces adifferent color whenilluminated by a laser beam.

    BASE DYE WAVELENGTHAdRGG 570 ddATP

    GdROX 620 ddTTP

    CdR110 540 ddCTP

    TdTAMARA 600 ddGTPp. 4-3

    Fluorescent dye terminators andautomated DNA sequencing

    • Since four different dyes are used, all the reactionscan be done in a single tube, thus increasing throughput

    • Some of the new sequencing machines use a smallcolumn (capillary), which can be reused.

    • Sensitive lasers are used to determine the 3’ nucleotideof each successive fragment that migrates off the column

  • 16

    Dideoxy (Sanger) Sequencing

    • Sequencing machinesanalyze fluorescentlylabeled ddNTPs.– fluorescently labeled (red,

    green, blue, yellow)– All reactions can be done in a

    single tube.• A computer program

    analyzes and interprets theresults.

  • 17

    41

    Dideoxy (Sanger) Sequencing

  • 18

    47

    atttaccgtg ttggattgaa attatcttgc atgagccagctgatgagtat gatacagttt tccgtattaa taacgaacggccggaaatag gatcccgatc atgattgctt caatattttcacttcaatga ttggttctaa gcattcgaat gcgtacccgtttgattaata tttccatttc tgtcccagtt tttaattttcatttcttttg gttaaaaaat tcccagtctc ttgaatgcttttctaaaatc tttaattcaa ttatttatta gaatcttctgttttgagaac attatcttgc atgagccagc tgatgagtatgatacagttt

    LOCUS AB231879 1383 bp mRNA linear INV 07-JUN-2006DEFINITION Artemia franciscana mRNA for zinc finger protein Af-Zic, complete cds.ACCESSION AB231879VERSION AB231879.1 GI:94966317KEYWORDS .SOURCE Artemia franciscana ORGANISM Artemia franciscana Eukaryota; Metazoa; Arthropoda; Crustacea; Branchiopoda; Anostraca; Artemiidae; Artemia.REFERENCE 1 AUTHORS Aruga,J., Kamiya,A., Takahashi,H., Fujimi,T.J., Shimizu,Y., Ohkawa,K., Yazawa,S., Umesono,Y., Noguchi,H., Shimizu,T., Saitou,N., Mikoshiba,K., Sakaki,Y., Agata,K. and Toyoda,A. TITLE A wide-range phylogenetic analysis of Zic proteins: Implications for correlations between protein structure conservation and body plan complexity JOURNAL Genomics 87 (6), 783-792 (2006) PUBMED 16574373REFERENCE 2 (bases 1 to 1383) AUTHORS Aruga,J. and Toyoda,A. TITLE Direct Submission JOURNAL Submitted (10-AUG-2005) Jun Aruga, RIKEN Brain Science Institute, Laboratory for Comparative Neurogenesis; 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan (E-mail:[email protected], URL:http://www.brain.riken.go.jp/labs/lcn/, Tel:81-48-467-9791, Fax:81-48-467-9792)FEATURES Location/Qualifiers source 1..1383 /organism="Artemia franciscana" /mol_type="mRNA" /db_xref="taxon:6661" gene 1..1383 /gene="Af-Zic" CDS 1..1383 /gene="Af-Zic" /codon_start=1 /product="zinc finger protein Af-Zic" /protein_id="BAE94140.1" /db_xref="GI:94966318" /translation="MTASLSASVMNPSFIKRESPASATALFVPNQFSAVPNFGFHHVP SACATEQSSEMLNPFVDNHLRLNDQSNFQGYHHPHHGQIQQHHLGSYAARDFLFRRDM GLGMGLEAHHTHAAQHHHMFDPSHAAAAAHHAMFTGFDHNTMRLPTEMYTRDASGYAA QQFHQMGSMAPMAHPASAGAFLRYMRTPIKQELHCLWVDPEQPSPKKTCGKTFGSMHE GKVFARSENLKIHKRTHTGEKPFKCEFEGCDRRFANSSDRKKHSHVHTSDKPYNCKVR GCDKSYTHPSSLRKHMKVHGKSPPPASSGCDSDENESIADTNSDSAASPSPSSHDSSQ VQVNHNRPPNHHNLGLGFTNPGHIGDWYVHQSAPDMPVPPATEHSPIGPPMHHPPNSL NYFKTELVQN"ORIGIN 1 atgactgcta gtttaagtgc aagcgtgatg aatccaagtt ttataaagag ggaaagtcct 61 gcatcggcta cagccctgtt cgtaccaaac caatttagtg cagtgcctaa ttttggattt 121 caccatgttc ctagtgcttg tgcaactgag caaagtagtg aaatgctgaa cccttttgtg(Note: the rest of the DNA sequence was deleted to save space)

    Genbank DNAsequence report

  • 19

    49

    LOCUS AB231879 1383 bp mRNA linear INV 07-JUN-2006DEFINITION Artemia franciscana mRNA for zinc finger protein Af-Zic, complete cds.ACCESSION AB231879VERSION AB231879.1 GI:94966317KEYWORDS .SOURCE Artemia franciscana ORGANISM Artemia franciscana Eukaryota; Metazoa; Arthropoda; Crustacea; Branchiopoda; Anostraca; Artemiidae; Artemia.REFERENCE 1 AUTHORS Aruga,J., Kamiya,A., Takahashi,H., Fujimi,T.J., Shimizu,Y., Ohkawa,K., Yazawa,S., Umesono,Y., Noguchi,H., Shimizu,T., Saitou,N., Mikoshiba,K., Sakaki,Y., Agata,K. and Toyoda,A. TITLE A wide-range phylogenetic analysis of Zic proteins: Implications for correlations between protein structure conservation and body plan complexity JOURNAL Genomics 87 (6), 783-792 (2006) PUBMED 16574373REFERENCE 2 (bases 1 to 1383) AUTHORS Aruga,J. and Toyoda,A. TITLE Direct Submission JOURNAL Submitted (10-AUG-2005) Jun Aruga, RIKEN Brain Science Institute, Laboratory for Comparative Neurogenesis; 2-1 Hirosawa, Wako-shi,

    Clone and contact information

    EATURES Location/Qualifiers source 1..1383 /organism="Artemia franciscana" /mol_type="mRNA" /db_xref="taxon:6661" gene 1..1383 /gene="Af-Zic" CDS 1..1383 /gene="Af-Zic" /codon_start=1 /product="zinc finger protein Af-Zic" /protein_id="BAE94140.1" /db_xref="GI:94966318" /translation="MTASLSASVMNPSFIKRESPASATALFVPNQFSAVPNFGFHHVP SACATEQSSEMLNPFVDNHLRLNDQSNFQGYHHPHHGQIQQHHLGSYAARDFLFRRDM GLGMGLEAHHTHAAQHHHMFDPSHAAAAAHHAMFTGFDHNTMRLPTEMYTRDASGYAA QQFHQMGSMAPMAHPASAGAFLRYMRTPIKQELHCLWVDPEQPSPKKTCGKTFGSMHE GKVFARSENLKIHKRTHTGEKPFKCEFEGCDRRFANSSDRKKHSHVHTSDKPYNCKVR GCDKSYTHPSSLRKHMKVHGKSPPPASSGCDSDENESIADTNSDSAASPSPSSHDSSQ VQVNHNRPPNHHNLGLGFTNPGHIGDWYVHQSAPDMPVPPATEHSPIGPPMHHPPNSL NYFKTELVQN"

    Annotations

  • 20

    General Databases:

    NCBI DNA and protein sequences (USA database)EMBL DNA sequences (European Molecular Biology Laboratory)GenEMBL GenBank and EMBL sequences combinedDDBJ DNA sequences (Japan’s equivalent of Genbank)PIR Protein Identification Resource (protein sequences)SwissProt Protein sequences (Switzerland and EMBL)Genpept Translations of DNA based on authors’ informationPDB Coordinates for protein 3D structure. (Now maintained at Rutgers)

    Organism Specific Databases:

    Sanger Worm sequence and genomic databaseSGD Saccharomyces Genomic DatabaseYPD Yeast Protein DatabaseWPD Worm Protein DatabaseWormBase C. elegansFlybase Drosophila sequence and genetic databaseHuman Many

    DNA search programs

    BLAST--basic local alignment search tool

    BLASTn--you provide nucleotide sequence, program comparesand reports nucleotide similarity alignment

    BLASTp--you provide protein sequence, program comparesand reports protein similarity alignment

    BLASTx--you provide nucleotide sequence, program translatesIn all six reading frames and compares and reports proteinsimilarity alignment

    All three of these programs will be used in this project.

  • 21

    Next Generation DNA Sequencing

    • Traditional SangerSequencing– 700-1000 bp– 96 samples/run

    • Roche 454– 200-400 bp– 1 million/run

    • NextGen:SOLiD/Illumina shortread sequencing– 25-50 bp– >300 million/run

    Genomic scaffold

    SOLiD System Overview

    © 2008 Applied Biosystems