wssp chapter 7 blastn: dna vs dna searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat...

33
WSSP Chapter 7 BLASTN: DNA vs DNA searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt

Upload: chloe-mccoy

Post on 28-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

WSSP Chapter 7BLASTN: DNA vs DNA searches

atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgctga ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgttgg attgaaggta attatcttgc atgagccagc tgatgagtat gatacagttt

4-3

DSAP: BLASTn Page

p. 7-1

p. 7-1

NCBI BLAST Home Page

p. 7-2

NCBI BLASTN search page

p. 7-2

Copy sequence from DSAP or wave form program

p. 7-3

Choose a database (nr/nt or est)

p. 7-4

Search options (Use defaults)

p. 7-5

BLASTN progress report (search may take a few minutes)

p. 7-5

Format options (use defaults)

p. 7-6

EX1.11 BLASTN nr/nt database

Graphic report of EX2.09

p. 7-7

p. 7-7

BLASTN list of matches for EX1.10

EX2.09 BLASTN

p. 7-9

Best match to EX1.12

p. 7-9

>gi|226493893|ref|NM_001157047.1| Zea mays dynein light chain LC6, flagellar outer arm (LOC100284150), mRNALength=606  Score = 221 bits (244), Expect = 5e-54 Identities = 218/282 (77%), Gaps = 0/282 (0%) Strand=Plus/Plus Query 11 ATGTTGGAAGGGAGGGCGAGAGTAGAAGACACCGACATGCCGAGGAAGATGCAGGCGGAG 70 ||||||||||| | |||| || || ||||||||||||||| ||||||||| || ||Sbjct 104 ATGTTGGAAGGAAAGGCGGTGGTGGAGGACACCGACATGCCGGCGAAGATGCAAGCCCAG 163 Query 71 GCCATGAACGCCGCCTCTCACGCGCTCGATCTGTTCGACGTCGCGGACTGCAAGAGCCTC 130 || ||| || || || || || |||| ||||||||| |||||| |||| ||Sbjct 164 GCGATGTCGGCGGCGTCCAGGGCCCTGGATCGCTTCGACGTCCTCGACTGCCGGAGCATC 223 Query 131 GCCGCGCATATCAAGAAGGAATTTGATAAGATCTACGGTCCGGGATGGCAGTGCGTCGTC 190 || | || ||||||||||| ||||| |||| | || || |||||||| ||||| || Sbjct 224 GCGTCCCACATCAAGAAGGAGTTTGACGCGATCCATGGCCCCGGATGGCAATGCGTGGTT 283 Query 191 GGCTCCAGCTTCGGCTGTTTCTTCACTCACAAGAAAGGCAGCTTCATCTACTTCCGCCTG 250 |||||| |||||||||| | | |||| |||| || || |||||||||||||||||||||Sbjct 284 GGCTCCGGCTTCGGCTGCTACATCACGCACAGCAAGGGGAGCTTCATCTACTTCCGCCTG 343 Query 251 GAGACGCTCCACTTCCTCATCTTCAAAGGCGCGGCCGCTTGA 292 ||| ||||| |||||| |||||||||| ||||| || |||Sbjct 344 GAGTCGCTCAGGTTCCTCGTCTTCAAAGGGGCGGCAGCATGA 385

Our Seq.

Database Seq.

Length of sequence

Mismatch

Match

Perfect, but short, matches are not usually meaningful

>gi|14250883|emb|AL583809.3|CNS07EFY Human chromosome 14 DNA sequence BAC R-736L22 of library RPCI-11 from chromosome 14 of Homo sapiens (Human), complete sequence Score = 40.1 bits (20), Expect = 4.6 Identities = 20/20 (100%)

Query: 189   ttttctgaatattcataata 208 ||||||||||||||||||||Sbjct: 60645 ttttctgaatattcataata 60626

7-11

Examine the best alignments: Are they significant?

7-9

Mismatchesi) Bad sequence on our part

ii) Bad sequence on their part

iii) Differences in the sequence of the two organisms

C R E L L I L D A Query TGT CGT GAA CTC CTA ATT CTC GAC GCC ||| ||| ||| || || || || || || Sbjct TGT CGT GAA CTT CTG ATC CTT GAT GCA C R E L L I L D A

Query: 383 AGCGTTGCCGTTCGTCAGCTTGATGTTAAGCTGGGCAGCGCGCTCGACGATTCCTTTGCG 324 |||||| |||||||||||||||||||| | ||| || ||||||||||||||||| |||||Sbjct: 6152 AGCGTTTCCGTTCGTCAGCTTGATGTTCAACTGAGCGGCGCGCTCGACGATTCCCTTGCG 6211

Wobble position: same amino acid,but different codon….degenerate code

           C R R T P D P *Query TGTCGT-CGAACTCCTGATCCTTGA           |||||| ||||||||||||||||||Sbjct TGTCGTCCGAACTCCTGATCCTTGA           C R E L L I L D

p. 7-13

Small Gaps- alter the reading frame of the protein

Query: 179 TTCGAGCTACCAGATGATC-GATTGGAACAT-T-C--TGTCATTG-AC-CTTC-AGGTAA 230 ||||||| || | | || |||| || || | | | | ||| | |||| |||| |Sbjct: 4684 TTCGAGCG-CC-GTTAATATGATTACAATATCTACAATATTATTATATGCTTCCAGGTGA 4741

Query: 231 TCAACCATGACCGTGTCAACCGAAACGACGTTATCGGCCGTGCACTATTGAACATGGAGG 290 |||| ||||||||||| ||||| || || || || |||||||| || | || ||||| |Sbjct: 4742 TCAATCATGACCGTGTTAACCGTAATGATGTAATTGGCCGTGCCCTTCTTAATATGGAAG 4801

An example of a match with and without gaps.

p. 7-13

>gi|241990611|dbj|AK330768.1| Triticum aestivum cDNA, clone: SET5_E05, cultivar: Chinese Spring Length=650 Score = 219 bits (242), Expect = 2e-53 Identities = 211/271 (77%), Gaps = 0/271 (0%) Query 10 GATGTTGGAAGGGAGGGCGAGAGTAGAAGACACCGACATGCCGAGGAAGATGCAGGCGGA 69 |||| ||||||||| ||||| || || ||||||||||||||| ||||||||| | |Sbjct 78 GATGCTGGAAGGGAAGGCGACGGTGGAGGACACCGACATGCCGGCCAAGATGCAGCTGCA 137 Query 70 GGCCATGAACGCCGCCTCTCACGCGCTCGATCTGTTCGACGTCGCGGACTGCAAGAGCCT 129 ||||| || || || |||||||| | ||||||||| |||||| |||| |Sbjct 138 GGCCACCTCGGCGGCGTCCAGGGCGCTCGAACGCTTCGACGTCCTCGACTGCCGGAGCAT 197 Query 130 CGCCGCGCATATCAAGAAGGAATTTGATAAGATCTACGGTCCGGGATGGCAGTGCGTCGT 189 ||| ||||| ||||||||||| || || | |||| |||| ||||| ||||||||||| ||Sbjct 198 CGCGGCGCACATCAAGAAGGAGTTCGACACGATCCACGGCCCGGGGTGGCAGTGCGTGGT 257 Query 190 CGGCTCCAGCTTCGGCTGTTTCTTCACTCACAAGAAAGGCAGCTTCATCTACTTCCGCCT 249 |||| |||||||||||| | |||||| |||| || || |||||||| |||||| ||Sbjct 258 GGGCTGCAGCTTCGGCTGCTACTTCACGCACAGCAAGGGGAGCTTCATATACTTCAAGCT 317 Query 250 GGAGACGCTCCACTTCCTCATCTTCAAAGGC 280 ||| |||||| |||||| |||||||||||Sbjct 318 CGAGTCGCTCCGGTTCCTCGTCTTCAAAGGC 348

Alignment of the second best match to EX1.12

p. 7-14

p. 7-14

Alignments near the end of the EX1.12

>gi|254826767|ref|NG_012498.1| Homo sapiens glypican 4 (GPC4), RefSeqGene on chromosome X Length=121142 Score = 71.6 bits (78), Expect = 6e-09 Identities = 42/44 (95%), Gaps = 0/44 (0%)

Query 665 CTAGCTTTTCTTAACaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 708 || ||||||||||| |||||||||||||||||||||||||||||Sbjct 72886 CTTGCTTTTCTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 72929

p. 7-15

Fill in the table listing the best matches from three different organisms.

List Wolffia if there is a match

Use the clone report to obtain more information about the gene

p. 7-15

3) Perform a BLASTn of the est database

Change the database

p. 7-17

p. 7-17

BLASTn report of the EX1.11 search of the est database

>gi|198335694|gb|GD004539.1| CCHY28888.g1 CCHY Panicum virgatum callus (N) Panicum virgatum cDNA clone CCHY28888 3', mRNA sequence. Length=624 Score = 246 bits (272), Expect = 1e-61 Identities = 226/286 (79%), Gaps = 0/286 (0%) Strand=Plus/Minus Query 3 GAGAGAAGATGTTGGAAGGGAGGGCGAGAGTAGAAGACACCGACATGCCGAGGAAGATGC 62 |||| | ||| ||||||||| ||||| || || ||||| ||||||||| ||||||||Sbjct 527 GAGACACCATGCTGGAAGGGAAGGCGATGGTGGAGGACACGGACATGCCGGCGAAGATGC 468 Query 63 AGGCGGAGGCCATGAACGCCGCCTCTCACGCGCTCGATCTGTTCGACGTCGCGGACTGCA 122 ||||| |||| ||| || || || || ||||| | ||||||||| |||||| Sbjct 467 AGGCGCAGGCGATGGCGGCGGCGTCCAGGGCCCTCGACCGCTTCGACGTCCTCGACTGCC 408 Query 123 AGAGCCTCGCCGCGCATATCAAGAAGGAATTTGATAAGATCTACGGTCCGGGATGGCAGT 182 |||| |||| ||||| ||||||||||| ||||| | |||| |||| || || ||||| |Sbjct 407 GGAGCATCGCGGCGCACATCAAGAAGGAGTTTGACACGATCCACGGCCCCGGGTGGCAAT 348 Query 183 GCGTCGTCGGCTCCAGCTTCGGCTGTTTCTTCACTCACAAGAAAGGCAGCTTCATCTACT 242 |||| || ||||||||||||||||| | |||||| |||| || || |||||||||||||Sbjct 347 GCGTGGTGGGCTCCAGCTTCGGCTGCTACTTCACGCACAGCAAGGGGAGCTTCATCTACT 288 Query 243 TCCGCCTGGAGACGCTCCACTTCCTCATCTTCAAAGGCGCGGCCGC 288 |||| || ||| ||||| ||||||||||||||||| ||||| ||Sbjct 287 TCCGGCTCGAGTCGCTCAGGTTCCTCATCTTCAAAGGGGCGGCAGC 242

Alignment of the best match to EX1.12 from the est search

p. 7-17

Fill out the DSAP table of the BLASTn search of the est database

p. 7-18

Query 61 CAAGGTCTAAGTACTGAAAAGGAAAGTCTACTAATTACAAAGAAGTTATTGTTTGTACCT 120 |||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||Sbjct 13166 CAAGGTCTAAGTACTGAAAAGGAAAGTCCACTAATTACAAAGAAGTTATTGTTTGTACCT 13107

Query 121 TTTGTATCAGGGTTTATTAAATTTCAATCTTTATTGCTGAATCCCGAAACAAGGTGATCT 180 |||||||||||||||||||||||| |||||| ||||||||||||||||||||||||||||Sbjct 13106 TTTGTATCAGGGTTTATTAAATTTTAATCTTCATTGCTGAATCCCGAAACAAGGTGATCT 13047

Open Question: Why are there differences in the sequences?

Q5. BLASTn Analysis: Is your cDNA similar to genes in other organisms?

p. 6-23

Q6. BLASTn Analysis: Is your cDNA similar to genes in many different organisms?

p. 6-23

!

Is the sequence found in many other organisms?