welcome to introduction to bioinformatics monday, 21 march 2005 genome comparison coming attractions...
TRANSCRIPT
Welcome toIntroduction to Bioinformatics
Monday, 21 March 2005
Genome Comparison
• Coming attractions
• How to compare genomes
• Chi-squared analysis
E. coli: What makes it kill?
Escherichia coli . . .
. . . very small lab rats
Courtesy of Kent State University Microbiology
E. coli: What makes it kill?
Escherichia coli . . .
haemorrhagic colitis
E. coli: What makes it kill?
E. coli K12 E. coli O157:H7
TCTACTTATA TTCAATCCAC AGGGCTACACAAGAGTCTGT TGAATGAACA CATACATGGTTTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA
TCTACTTATA TTCAATCCAC AGGGCTACACAAGAGTCTGT TGAATGAACA CATACATGGTTTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA
How to compare genomes
E. coli O157:H7 genomeGATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA...
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
How to compare genomes
E. coli O157:H7 genomeGATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA...
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
How to compare genomes
E. coli O157:H7 genome
GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA...
GATAGATCCCC
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
How to compare genomes
E. coli O157:H7 genome
GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA...
GATAGATCCCC
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
How to compare genomes
E. coli O157:H7 genome
GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA...
GATAGATCCCC
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
How to compare genomes
E. coli O157:H7 genome
GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA...
GATAGATCCCC
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
How to compare genomes
E. coli O157:H7 genome
GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA...
CCCACGCCTAT
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
How to compare genomes
E. coli O157:H7 genome
GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA...
CCCACGCCTAT
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
How to compare genomes
E. coli O157:H7 genome
GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA...
E
. col
i O15
7:H
7
E. coli K12
E
. col
i O15
7:H
7
E. coli K12
O-Islands
P
roch
lor
ss12
0
Prochlor. MED4
Prochlorococcus SS120
Prochlorococcus MED4
(100 nuc)
P
roch
lor
ss12
0
Prochlor. MED4
Prochlorococcus SS120
Prochlorococcus MED4
(25 nuc)
Nature of Pathogenicity Islands
Horizontal transfer of foreign DNA
E. coli O157:H7 genomeGATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA...
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
How do differences arise between genomes?
InfectionPhage
Bacterial chromosome
Phage genome
Lysogenicpathway
LyticpathwayPhage genome
DeathGeneral transduction
How do differences arise between genomes?
InfectionPhage
Bacterial chromosome
Phage genome
Lysogenicpathway
LyticpathwayPhage genome
Life!
How do differences arise between genomes?
InfectionPhage
Bacterial chromosome
Phage genome
Lysogenicpathway
LyticpathwayPhage genome
Life!
How do differences arise between genomes?
InfectionPhage
Bacterial chromosome
Phage genome
Lysogenicpathway
LyticpathwayPhage genome
Special transduction
How to compare genomes
E. coli O157:H7 genomeGATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA...
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
• Differences in genome sequence Useful only if very related
How to compare genomes
E. coli O157:H7 genomeGATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA...
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
• Differences in genome sequence Useful only if very related
• Differences in protein content Useful for even distant comparisons
How to compare genomes
E. coli O157:H7 genomeGATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA...
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
• Differences in genome sequence Useful only if very related
• Differences in protein content Useful for even distant comparisons
How to find orthologous protein?
How to compare genomes
E. coli O157:H7 genomeGATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA...
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
• Differences in genome sequence Useful only if very related
• Differences in protein content Useful for even distant comparisons
How to find corresponding protein?
How to find corresponding protein?
X X X X
X
X
X
Yeast E. coli Anabaena Methanobacter
How to find corresponding protein?
X X X X
X
X
X
Yeast E. coli Anabaena Methanobacter
All similar protein?
Most related by common descent?
Orthologs
OrthologsParalogs
How to find corresponding protein?
Most related by common descent?
All similar protein?OrthologsParalogsBlast
E-valuethreshold
Organism X
Organism Y
How to find corresponding protein?
Most related by common descent?
Orthologs
Blast
E-valuethreshold
Organism Y
Organism X
Organism Y
Organism Y
Defined by bidirectional Blast hit
How to find corresponding protein?
PROTEINS-SIMILAR-TO
ORTHOLOG-OF
COMMON-ORTHOLOGS-OF
Nature of Pathogenicity Islands
Horizontal transfer of foreign DNA
E. coli O157:H7 genomeGATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA...
E. coli K12 genomeGCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
Nature of Pathogenicity Islands
Horizontal transfer of foreign DNA
Nature of Pathogenicity IslandsNucleotide frequencies comparisons
Base Sequence1 Sequence2 Total
A 1,000 600 1,600
C 1,000 800 1,800
G 1,000 700 1,700
T 1,000 900 1,900
Total 4,000 3,000 7,000
Nucleotide Count
Nucleotide frequencies to detect foreign genes
1. Find nucleotide frequencies of native genes
2. Find nucleotide frequencies of test gene
3. Compare frequencies
4. How likely differences arose by chance?
Chi-squared analysis
Result: 705 purple 224 white = 929 plantsResult: 698 purple 231 white = 929 plantsResult: 688 purple 241 white = 929 plantsResult: 710 purple 219 white = 929 plantsResult: 695 purple 234 white = 929 plantsResult: 702 purple 227 white = 929 plants
Where does 2 come from?A million repetitions of Mendel’s experiment
Create a million universes -- purple:white on average = 3:1
200,000 repetitions
Where does 2 come from?A million repetitions of Mendel’s experiment
500,000 repetitions
Where does 2 come from?A million repetitions of Mendel’s experiment
Where does 2 come from?A million repetitions of Mendel’s experiment
1,000,000 repetitions
Why is it that the two dotted lines are on opposite sides of the mean?
Where does 2 come from?A million repetitions of Mendel’s experiment
1,000,000 repetitions
What’s the most likely result? How often does it occur?
Deviation from ExpectationTwo example experiments
Why is there shading on both sides of the curve?
The farther away O from E, the smaller/larger the shaded area?
Steps in Performing a Chi2 Test
Determine the expected values for the experiment
Model: 3 purple : 1 white flowerTotal counted: 929Purple = 75% of 929 = 696.75White = 25% of 929 = 232.25
Calculate the squares of the deviations
Chi2 = Sum of (O - E)2 / EChi2 = (705 - 696.75)2 /696.75 + (224 - 232.25)2 /232.25 ~82 / 700 + ~82 / 230 ~0.09 ~0.3 Chi2 = approx 0.39 (actually = 0.37)
Steps in Performing a Chi2 Test
Determine the degrees of freedom
What was the experiment? - Count 929 flowers a million timesAsk: purple? (if not, then white)
Look up probability for 2 value
2 = 0.30
80% > P > 50%. Call it ~60%
Therefore ONE degree of freedom
Steps in Performing a Chi2 Test
P ~60%
Draw a conclusion
The result has a 50% chance of being correctThe hypothesis has a 50% chance of being correct60% of the time, Mendel’s result or worse would have arisen by chance if purple:white truly occurs in a 3:1ratio.
Deviation from ExpectationTwo example experiments
Study Question 20:What if Mendel had counted not 929 but 929,000 plants -- what does the curve and shading look like then? (d still = 29)
P = .50 P = ???
Interpretation of Chi-Square
Does a high P value indicate the hypothesis is correct?
Does a low P value indicate the hypothesis is incorrect?
Bag of Marbles
1000’s of marbles!50% red, 50% blue
Guaranteed!
Test Claim of 50%:50%
41 marbles
59 marbles
100 marblesTOTAL
Is their claim correct?
How to tell how close is close enough?
2 Test of Claim
Chi2 = Sum of (O - E)2 / E
Chi2 = (53 - 50)2 /50 + (47 - 50)2 /50 9 / 50 + 9 / 50 18/50 0.36
P = ?P = ~60%