robots and automatic genome annotation n ross d. king n department of computer science n university...
TRANSCRIPT
Robots and Automatic Genome Annotation
Ross D. King Department of Computer Science University of Wales, Aberystwyth
Talk Plan
Data Mining based gene function prediction
The Robot Scientist
Automating annotation and experimentation
Data Mining Prediction
We have developed a method for predicting the functional class of gene products based on data mining.
The idea is to learn a reliable predictive function on the examples of genes with products of known function.
Then apply this function to genes where the functional class is unknown.
Applied to: E. coli, M. tuberculosis, S. cerevisiae, A. thaliana.
We call this approach: Data Mining Prediction (DMP).
Classification schemes (MIPS/GO)
1,0,0,0 "METABOLISM"1,1,0,0 "amino acid metabolism"1,1,1,0 "amino acid biosynthesis"1,1,4,0 "regulation of amino acid metabolism"1,1,7,0 "amino acid transport"1,1,10,0 "amino acid degradation (catabolism)"1,1,99,0 "other amino acid metabolism activities"
1,2,0,0 "nitrogen and sulfur metabolism"1,3,0,0 "nucleotide metabolism"1,4,0,0 "phosphate metabolism"1,5,0,0 "C-compound and carbohydrate metabolism"1,6,0,0 "lipid, fatty-acid and isoprenoid metabolism"1,7,0,0 "metabolism of vitamins, cofactors, and prosthetic groups"1,20,0,0 "secondary metabolism"
... and ORFs may have multiple functions too!
Hierarchy of classes
Sequence Data
478 attributes in total
field description typeaa_rat_X % of amino acid X in the protein realseq_len length of the protein sequence intaa_rat_pair_X_Y % of the amino acids X and Y consecutively realmol_wt molecular weight of the protein inttheo_pI theoretical pI (isoelectric point) realatomic_comp_X atomic composition of X (C,H,N,O,S) realaliphatic_index aliphatic index realhydro grand average of hydropathy realstrand the DNA strand 'w' or 'c'position the number of exons (no. of start positions) intcai codon adaptation index realmotifs number of PROSITE motifs inttmSpans number of transmembrane spans
intchromosome chromosome number 1..16,mit
Homology dataYAL001C: mvltiypdelvqivsdkiasnkgkitlnqlwdisgkyfdlsdk....
PSI-BLASTSequence databaseNRDB
sfc3:keyword(membrane)length(358)dbref(prosite)dbref(embl)
genetfcsfc3wsv442cg9463f1l3
organismbaker's yeastfission yeastwhite spot virusfruit flyArabidopsis
score0.01.0e-182.12.93.0
We look up the associated information from SwissProt
Predicted Secondary Structure Data
mvltiypdelvqivsdkiasnkgkitlnqlwdisgkyfdlsdkkvk...cbbbbccaaaaaaaaaaaacccccbbbbaaaaaacccbbccccccb...
We record length and relative positions of the secondary structure elements.
This is relational data.
Expression Data
Spellman et al (1998), Roth et al (1998)DeRisi et al (1997), Eisen et al (1998)Gasch et al (2000, 2001), Chu et al (1998)
• Microrarray experiments to measure expression changes in yeast under a variety of conditions, including cell cycle, heat shock, diauxic shift.
• Short time series data, numerical-valued
0 7 14 21YBR166C 0.33 -0.17 0.04 -0.07YOR357C -0.64 -0.38 -0.32 -0.29YLR292C -0.23 0.19 -0.36 0.14YGL112C -0.69 -0.89 -0.74 -0.56...
Phenotype Data• Data from knockout gene growth experiments • Many missing data• Data taken from 3 sources (TRIPLES, MIPS, EUROFAN)
s = sensitive (less growth)w = wild-type (no observable effect) r = resistant (more growth)n = no data
ORF
YAL001CYAL019WYAL021CYAL029C
calcofluor white
w n n n
sorbitol
n s n w
benomyl
n w n w
...
deleted ORFgrowth medium
H2O2
w w n r
What are the Machine Learning Issues?
• Large volume of data• Missing data• Accurate results required • Intelligible results required• Class hierarchy • Multiple labels • Relational data
Data Mining Prediction (DMP)
Entire database
Data for rule creation
2/3 1/3
2/3 1/3
PolyFARM
C4.5Rule
gener-ation
Selectbestrules
Measurerule
accuracy
Validation data
Trainingdata
Allrules
Bestrules
Test data
Results
Application to Bacterial Genomes Successful for both M. tuberculosis and E. coli.
Of the ORFs with no assigned function >40% were predicted to have a function at one or more levels of the class hierarchy.
It was found that many of the predictive rules were more general than possible using sequence homology.
References
King et al. (2000) KDD 2000
King et al. (2000) Yeast (Comparative and Functional Genomics)
King et al. (2001) Bioinformatics
Summary Results (Bacteria)
Using voting (2 or more rules agree on a prediction)– Level 2 :128 ORFs predicted - 87.5% accuracy– Level 3 : 23 ORFs predicted - 91.3% accuracy
All predictions– Level 2 :335 ORFs predicted - 64.5% accuracy– Level 3: 204 ORFs predicted - 44.6% accuracy
Example Rule (level 2 E. coli) If the ORF is not predicted to have a -strand of length 3 a homologous protein from class Chytridiomycetes was foundThen its functional class is “Cell processes, Transport/binding proteins”
12/13 (86%) correct on Test Set - probability of this result occurring by chance is estimated at 4x10-7. 24 ORFs of unknown function are predicted by the rule.
16 ORFs now with putative or confirmed function - 93.8%
accurate predictions
Experimental Conformation The original bacterial ORF predictions were made
over three years ago.
In the intervening time many more ORFs have been sequenced, making traditional homologous prediction methods more accurate and sensitive, and the function of some ORFs have been determined by wet biology.
The E. coli genome has recently been re-annotated by Monica Riley’s group.
“Wet” Biology conformation A number of predictions have been confirmed or
falsified by new “wet” experimental data.
This new data is biased towards hard classes. Despite this the results are still good:– Level 2: 23 predictions - 47.8% accuracy– Level 3: 23 predictions - 43.4% accuracy
This is very much better than random as there are many classes.
Confirmation of “Wet” PredictionsORF Rule Predicted Class Confirmed Function Resultb0805 8 Cell envelop Outer membrane protein Cb1519 15 Degradation of small molecules Trans-aconitate methyltransferase Cb1533 43 Transport/binding proteins Cysteine pathway metabolite transport Cb1981 42 Transport/binding proteins Shikimate and dehydroshikimate transport
proteinC
b1981 56 Transport/binding proteins Shikimate and dehydroshikimate transportprotein
C
b2210 15 Degradation of small molecules Malate:quinone oxidoreductase Cb2392 43a Transport/binding proteins High-affinity manganese transporter Cb2392 43b Transport/binding proteins High-affinity manganese transporter Cb2392 54 Transport/binding proteins High-affinity manganese transporter Cb2924 45 Transport/binding proteins Component of the MscS mechanosensitive
channel – “new gene family”C
b3839 43 Transport/binding proteins Essential component of translocase Cb0103 42 Transport/binding proteins dephospho-CoA kinase Wb0103 41 Transport/binding proteins dephospho-CoA kinase Wb0103 43 Transport/binding proteins dephospho-CoA kinase Wb1822 15 Degradation of small molecules 23S rRNA m1G745 methyltransferase Wb2530 35 Global regulatory functions cysteine desulfurase Wb2392 14 Degradation of small molecules High-affinity manganese transporter Wb2889 50 Energy metabolism carbon Isopentenyl diphosphate isomerase Wb3222 54 Transport/binding proteins ManNAc kinase Wb3223 39 Ribosome constituents ManNAc epimerase Wb3337 28 Laterally acquired elements regulatory or redox component Wb3338 39 Ribosome constituents Periplasmic endochitinase Wb3569 32 Laterally acquired elements transcriptional regulator of xylose utilization Wb3955 8 Cell envelop Required for invasion of brain microvascular
endothelial cellsEF
b3955 18 Energy metabolism carbon Required for invasion of brain microvascularendothelial cells
EA
b3955 20 Energy metabolism carbon Required for invasion of brain microvascularendothelial cells
EA
Results (Yeast) Many rules from each data type Rules at each level of hierarchy Some classes are much easier to predict than others
(for example "protein synthesis" at 71-93%, "energy" at 20-47%)
Good levels of accuracy on held out test data Many predictions for ORFs of unknown function
(some function at some level is predicted for 96% of the ORFs of unknown function)
Some rules explainable by biology -> scientific knowledge discovery
Clare & King (2003) Bioinformatics suppl. 2., 42-49
Accuracy Table
Level
Datatype 1 2 3 4 all
Seq 55 55 33 0 71
Struc 49 43 0 0 58
Hom 65 38 69 20 55
Expr 42 37 35 0 75
Phen 75 40 7 0 68
Extension to Arabidopsis Genome Collaborative project with the Institute of Grassland
and Environmental Research and the University of Nottingham.
Large increase in data: 6,000 -> 25,000 ORFs. Large amount of micro-array data from the Nottingham Arabidopsis stock centre.
250 million Prolog facts, 200,000 attributes, File sizes almost 2Gb
7,964 gene function predictions with an expected accuracy >70%, 2,974 with an expected accuracy >90%,
We are currently growing 14 knockout varieties of Arabidopsis to test a sample of these predictions
Availability
All rules and data available at http://www.aber.ac.uk/compsci/Research/bio/dss/
All predictions available at http://www.genepredictions.org
The Robots Scientist
The Robot Scientist Concept
Background Knowledge Machine Learning Analysis
Consistent
Hypothesis
Final Theory Experiment(s) selection Robot
Experiments(s)
Results
The robot scientist project aims to develop a computer system that is capable of originating its own experiments, physically doing them, interpreting the results, and then repeating the
cycle.
Motivation: Technological
In many areas of science our ability to generate data is outstripping our ability to analyse the data.
One scientific area where this is true is functional genomics, where data is now being generated on an industrial scale.
The analysis of scientific data needs to become as industrialised as its generation.
The Application Domain
Functional genomics
In yeast (S. cerivasae) ~30% of the 6,000 genes still have no known function.
EUROFAN 2 has knocked out each of the 6,000 genes in mutant strains.
Task to determine the “function” of the gene by auxotrophic growth experiments comparing mutants and wild type.
Logical Cell Model
We have built a logical model of the known metabolic pathways (coded in Prolog) - taken from KEGG and other bioinformatic sources. This is essentially a directed graph: with metabolites as nodes and enzymes as arcs.
If a path can be found from cell inputs (metabolites in the growth medium) to all the cell outputs (essential compounds), then the cell can grow.
AAA Model System
We started using the aromatic amino-acid (AAA) pathway in yeast as a model system to prove the principle of the Robot Scientist.
9 metabolities can be used of the shelf 15 knockout mutants from Eurofan
The mutant can grow iff all three aromatic amino-acids can be synthesised (tyrosine, phenyalalanine, tryptophan). Based on a pathway from glycerate-2-phophate.
Glycerate-2-Phosphate
Phosphoenolpyruvate
D-Erythrose-4-Phosphate
3-deoxy-D-arabino-heptulosonate-7-
phosphate
3-Dehydroquinate
3-Dehydroshikimate
5-Dehydroshikimate Shikimate
Shikimate –3-phosphate
5-o-1-carboxyvinyl-3-phosphoshikimate
Chorismate
Prephenate
p-Hydroxyphenylpyruvate
TYROSINE
Phenylpyruvate
PHENYLALANINE
Anthranilate
TRYPTOPHAN
N-5’-Phospho--d-ribosylanthranilate
1-(2-Carboxylphenylamino)-1’-deoxy-D-ribulose-
5’-phosphate
(3-Indolyl)-glycerol
phosphateIndole
YBR249CYDR035WYBR249CYDR035W
YGR254WYHR174WYMR323W
YGR254WYHR174WYMR323W
YDR127WYDR127W
YDR127WYDR127W
YDR127WYDR127W
YDR127WYDR127W
YDR127WYDR127W
YDR127W
YDR127W
YPR060CYPR060C
YBR166CYBR166C
YHR137WYGL202WYHR137WYGL202W
YNL316CYNL316C
YGL148WYGL148W
YDR354WYDR354W
YDR007WYDR007W
YKL211CYKL211C
YGL026CYGL026C
YGL026CYGL026CYGL026CYGL026C
YER090W(YKL211C)YER090W(YKL211C)
C00631
C00074
C00279
C04961
C00944
C02637
C02652
C00493
C03175
C01269
C00251
C00254
C01179 C00166
C03506
C01302
C00108
C04302
C00463
C00078C00079C00082
YHR137WYGL202WYHR137WYGL202W
Phenyalanine, Tyrosine, and Tryptophan Pathways for S. cerivisae
Growth Medium
Metabolite import
Experimental Methodology
Experiments consist of making particular growth media and testing if the mutants can grow (add metabolites to a basic defined medium).
A mutant is auxotrophic if cannot grow on a defined medium that the wild type can grow on.
By observing the pattern of chemicals that recover growth the function of the knocked out mutant can be inferred.
Inferring Hypotheses
In the philosophy of science. It has often been argued that only humans can make the “leaps of imagination” necessary to form hypotheses.
We use Abductive Logic Programming to infer missing arcs/labels in our metabolic graph. With these missing nodes we can explain (deductively) all the experimental results.
Reiser et al., (2001) ETAI 5, 233-244;
The Form of the Hypotheses
The form of the hypotheses we can infer is currently quite simple. Each hypothesis binds a particular gene to an enzyme that catalyses the reaction.– A correct hypothesis would be that: YDR060C
codes for the enzyme for the reaction chorismate prephenate.
– An incorrect hypothesis would be that: it coded for the reaction chorismate anthranilate.
We have also demonstrated how more complex abductive hypotheses could be formed.
A Discriminating Experiment
Hypothesis 1: YDR060C codes for the enzyme the reaction: chorismate prephenate.
Hypothesis 2: YDR060C codes for the enzyme the reaction: chorismate anthranilate.
These can be distinguished by growing the knockout YDR060C on prephenate or anthranilate.
Note that these two experiments will have differing monetary cost.
Glycerate-2-Phosphate
Phosphoenolpyruvate
D-Erythrose-4-Phosphate
3-deoxy-D-arabino-heptulosonate-7-
phosphate
3-Dehydroquinate
3-Dehydroshikimate
5-Dehydroshikimate Shikimate
Shikimate –3-phosphate
5-o-1-carboxyvinyl-3-phosphoshikimate
Chorismate
Prephenate
p-Hydroxyphenylpyruvate
TYROSINE
Phenylpyruvate
PHENYLALANINE
Anthranilate
TRYPTOPHAN
N-5’-Phospho--d-ribosylanthranilate
1-(2-Carboxylphenylamino)-1’-deoxy-D-ribulose-
5’-phosphate
(3-Indolyl)-glycerol
phosphateIndole
YBR249CYDR035WYBR249CYDR035W
YGR254WYHR174WYMR323W
YGR254WYHR174WYMR323W
YDR127WYDR127W
YDR127WYDR127W
YDR127WYDR127W
YDR127WYDR127W
YDR127WYDR127W
YDR127W
YDR127W
YPR060CYPR060C
YBR166CYBR166C
YHR137WYGL202WYHR137WYGL202W
YNL316CYNL316C
YGL148WYGL148W
YDR354WYDR354W
YDR007WYDR007W
YKL211CYKL211C
YGL026CYGL026C
YGL026CYGL026CYGL026CYGL026C
YER090W(YKL211C)YER090W(YKL211C)
C00631
C00074
C00279
C04961
C00944
C02637
C02652
C00493
C03175
C01269
C00251
C00254
C01179 C00166
C03506
C01302
C00108
C04302
C00463
C00078C00079C00082
YHR137WYGL202WYHR137WYGL202W
Phenyalanine, Tyrosine, and Tryptophan Pathways for S. cerivisae
Growth Medium
Metabolite import
Inferring ExperimentsGiven a set of hypotheses we wish to infer an experiment
that will efficiently discriminate between them
Assume: Every experiment has an associated cost. Each hypothesis has a probability of being correct.
The task: To choose a series of experiments which minimise the
expected cost of eliminating all but one hypothesis.
Comparison of different experimental strategies
ASE - Expected cost minimization.
Naïve - Choose cheapest experiment.
Random - Randomly choose experiments.
The cost of a series of experiment is a function of the time taken and money spent. “Time is Money”.
The Robot
Biomek 200
Closing the Loop
We have physically implemented all aspects of the Robot Scientist system.
To the best of our knowledge this is the first active learning system that both explicitly forms hypotheses and experiments, and physicals does real experiments.
Accuracy v Time
50
55
60
65
70
75
80
85
90
95
100
0 1 2 3 4 5
Iterations
ase
random
naive
At the end of the 5th iteration: ASE 80.1%, Naïve 74.0%, Random 72.2%. ASE was significantly more accurate than either Naïve (p < 0.05) or Random (p < 0.07) using a paired t-test.
Accuracy v Money
50
55
60
65
70
75
80
85
90
95
100
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Log 10 Cost (£)
Cla
ss
ific
atio
n A
ccu
racy
(%
)
ase
random
naive
Given a spend of ≤£102.26, ASE 79.5%, Naïve 73.9%, Random 57.4%. ASE was significantly more accurate than either Naïve (p < 0.05) or Random (p < 0.001).
Time and Money “Cost” is a positive function of time & money.
ASE dominates for both, therefore ASE dominates for any reasonable cost function.
For example: to achieve an accuracy of ~70%, ASE requires fewer trial iterations, and a hundredth of the price, of Random; and almost half the number of iterations, and a third of the price, of Naïve.
King et al. (2004) Nature. 427, 247-252.
Human Comparisons
We were interested to compare the performance of the Robot Scientist with that of humans.
We adopted the simulator to allow humans to chooses and interpret the results of cycles of experimentation.
Compared nine graduate computer scientists and biologists.
No significant difference between the best humans and the Robot
Robotic Annotation
New Biological Knowledge
So far with the Robot Scientist we have only shown that we can automatically rediscover known biological knowledge.
We wish to extend this result to the discovery of new biological knowledge.
To do this we need to combine the robot scientist with conventional genome annotation bioinformatics, and DMP.
Robotic Annotation
One way of thinking about genome annotation is as a hypothesis formation process.
Hypothesis formation is perhaps the hardest part of automating science.
Our idea is to incorporate bioinformatic annotation methods with genome annotation.
The bioinformatic methods will generate the hypotheses which the robot scientist will experimentally test.
Genome Scale Model of Yeast Metabolism
We have extended our model of aromatic amino acid metabolism to cover most of what is known about yeast metabolism.
Includes 1,166 ORFs (940 known, 226 inferred) Growth if path from growth medium to defined end-
points. 83% accuracy (based on 914 strain/medium
predictions)
The Model is Incomplete
It is not possible to find a path from the inputs (growth medium) to all the end-point metabolites using only reactions encoded by known genes.
This suggests automated strategies for determining the identity of the missing genes - new biological knowledge.
One strategy is based on using EC enzyme class of missing reactions, identify genes that code for this EC class in other organism, then find homologous genes in yeast.
The predictions can be tested automatically by robot.
Confirmation of DMPYeast Predictions
The yeast gene YBR147W, of currently “unknown” function.
It is predicted to have a function in “metabolism” by 2 DMP rules with expected accuracies of >80%.
It is predicted to have a function in “amino-acid metabolism” with two rules with expected accuracies of 50% and 60% respectively.
Using our robot scientist auxotrophic methodology we have recovered growth of the knockout with: aspartic acid, tyrosine, leucine, valine, phenylalanine, cystine, arginine.
Conclusions
Machine learning can be used to accurately predict gene function.
Simple forms of scientific reasoning and experimentation can be fully automated.
To develop robotic systems capable of generating new biological knowledge will require a synthesis of traditional genome annotation techniques, machine learning, and a Robot Scientist like methodology.
The Three Objects of the Intellect
• The True
• The Beautiful
• The Beneficial
AcknowledgementsDMP Andreas Karwath Aberystwyth Amanda Clare Aberystwyth Paul Wise Aberystwyth Luc Dehaspe Leuven
Robot Scientist Ken Whelan Aberystwyth Philip Reiser Aberystwyth Ffion Jones Aberystwyth Ugis Sarkans Aberystwyth (EBI) Douglas Kell Manchester (Aberystwyth) Steve Oliver Manchester Stephen Muggleton Imperial College (York) Chris Bryant Robert Gordons (York)
David Page Wisconsin
BBSRC, EPSRC
PharmDM - Commercial Support
Relational vs Propositional
orf time0 time7 time14yal001c 0.34 0.52 0.48yal002w0.76 0.82 0.89yal003w0.77 0.46 0.78yal004c 0.38 0.50 0.49
orf SwissProtID e-valyal001c p03415 2e-4yal001c p08640 8e-58yal002wp32583 6e-52yal002wp08775 3e-42
SwissProtID keywordp03415 apoptosisp03415 repeatp03415 zincp08640 membrane
Propositional: single table, fixed number of columns/attributes
Relational: multiple tables, multiple values
Expression Data Rule
If in the micro-array experiment (sorbitol incubation) the ORF expression is > -0.25 and in the micro-array experiment (nitrogen depletion) the ORF expression is <= -1.29 and in the micro-array experiment (YPD stationary phase) the ORF expression is > -1.06then the function of this ORF is ”pheromone response, mating type determination, sex-specific proteins"
Accuracy on training data: 11/12 (92%)Accuracy on the test data: 3/4 (75%)21 predictions made
Structure Rule
• 80% accurate on test data• Most matching ORFs belong to the Mitochondrial Carrier Family• These have 6 long transmembrane alpha-helices of about 20-30
amino acids• Why do we notice alpha-helices of length 10-14?
If true: coil (of length 3) followed by alpha (10 <= length < 14)and true: coil (of length 1 or 2) followed by alpha (10 <= length < 14)and true: coil (of length 3) followed by alpha (3 <= length < 6)and false: coil followed by beta followed by coil (c-b-c)and false: coil (6 <= length < 10) followed by alpha (of length 1 or 2)then the function of this ORF is "mitochondrial transport"
AlignmentYJL133W -------NEYNPLIHCLC----GSISGSTCAAITTPLDCIKTVLQIRG------------ 251YKR052C -------NSYNPLIHCLC----GGISGATCAALTTPLDCIKTVLQVRG------------ 241YIL006W ----NNTNSINLQRLIMA----SSVSKMIASAVTYPHEILRTRMQLKS------------ 310YBR104W ----LTRNEIPPWKLCLF----GAFSGTMLWLTVYPLDVVKSIIQNDD------------ 271YGR096W ----KTTAAHKKWELATLNHSAGTIGGVIAKIITFPLETIRRRMQFMNSKHLEK------ 250YJR095W -----QMDVLPSWETSCI----GLISGAIGPFSNAPLDTIKTRLQKDK------------ 246YKL120W -----LMKDGPALHLTAS-----TISGLGVAVVMNPWDVILTRIYNQK------------ 261YLR348C -----FDASKNYTHLTAS-----LLAGLVATTVCSPADVMKTRIMNGS------------ 239YMR166C ----DGRDGELSIPNEILT---GACAGGLAGIITTPMDVVKTRVQTQQPPSQSNKSYSVT 300YDL198C ------DYSQATWSQNFIS---SIVGACSSLIVSAPLDVIKTRIQNRN------------ 242YGR257C ----RFASKDANWVHFINSFASGCISGMIAAICTHPFDVGKTRWQISMMN---------- 302YDL119C FIHYNPEGGFTTYTSTTVNTTSAVLSASLATTVTAPFDTIKTRMQLEP------------ 255
YJL133W -SQTVSLEIMRKADTFSKAASAIYQVYGWKGFWRGWKPRIVANMPATAISWTAYECAKHF 310YKR052C -SETVSIEIMKDANTFGRASRAILEVHGWKGFWRGLKPRIVANIPATAISWTAYECAKHF 300YIL006W -DIPDSIQRR-----LFPLIKATYAQEGLKGFYSGFTTNLVRTIPASAITLVSFEYFRNR 364YBR104W -LRKPKYKNS-----ISYVAKTIYAKEGIRAFFKGFGPTMVRSAPVNGATFLTFELVMRF 325YGR096W FSRHSSVYGSYKGYGFARIGLQILKQEGVSSLYRGILVALSKTIPTTFVSFWGYETAIHY 310YJR095W ---SISLEKQSGMKKIITIGAQLLKEEGFRALYKGITPRVMRVAPGQAVTFTVYEYVREH 303YKL120W ----GDLYKG-----PIDCLVKTVRIEGVTALYKGFAAQVFRIAPHTIMCLTFMEQTMKL 312YLR348C ----GDHQP------ALKILADAVRKEGPSFMFRGWLPSFTRLGPFTMLIFFAIEQLKKH 289YMR166C HPHVTNGRPAALSNSISLSLRTVYQSEGVLGFFSGVGPRFVWTSVQSSIMLLLYQMTLRG 360YDL198C ---FDNPESG------LRIVKNTLKNEGVTAFFKGLTPKLLTTGPKLVFSFALAQSLIPR 293YGR257C ---NSDPKGGNRSRNMFKFLETIWRTEGLAALYTGLAARVIKIRPSCAIMISSYEISKKV 359YDL119C ----SKFTNS------FNTFTSIVKNENVLKLFSGLSMRLARKAFSAGIAWGIYEELVKR 305
AlignmentYJL133W -------cccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaacc------------ 251YKR052C -------cccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaacc------------ 241YIL006W ----ccccccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaacc------------ 310YBR104W ----ccccccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaacc------------ 271YGR096W ----cccccccccccccbaaaaaaaaaaaaaaacccaaaaaaaaaacccccccc------ 250YJR095W -----cccccccaaaaaa----aaaaaaaaaaacccaaaaaaaaaccc------------ 246YKL120W -----ccccccaaaaaaa-----aaaaaaaaaacccaaaaaaaaaacc------------ 261YLR348C -----ccccccaaaaaaa-----aaaaaaaaaacccaaaaaaaaaacc------------ 239YMR166C ----cccccccccaaaaaa---aaaaaaaaaaacccaaaaaaaaaacccccccccccccc 300YDL198C ------cccccccaaaaaa---aaaaaaaaaaacccaaaaaaaaaacc------------ 242YGR257C ----ccccccccccccaaaaaaaaaaaaaaaaacccaaaaaaaaaacccc---------- 302YDL119C ccccccccccccccaaaaaaaaaaaaaaaaaaacccaaaaaaaaaacc------------ 255
YJL133W -ccccccccccccccaaaaaaaaaaaccccaaaaccaaaaaaacaaaaaaaaaaaaaaaa 310YKR052C -ccccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 300YIL006W -ccccccccc-----aaaaaaaaaaaccccaaacccaaaaaaaccaaaaaaaaaaaaaaa 364YBR104W -ccccccccc-----aaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 325YGR096W cccccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 310YJR095W ---ccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 303YKL120W ----cccccc-----aaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 312YLR348C ----ccccc------aaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 289YMR166C cccccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 360YDL198C ---cccccca------aaaaaaaaaacccaaaaacccaaaaaaaaaaaaaaaaaaaaaaa 293YGR257C ---ccccccccccccaaaaaaaaaaacccaaaaaccaaaaaaaccaaaaaaaaaaaaaaa 359YDL119C ----ccccca------aaaaaaaaaacccaaaaacccaaaaaaccaaaaaaaaaaaaaaa 305
Types of LogicDeduction
Rule: If a cell grows, then it can synthesise tryptophan.
Fact: cell cannot synthesise tryptophan
Cell cannot grow.
Given the rule P Q, and the fact Q, infer the fact P
(modus tollens)
AbductionRule: If a cell grows, then it can synthesise tryptophan.
Fact: Cell cannot grow.
Cell cannot synthesise tryptophan.
Given the rule P Q, and the fact P, infer the fact Q