function-process links the theory. why bother? to improve the ontology to fill in annotation gaps as...
TRANSCRIPT
Function-process links
The theory
Why bother?• To improve the ontology• To fill in annotation gaps• As an aid to annotation
– Suggest new annotations– Avoid redundant annotation effort– Annotation cross-products
• Better integration with pathway databases• To present annotations to users in more useful ways
– e.g. more informative AmiGO displays
GO in 2008
Filling in annotation gaps
GO:0016301kinase activityGO:0016301
kinase activityGO:0016310
phosphorylationGO:0016310
phosphorylation
|P| = 3640|F| = 6053|F ∩ P| = 2230|F ∩ not P| = 3823
2230
14103823
July 2008
Filling in annotation gaps
GO:0016301kinase activityGO:0016301
kinase activity
GO:0016310 phosphorylationGO:0016310 phosphorylation
Future - 2009
Improved presentation to users
part_of
part_of
annotations propagateover part_of
KIC1 IDA
part_of
annotations propagateover part_of
KIC1 IDA
part_of
annotations propagateover part_of
NDK1IDA
part_of
annotations propagateover part_of
NDK1IDA
A quick review of part_of
• Means “always part of some”– Example:
• nucleus part_of cell• EVERY nucleus is part_of SOME cell
Mining pathway DBs for links
glycolysisglycolysis
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glycolysisglycolysis
fructose-bisphosphate
aldolase activity
fructose-bisphosphate
aldolase activity
glucose-6-phosphate isomerase
activity
glucose-6-phosphate isomerase
activity
reactomeGOMF
BP
Mining pathway DBs for links
glycolysisglycolysis
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glycolysisglycolysis
fructose-bisphosphate
aldolase activity
fructose-bisphosphate
aldolase activity
glucose-6-phosphate isomerase
activity
glucose-6-phosphate isomerase
activity
reactomeGO
xrefxref
xrefxrefxrefxref
Mining pathway DBs for links
glycolysisglycolysis
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glycolysisglycolysis
fructose-bisphosphate
aldolase activity
fructose-bisphosphate
aldolase activity
glucose-6-phosphate isomerase
activity
glucose-6-phosphate isomerase
activity
reactomeGO
xrefxref
xrefxrefxrefxref
has_part has_part
xrefs: not necessarily equivalent
glycolysis [human]glycolysis [human]
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glycolysisglycolysis
fructose-bisphosphate
aldolase activity
fructose-bisphosphate
aldolase activity
glucose-6-phosphate isomerase
activity
glucose-6-phosphate isomerase
activity
reactomeGO
equivalentequivalent
equivalentequivalent
equivalentequivalent
has_part? has_part?
GO:newGO:newGO:newGO:new
is_a is_a
GO:newGO:new
is_a
xrefs: not necessarily equivalent
glycolysis [human]glycolysis [human]
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glycolysisglycolysis
fructose-bisphosphate
aldolase activity
fructose-bisphosphate
aldolase activity
glucose-6-phosphate isomerase
activity
glucose-6-phosphate isomerase
activity
reactomeGO
equivalentequivalent
equivalentequivalent
equivalentequivalent
some_has_part
some_has_part
GO:newGO:newGO:newGO:new
is_a is_a
GO:newGO:new
is_a
has_part
xrefs: not necessarily equivalent
glycolysis [human]glycolysis [human]
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
fructose bisphosphatase
activity of fructose 1 6
bisphosphatase 2 _cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glucose 6 phosphate isomerase activity of glucose 6 phosphate
isomerase dimer_cytosol
glycolysisglycolysis
fructose-bisphosphate
aldolase activity
fructose-bisphosphate
aldolase activity
glucose-6-phosphate isomerase
activity
glucose-6-phosphate isomerase
activity
reactomeGO
xrefxref
xrefxref
some_has_part
some_has_part
has_part
Specifics
• Low Hanging Fruit– Function to process links
• Mostly part_of links• Some regulates links
• Pathways– Process to function
• has_part
– Mining from pathways databases & curation
Function-process links
Conclusions of the electron transport working group.
UTP:glucose-1-phosphate uridylyltransferase activity α-D-glucose 1-phosphate + UTP ->
UDP-D-glucose + diphosphate
glucose metabolic process
UDP-glucosemetabolicprocess galactose
metabolic process
biosyntheticprocess
colanic acidbiosyntheticprocess
responseto desiccation
carbohydrate catabolic process
Function
Processhp hp
hphp hp
hp hp
ureacycle
arginosuccinate synthase activity Catalysis of the reaction: ATP + L-citrulline + L-aspartate = AMP
+diphosphate + (N(omega)-L-arginino)succinate
arginine
biosynthetic
process
polyamine biosynthesis
Function
Process
hp hp hp
carbamoyl-phosphate synthase activityCatalysis of a reaction that results in
the formation of carbamoyl phosphate.
Urea cycle andmetabolism ofamino groups
Glutamate
metabolism
Arginine
and proline
metabolism
Nitrogen
metabolism
Function
Processhp hp hp hp
Lysine biosynthesis pathways
lysinebiosynthesis
lysinebiosynthesis1
lysinebiosynthesis3
lysinebiosynthesis2
lysinebiosynthesis4
lysinebiosynthesis5
lysinebiosynthesis6
Function
Process
is_a
is_ais_a is_a is_a
is_a
lysinebiosynthesis 7?
is_a
Process
Function
Lysine Biosynthesis
Shared function?
= has_part
Non-shared function
new GO term
existing GO term
Process
Function
Lysine Biosynthesis
Process B
Shared function?
= has_part
Non-shared function
new GO term
existing GO term
Process
Function
Lysine Biosynthesis
Process B Process C
Shared function?
= has_part
Non-shared function
new GO term
existing GO term
Process
Function
Lysine Biosynthesis
Process B Process CRelationship explosion
(or Editorial office explosion)
Where do pathways start and end?
A B C D
process 1
process 2
process 3
Use cases
• Can we slim from function up to process?
• Can we infer annotations to process from those to function?
has_part
has_function
has_function but only as part_of polyamine biosynthesis
has_function but only as part_of urea cycle
urea cyclepolyamine biosynthesis
arginosuccinate synthase activity
Gene product x Gene product y
Function
Process
Gene products
has_part
has_function
urea cyclepolyamine biosynthesis
Gene product x Gene product y
Function
Process
Gene products
?
has_part
has_function
urea cyclepolyamine biosynthesis
Gene product x Gene product y
Function
Process
Gene products
No
has_partcannot beused for slimming.
Can we infer annotations to process from those to function?
• No. There is too much variation in process details, and too many functions are shared.
So what can we do?
phosphorylation
kinase activity
Function
Process
part_of
We can make relationships between single step processes and their respective functions.
glucose transport
glucose transporter activity
Function
Process
part_of
We can make any obvious relationship where part_of holds, and this will allow useful slimming.
We can mine the other links from pathway databases and make non-curated sometimes_part_of links.
sometimes_part_of
What does this buy us?
• Very full coverage of function-process links. • No manual link curation.
What work does it involve?
• We maintain the mapping files e.g. reactome2go.• We write the mining scripts.• Work with pathway dbs to unify exchange formats and make data interoperable
Acknowledgements
Michelle Gwinn-GiglioDebbie SiegeleIngrid KeselerHarold DrabkinJennifer DeeganChris MungallPeifen Zhang