function-process links the theory. why bother? to improve the ontology to fill in annotation gaps as...

41
Function-process links The theory

Upload: linette-marshall

Post on 04-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Function-process links

The theory

Page 2: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Why bother?• To improve the ontology• To fill in annotation gaps• As an aid to annotation

– Suggest new annotations– Avoid redundant annotation effort– Annotation cross-products

• Better integration with pathway databases• To present annotations to users in more useful ways

– e.g. more informative AmiGO displays

Page 3: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

GO in 2008

Page 4: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Filling in annotation gaps

GO:0016301kinase activityGO:0016301

kinase activityGO:0016310

phosphorylationGO:0016310

phosphorylation

|P| = 3640|F| = 6053|F ∩ P| = 2230|F ∩ not P| = 3823

2230

14103823

July 2008

Page 5: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Filling in annotation gaps

GO:0016301kinase activityGO:0016301

kinase activity

GO:0016310 phosphorylationGO:0016310 phosphorylation

Future - 2009

Page 6: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Improved presentation to users

Page 7: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

part_of

Page 8: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

part_of

annotations propagateover part_of

KIC1 IDA

Page 9: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

part_of

annotations propagateover part_of

KIC1 IDA

Page 10: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

part_of

annotations propagateover part_of

NDK1IDA

Page 11: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

part_of

annotations propagateover part_of

NDK1IDA

Page 12: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

A quick review of part_of

• Means “always part of some”– Example:

• nucleus part_of cell• EVERY nucleus is part_of SOME cell

Page 13: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Mining pathway DBs for links

glycolysisglycolysis

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glycolysisglycolysis

fructose-bisphosphate

aldolase activity

fructose-bisphosphate

aldolase activity

glucose-6-phosphate isomerase

activity

glucose-6-phosphate isomerase

activity

reactomeGOMF

BP

Page 14: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Mining pathway DBs for links

glycolysisglycolysis

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glycolysisglycolysis

fructose-bisphosphate

aldolase activity

fructose-bisphosphate

aldolase activity

glucose-6-phosphate isomerase

activity

glucose-6-phosphate isomerase

activity

reactomeGO

xrefxref

xrefxrefxrefxref

Page 15: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Mining pathway DBs for links

glycolysisglycolysis

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glycolysisglycolysis

fructose-bisphosphate

aldolase activity

fructose-bisphosphate

aldolase activity

glucose-6-phosphate isomerase

activity

glucose-6-phosphate isomerase

activity

reactomeGO

xrefxref

xrefxrefxrefxref

has_part has_part

Page 16: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

xrefs: not necessarily equivalent

glycolysis [human]glycolysis [human]

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glycolysisglycolysis

fructose-bisphosphate

aldolase activity

fructose-bisphosphate

aldolase activity

glucose-6-phosphate isomerase

activity

glucose-6-phosphate isomerase

activity

reactomeGO

equivalentequivalent

equivalentequivalent

equivalentequivalent

has_part? has_part?

GO:newGO:newGO:newGO:new

is_a is_a

GO:newGO:new

is_a

Page 17: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

xrefs: not necessarily equivalent

glycolysis [human]glycolysis [human]

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glycolysisglycolysis

fructose-bisphosphate

aldolase activity

fructose-bisphosphate

aldolase activity

glucose-6-phosphate isomerase

activity

glucose-6-phosphate isomerase

activity

reactomeGO

equivalentequivalent

equivalentequivalent

equivalentequivalent

some_has_part

some_has_part

GO:newGO:newGO:newGO:new

is_a is_a

GO:newGO:new

is_a

has_part

Page 18: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

xrefs: not necessarily equivalent

glycolysis [human]glycolysis [human]

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

fructose bisphosphatase

activity of fructose 1 6

bisphosphatase 2 _cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glucose 6 phosphate isomerase activity of glucose 6 phosphate

isomerase dimer_cytosol

glycolysisglycolysis

fructose-bisphosphate

aldolase activity

fructose-bisphosphate

aldolase activity

glucose-6-phosphate isomerase

activity

glucose-6-phosphate isomerase

activity

reactomeGO

xrefxref

xrefxref

some_has_part

some_has_part

has_part

Page 19: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Specifics

• Low Hanging Fruit– Function to process links

• Mostly part_of links• Some regulates links

• Pathways– Process to function

• has_part

– Mining from pathways databases & curation

Page 20: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Function-process links

Conclusions of the electron transport working group.

Page 21: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

UTP:glucose-1-phosphate uridylyltransferase activity α-D-glucose 1-phosphate + UTP ->

UDP-D-glucose + diphosphate

glucose metabolic process

UDP-glucosemetabolicprocess galactose

metabolic process

biosyntheticprocess

colanic acidbiosyntheticprocess

responseto desiccation

carbohydrate catabolic process

Function

Processhp hp

hphp hp

hp hp

Page 22: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

ureacycle

arginosuccinate synthase activity Catalysis of the reaction: ATP + L-citrulline + L-aspartate = AMP

+diphosphate + (N(omega)-L-arginino)succinate

arginine

biosynthetic

process

polyamine biosynthesis

Function

Process

hp hp hp

Page 23: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

carbamoyl-phosphate synthase activityCatalysis of a reaction that results in

the formation of carbamoyl phosphate.

Urea cycle andmetabolism ofamino groups

Glutamate

metabolism

Arginine

and proline

metabolism

Nitrogen

metabolism

Function

Processhp hp hp hp

Page 24: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Lysine biosynthesis pathways

Page 25: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

lysinebiosynthesis

lysinebiosynthesis1

lysinebiosynthesis3

lysinebiosynthesis2

lysinebiosynthesis4

lysinebiosynthesis5

lysinebiosynthesis6

Function

Process

is_a

is_ais_a is_a is_a

is_a

lysinebiosynthesis 7?

is_a

Page 26: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Process

Function

Lysine Biosynthesis

Shared function?

= has_part

Non-shared function

new GO term

existing GO term

Page 27: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Process

Function

Lysine Biosynthesis

Process B

Shared function?

= has_part

Non-shared function

new GO term

existing GO term

Page 28: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Process

Function

Lysine Biosynthesis

Process B Process C

Shared function?

= has_part

Non-shared function

new GO term

existing GO term

Page 29: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Process

Function

Lysine Biosynthesis

Process B Process CRelationship explosion

(or Editorial office explosion)

Page 30: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Where do pathways start and end?

A B C D

process 1

process 2

process 3

Page 31: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Use cases

• Can we slim from function up to process?

• Can we infer annotations to process from those to function?

Page 32: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

has_part

has_function

has_function but only as part_of polyamine biosynthesis

has_function but only as part_of urea cycle

urea cyclepolyamine biosynthesis

arginosuccinate synthase activity

Gene product x Gene product y

Function

Process

Gene products

Page 33: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

has_part

has_function

urea cyclepolyamine biosynthesis

Gene product x Gene product y

Function

Process

Gene products

?

Page 34: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

has_part

has_function

urea cyclepolyamine biosynthesis

Gene product x Gene product y

Function

Process

Gene products

No

has_partcannot beused for slimming.

Page 35: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Can we infer annotations to process from those to function?

• No. There is too much variation in process details, and too many functions are shared.

Page 36: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

So what can we do?

Page 37: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

phosphorylation

kinase activity

Function

Process

part_of

We can make relationships between single step processes and their respective functions.

Page 38: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

glucose transport

glucose transporter activity

Function

Process

part_of

We can make any obvious relationship where part_of holds, and this will allow useful slimming.

Page 39: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

We can mine the other links from pathway databases and make non-curated sometimes_part_of links.

Page 40: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

sometimes_part_of

What does this buy us?

• Very full coverage of function-process links. • No manual link curation.

What work does it involve?

• We maintain the mapping files e.g. reactome2go.• We write the mining scripts.• Work with pathway dbs to unify exchange formats and make data interoperable

Page 41: Function-process links The theory. Why bother? To improve the ontology To fill in annotation gaps As an aid to annotation –Suggest new annotations –Avoid

Acknowledgements

Michelle Gwinn-GiglioDebbie SiegeleIngrid KeselerHarold DrabkinJennifer DeeganChris MungallPeifen Zhang