developing a protein-interactions ontology esther ratsch european media laboratory
Post on 18-Dec-2015
212 views
TRANSCRIPT
PIOG
• Protein Interactions Ontology Group• Computer scientists:
– Philipp Cimiano Lavin (IMS Stuttgart, EML Heidelberg)
– Isabel Rojas (EML Heidelberg)
• Computational linguists:– Uwe Reyle (IMS Stuttgart)– Jasmin Saric (EML Heidelberg)
• Biologists:– Esther Ratsch (EML Heidelberg)– Jörg Schultz (MPI for Molecular Genetics Berlin)– Ulrike Wittig (EML Heidelberg)
Motivation
• Why protein interactions?– protein function analysis– larger datasets
• Why an ontology?– clear domain model– storage and understanding of data– information retrieval from text– retrieve hidden information, inferencing
What is a signal transduction pathway?
• signal from outside is transduced to the nucleus
• often phosphorylation cascade
signal
change
transcription
Why are they important?
• control of cellular processes
• communication between cells
• response to environmental changes
• regulatory network
• stable system, single mutations may be overriden by other pathway
• complex network enables complex behaviour
Jak-Stat pathway
ligand
cytokinereceptors
JAKsP
P P
STAT monomers
P P
P PPP
nucleus
target genes
P
PP
tyrosine residues
PP
General approach
• Identify scope of the ontology
• Identify concepts involved and their properties
• How to represent them?
• Define rules and constraints
• Formalisation
Scope Concepts Representation Rules/Constraints Formalisation
The scope
• Ontology that represents interactions between proteins and other cellular compounds
• Restriction on molecular detail: amino acids
• Concentration on signal transduction pathways in initial phase
• no quantitative properties are modeled
Scope Concepts Representation Rules/Constraints Formalisation
Identify concepts: Interacting compounds
• Different kinds of compounds: proteins, genes/DNA, ions, ...
• Composition of compounds, e.g. amino acids, domains
DNA regionJak Stat
TADLZ DBD SH3 SH2
YDomain organisationof Stat proteins
Scope Concepts Representation Rules/Constraints Formalisation
Properties of compounds
• Characteristics: molecular weight, sequence, isoelectric point...
• Interaction potential: modifications, location, binding partners
Scope Concepts Representation Rules/Constraints Formalisation
nucleus
PP
X
Identify concepts: Interactions I
– Control/Regulation
– Biochemical Interactions
– Logical Interactions
– Bind/Dissociate
– Formation
– Integrity
– Availability
– Change of Location
– Modification of Structure
– Special Processes/ Reactions
– Order
• Different types of interactions: phosphorylation, binding, translocation ...
• Other classification: grouping of > 100 verbs (Swissprot) 11 not disjoint classes
Scope Concepts Representation Rules/Constraints Formalisation
Representation of proteins
• General characteristics: sequence, molecular weight, ...
• Protein state:– location– list of modifications– list of binding partners
StatState3(cytoplasm, phosphorylatedAtResidue701, Stat)PP
JakState1(cytoplasm, none, cytokine-receptor)
Scope Concepts Representation Rules/Constraints Formalisation
Representation of interactions
• Event with pre- and postconditions
not p event e p
t),'(Res:':',, espsespssse
phosphorylation
P
not phosphorylated phosphorylated
Scope Concepts Representation Rules/Constraints Formalisation
Rules and constraints
• Simple hierarchies: nucleolus inside nucleus, Stat1 is a Stat is a protein
• Rules for the definition of interactions
• Consistency checking
• Knowledge retrieval
Scope Concepts Representation Rules/Constraints Formalisation
Rules and constraints: example
• „Protein A is phosphorylated by B at position X.“ A and B are located in the same compartment A was not modified at X before A is phosphorylated at X afterwards B is a protein kinase, which is a protein dependent on X, B is either a S/T-kinase or a
Y-kinase
Scope Concepts Representation Rules/Constraints Formalisation
• Phosphorylation of a protein by a kinase at a distinct residue
• S/T-kinase phosphorylation
Formalisation: phosphorylation
Scope Concepts Representation Rules/Constraints Formalisation
)) ')( )(:' )(:
),( )( )(
),,(:( ,,,,',
esseResultRatedphosphorylsRmodifieds
PRfisResidueOQaseproteinKinPprotein
RQPationphosphoryleRQPess
)),(: ),,(:( ,,, QPninteractioeRQPationphosphoryleRQPe
)))()(( :( ,, QcationcompoundLoPcationcompoundLoninteractioeQPe
))( )(( PproteinPaseproteinKinP
...)) )( )(( )(:( , RedglycosylatRatedphosphorylRmodifiedssR
)))( )(( ),,(
),,(/:( ,,,
RthreonineRserineRQPationphosphoryl
RQPationphosphorylTSeeRQP
Challenges met
• Multidisciplinarity of the group– Different vocabularies clear expression,
fewer ambiguities– Different goals, different needs not restricted
to one goal– Different experiences mutual benefit
• Domain
Complexity of the domain
• Granularity of information– detail of compound part
• protein: Stat
• domain: SH2-domain
• amino acid: tyrosine701
– detail of protein identity• protein family: Jak, Stat
• protein type: Jak2, Stat5
• organism specific protein: Jak2_human, Jak2_rat
Complexity of the domain II
• description detail:– not known: no data available– doesn‘t have: no binding partners– don‘t care: not important for a certain
interaction
What comes next?
• Go on with development of ontology
• Projects using the ontology:– integration in larger ontology on metabolic
pathways– application to TIGERSearch (see poster)
Acknowledgements
• Protein Interactions Ontology Group• Computer scientists:
– Philipp Cimiano Lavin (IMS Stuttgart, EML Heidelberg)
– Isabel Rojas (EML Heidelberg)
• Computational linguists:– Uwe Reyle (IMS Stuttgart)– Jasmin Saric (EML Heidelberg)
• Biologists:– Esther Ratsch (EML Heidelberg)– Jörg Schultz (MPI for Molecular Genetics Berlin)– Ulrike Wittig (EML Heidelberg)