an ontology for protein- protein interaction data karen jantz cis honors project december 7, 2006

28
An Ontology for Protein-Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Upload: maximilian-poole

Post on 11-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

An Ontology for Protein-Protein Interaction Data

Karen JantzCIS Honors ProjectDecember 7, 2006

Page 2: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Overview

Problem Statement Objectives Approach Background Methodology Evaluation Demonstration Conclusion

Page 3: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Problem Statement

Several sources for protein-protein interaction data

Different schemata Different purposes Different strengths/weaknesses

Page 4: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Objectives

Unify the data Enable data mining Evaluate reliability of data across

data sources Gain new information about the

entire data set Enable others to easily add other

data sources to the set

Page 5: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Approach: ontology

o ontology – n.1. that which exists (philosophy)2. that which is represented (artificial

intelligence)o A descriptive data modelo Defines the entities and

relationships within a domaino Based upon datao Human-readable

Page 6: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Approach: ontology

Data integration Enables simultaneous querying across

multiple databases Data transformation

Enables interchange between database formats

Data mining Enables reasoning and learning over

the entire data set

Page 7: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Background: Data Sources

DIP (Jing Xia)

Database of Interacting Proteins

Most reliable data set Jing Xia

BIND (Abhijit Erande, Aaron Schoenhofer)

Biomolecular Interactions Network Databank

Very large data set Contains interactions, molecular

complexes, and pathways

Page 8: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Background: Data Sources

MINT Molecular INTeractions database

experimentally verified protein interactions Evaluates confidence level

IntAct Not limited to binary interactions Allows user submissions

mips CYGD Munich Information Center for Protein Sequences:

Comprehensive Yeast Genome Database

Limited to yeast Focuses on sequencing

Page 9: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Background: Tools

Protégé Open-Source Project Graphical ontology editor Interacts with OWL Reasoner Detailed API for modifying ontologies

programmatically

Page 10: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Background: Tools

Prompt A Protégé Plugin Enables ontology mapping Enables ontology comparison

Page 11: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Background: Related Work

PSI-MI Controlled vocabulary for PPI data Not a proposed database structure Decreases the strength of information Helpful in defining relationships and

keys

Page 12: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Methodology: Overview

Q: What interactions have been observed between with protein A?

DIP BIND MIPS MINT IntAct

WebInterface

Unified Ontology

UnifiedData Set

Q: What experiments give evidence for a given interaction?

Page 13: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Methodology: Design

Review the singular database schemata and determine strengths/weaknesses

View data files Native formats PSI-MI formats

Create a unified schema of the data sources

Create the unified ontology in Protégé Create each singular database as a subset

of the unified ontology

Page 14: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Protégé Screenshot

Page 15: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Methodology: Data Import

DOMParser Load data from XML

Protégé-OWL API Insert entities into singular databases

Page 16: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Methodology: Transformation

Use Prompt to create a mapping for each specific data source to the unified ontology

Use Prompt mappings to insert individuals from each singular ontology into the unified model

Page 17: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Methodology: Transformation

Duplicate Data Need to fill in attributes on existing

records Write ‘Algorithm Plugin’ for Prompt to

determine when individuals are the same

Page 18: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Prompt Screenshot - Mapping

Page 19: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Methodology: Query Interface

Export Protégé data into MySQL Web interface for collecting data Working with domain experts to

determine useful views, queries

Page 20: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Evaluation

Performance Transformation Time in Protégé Query Time for Web Interface

Size Minimize redundancy in data model Minimize duplicate data

Page 21: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Evaluation

Correctness Domain Experts

Dr. Brown, Dr. Wang Maintain proper data relationships

Utility Enrich data

Page 22: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Evaluation

Data Model Enrichment

0

5

10

15

20

25

30

IntAct MINT MIPS

Database

Nu

mb

er o

f C

lass

es

New

Changed

Existing

Page 23: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Demonstration

Page 24: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Future Work

Complete transformations Import data Evaluate ontology Add other databases to model

Page 25: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Conclusions

Adequate start Needs improvement, evolution,

more data sources As the project matures, the ontology

will be ready for use in the biological domain

Will be able to more easily gain information about protein-protein interactions

Page 26: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

References

AAAI.org - AITopics: “Ontology” http://www.aaai.org/AITopics/html/ontol.html

Protégé http://protege.stanford.edu/overview/protege-o

wl.html Prompt

http://protege.cim3.net/cgi-bin/wiki.pl?Prompt PSI-MI

http://psidev.sourceforge.net/mi/xml/doc/user

Page 27: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

References

BIND http://www.bind.ca

DIP http://www.dip.doe-mbi.ucla.edu

IntAct http://www.ebi.ac.uk/intact/site/

MINT http://mint.bio.uniroma2.it/mint/Welcome.do

MIPS http://mips.gsf.de/genre/proj/yeast

Page 28: An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Q & A