ontology databases: detecting inconsistencies in the gene ontology using not-gadgets

68
Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets Paea LePendu University of Oregon Talk: National Center for Biomedical Ontology ◦ Stanford University ◦ September, 2009

Upload: valin

Post on 23-Feb-2016

59 views

Category:

Documents


0 download

DESCRIPTION

Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets. Paea LePendu University of Oregon. Talk: National Center for Biomedical Ontology ◦ Stanford University ◦ September , 2009. General Interests. Logic. Programming Languages. Databases. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology Databases:Detecting Inconsistencies in the Gene Ontology using

Not-gadgets

Paea LePenduUniversity of Oregon

Talk: National Center for Biomedical Ontology ◦ Stanford University ◦ September, 2009

Page 2: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

General Interests

Programming Languages

Automated Reasoning

Databases

Logic

Page 3: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Outline

• Ontology-based Data Management– Background, Motivation– Theory– Benchmarking– Application Domain, Query Answering

• Inconsistency Detection– Theory– The serotonin example– GO plus ZFIN, MGI annotations

Page 4: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology-based Database Integration:reducing database integration to ontology translation

Page 5: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology-based Database Integration:reducing database integration to ontology translation

Page 6: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

RelationAttributeDatatype

KeysConstraint

ViewTriggerTuple

Ontology-based Data Management

ClassPropertyDatatypeSubClass

RestrictionIndividual

Page 7: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology-based Data Management

Ontology

RDBMS

User

RDBMSRDBMS

RDBMSRDBMS

Data Annotation Data Management

Data Access Layer

Page 8: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: sisters-siblings

All sisters are siblings.

Hilary and Lynn are sisters.

This is what we know :

This is what we want to know :

Who are siblings?

{ <Hilary, Lynn> }

Obviously, the answer should be :

Hilary and Lynn are siblings.

{ <x,y> | siblingOf(x,y) }

Page 9: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: sisters-siblings

Page 10: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: sisters-siblings

Page 11: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: The Gene OntologyGO_0003674

z01, z02, z03

GO_0005488e01,e02, e03

GO_0030528y01, y02, y03

GO_0003677x01, x02, x03

GO_0003700w01, w02, w03

GO_0003676c01, c02, c03

GO_0003723d01, d02, d03

GO_0008135a01, a02, a03

GO_0045182b01, b02, b03

Page 12: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: The Gene OntologyGO_0003674

z01, z02, z03

GO_0005488e01,e02, e03

GO_0030528y01, y02, y03

GO_0003677x01, x02, x03

GO_0003700w01, w02, w03

GO_0003676c01, c02, c03

GO_0003723d01, d02, d03

GO_0008135a01, a02, a03

GO_0045182b01, b02, b03

Page 13: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology Databases:General Models for Database Designs

• Generality is important– Avoid rewriting

• Scalability of KB is important– Persistence, caching and indexing

• Major generic models– Horizontal Models– Vertical Models– Decomposition Storage Models

Page 14: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology Databases: View-based Approach

CREATE VIEW v_Person(id) AS SELECT id FROM Person UNION SELECT id FROM v_Male UNION SELECT id FROM v_Female

Person

Female Male

[Pan & Heflin. DLDB: Extending Relational Databases to Support Semantic Web Queries. ISWC, 2003.]

v_Person

Person

P-0004

FemaleP-0002

MaleP-0001

P-0003

v_Female v_Male

Page 15: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology Databases: Active Database Approach

Person

Female Male

[LePendu, et al. Ontology Database: a New Method for Semantic Modeling and an Application to Brainwave Data. SSDBM, 2008.]

Female Male

Person

ON INSERT into Male INSERT into Person

On INSERT into Female INSERT into Person

Page 16: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology Databases: Active Database Approach

Person

Female Male Female MaleP-0001

Person

ON INSERT into Male INSERT into Person

On INSERT into Female INSERT into Person

Page 17: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology Databases: Active Database Approach

Person

Female Male Female MaleP-0001

PersonP-0001

Page 18: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology Databases: Active Database Approach

Person

Female Male FemaleP-0002

MaleP-0001

PersonP-0001

Page 19: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology Databases: Active Database Approach

Person

Female Male FemaleP-0002

MaleP-0001

PersonP-0001P-0002

Page 20: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology Databases: Active Database Approach

Person

Female Male FemaleP-0002

MaleP-0001P-0003

PersonP-0001P-0002

Page 21: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology Databases: Active Database Approach

Person

Female Male FemaleP-0002

MaleP-0001P-0003

PersonP-0001P-0002P-0003

Page 22: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology Databases: Active Database Approach

Person

Female Male FemaleP-0002

MaleP-0001P-0003

PersonP-0001P-0002P-0003P-0004

Page 23: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

All sisters are siblings.

Hilary and Lynn are sisters.

This is what we know :

This is what we want to know :

Who are siblings?

Obviously, the answer should be :

Hilary and Lynn are siblings.

Example: sisters-siblings (revisited)

Page 24: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

All sisters are siblings.

Hilary and Lynn are sisters.

This is what we know :

This is what we want to know :

Who are siblings?

Obviously, the answer should be :

Hilary and Lynn are siblings.

Example: sisters-siblings (revisited)SiblingOf

SisterOf

Hilary Lynn

Page 25: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

All sisters are siblings.

Hilary and Lynn are sisters.

This is what we know :

This is what we want to know :

Who are siblings?

Obviously, the answer should be :

Hilary and Lynn are siblings.

Example: sisters-siblings (revisited)SiblingOf

Hilary Lynn

SisterOf

Hilary Lynn

Page 26: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

All sisters are siblings.

Hilary and Lynn are sisters.

This is what we know :

This is what we want to know :

Who are siblings?

Obviously, the answer should be :

Hilary and Lynn are siblings.

Example: sisters-siblings (revisited)SiblingOf

Hilary Lynn

SisterOf

Hilary Lynn

{ <x,y> | siblingOf(x,y) }

Page 27: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

All sisters are siblings.

Hilary and Lynn are sisters.

This is what we know :

This is what we want to know :

Who are siblings?

Obviously, the answer should be :

Hilary and Lynn are siblings.

Example: sisters-siblings (revisited)SiblingOf

Hilary Lynn

SisterOf

Hilary Lynn

{ <x,y> | siblingOf(x,y) }

Just look it up!

Page 28: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

All sisters are siblings.

Hilary and Lynn are sisters.

This is what we know :

This is what we want to know :

Who are siblings?

Obviously, the answer should be :

Hilary and Lynn are siblings.

Example: sisters-siblings (revisited)SiblingOf

Hilary Lynn

SisterOf

Hilary Lynn

{ <x,y> | siblingOf(x,y) }

Just look it up!

{ <Hilary, Lynn> }

Page 29: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Lehigh University Benchmark (LUBM)Load Time and Query Time

(1.5 million facts)(10 Universities, 20 Departments)

[Guo, et al. LUBM: A Benchmark for OWL Knowledge Base Systems. J Web Semantics, 2005.]

Page 30: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology-based Data Management

[Frishkoff, et al. Development of Neural Electromagnetic Ontologies (NEMO): Ontology-based Tools for Representation and Integration of Event-related Brain Potentials. ICBO, 2009]

Page 31: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology-based Query AnsweringReturn all data instances that belong to ERP pattern classes which have a surface positivity over

frontal regions of interest and are earlier than the N400.

Which patterns have a region of interest that is left-occipital and manifests between 220 and 300ms?

What is the range of intensity mean for the region of interest for N100?

Show the region of interest for all ERP patterns that occur between 0 and 300ms.

Which PCA factor do P100 patterns most often appear in?

What is the range of intensity mean for the region of interest for N100 patterns?

Show the patterns whose region of interest is left occipital and occurs between 220 and 300ms.

Page 32: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Inconsistency Detection

• Background and Motivation– Expressiveness– From disjunctions to negations

• Theory– Not-gadgets

• Motivation– Serotonin example– ATP-gated cation channel activity

• Results from ZFIN and MGI Annotations

Page 33: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Not-gadgets

¬ →

¬

Page 34: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: inconsistency detection

"Annotations in this way sometimes point to errors in the type-type relationships described in the ontology. An example is the recent removal of the type serotonin secretion as an is_a child of neurotransmitter secretion from the GO Biological Process ontology. This modification was made as a result of an annotation from a paper showing that serotonin can be secreted by cells of the immune system where it does not act as a neurotransmitter.“

[Hill, et al. Gene Ontology annotations: what they mean and where they come from. BMC Bioinformatics, 2008]

Page 35: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

gene-x

gene-x

not-gadgetfail!

Example: serotonin secretion

Page 36: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: GO:0004931

ATP-gated cation channel activity (as of 3/09):[Term]id: GO:0004931name: ATP-gated cation channel activitynamespace: molecular_functiondef: "Catalysis of the transmembrane transfer of an ion by a channel that opens when extracellular ATP has been bound by the channel complex or one of its constituent parts." [GOC:mah, PMID:9755289]comment: Note that this term refers to an activity and not a gene product. Consider also annotating to the molecular function term 'purinergic nucleotide receptor activity ; GO:0001614'.synonym: "P2X activity" RELATED []synonym: "purinoceptor" BROAD []synonym: "purinoreceptor" BROAD []is_a: GO:0005231 ! excitatory extracellular ligand-gated ion channel activityis_a: GO:0005261 ! cation channel activity

Page 37: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: GO:0004931

GO:0004391 sub-graph (using Jambalaya):

Page 38: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: GO:0004931

What is so interesting about GO:0004391?

ZFIN ZDB-GENE-030319-2 p2rx2 NOT GO:0004931ZFIN:ZDB-PUB-031031-8|PMID:14580944 IDA F

purinergic receptor P2X, ligand-gated ion channel, 2gene taxon:7955 20071005 ZFIN

ZFIN ZDB-GENE-030319-2 p2rx2 GO:0004931ZFIN:ZDB-PUB-031031-8|PMID:14580944 IGI ZFIN:ZDB-GENE-000427-3 F purinergic receptor P2X, ligand-gated ion channel, 2 gene taxon:7955 20071005

ZFIN

Source: [1/13/2009] http://www.geneontology.org/gene-associations/

Page 39: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: GO:0004931

The not-gadget will raise a logical inconsistency. p2rx2 NOTGO:0004931 p2rx2 GO:0004931

GO_0004931

* Tables starting with an '_' are negations.

not-gadgetfail!_GO_0004931

p2rx2

Page 40: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: GO:0004931

GO:0004391 sub-graph (using Jambalaya):

Page 41: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: GO:0004931

GO:0004391 sub-graph (using Jambalaya):

Page 42: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Example: GO:0004931

GO:0004391 sub-graph (using Jambalaya):

Page 43: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

ZFIN

Page 44: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

ZFIN

Page 45: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

MGI

Page 46: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

MGI

Page 47: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

MGI

Page 48: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

ZFIN - MGI

Page 49: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

ZFIN

Page 50: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Outcome: suspect IEA annotations

Page 51: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

GO Online SQL Environment (GOOSE)

Source: [1/13/2009] http://www.geneontology.org/GO.database.shtml#diagram

pos,IEA(graph_path x association) x neg(grapth_path x association)

Page 52: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

What do logical inconsistencies mean?

• Several possibilities:– Incorrect annotation (e.g., suspect IEA annotations)– Incorrect relationship (e.g., serotonin secretion)– Incomplete model:

Recall:ZFIN ZDB-GENE-030319-2 p2rx2 GO:0004931

ZFIN:ZDB-PUB-031031-8|PMID:14580944 IGIZFIN:ZDB-GENE-000427-3 F purinergic receptor P2X, ligand-gated ion channel, 2 gene taxon:795520071005 ZFIN

– Perfectly admissible!

Page 53: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Next Directions

• Explanation and proof-reconstruction ✓• Deep (data) annotation tools• Distributed network of Ontology Databases

Page 54: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Data Annotation:Neural ElectroMagnetic Ontologies

LFRON

RFRON

frontocentral

[Frishkoff, et al. ERP measures of partial semantic knowledge: Left temporal indices of skill differences and lexical quality. Biological Psychology, 2009.]

Page 55: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Network of Ontology Databases

[Thorisson, Muilu and Brookes. Genotype–phenotype databases: challenges and solutions for the post-genomic era. Nature Reviews, 2009.]

Page 56: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Thank you

Questions?

Page 57: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets
Page 58: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Andrea’s Example

Is John supervised by a TopManager who is a friend of an AreaManager?

{ {Mary/y, Andrea/z}, {Andrea/y, Paul/z} }[Franconi. Ontologies and databases: myths and challenges. VLDB, 2008.]

Page 59: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Raymond Reiter

[Reiter. Deductive Question-Answering on Relational Data Bases. Logic and Data Bases, 1977]

Page 60: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Raymond Reiter

Page 61: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Raymond Reiter

Page 62: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Raymond Reiter

Page 63: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets
Page 64: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Benchmarking Suite

Page 65: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Origins

Page 66: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

CIS @ UO

Page 67: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

CIS @ UO• Research Areas in Computer Science:

– software engineering– programming languages– human-computer interaction– parallel and distributed computing– networking and graph theory– scientific computation/visualization– information integration and mining

• Affiliates:– Neurosciences Institute– Computational Science Institute– Zebrafish Information Network

Page 68: Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets

Ontology-based Data Access

[Rodriguez-Muro , et al. Realizing Ontology Based Data Access: A plug-in for protégé. ICDEW, 2008.]