semantic concept-based queries in onix cagrid · pdf fileoutline problem statement high-level...
TRANSCRIPT
Semantic concept-based queries in ONIXcaGrid case
Alejandra González BeltránJoint work with Anthony Finkelstein (UCL), J Max Wi lkinson (NCRI) and Jeff Kramer (ICL)
caBIG Arch/VCDE F2FMeeting
11th May 2009
Outline
� Problem statement� High-level concept-based queries for caGrid resources
implemented in ONIX� Motivating Example
� Approach� Exploiting the combination of Semantic Web technologies and
caGrid� UML vs OWL
� System Architecture� OWL Generation� Query Rewriting and Translation
� Concluding remarks
Problem Statement
� Support for high-level descriptive queries based on domain concepts� High-level query: the user may be unaware of the structure of the
particular target resources� Descriptive query: the query is not given in terms of how to find
data, but gives the criteria for the desired data
� caGrid case: exploit the rich metadata infrastructure to support high-level queries based on NCIt concepts and its automatic translation to CQL
Motivating example
� A biomedical scientist is interested in the gene Transforming Growth Factor Beta 1 (TGFB1), which has been implicated in many diseases, including cancer� Interest in examining the effect of all or certain SNPs present in the
gene TGFB1.� Need to retrieve SNPs associated with the gene TGFB1 from a
data source
� caBIO is an appropriate target resource [Komatsoulis’08]
� caBIO model 4.2 is the latest version, but 4.0 is the latest version
Motivating Example in CQL
<ns1:CQLQuery xmlns:ns1=" http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery"><ns1:Target name=" gov.nih.nci.cabio.domain.SNP"><ns1:Association
name=" gov.nih.nci.cabio.domain.GeneRelativeLocation" roleName=" relativeLocationCollection">
<ns1:Association name=" gov.nih.nci.cabio.domain.Gene" roleName=" gene">
<ns1:Attribute name=" symbol" predicate= " EQUAL_TO" value=" TGFB1"/></ns1:Association></ns1:Association>
</ns1:Target></ns1:CQLQuery>
Motivating example – Annotated UML model
Single_Nucleotide_Polymorphism
Gene_Relative_Location
Gene
Gene_Symbol
Relative_Value
Chromosome
Location
Description Logic query(DL-query) in Manchester OWL Syntax
Motivating example – Query in terms of NCItconcepts
� Find objects that have concept Single_Nucleotide_Polymorphism and have an association with objects whose concept is Gene, which in turn have an attribute with concept Gene_Symbol, whose value is “TGFB1”
� hasConcept some Single_Nucleotide_Polymorphism and hasAssociation some (hasConcept some Gene and hasAttribute some(hasConcept some Gene_Symbol and hasValue value “TGFB1”)
CQL query
Description Logic query(DL-query) in Manchester OWL Syntax
Motivating example – Query in terms of NCItconcepts
� Find objects that have concept Single_Nucleotide_Polymorphism and have an association with objects whose concept is Gene, which in turn have an attribute with concept Gene_Symbol, whose value is “TGFB1”
� hasConcept some Single_Nucleotide_Polymorphism and hasAssociation some (hasConcept some Gene and hasAttribute some(hasConcept some Gene_Symbol and hasValue value “TGFB1”)
CQL query
Description Logic query(DL-query) in Manchester OWL Syntax
Motivating example – Query in terms of NCItconcepts
� Find objects that have concept Single_Nucleotide_Polymorphism and have an association with objects whose concept is Gene, which in turn have an attribute with concept Gene_Symbol, whose value is “TGFB1”
� hasConcept some Single_Nucleotide_Polymorphism and hasAssociation some (hasConcept some Gene and hasAttribute some(hasConcept some Gene_Symbol and hasValue value “TGFB1”)
CQL query
Description Logic query(DL-query) in Manchester OWL Syntax
Motivating example – Query in terms of NCItconcepts
� Find objects that have concept Single_Nucleotide_Polymorphism and have an association with objects whose concept is Gene, which in turn have an attribute with concept Gene_Symbol, whose value is “TGFB1”
� hasConcept some Single_Nucleotide_Polymorphism and hasAssociation some (hasConcept some Gene and hasAttribute some(hasConcept some Gene_Symbol and hasValue value “TGFB1”)
CQL query
Outline
� Problem statement� High-level concept-based queries for caGrid resources
implemented in ONIX� Motivating Example
� Approach� Exploiting the combination of Semantic Web technologies and
caGrid� UML vs OWL
� System Architecture� OWL Generation� Query Rewriting and Translation
� Concluding remarks
ApproachR
ules
Syntax
Structure
Reference
Domain EVS
caDSR
Ontologicalrepresentations of
data services’ metadata
Exploit caGrid rich metadata
infrastructure
InformationModels/ GME
Instance Data
Metadata Hierarchy [Pollock’04]
caGridData Service
ApproachR
ules
Syntax
Structure
Reference
Domain EVS
caDSR
Ontologicalrepresentations of
data services’ metadata
Exploit caGrid rich metadata
infrastructure
InformationModels/ GME
Instance Data
Metadata Hierarchy [Pollock’04]
caGridData Service
Annotated UML Web Ontology Language (OWL)
Software design and analysis
Closed World Assumption
Main constructs:� UML class: a type for a set of objects with common characteristics (UML attributes)� UML Associations� Generalisations: classes and associations
No formal semantics
Visual modelling language
Open World Assumption
Reasoning (or inference) capabilities
Main constructs:� OWL class: set of individuals� Object properties: relationships between individuals� Datatype properties: relationships between individuals and data values
Formal semantics: DLs
Textual modelling language
Semantic web, knowledge representation
UML OWL
Outline
� Problem statement� High-level concept-based queries for caGrid resources
implemented in ONIX� Motivating Example
� Approach� Exploiting the combination of Semantic Web technologies and
caGrid� UML vs OWL
� System Architecture� OWL Generation� Query Rewriting and Translation
� Concluding remarks
Architecture
FQP servicecaDSR service
EVS service
SemanticFQP service
OWL generationservice
OWL repositoryservice
caGriddata service
CQL
DCQLCQL
ONIX Portal
DCQL
concept-basedquery
� Services for data integration developed within the Ontology-based Query WG can extend this architecture
OWL generation from caGrid information models
� Analysis of existing UML-to-OWL transformation
� Custom UML-to-OWL transformation considering semantic annotations
� Two generation methods� Semantic data integration:
e.g. multiplicities [McCusker’09]
� Semantic queries: e.g. associations are transitive, only existential restrictions
NCIt
Data Service MetadataOntology
Annotated UMLontology
NCIt Module forData Service
Metadata
NCIt Module forData Service
Metadata
<<imports>> <<imports>>
<<depends>>
UMLontology
<<imports>>
Annotated UML model
Single_Nucleotide_Polymorphism
Gene_Relative_Location
Gene
Gene_Symbol
Relative_Value
Chromosome
Location
Part of caBIO 4.0 ontology
cb:GeneRelativeLocationcb:SNP
nci:Single_Nucleotide_Polymorphism
cb:relativeLocation
nci:Gene
cb:Gene
nci:Gene_Relative_Location
sq:hasConcept sq:hasConcept sq:hasConcept
sq:hasAssociation
rdfs:subPropertyOf
cb:gene
rdfs:subPropertyOf
Q ≅ hasConcept some Single_Nucleotide_Polymorphismand hasAssociation some (hasConcept some GeneConcept and
hasAttribute some (hasConcept some Gene_Symbol andhasValue value “TGFB1”))
<ns1:CQLQuery xmlns:ns1=" http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery"><ns1:Target name=" gov.nih.nci.cabio.domain.SNP"><ns1:Association
name=" gov.nih.nci.cabio.domain.GeneRelativeLocation" roleName=" relativeLocationCollection">
<ns1:Association name=" gov.nih.nci.cabio.domain.Gene" roleName=" gene">
<ns1:Attribute name=" symbol" predicate=“ EQUAL_TO" value=" TGFB1"/></ns1:Association>
</ns1:Association></ns1:Target>
</ns1:CQLQuery>
Semantic Query ServiceQuery rewriting and translation
DL- Query Parsing
DL-query to OQOL
OQOL to CQL
DL- Query Validation
DL- Query RewritingData Service
Ontology
Syntactic analysis of query
Semantic analysis of query: it must entail a UML class or UML attributes to be valid
Reformulate query in terms of data service constructs using explanations [Kalyanpur’07]
Translate DL-query into the Object-Query Optimisation Language(monoid comprehensions) & Normalization [Fegaras’00].
Translate OQOL expression to CQL
DL-query
CQL
Conclusions
� Support for high-level description queries based on domain concepts for caGrid data services
� Approach generates ontologies from caGrid metadata and uses ontologies+reasoning to translate concept-based queries to CQL� No need for the user to be aware of the structure of the target
resource
� Approach is applicable to other resources exposing metadata and semantic annotations
Future Work� Extend approach to support DCQL queries� Incorporate caDSR semantics to OWL generator
References
[Komatsoulis’08] G Komatsoulis, D Warzel, F Hartel, K Shanbag, et al. caCORE version 3: Implementation of a model-driven, service-oriented architecture for semantic interoperability. Journal of Biomedical Informatics, 41(1):106-123.
[McCusker’09] J McCusker, J Phillips, A González Beltrán, A Finkelstein, Semantic web datawarehousing for caGrid. To appear in BMC Bioinformatics, 2009.
[Kalyanpur’07] A Kalyanpur, B Parsia, M Horridge, E Sirin. Finding all justifications of OWL DL entailments. ISWC/ASWC, 267-280.
[Fegaras’00] L Fegaras, D Maier. Optimizing Object Queries Using an Effective Calculus. ACM Trans. Database Syst. 25(4):457-516,2000