a school of information science, federal university of minas gerais , brazil
DESCRIPTION
Requirements for Semantic Biobanks. André Q ANDRADE a,b , , Markus KREUZTHALER b , Janna HASTINGS d,e , Maria KRESTYANINOVA f,g , Stefan SCHULZ b,c. a School of Information Science, Federal University of Minas Gerais , Brazil - PowerPoint PPT PresentationTRANSCRIPT
aSchool of Information Science, Federal University of Minas Gerais, Brazil bMedical University of Graz, Austria, cUniversity Medical Center Freiburg, GermanydEuropean Bioinformatics Institute, Hinxton, UK; eUniversity of Geneva, Switzerland
fHelsinki University, Finland, gUniquer, Lausanne, Switzerland
Requirements for Semantic Biobanks
André Q ANDRADEa,b,, Markus KREUZTHALERb, Janna HASTINGSd,e , Maria KRESTYANINOVAf,g ,
Stefan SCHULZ b,c
• Semantic interoperability: systems exchange
exchange data + meaning
• Formal Ontologies provide unambiguous
descriptions of what is universally true for all
objects of a certain type
• Increasing number of biomedical
vocabularies are ontology based
(OBO Foundry, SNOMED CT…)
• Blood, tissue sampling for research
• Samples from several biobanks needed for
retrieving data for a specific research
question
• Comprehensive annotations with lab data
and clinical data
BiobanksSemantic
Model of Meaning Data
(Generalized) Biomedical Retrieval Scenario
• Retrieval: – Distribution of heterogeneous resources of interest
– Most retrieval scenarios recall-oriented
• Resources used by multiple researchers over the world
for multiple purposes
• Effective retrieval depends on querying resource
metadata– Provenance information
– Content-based semantic annotations (structured vocabulary)
– Access regulations
Does this sound familiar?
Analogy
• Global bibliographic database
• Resources: publications from
different publishers
• Annotations:– Bibliographic data
– Abstract
– Semantic representation (MeSH) on
paper content
• Local access conditions to the full
resource apply
Analogy
• Global bibliographic database
• Resources: publications from
different publishers
• Annotations:– Bibliographic data
– Abstract
– Semantic representation (MeSH) of
paper content
• Local access conditions to the full
resource apply
Biobank“Broker”
• Global biobank sample database
• Resources: biological specimens
(blood, tissue,…)
• Annotations:– Sample information (staining etc…)
– Semantic representation of both lab and
selected patient related information
(Information models / ontologies)
• Local access conditions to the full
resource apply
• Sample related information:– Type of sample– Preparation of sample– Time– Storage information– Physical location– Associated information, lab data,
genotype,…• Donor related information:
– Demographic data– Phenotype data – Time indexed clinical data
(EHR extracts)
• Increment of relevant donor related information after samples are taken
Data resources for biobanking
1960 1970 1980 1990 2000 2010
Centralized broker for biobanking information
+
Biobank
* +EHR
+
Biobank
* +EHR
+
Biobank
* +EHR
+
Biobank
* +EHR
Centralized broker for biobanking information
+
Biobank
* +EHR
+
Biobank
* +EHR
+
Biobank
* +EHR
+
Biobank
* +EHR
Centralized broker for biobanking information
+
Biobank
* +EHR
+
Biobank
* +EHR
+
Biobank
* +EHR
+
Biobank
* +EHR
Language for semantic annotations of biobank data
• Formal ontologies – Precise, logical descriptions of annotations and queries
– High expressiveness through compositionality
– OWL-DL: Semantic Web Standard for description logics: allows to
formulate axioms of what is universally true of all instances of a
kind
• Specific components– Ground axioms provided by an upper level ontology (BioTop)
– Set of disjoint upper level categories and relations, together with
related constraints
– Ontological description of domain: SNOMED CT, OBO Foundry…
Description logics representation and retrieval
retrieves
• “retrieve all gastric mucosa samples from before 2003 of patients who had cancer of
stomach after 2008”
• Representation language: OWL DL
• Editor: Protégé 4.2.
• Reasoner: HermiT
Requirements
• Formal representations – Ontological representation of information models and terminologies
– Ontological representation of data about specimens
– Joint, universally used clinical terminology
– Expressive and stable upper level ontologies (+ ontological relations)
• Scope and granularity of EHR extract of interest for biobank
related queries
• Specification of structure and function of central repository
• Steps for information translation from legacy systems– Mappings
– Interfaces
– Update policies
Challenges
• Prototypical status of DL reasoners and editor
• Performance problems with expressive ontologies
• Modularization of large clinical terminologies in response to
data and query under scrutiny
• Organization of – Central repository
– Local mappings / translations
• Logistics (samples)
• Privacy and IP issues
• Business model