semantics 101 for pharma - phuse...barcelona annual conference monday, 10th october 2016 semantics...
TRANSCRIPT
Barcelona
Annual Conference
Monday, 10th October 2016
Semantics 101 for Pharma
Tim Williams,
UCB Biosciences Inc., USA
Marc Andersen
StatGroup ApS, Denmark
37
Everything has a unique, linkable reference.
38
Resource Description Framework (RDF)
• Semantic Web
• Clinical Trials Context
• Querying
• Creating
• Data Cubes Use Case
39
Explore a Studyhttps://www.clinicaltrials.gov/ “Evaluation of Efficacity and Safety of
Oseltamivir and Zanamivir”
Without knowing anything about Triples!
40
Find the NCTID
41
Explore NCTID Linked Datahttp://lod.openlinksw.com/describe/?uri=http://bio2rdf.org/clinicaltrials:NCT00799760
42
type Clinical Study
NCT00799760Evaluation of Efficacity and Safety
of Oseltamivir and Zanamivir
phase
condition
Phase 3
Gastric Influenza
http://bio2rdf.org/clinicaltrials_resource:f773736eaf3a1da739bc23f48dae6954
http://bio2rdf.org/clinicaltrials/NCT00799760
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://bio2rdf.org/clinicaltrials_vocabulary:Clinical-Study
http://bio2rdf.org/clinicaltrials/NCT00799760
http://bio2rdf.org/clinicaltrials_vocabulary:phase
http://bio2rdf.org/clinicaltrials_resource:8357418e2694434468870b487644532d
http://bio2rdf.org/clinicaltrials/NCT00799760
http://bio2rdf.org/clinicaltrials_vocabulary:condition
Phase 3 Code
Gastric Influenza Code
Subject Predicate Object
43
type Clinical Study
NCT00799760Evaluation of Efficacity and Safety
of Oseltamivir and Zanamivir
phase
condition
Phase 3
Gastric Influenza
ns3:f773736eaf3a1da739bc23f48dae6954
ns1:NCT00799760
rdf:type ns2:Clinical-Study
ns2:phase
Phase 3 Code
Gastric Influenza Code
Subject Predicate Object
@prefix ns1: <http://bio2rdf.org/clinicaltrials:> .@prefix ns2: <http://bio2rdf.org/clinicaltrials _vocabulary:> .@prefix ns3: <http://bio2rdf.org/clinicaltrials_resource:>.@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns# >.
ns3:8357418e2694434468870b487644532d
ns3:condition
Terse Triple Language
44
Native• 4Store http://www.4store.org/
• AllegroGraph http://franz.com/agraph/allegrograph/
• Apache Jena TDB http://jena.apache.org/
• GraphDB http://ontotext.com/products/graphdb/
• MarkLogic http://www.marklogic.com
DBMS-backed• Apache Jena SDB http://jena.apache.org/
• Oracle Spatial and Graph http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/rdfse
mantic-graph-1902016.html
Hybrid Sesame http://rdf4j.org/
Virtuoso http://virtuoso.openlinksw.com/
List at the W3C: https://www.w3.org/2001/sw/wiki/Category:Triple_Store
Storing RDF: Triple Stores
Adapted from Dr. Harold StackKnowledge Engineering with Semantic Web Technologies 2015
45
Resource Description Framework (RDF)
• Semantic Web
• Clinical Trials Context
• Querying
• Creating
• Data Cubes Use Case
46
• SPARQL – SPARQL Protocol
And RDF Query Language
• Not limited to RDF
– Utilities for relational database, spreadsheets, XML, JSON
• Protocol
– Rules for queries and results exchange
Query RDF with SPARQL
47
DataQuery
ns3:title
ns2:primary-outcome
ns1:NCT00799760 ?outURI
?outcome
SELECT ?outcome
"RT-PCR for influenza A virus…"@en ;
Graph Path for Primary Outcome
48
ns2:primary-outcome
Graph Path for Primary Outcome
ns1:NCT00799760 ns3:d821848f0fb8dc44f390a40e066e9224
ns4:title
“T-PCR for influenza A virus in nasal secretion 2 days”@en
Primary Outcome URI
Primary Outcome title ashuman readable code list value
49
PREFIX ns1: <http://bio2rdf.org/clinicaltrials:>
PREFIX ns2: <http://bio2rdf.org/clinicaltrials_vocabulary:clinicaltrials_vocabulary:>
PREFIX ns3: <http://purl.org/dc/terms/>
SELECT ?outcome
WHERE
{
ns1:NCT00799760 ns2:primary-outcome ?outURI .
?outURI ns3:title ?outcome .
} Retrieve data that matches the Graph Pattern
NCTID ?outURIprimary-outcome title
?outcome
SPARQL Query for Primary Outcome
Try it at: http://lod.openlinksw.com/sparql
50
Query using:
• SPARQL Endpoint
– Example: lod.openlinksw.com/sparql
• R with package rrdf - see exercises
• SAS macro, PROC GROOVY- see exercises
See Exercises
51
Query with RR Packages:• rrdf• rrdflibs
http://github.com/egonw/rrdf
Requires Java 7 or higher
rrdf, rrdflibs
Willighagen E. (2014) Accessing biological data in R with semantic web technologies. PeerJ PrePrints 2:e185v3See https://dx.doi.org/10.7287/peerj.preprints.185v3
52
Query an Endpoint with R
library(rrdf)
endpoint = "http://localhost:3030/test/query"
query = "SELECT * WHERE {?s ?p ?o . } LIMIT 10 "
queryResult = sparql.remote(endpoint, query)
queryResult
See Exercises
53
Query with SASSAS Macros:%sparqlquery - SPARQL query%sparqlupdate - SPARQL update
https://github.com/MarcJAndersen/SAS-SPARQLwrapper
Implementation:• SAS PROC HTTP to access the
service • Send query/update as text file• Input result using SAS LIBNAME
for XML
Other approaches: • PROC groovy to execute Java Code
fromApache Jena (see directory show-res-sasin https://github.com/MarcJAndersen/poc-analysis-results-metadata)
• SAS Java objects to interface to Apache Jena
Requires running SPARQL service, for example Apache Jena
See Exercises
54
Query a Remote SourceAt: http://lod.openlinksw.com/sparql
55
Which variables are used?
Given: CSR appendix 14. 1 as RDF data cubes
SPARQL query using property paths:
select distinct ?columnwhere { ?ds a qb:DataSet ;
(<>|!<>)*/rrdfqbcrnd0:D2RQ-PropertyBridge/^d2rq:property/d2rq:column ?column .}order by ?ds
56
Details (1/2)select distinct ?column
where {
{
?ds a qb:DataSet ;
qb:structure ?structure.
?structure qb:component ?component .
?component qb:dimension ?dimension .
?dimension qb:codeList ?codeList .
?codeList rrdfqbcrnd0:DataSetRefD2RQ ?DataSetRefD2RQ .
?DataSetRefD2RQ rrdfqbcrnd0:D2RQ-PropertyBridge ?D2RQPropertyBridge .
?Correctd2rqPropertyBridge d2rq:property ?D2RQPropertyBridge ;
d2rq:column ?column .
}
(continued on next slide)
57
(continued from previous slide)union {?ds a qb:DataSet ;
qb:structure ?structure.
?structure qb:component ?component .?component qb:dimension ?dimension .?dimension qb:codeList ?codeList .?codeList skos:hasTopConcept ?codeValue .?codeValue rrdfqbcrnd0:DataSetRefD2RQ ?DataSetRefD2RQ .?DataSetRefD2RQ rrdfqbcrnd0:D2RQ-PropertyBridge ?D2RQPropertyBridge .
?Correctd2rqPropertyBridge d2rq:property ?D2RQPropertyBridge ;d2rq:column ?column .
}}
58
Which data are used for a result?select ?s ?obs
where {
?s ?variable ?value .
{select
(iri(concat('http://www.example.org/datasets/vocab/',
replace(str(?vnop),'http://www.example.org/rrdfqbcrnd0/([A-Z0-9_]+)$', '$1', 'i' ))) as
?variable)
?matchvalue
?obs
where {
?obs ?dim ?codevalue .
?dim a qb:DimensionProperty .
?codelist skos:hasTopConcept ?codevalue .
?codelist rrdfqbcrnd0:DataSetRefD2RQ ?vnop .
?codelist rrdfqbcrnd0:R-columnname ?vn .
?codelist rrdfqbcrnd0:codeType ?vct .
?codevalue skos:prefLabel ?clprefLabel .
?codevalue rrdfqbcrnd0:R-selectionoperator '==' .
?codevalue rrdfqbcrnd0:R-selectionvalue ?matchvalue.
values (?obs) { (ds:obs223) }
}
}
BIND(IF(?value!=?matchvalue,1,0) AS ?notequal)
}
group by ?s ?obs
having(SUM(?notequal)=0)
order by ?s
59
Federated Query: Join data across sources
LINK
60
61
More SPARQL
SPARQL Query Language for RDF https://www.w3.org/TR/rdf-sparql-query/
SPARQL 1.1 Query Language https://www.w3.org/TR/sparql11-query/
“Learning SPARQL” - Bob DuCharme
http://www.learningsparql.com/index.html - examples for download