sadi swsip '09 'cause you can't always get what you want!

104
’cause you can’t always GET what you want!

Upload: mark-wilkinson

Post on 16-Dec-2014

2.805 views

Category:

Technology


3 download

DESCRIPTION

My presentation to the IEEE Asia Pacific Services Computing Conference 2009 - Semantic Web Services In Practice (SWSIP 09) track. This show introduces the SADI (Semantic Automated Discovery and Integration) Framework - the replacement for our earlier explorations with the BioMoby project. In this slideshow I explore what SADI is, and why it is able to generate such interesting (and useful!) Semantic-Webby behaviours from Web Services. I also discuss our current research activities around how we are trying to exploit the SADI system to create much more natural query interfaces for Cardiovascular researchers.

TRANSCRIPT

Page 1: SADI SWSIP '09  'cause you can't always GET what you want!

’cause you can’t always GET what you want!

Page 2: SADI SWSIP '09  'cause you can't always GET what you want!

Web Services

vs.

Semantic Web

Page 3: SADI SWSIP '09  'cause you can't always GET what you want!

Web Servicesare not “connected” to

the Semantic Web

Why?

Page 4: SADI SWSIP '09  'cause you can't always GET what you want!

Web ServicesXML + XML Schema

Semantic WebRDF + OWL

Page 5: SADI SWSIP '09  'cause you can't always GET what you want!

Web ServicesPOST of SOAP-XML

Semantic WebGET of RDF-XML

Page 6: SADI SWSIP '09  'cause you can't always GET what you want!

Web ServicesNo (rigorous) semantics

Semantic WebRich, flexible semantics

Page 7: SADI SWSIP '09  'cause you can't always GET what you want!

Web Services&

Semantic Web

Fundamentally different technologies!

Page 8: SADI SWSIP '09  'cause you can't always GET what you want!
Page 9: SADI SWSIP '09  'cause you can't always GET what you want!

>1000 X more data in the Deep Web than in Web pages

In bioinformatics this is primarily databases and

analytical algorithms

Page 10: SADI SWSIP '09  'cause you can't always GET what you want!

Web Service output is

critical to success

for the Semantic Web!!

Page 11: SADI SWSIP '09  'cause you can't always GET what you want!

RESEARCH QUESTION

Page 12: SADI SWSIP '09  'cause you can't always GET what you want!

WITHOUT INVENTING A NEW

TECHNOLOGY OR STANDARD

Page 13: SADI SWSIP '09  'cause you can't always GET what you want!

Can we define

frameworks and/or best practices

that make Web Services

“transparent” to the Semantic Web?

Page 14: SADI SWSIP '09  'cause you can't always GET what you want!

Semantic Web?

An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had

not anticipated.

Page 15: SADI SWSIP '09  'cause you can't always GET what you want!

Semantic Web?

If we cannot achieve those two things, then IMO we don’t have a “semantic web”, we only have a distributed (??), linked database… and that isn’t

particularly exciting…

Page 16: SADI SWSIP '09  'cause you can't always GET what you want!

What do we need to do?

Make Web Services exhibit the same “behavior” as GET

Add rich Semantics to Web Services at every level of the WS “stack”

Page 17: SADI SWSIP '09  'cause you can't always GET what you want!

What do we have to work with?(Without inventing new technologies or standards…)

Data

Interface annotations

Page 18: SADI SWSIP '09  'cause you can't always GET what you want!

Founding partner

Semantic Automated Discovery and Integrationhttp://sadiframework.org

Page 19: SADI SWSIP '09  'cause you can't always GET what you want!

History of SADI

Based on observations of the usage and behaviour of BioMoby Semantic Web Services

since 2002

Page 20: SADI SWSIP '09  'cause you can't always GET what you want!

SADI Observation #1:

What does ‘GET’ really do?

What “behaviour” of GET do we need to mimic

in a Web Service?

Page 21: SADI SWSIP '09  'cause you can't always GET what you want!

SADI Observation #1:

GET guarantees the response relates to the input URI in a very precise and predictable way

POST does not…

Web Services have a fundamentally different semantic behaviour than the Semantic Web

Page 22: SADI SWSIP '09  'cause you can't always GET what you want!

SADI Observation #1:

We can fix that!

(without breaking any existing rules or standards!)

Page 23: SADI SWSIP '09  'cause you can't always GET what you want!

SADI Observation #2:

All Web Services in Bioinformatics create implicit biological relationships

between their input and output

Page 24: SADI SWSIP '09  'cause you can't always GET what you want!

SADI Observation #2:

Page 25: SADI SWSIP '09  'cause you can't always GET what you want!

SADI: Core Proposition

Make the implicit explicit

Model all Web Services this way

Page 26: SADI SWSIP '09  'cause you can't always GET what you want!

SADI: Core Proposition

Web Service output must be Triples

SUBJECT URI of the output graph is the same URI

as the SUBJECT URI of the input graphthat was POSTed to the Web Service

Define OWL classes that describe your input and output triples

Page 27: SADI SWSIP '09  'cause you can't always GET what you want!

Consequence(i.e. why SADI works!)

The Semantics of the triples from a Web Service are now

explicit and identical to the Semantics of GET

Page 28: SADI SWSIP '09  'cause you can't always GET what you want!

How do we exploit this?

Current Web Service definition systems such as WSDL, and OWL-S

are missing a crucial annotation element…

…the semantic relationship between input and output

Page 29: SADI SWSIP '09  'cause you can't always GET what you want!

How do we exploit this?

With SADI services, this relationship can be automatically derived by “diff-ing” the input and output OWL classes

…because the SUBJECT node of the input and output graphs is guaranteed to be the same!

Page 30: SADI SWSIP '09  'cause you can't always GET what you want!

How do we exploit this?

We have constructed a simple prototype

Web Services registry that

provides discovery based on

annotation of these biological relationships

Page 31: SADI SWSIP '09  'cause you can't always GET what you want!

Demo #1a plug-in to the

IO-Informatics Knowledge Explorer

Page 32: SADI SWSIP '09  'cause you can't always GET what you want!

The Sentient Knowledge Explorerfrom IO Informatics

A Semantic Integration, Visualization and Search Tool

Page 33: SADI SWSIP '09  'cause you can't always GET what you want!

SADI Plug-in to the Sentient Knowledge Explorer

• The Knowledge Explorer has a plug-ins API that we have utilized to pass data directly from SADI to the KE visualization and exploration system

• Data elements in KE are queried for their “type” (rdf:type); this is used to discover additional data properties by querying the SADI Web Service registry

Page 34: SADI SWSIP '09  'cause you can't always GET what you want!

BACKGROUND of DATASET

NIST toxicity compendium study

8 toxicantsLiver

SerumUrine

Genomic [Affymetrix MA]Metabolomic [LC/MS] data

• conducted under NIST Advanced Technology Program (ATP), Award # 70NANB2H3009 as a Joint Venture between Icoria / Cogenics (Division of CLDA) and IO Informatics.

• Microarray studies were conducted under NIEHS contract # N01-ES-65406. • The Alcohol study was conducted under NIAAA contract # HHSN281200510008C in

collaboration with Bowles Center for Alcohol Studies (CAS) at UNC. • This work was also made possible through contributions from members of IO

Informatics’ Working Group on “Semantic Applications for Translational Research”.

Page 35: SADI SWSIP '09  'cause you can't always GET what you want!

DEMOKnowledge Explorer

Plug-in

For more information about the Knowledge Explorer surf to:http://io-informatics.com

Page 36: SADI SWSIP '09  'cause you can't always GET what you want!
Page 37: SADI SWSIP '09  'cause you can't always GET what you want!
Page 38: SADI SWSIP '09  'cause you can't always GET what you want!
Page 39: SADI SWSIP '09  'cause you can't always GET what you want!
Page 40: SADI SWSIP '09  'cause you can't always GET what you want!

Call SADI to provide discovery of predicates based on data-type

Page 41: SADI SWSIP '09  'cause you can't always GET what you want!
Page 42: SADI SWSIP '09  'cause you can't always GET what you want!

SADI fills-in the “Encodes” property from the discovered Web Service(s)

Page 43: SADI SWSIP '09  'cause you can't always GET what you want!
Page 44: SADI SWSIP '09  'cause you can't always GET what you want!

Discover services that provide the hasGOTerm property for Protein Sequence datatype

Page 45: SADI SWSIP '09  'cause you can't always GET what you want!
Page 46: SADI SWSIP '09  'cause you can't always GET what you want!

DEMOKnowledge Explorer

Plug-in

For more information about the Knowledge Explorer surf to:http://io-informatics.com

Page 47: SADI SWSIP '09  'cause you can't always GET what you want!

SADI Demo #2

Imagine there is a “virtual graph” connecting every conceivable input for every conceivable Web Service

to their respective outputs

How do we query that graph?

Page 48: SADI SWSIP '09  'cause you can't always GET what you want!

“SHARE”

Semantic Health And Research EnvironmentSADI client application

Page 49: SADI SWSIP '09  'cause you can't always GET what you want!
Page 50: SADI SWSIP '09  'cause you can't always GET what you want!

What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .

}

Page 51: SADI SWSIP '09  'cause you can't always GET what you want!
Page 52: SADI SWSIP '09  'cause you can't always GET what you want!
Page 53: SADI SWSIP '09  'cause you can't always GET what you want!
Page 54: SADI SWSIP '09  'cause you can't always GET what you want!

Recapwhat we just saw

A standard SPARQL query was entered into SHARE – a SADI-based query engine

The query was interpreted to extract the properties being queried and these were passed to SADI for Web

Service discovery

SADI searched-for, found, and accessed the databases and/or analytical tools capable of

generating those properties

Page 55: SADI SWSIP '09  'cause you can't always GET what you want!

Recapwhat we just saw

We posed, and answered a complex database query

WITHOUT A DATABASE

(in fact, the data didn’t even have to exist...)

Page 56: SADI SWSIP '09  'cause you can't always GET what you want!

RDF graph browsing and querying is useful…

…but where’s the semantics??

Let’s bring in some ontologies!

Page 57: SADI SWSIP '09  'cause you can't always GET what you want!

My Definition of Ontology (for this talk)

Ontologies explicitly define the things that exist in “the world”

based on what properties each kind of thing must have

Page 58: SADI SWSIP '09  'cause you can't always GET what you want!

Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrowerterm”relation

Formalis-a

Frames(Properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

Page 59: SADI SWSIP '09  'cause you can't always GET what you want!

Semantic Web?

An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had

not anticipated.

Page 60: SADI SWSIP '09  'cause you can't always GET what you want!

Dynamic

Discovery

Distributed

Interpretation

Re-interpretation

Page 61: SADI SWSIP '09  'cause you can't always GET what you want!

Powered by SADI

Prototype SADI-enabled SWS Client

Page 62: SADI SWSIP '09  'cause you can't always GET what you want!

Data exhibits “late binding”

Page 63: SADI SWSIP '09  'cause you can't always GET what you want!

Late binding:

“purpose and meaning” of the data is

not determined untilthe moment it is required

Page 64: SADI SWSIP '09  'cause you can't always GET what you want!

Benefitof late binding

Data is amenable toconstant re-interpretation

Page 65: SADI SWSIP '09  'cause you can't always GET what you want!

How do we achieve this?

DO NOT PRE-CLASSIFY DATA!

just hang properties on it

Page 66: SADI SWSIP '09  'cause you can't always GET what you want!

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrowerterm”relation

Formalis-a

Frames(Properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

These are indexing/annotation systems

These are axioms that enable classification

Page 67: SADI SWSIP '09  'cause you can't always GET what you want!

Design OWL-DL ontologies describing cardiovascular concepts

Ontologies are in the “Frames+” area of the Ontology spectrum, and therefore can leverage SADI and be

“executed” as workflows

Page 68: SADI SWSIP '09  'cause you can't always GET what you want!

What the…???

Did you just say “execute ontologies as workflows?!”

Page 69: SADI SWSIP '09  'cause you can't always GET what you want!

Another interesting consequence of the SADI architecture

Ontology = Query = Workflow

Page 70: SADI SWSIP '09  'cause you can't always GET what you want!
Page 71: SADI SWSIP '09  'cause you can't always GET what you want!

WORKFLOW

Page 72: SADI SWSIP '09  'cause you can't always GET what you want!

QUERY:

SELECT images of mutations from genes in organism XXX that share homology to this gene in

organism YYY

Page 73: SADI SWSIP '09  'cause you can't always GET what you want!

Concept:

“Homologous Mutant Image”

Page 74: SADI SWSIP '09  'cause you can't always GET what you want!

As OWL Axioms

HomologousMutantImage is owl:equivalentTo {

Gene Q hasImage image P

Gene Q hasSequence Sequence Q

Gene R hasSequence Sequence R

Sequence Q similarTo Sequence R

Gene R = “my gene of interest” }

Page 75: SADI SWSIP '09  'cause you can't always GET what you want!

Those axioms combine to create an OWL Class:

homologous mutant images

Page 76: SADI SWSIP '09  'cause you can't always GET what you want!

QUERY:

Retrieve owl:homologous

mutant images for gene XXX

Page 77: SADI SWSIP '09  'cause you can't always GET what you want!

Demo #3Discover instances of OWL classes

from data that doesn’t exist…

Page 78: SADI SWSIP '09  'cause you can't always GET what you want!

Show me patients whose creatinine level is increasing over time, along with their latest BUN and creatinine levels.

PREFIX regress: <http://sadiframework.org/examples/regression.owl#> PREFIX patients: <http://sadiframework.org/ontologies/patients.owl#> PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient patients:creatinineLevels ?collection . ?collection regress:hasRegressionModel ?model . ?model regress:slope ?slope

FILTER (?slope > 0) .?patient pred:latestBUN ?bun . ?patient pred:latestCreatinine ?creat .

}

Page 79: SADI SWSIP '09  'cause you can't always GET what you want!
Page 80: SADI SWSIP '09  'cause you can't always GET what you want!
Page 81: SADI SWSIP '09  'cause you can't always GET what you want!
Page 82: SADI SWSIP '09  'cause you can't always GET what you want!

Show me patients whose creatinine level is increasing over time, along with their latest BUN and creatinine levels (now as an OWL class ElevatedCreatininePatient)

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patients: <http://sadiframework.org/ontologies/patients.owl#> PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient rdf:type patients:ElevatedCreatininePatient .?patient pred:latestBUN ?bun . ?patient pred:latestCreatinine ?creat .

}

Page 83: SADI SWSIP '09  'cause you can't always GET what you want!

Start burrowing through the OWL class find that we need aregression model OWL class

Page 84: SADI SWSIP '09  'cause you can't always GET what you want!

Regression models have features like slopes and intercepts… and so onThe class is completely decomposed until a set of required Services are discoveredcapable of creating all these necessary properties

Page 85: SADI SWSIP '09  'cause you can't always GET what you want!

Successful decomposition of the OWL class to discover the need for a LinearRegression Web Service, and so on

Page 86: SADI SWSIP '09  'cause you can't always GET what you want!

VOILA!

Page 87: SADI SWSIP '09  'cause you can't always GET what you want!

Demo #3Discover instances of OWL classes

from data that doesn’t exist…

Page 88: SADI SWSIP '09  'cause you can't always GET what you want!

SADI and CardioSHARE

OWL Class restrictions converted into workflows

SPARQL queries converted into workflows

Reasoning happening at the same time as queries are being executed

Page 89: SADI SWSIP '09  'cause you can't always GET what you want!

(AFAIK)

This is a completely novel behaviour!

Page 90: SADI SWSIP '09  'cause you can't always GET what you want!

Ontologies are reasoned

Workflows are executed

Queries are resolved

so what do we call what this client does??

Page 91: SADI SWSIP '09  'cause you can't always GET what you want!

“Reckoning”Dynamic discovery of instances of OWL classes through synthesis and invocation of a Web Service

workflow capable of generating data described by the OWL Class restrictions, followed by reasoning to classify the data into that ontology

Page 92: SADI SWSIP '09  'cause you can't always GET what you want!

But OWL Classes can be (and often are) completely

“made-up”

What does that mean in the context of SADI and CardioSHARE?

Page 93: SADI SWSIP '09  'cause you can't always GET what you want!

Current Research

We believe that ontologies and hypotheses are, in some ways, the same “thing”…

…simply assertions about individuals that may or may not exist

If an instance of a class exists, then the hypothesis is “validated”

Page 94: SADI SWSIP '09  'cause you can't always GET what you want!

Current Research Constructing OWL classes that represent patient categories mimicking clinical outcomes research experiments done on the BC Cardiac Registry

The original experiments have already been published, and therefore act as a gold-standard

We attempt to use “Reckoning” to automatically discover instances of these patient Classes and compare our results with those of the clinical researchers

Page 95: SADI SWSIP '09  'cause you can't always GET what you want!
Page 96: SADI SWSIP '09  'cause you can't always GET what you want!

CardioSHARE architecture: Increasingly complex ontological layers organize data into richer concepts, even hypotheses

Blood Pressure

Hypertension

Ischemia

Hypothesis

Database 1 Database 2 AnalysisAlgorithmXX

SADIWeb

“agents”

Page 97: SADI SWSIP '09  'cause you can't always GET what you want!

Recap

SADI Semantic Web Services generate triples; the predicates of those triples are indexed... Period!!

For a given query, determine which properties are available, and which need to be

discovered/generated

Discovery of services through index queries + on-the-fly “classification” of local data against the

service interface OWL Classes

Page 98: SADI SWSIP '09  'cause you can't always GET what you want!

Semantic Web

An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had

not anticipated.

Page 99: SADI SWSIP '09  'cause you can't always GET what you want!

What SADI + SHARE supports

Re-interpretation :

The SADI data-store simply collects properties, and matches them up with OWL Classes in a SPARQL query and/or

from individual service provider’s Web Service interface

Page 100: SADI SWSIP '09  'cause you can't always GET what you want!

Novel re-use:

Because we don’t pre-classify, there is no way for the provider to dictate how their

data should be used. They simply add their properties into the “cloud” and

those properties are used in whatever way is appropriate for me.

What SADI + SHARE supports

Page 101: SADI SWSIP '09  'cause you can't always GET what you want!

Data remains distributed – no warehouse!

Data is not “exposed” as a SPARQL endpoint greater provider-control over

computational resources

Yet data appears to be a SPARQL endpoint… no modification of SPARQL or

reasoner required.

Important “wins”

Page 102: SADI SWSIP '09  'cause you can't always GET what you want!

And all this because we simply require

that the input URI

is the same as the output URI

Page 103: SADI SWSIP '09  'cause you can't always GET what you want!

Join us!

SADI and CardioSHARE are Open-Source projects

Come join us – we’re having a lot of fun!!

http://sadiframework.org

Page 104: SADI SWSIP '09  'cause you can't always GET what you want!

Fin