introduction to the semantic web and bio2rdf, the “semantic web atlas of postgenomic knowledge ”

55
1 Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledgeMichael Grobe Biomedical Applications Group Research Technologies University Information Technology Services Indiana University

Upload: laird

Post on 10-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ” Michael Grobe Biomedical Applications Group Research Technologies University Information Technology Services Indiana University. This presentation in perspective - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

1

Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of

postgenomic knowledge”

Michael Grobe

Biomedical Applications GroupResearch Technologies

University Information Technology ServicesIndiana University

Page 2: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

2

This presentation in perspectiveThis is actually one of a series of presentations on Linked Data Web and graph database technologies:

- Introduction to ontologies- RDF, Jena, SparQL, and the “Semantic Web”- This presentation on Bio2RDF- OWL and inference over ontologies

In general, these Semantic technology topics seem “deceptively simple,” but are fraught with complications, limitations, and qualifications…especially when the casual user attempts to compare them with relational data approaches to the same or similar problems.

Page 3: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

3

TopicsSimple introduction to the semantic approach - sentences as triples and graphs - sentences encoded using URIs - transcending the data/metadata dichotomy with “sentence stores”

Introduction to SparQL

Free-standing query clients: Twinkle, RDF-gravity, Explorator

Bio2RDF atlas (warehouse) contents

Bio2RDF queries using Virtuoso SparQL and iSparQL endpoints

The Bio2RDF proxy relay service, and the tabulator

Discussion of the semantic approach

Page 4: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

4

Sentences

Here is some information in sentence form:

Smith has age 21. Jones has age 45. Blake has age 12. George has age 21. Smith has favorite friend Jones. Jones has favorite friend Smith. Blake has favorite friend Blake. George has favorite friend Smith.

Note that each sentence has the form:

Subject Predicate Object

also known as

Entity Property Value

Page 5: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

5

A “Sentence base”If someone hadn’t already done it, we could invent a “sentence base” to hold these sentences, but W3C has already done it.

To help with manipulation and searching, each grammatical component is stored separately, so that each sentence has a “triple” form like:

Subject Predicate Object Smith has age 21 Jones has age 45 Blake has age 12 George has age 21 Smith has favorite friend Jones Jones has favorite friend Smith Blake has favorite friend Blake George has favorite friend Smith

Page 6: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

6

Sentences

We can query such information with queries like:

“Someone has friend Smith”

where “Someone” acts like a “variable” and “resolves” as the list:

Jones George

because the pattern “Someone has friend Smith” matches both triples:

Jones has_favorite_friend Smith George has_favorite_friend Smith

and we can interpret a more complicated query like:

"Someone has friend Smith and has age 21”

as a pair of requirements:

"Someone has friend Smith” and "Someone has age 21“

where we mean that same someone has both characteristics . . . in which case Someone will resolve as "George“, since George is the only “Someone” who satisfies both requirements via the following triples:

George has age 21 George has_favorite_friend Smith

Page 7: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

7

Using graphs used to represent sentences

If we want to complicate things, we can also represent the same information in “graph form” as with these 2 graphs that represent the 2 kinds of information in the collection of sentences:

Graph #1: Person ages Graph #2: Favorite Friends

Typically we don’t really want to complicate these issues, but the semantic web literature often “thinks” in graph terms, so it’s a good idea to cover the basic idea.

Page 8: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

8

Using graphs to represent sentencesHere the 2 graphs are combined using named edges to represent 2 kinds of information associated with the same 4 persons.

Graph #3: Person ages (:age) and favorite friends (:fav)

Each arc represents the “predicate” of a sentence, connecting a “subject” with an “object”. (Note that a subject may have >= 0 arcs of each type.)

Page 9: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

9

Using URIs and URLs to represent predicates: Metadata!

Now if it hadn’t already happened someone would come up with the idea to use URLs to point to Web documents that describe the “exact” meaning of each predicate, or “metadata”.

For example, “http://CelebrityMagazine.com/fav” could contain a definition of “favorite friend”, and other documents would define “BFF”, “long-time-friend”, “family-friend”, “friends with benefits”, etc,

And, in fact, these definitions could themselves refer to other definitions like some “superset” of relationships such as:

http://CelebrityMagazine.com/personal_relationships

or the personal_relationships file could include a collection of subset definitions that we might refer to like:

http://CelebrityMagazine.com/personal_relationships#fav

using the # convention for targeting a specific location within a URL.

Note that this form of metadata is not the only useful form of metadata, but it is clearly integrated with the data in a unique fashion. (The basic triplet structure of each sentence represents another (implicit) form of metadata.)

Page 10: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

10

The sentences as a set of 8 triples (2 for each person) |-------------------------------------| | Subject | Predicate | Object | ======================================= | “Blake” | example:fav | “Blake” | | “Blake” | info:has_age | "12" |

| “Jones” | example:fav | “Smith” | | “Jones” | info:has_age | "35" |

| “George” | example:fav | “Smith” | | “George” | info:has_age | "21" |

| “Smith” | example:fav | “Jones” | | “Smith” | info:has_age | "21" | ---------------------------------------

Here the abbreviation “example:” stands for

http://CelebrityMagazine.com/personal_relationships#

and the abbreviation “info” stands for some imaginary web page that defines age, let’s say

http://demographicstats.org/characteristics#”.

Page 11: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

11

Representing sentence components using URIs

To specify exactly which person named “Blake”, “Smith”, etc. we are referring to, we can again use URIs.------------------------------------------------------------------------------| Subject | Predicate | Object |===============================================================================| <http://fake.host.edu/blake> | example:fav | <http://fake.host.edu/blake> || <http://fake.host.edu/blake> | info:has_age | "12" | | <http://fake.host.edu/jones> | example:fav | <http://fake.host.edu/smith> || <http://fake.host.edu/jones> | info:has_age | "35" | | <http://fake.host.edu/george> | example:fav | <http://fake.host.edu/smith> || <http://fake.host.edu/george> | info:has_age | "21" | | <http://fake.host.edu/smith> | example:fav | <http://fake.host.edu/jones> || <http://fake.host.edu/smith> | info:has_age | "21" |-------------------------------------------------------------------------------

Here the abbreviation “example:” stands for

http://CelebrityMagazine.com/personal_relationships#

and the abbreviation “info” stands for some imaginary web page that defines age, let’s say

http://demographicstats.org/characteristics#”.

Page 12: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

12

Triplestore summary and outrageous claims

Sentences may be represented as a collection of triples.

Sentences in triple form are stored in “triplestores” or “quad stores” (when many are stored together).

Triples will contain URIs that:

- serve to identify and/or reference predicate definitions, and object data types, and

- identify and/or name “resources”: subjects and/or objects.

IMHO, triplestores do NOT contain “data”. They contain “sentences”, “information”, or “assertions” (not necessarily true or correct assertions).

One might even say that the semantic approach transcends the data/meta-data dichotomy because the triple format provides implicit metadata, and because predicates link to metadata and/or the option to link to metadata in every triple, and because subjects and objects often link to external resources.

Page 13: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

13

Triples may be serialized in various forms:

- using the N3 version of Turtle to create files that look like the previous example with each line holding 3 URIs (and ending with a “.”)

- using the Resource Description Format (RDF), as in this encoding of the Smith information (with non-dereferenceable URIs):<rdf:RDF   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"   xmlns:example="http://fake.host.edu/example-schema#">

 <example:Person rdf:about=“http://fake.host.edu/smith”>   <example:name>Smith</example:name>   <example:age>21</example:has_age> <example:fav rdf:resource=“http://fake.host.edu/jones”/> </example:Person>          

</rdf:RDF>

Page 14: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

14

Dereferenceable URI version of the Smith RDF triple

- using the Resource Description Format (RDF), as in this encoding of the Smith information including “dereferenceable” URIs:

<rdf:RDF   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"   xmlns:example="http://fake.host.edu/example-schema#">

 <example:Person rdf:about=“http://discern.uits.iu.edu:8421/smith”>   <example:name>Smith</example:name>   <example:age>21</example:has_age> <example:fav rdf:resource=“http://discern.uits.iu.edu:8421/jones”/> </example:Person>          

</rdf:RDF>

Page 15: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

15

Browse RDF documents

Here is a view of the Smith RDF file from within Firefox using the Tabulator plug-in:

You can click on the jones.rdf link to see the Jones record, and browse from there, or choose the Person link to examine its definition (if its dereferenceable).

Page 16: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

16

The “Semantic Web”

In general, if URIs are dereferenceable they can link into a “Gigantic Global Graph”, usually know as the Linked Data Web or the “Semantic Web,” with RDF as one of W3C’s Semantic Web architectural levels.

“If HTML and the Web make all online documents look like one huge book, RDF, schema, and inference languages will make all the data in the world look like on huge database.” --TimBL

Page 17: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

17

Documents in RDF and N3 format may be interrogated:

- by physical inspection (for anyone willing to read XML)

- by writing programs (in Jena, for example) that read RDF files, construct the represented graphs internally, and then

- access graph triples in sequential order,- select triples according to specified content, and/or- apply SparQL queries and access results in sequential order

- using command-line tools that apply SparQL queries, and/or

- using GUI interfaces accepting SparQL queries- written in text, or- represented graphically

Page 18: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

18

A SparQL example

If http://discern.uits.iu.edu:8421/all-persons.rdf

contains all the triples listed earlier, then this SparQL query should find all the triples related to “smith”:

select $p $ofrom <http://fake.host.edu:8421/all-persons.rdf>where{ <http://discern.uits.iu.edu:8421/smith.rdf> $p $o .}

Intuitively, this query asks “Smith has what relationship(s) to whom/what?”and should identify these 2 value pairs:

<http://fake.host.edu/example-schema#fav> <http://discern.uits.iu.edu:8421/jones.rdf> <http://fake.host.edu/example-schema#age> "21”

$p, $o are variable names that were each assigned a value as the query was “satisified.” Variable names may also start with “?”.

Page 19: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

19

Another SparQL exampleIf http://discern.uits.iu.edu:8421/all-persons.rdf

contains all the triples listed earlier, then this SparQL query simply asks for a list of all those triple values:

select *from <http://discern.uits.iu.ed:8421/all-persons.rdf>where{ $sub $pred $obj .}

Intutitively, this query asks “Who has what relationship to whom?”

$sub, $pred, and $obj will each be assigned one or more values as the query is satisified and all three will be printed (*).

(Note that “$sub $pred $obj .” is a triple pattern in the Turtle/N3 format.)

Page 20: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

20

Results of the single file SparQL query

--------------------------------------------------------------------------| sub | pred | obj |==========================================================================| http://...8421/blake.rdf | example:fav | http://...8421/blake.rdf || http://...8421/blake.rdf | example:has_age | "12" | | http://...8421/jones.rdf | example:fav | http://...8421/smith.rdf || http://...8421/jones.rdf | example:has_age | "35" | | http://...8421/george.rdf | example:fav | http://...8421/smith.rdf || http://...8421/george.rdf | example:has_age | "21" | | http://...8421/smith.rdf | example:fav | http://...8421/jones.rdf || http://...8421/smith.rdf | example:has_age | "21" |--------------------------------------------------------------------------

where “…” indicates “discern.uits.iu.edu:”.

Page 21: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

21

A “distributed” SparQL query against 4 separate RDF files

The next query searches 4 dereferenceable files holding the same data broken into 4 files, one for each subject:

select *from <http://discern.uits.iu.edu:8421/smith.rdf>from <http://discern.uits.iu.edu:8421/jones.rdf>from <http://discern.uits.iu.edu:8421/george.rdf>from <http://discern.uits.iu.edu:8421/blake.rdf>where{ $sub $pred $obj .}

The results of this query will be the same as the results for the single file query (though order my vary due to remote URL access latency).

Page 22: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

22

Use SparQL to find the predicates

This SparQL example query simply asks for a list of all the unique predicates that occur in all the triples:

select distinct $pfrom <http://discern...8421/friend-network.rdf>where{ $s $p $o .}

If you don’t use “distinct” you will get multiple occurrences of the same predicate.

This can be very useful when you are trying to figure out what predicates are available to interrogate a triplestore that you don’t know much about.

Page 23: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

23

SparQL (incomplete) basic syntax :

SELECT some_variable_list FROM <some_RDF_source_URI> WHERE { { some_n3_triple_pattern . another n3_triple_pattern . }

Notes: - the “<“ and “>” characters are required; and “[“ and “]” surround optional content. - other commands in place of SELECT are: CONSTRUCT, ASK and DESCRIBE, - * is a valid variable list, specifying any variable returned by the query engine, and

may be preceded by DISTINCT, which will prevent duplicate triples - there may be multiple FROM clauses, whose targets will be combined and treated as

a single store, - a “.” separating multiple triple patterns is intuitively similar to an “and” operator (but

actually behaves like an SQL natural join, - the term WHERE is optional, and may be omitted.

SparQL reference: http://www.dajobe.org/2005/04-sparql/SPARQLreference-1.8.pdf

Page 24: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

24

Optional clauses in SparQL queries

Permitted within “where” clauses:

optional { triple_pattern }: identifies a triple that need not appear in an RDF target but whose absence will not prohibit a pattern match.

filter: restricts variable matches in the preceding triple to specified filter patterns, as in:

{ $s $p $date FILTER ( $date > "2005-01-01T00:00:00Z"^^xsd:dateTime ) }or { $s $p $d FILTER ( xsd:dateTime( $d ) < xsd:dateTime( "2005-01-01T00:00:00Z“ ) ) }or { ?s ?p ?name FILTER regex( ?name, "^smi", “some_flag“ ) }

union: “where” clauses may be constructed as

{ triple_pattern_1 } UNION { triple_pattern_2 }

and any RDF element matching either of these triples will be included in the resulting output.

Permitted following the “where” clause:

order by [DESC|ASC| ] ( variable_list )limit n: print up to n return values.offset n: start output with the nth return value.

Page 25: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

25

Some useful SparQL pattern patterns

Display two property values of some entity (<some_URI>) on the same line:

select *where { <some_URI> <some_predicate> ?o . <the_same_URI> <some_other_predicate> ?o1.}

Example using the friend information and PREFIX statements:

PREFIX example: <http://CelebrityMagazine.com/personal_relationships#>

PREFIX info: <http://demographicstats.org/characteristics#> select *where { <http://fake.host.edu/smith> example:fav ?favorite . <http://fake.host.edu/smith> info:has_age ?age .}

Page 26: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

26

Some useful SparQL pattern patterns

Merge results of 2 pattern matches into a single output column:

select *where { { <some_URI> <some_predicate> ?o . } UNION { <some_other_URI> <some_other_predicate> ?o . }}

Example:

PREFIX example: <http://CelebrityMagazine.com/personal_relationships#>

PREFIX info: <http://demographicstats.org/characteristics#>

select *where { { <http://fake.host.edu/smith> example:fav ?values .} UNION { <http://fake.host.edu/smith> info:has_age ?values . }}

Page 27: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

27

Some useful SparQL pattern patterns

Slowly find all triples whose object components mentions hexokinase:

select *where { ?s ?p ?o . FILTER regex( $o, "hexokinase" ) .}

Quickly find all entries with object components mentioning hexokinase, but works only within a Virtuoso triplestore when applied to indexed graphs (and will return nothing when applied to a non-indexed graph):

select *where { ?s1 ?p1 ?o1 . ?o1 bif:contains "hexokinase" .}

Page 28: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

28

SparQL desktop client: Twinkle (version of the upward paths query)

Page 29: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

29

SparQL desktop client: RDF-gravity (the friend data)

Page 30: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

30

SparQL desktop client: Explorator RDF explorerThe Explorator can download (extracts from) multiple RDF resources, and manipulate them in combination. Here with the Russian lakes example.

This approach provides an interface using a set algebra model of data manipulation. (See Araujo, et al. and http://139.82.71.60:3000/explorator)

Page 31: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

31

SparQL endpoints

Triplestores like the Virtuoso Universal Database System and the D2R gateway will take SparQL queries through several interfaces:

- encoded in URLs addressed to the triplestore servers, like

http://dbpedia.org/sparql?query=SELECT distinct * WHERE { $s $p $o . $o bif:contains “Goethe_Johann_Wolfgang” . }

- entered into Web forms that present text areas into which one can enter queries, as on the next page

Page 32: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

32

SparQL endpoints

Page 33: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

33

Using SparQL endpoints to get RDF documentsDocuments returned by SparQL queries are not RDF documents. They may not have triples and they are structured for display or storage in HTML, Excel or some other format.

However, you can use the CONSTRUCT command (in place of SELECT) within a SparQL query to build an RDF formatted response.

construct{ ?o <http://www.w3.org/2000/01/rdf-schema#comment> ?q } where { <http://bio2rdf.org/go:0004003 <http://bio2rdf.org/ns/go#is_a> ?o . ?o <http://www.w3.org/2000/01/rdf-schema#comment> ?q .}

The structure of the triple to be created is defined in the “construct” clause, and the returned document is shown on the next page.

You can also send a CONSTRUCT query to a SparQL endpoint embedded within a URL, as in (here shown without the required URL encodings):

http://discern.uits.iu.edu:8890/sparql?query=construct { ?o <http://www.w3.org/2000/01/rdf-schema#comment> ?q } where { <http://bio2rdf.org/go:0004003> <http://bio2rdf.org/ns/go#is_a> ?o . ?o <http://www.w3.org/2000/01/rdf-schema#comment> ?q . }

Do this while using clients like the Explorator to get extracts from very large triplestores for local manipulation. Otherwise such triplestores may not be locally manageable.

Page 34: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

34

Using SparQL endpoints to get RDF documentsHere’s what the previous CONSTRUCT query will return (edited for readability). These are the parents of go:0004003:

<?xml version="1.0" encoding="utf-8" ?><rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<rdf:Description rdf:about="http://bio2rdf.org/go:0008094"> <rdfs:comment>Catalysis of the reaction: ATP + H2O = ADP + phosphate in the presence of single- or double-stranded DNA; drives another reaction. </rdfs:comment></rdf:Description>

<rdf:Description rdf:about="http://bio2rdf.org/go:0003678"> <rdfs:comment>Catalysis of the reaction: NTP + H2O = NDP + phosphate to drive the unwinding of a DNA helix. </rdfs:comment></rdf:Description>

<rdf:Description rdf:about="http://bio2rdf.org/go:0008026"> <rdfs:comment>Catalysis of the reaction: ATP + H2O = ADP + phosphate to drive the unwinding of a DNA or RNA helix. </rdfs:comment></rdf:Description>

</rdf:RDF>

Page 35: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

35

Bio2RDF: Atlas of postgenomic knowledge

Bio2RDF integrates some 40 biomedical information resources (such as GO, Uniprot, etc.) or extracts recoded in RDF:

- currently runs over the Virtuoso Universal Database server at http://atlas.bio2rdf.org

but each resource has its own SparQL endpoint, in addition to the endpoint accessing the unified triplestore: http://atlas.bio2rdf.org/sparql

- a list of included resources is at (http://www.freebase.com/view/user/bio2rdf/public/sparql)

and includes links to the SparQL endpoint for each resource, as well as descriptions of the resource contents and triple counts.

- there is also a Bio2RDF proxy service that takes queries and relays them to multiple distributed servers (examples later).

Page 36: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

36

Resources included in Bio2RDF

(downloadable from http://quebec.bio2rdf.org/download/n3/)

GO KEGGOMIM HGNCPUbMed INOHGeneID IProClassUniProt MGIUniRef CellMapUniParc BioPAXKegg Pathway InterProCPATH PfamReactome PROSITEBiocyc ProteinMeSH SIDPDB CIDCPD: Kegg Ligand for chemical compound PubChemGL: Kegg Ligand for carbohydrate structure UniSTSEC HomologeneRN Kegg Ligand for chemical reaction DBpediaDR: Kegg Ligand for drugs OBO CheBITaxonomy: NEWT AffymetrixPID Biocarta

Page 37: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

37

Bio2RDF resources

(Edge width is proportional to link density.)

Page 38: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

38

Local Bio2RDF (partial) mirrorResearch Technologies has installed a TEST version of a Virtusoso server hosting a PART of Bio2RDF (bind, GO, and IPROCLASS only) running locally on discern.uits.iu.edu. 

You can reach its SparQL endpoint at

   http://discern.uits.iu.edu:8890/sparql  (Firefox and IE) 

The isparql endpoint is only usable via Firefox, and is accessible at http://discern.uits.iu.edu:8890/isparql  (Firefox only)       (Choose "Cancel" in the Preferences pop-up to use it.)

Note that these endpoints are only available in TEST mode; they could go away at any time.

Page 39: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

39

Find parents of GO:0004003 in the local Bio2RDF GO graph using the SparQL endpoint

select *where{ <http://bio2rdf.org/go:0004003> <http://bio2rdf.org/ns/go#is_a> $parent .}

Result:

-----------------------------------| parent |===================================| <http://bio2rdf.org/go:0008094> || <http://bio2rdf.org/go:0008026> || <http://bio2rdf.org/go:0003678> |-----------------------------------

Page 40: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

40

Find all 3-element paths up from GO:0004003PREFIX go: <http://bio2rdf.org/ns/go#>select *where{ <http://bio2rdf.org/go:0004003> go:is_a $a . $a go:is_a $b . $b go:is_a $c .}

Note the use of the PREFIX to define an abbreviation that will be substituted for the string “go:”.

Also, you can speed up this search by specifying http://bio2rdf.org/go

as the “Default Graph URI” (so the other graphs will be ignored).

Page 41: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

41

Find all 3-element paths up from GO:0004003 using Bio2RDF

a b c

http://bio2rdf.org/go:0008026 http://bio2rdf.org/go:0070035 http://bio2rdf.org/go:0004386

http://bio2rdf.org/go:0008026 http://bio2rdf.org/go:0042623 http://bio2rdf.org/go:0016887

http://bio2rdf.org/go:0003678 http://bio2rdf.org/go:0004386 http://bio2rdf.org/go:0017111

http://bio2rdf.org/go:0008094 http://bio2rdf.org/go:0042623 http://bio2rdf.org/go:0016887

Page 42: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

42

Find all 3-element paths up from GO:0004003 using SQL within CLSD

select a.parent_id, b.parent_id, c.parent_id from GO.molecular_function_DAG a join GO.molecular_function_DAG b on a.parent_id = b.child_id join GO.molecular_function_DAG c on b.parent_id = c.child_id where a.child_id like 'GO:0004003‘

This query is posed as a series of joins on the GO.molecular_function_DAG just as the SparQL version uses structures like:

$a go:is_a $b . $b go:is_a $c .

where go:is_a is analogous to the DAG table, the “.” specifies a “join”, and $b, appearing on two separate lines, implicitly specifies an equality requirement.

Page 43: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

43

Auer and Lehmann asked:“What DO Innsbruck and Leipzig have in common?”

. . .or to be more exact:

What query will reveal what properties 2 entities have in common? select * where { < . . . Innsbruck> ?p ?o . < . . . Leipzig> ?p ?o . }

will direct the resolver will find every characteristic of each city and see which pairs of cities share the same characteristic.

This doesn't have an equivalent in SQL because you can't treat table and variable names as variables in SQL.

(You can of course get around this by storing all your data de-normalized as a single table containing 3 columns, which might not be a bad idea in some circumstances.)

Page 44: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

44

What do go:0004145 and go:0004059 have in common?

select * where { <http://bio2rdf.org/go:0004145> $predicate ?object . <http://bio2rdf.org/go:0004059> $predicate ?object .

}

----------------------------------------------------------------| predicate || object ||--------------------------------------------------------------|| http://bio2rdf.org/ns/go#is_a || http://bio2rdf.org/go:0008080 |---------------------------------------------------------------|| http://www.w3.org/1999/02/22-rdf-syntax-ns#type || http://bio2rdf.org/ns/go#Term ||--------------------------------------------------------------|| http://www.w3.org/1999/02/22-rdf-syntax-ns#type || http://bio2rdf.org/ns/go#molecular_function |----------------------------------------------------------------

So, this query reveals that both classes are subclasses of go:0008080.

Page 45: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

45

Some queries for Bio2RDF (atlas.bio2rdf.org/sparql)

Find every triple whose subject is http://bio2rdf.org/iproclass:P04637: (or is it P31946?)

select * where{ <http://bio2rdf.org/iproclass:P04637> ?p1 ?o1 .}

Find all the subjects that cross reference TO http://bio2rdf.org/geneid:3098:

select * where{ ?s ?p <http://bio2rdf.org/geneid:3098> .}

Get all the pubmed predicates:

select * where{ <http://bio2rdf.org/pubmed:10978502> $p $o .}

Get all the Pubmed titles and abstracts about geneid 3098:

select distinct $title $abstract where { <http://bio2rdf.org/geneid:3098> <http://bio2rdf.org/ns/bio2rdf#xArticle> ?o . $o <http://purl.org/dc/elements/1.1/title> $title . $o <http://www.w3.org/2000/01/rdf-schema#comment> $abstract .}

Page 46: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

46

Some queries for discern.uits.iu.edu:8890/sparql

GO categories and descriptions for TP53 aka P04637:

select * where{ <http://bio2rdf.org/iproclass:P04637> <http://bio2rdf.org/ns/iproclass#xGo> $go . $go <http://www.w3.org/2000/01/rdf-schema#comment> $description .}

Same but for molecular function namespace only and using PREFIXes:

PREFIX iproclass: <http://bio2rdf.org/iproclass:>PREFIX iproclass-ns: <http://bio2rdf.org/ns/iproclass#>PREFIX rdf-schema: <http://www.w3.org/2000/01/rdf-schema#>PREFIX rdf-syntax: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX go-ns: <http://bio2rdf.org/ns/go#>select * where{ iproclass:P04637 iproclass-ns:xGo $go_cat . $go_cat rdf-schema:comment $go_description . $go_cat rdf-syntax:type go-ns:molecular_function}

Page 47: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

47

Query dbpedia for entries about “Goethe”

using the Virtuoso iSparql text interface

Note that the predicate “bif:contains” is a Virtuoso “Built-In Function” that searches back-end text indexes. It might be possible to search using a standard SparQL regex FILTER, but it would be much slower.

Page 48: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

48

The same query using the iSparql “graphical” QBE (sic) interface

Here is the same query in graphical form as constructed using the iSparql QBE interface:

Components can be dragged-and-dropped from the menu at the top of the window. The whole interactive window is shown on the next page.

Page 49: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

49

The same query within the whole iSparql QBE (sic) window

Page 50: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

50

Results from the iSparql text and/or QBE queries

Page 51: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

51

Bio2RDF proxy service

The proxy service is

- a Java servlet that will relays queries to federated versions of  Bio2RDF resources.

- one instance is currently available at

http://atlas.bio2rdf.org/

It will let you run various demo queries, which are much more tractable if you have the Tabulator plug-in installed. 

The "Demonstration set of Bio2RDF URIs" is a particularly interesting browse.

The next 2 slides show results from the GO demo example.

Page 52: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

52

Bio2RDF proxy results for GO 0032283

If you select the GO query example and are not running the Tabulator, you will get a document to download whose contents look like this:

<?xml version="1.0" encoding="UTF-8" ?><!-- bio2rdf sourceforge package version (0.6.1) --><!-- bio2rdf sourceforge subversion copy Id ($Id: atlas2rdf.jsp 592 2009-06-29 03:09:31Z p_ansell $) --><!-- bio2rdf sourceforge properties file subversion copy Id ($Id: bio2rdf.properties 590 2009-06-29 01:38:38Z p_ansell $) --><!-- Query successful on endpoint=http://obo.bio2rdf.org/sparql query=CONSTRUCT { &lt;http://bio2rdf.org/go:0032283&gt; ?p ?o . } WHERE { &lt;http://bio2rdf.org/go:0032283&gt; ?p ?o . } LIMIT 2000 OFFSET 0-->

<rdf:RDFxmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:n0pred="http://bio2rdf.org/ns/go:"xmlns:ns0pred="http://www.w3.org/2002/07/owl#">

<rdf:Description rdf:about="http://bio2rdf.org/go:0032283"><rdf:type rdf:resource="http://bio2rdf.org/ns/go:Term"/><n0pred:accession>GO:0032283</n0pred:accession><rdfs:label>plastid acetate CoA-transferase complex [go:0032283]</rdfs:label><n0pred:definition>An acetate CoA-transferase complex located in the stroma of a plastid.</n0pred:definition><rdf:type rdf:resource="http://bio2rdf.org/ns/go:term"/><n0pred:name>plastid acetate CoA-transferase complex</n0pred:name><n0pred:is_a rdf:resource="http://bio2rdf.org/go:0009329"/>

</rdf:Description>

</rdf:RDF>

Page 53: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

53

Bio2RDF proxy results for GO:0032283

If you select the GO example and are running the Tabulator in Firefox, you can end up with a browsable page with “tabulated and N3 results like:

Page 54: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

54

Evaluate the semantic approach?

The semantic approach is complicated, often produces ugly-looking and slow results, and new tools emerge like Topsy . . .

. . . but it does some things really well, things that cannot be so easily done within the relational approach:

- It handles some kinds of distributed information well; users can access multiple RDF documents in a single SparQL query, and even browse distributed RDF sources as part of the LDW or GGG.

- It simplifies the integration of (parts of) resources; since it doesn’t require establishing a unified storage schema, multiple RDF versions of multiple resources can be dumped into the same triplestore.

- It merges data with metadata in a unique fashion, making metadata easy to find.

- Since it stores information based on sentences, it’s easy for users to understand the storage format and make extracts.

- Its sentence based query language, SparQL, is more intuitive than SQL (and is more declarative than SQL?).

- It can handle some types of queries much more easily than SQL (Leipzig and Innsbruck).

Page 55: Introduction to the Semantic Web and Bio2RDF, the “semantic web atlas of postgenomic knowledge ”

55

For more information, see:

• Auer, Soren and Jens Lehmann, "What do Innsbruck and Leipzig have in common? Extracting Semantics from Wiki Content, European Semantic Web Conference (ESWC), 2007.

• Bizer, Christian, Tom Heath, Tim Berners-Lee, “Linked Data--The

story so far.” http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-

data.pdf

• Grobe, Michael, “RDF, Jena, SparQL, and the “Semantic Web”, SIGUCCS, 2009.

http://mypage.iu.edu/~dgrobe/SIGUCCS/fp0518-grobe.pdf

• Marajo S.; Schwabe D., Barbosa S. - Experimenting with Explorator: a Direct Manipulation Generic RDF Browser and Querying Tool. Visual Interfaces to the Social and the Semantic Web (VISSW 2009), Sanibel Island, Florida - February 2009

http://smart-ui.org/events/vissw2009/papers/VISSW2009-Araujo.pdf