1236369. the need “most of the web's content today is designed for humans to read, not for...

78
1 236369

Upload: stuart-knight

Post on 12-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

1236369

Page 2: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

The Need

“Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.”

Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001

2236369

Page 3: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Semantic ProcessingOur goal: to be able to pose complex search

tasks that using the semantics of pieces of information, e.g.,

I want to purchase a DVDof “Dore the Explorer” at a price lower than 10$.Is such a CD available atamazon.com?

3

Page 4: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Current search agents are not suitable for such task

4236369

Page 5: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

236369 5

In order to get the feeling, try to read and understand a page written in a foreign

language

Page 6: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Current Solution

Use “intelligent” agents

The Semantic-Web Approach

Content is machine-understandable by being bound to some formal description

of itself (i.e. metadata)

6236369

Page 7: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

So what is the Semantic Web?Humans can easily “connect the dots” when

browsing the Web… you disregard advertisements you “know” (from the context) that one

phone number in my homepage is my office phone and the other one is my fax number

etc.… but machines can’t!The goal is to have a Web of Data to that

machines can exploite

7236369

Page 8: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Some NeedsBetter searchSemantics in Web ServicesData integration

8236369

Page 9: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Need to Improve Web Search

9236369

Page 10: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Need for Semantics in Web Services

If the services are ubiquitous, searching issue comes up, for example: “find me the most elegant Schrödinger equation solver”

but what does it mean to be “elegant”? “most elegant”?

mathematicians ask these questions all the time… It is necessary to characterize the service not

only in terms of input and output parameters… …but also in terms of its semantics

10236369

Page 11: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Need for Data IntegrationDatabases are very different in structure & in

contentLots of applications require managing several

databases after company mergers combination of administrative data for e-

Government biochemical, genetic, pharmaceutical research etc.

Most of these data are accessible on the Web (though not necessarily public yet)

11236369

Page 12: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Example: Data Integration in Life Sciences

12236369

Page 13: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

And the Problem is Real

13236369

Page 14: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

GoalsWeb of data - provides common data

representation framework to facilitate integrating multiple sources to draw new conclusions

Increase the utility of information by connecting it to its definitions and to its context

More efficient information access and analysis

236369 14

Page 15: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

What Is Needed (Technically)?To make data machine process-able, we need:

unambiguous names for resources that may also bind data to real world objects: URI-s

a common data model to access, connect, describe the resources: RDF

access to that data: SPARQL common vocabularies: RDFS, OWL, SKOS reasoning logics: OWL, Rules

The “Semantic Web” is an infrastructure for the interchanging and integrating data on the Web

It extends the current Web (and does not replace it)

15236369

Page 16: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Required ApplicationsAgents that search the Web and retrieve

valuable information to the end userWeb services that publish their informationPrograms that try to integrate data of

different web services and to produce new results or draw new conclusions from the integrated data

236369 16

Page 17: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Ontologies & Inference Engines

“For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.”

Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001

236369 17

Page 18: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

The Four Building Blocks

1. XML2. RDF3. Ontologies4. Agents

236369 18

Page 19: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

XML

“XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean”

236369 19

Page 20: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

RDF –Resource Description FrameworkMeaning encoded in sets of ‘triples’: entities

have properties which have valuesEntities, properties and values all have

distinct URIs

236369 20

“imagine that we have access to a variety of databases with information about people, including their addresses. If we want to find people living in a specific zip code, we need to know which fields in each database represent names and which represent zip codes. RDF can specify that "(field 5 in database A) (is a field of type) (zip code)," using URIs rather than phrases for each term.” Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001a

Page 21: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

OntologiesDatabase A and Database B may use

different fields to contain ‘zip code’Ontologies sort this outOntology = ‘a document or file that

formally defines the relations among terms’

Ontologies for the web normally haveA taxonomyA set of inference rules

236369 21

Page 22: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Agents

“Agent based computing appears to be the appropriate paradigm to work in a complex world with multiple ontologies, fragments and multiple inferencing engines.”

Stork, Hans-Georg and Mastroddi, Franco, Semantic Web Technologies - a New Action Line in the European Commission’s IST Programme, 2001

236369 22

Page 23: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

The Power of Agents - Integration“The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs. The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available.”

Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001

236369 23

Page 24: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

‘Ambient Intelligence’

“In the next step, the Semantic Web will break out of the virtual realm and extend into our physical world. URIs can point to anything, including physical entities, which means we can use the RDF language to describe devices such as cell phones and TVs.”Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001

236369 24

Page 25: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Resource Description Framework

236369 25

Page 26: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

26

What is RDF?A part of the semantic-web activityRDF is a general-purpose language for

representing information on the webSpecifically, objects and relationships

Designed to allow computer applications to process data based on its semanticsRather than displaying data to humans (as

opposed to RSS)

An RDF document is actually a labeled graph that is represented in XMLThe specific language is called RDF/XMLW3C recommendation (Feb. 2004)

236369

Page 27: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Resource Description Framework (RDF)

The data model of the Semantic WebThe data model of the Semantic Web

A schema-less data model that features unambiguous identifiers and named relations between pairs of resources

A schema-less data model that features unambiguous identifiers and named relations between pairs of resources

A labeled, directed graph of relations between resources and literal valuesA labeled, directed graph of relations between resources and literal values

27236369

Page 28: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

RDF Data Consists of TripletsRDF data is a set of statementsEach statement is a triplet (Resource,

Property, Value)Sometimes we refer to a triplet using the

terminology of (Subject, Predicate, Object)

236369 28

The author of http://www.cs.technion.ac.il/kanza/myPage.html

is Yaron KanzaResource (Subject): myPage.html

Property (Predicate): author Value (Object): Yaron Kanza

Page 29: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

29

RDF Data

Subject Objectpredicate

The basic element: Triple (labeled edge)

Person#845

#1002

address

postalCode

6941Haifa

city

Herzel

street

RDF document: edge-labeled graph

Statement

236369

Page 30: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

URI-s Play a Fundamental RoleAnybody can create (meta)data on any resource

on the Web e.g., the same Person could be annotated through

other terms semantics is added to existing Web resources via

URI-sData exist on the Web, because it is accessible

through standard Web meansURI-s ground RDF into the Web

information can be retrieved using existing tools this makes the “Semantic Web”, well… “Semantic

Web”

30236369

Page 31: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Example: RDF Triples

<rdf:Description rdf:about="http://.../membership.svg#FullSlide"> <axsvg:graphicsType>Chart</axsvg:graphicsType> <axsvg:labelledBy rdf:resource="http://...#BottomLegend"/> <axsvg:chartType>Line</axsvg:chartType></rdf:Description>

31236369

Page 32: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

URI-s: Merging (data integration)

It becomes easy to merge data Merge can be done because statements refer

to the same URI-s nodes with identical URI-s are considered

identicalMerging is a very powerful feature of RDF

metadata may be defined by several (independent) parties…

…and combined by an application one of the areas where RDF is much handier

than pure XML

32236369

Page 33: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Data integration example

33236369

Page 34: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

34

page.html

John Smith

John’s Home Page

DC:Creator

DC:Title

<?xml version=“1.0”?><rdf:RDF

xmlns:rdf=“http://www.w3.org/TR/WD-rdf-syntax#”xmlns:dc=“http://purl.org/metadata/dublin_core#”>

<rdf:Description about=“page.html”> <dc:Creator>John Smith</dc:Creator> <dc:Title>John’s Home Page</dc:Title> </rdf:Description>

</rdf:RDF>

236369

Page 35: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

35

page.html

John SmithJohn’s Home Page [email protected]

dc:Title

dc:Creator

Name Email

. . .<Description about=“page.html”> <dc:Creator> <Description> <corp:Name>John Smith</corp:Name> <corp:Email>[email protected]</corp:Email> </Description> </dc:Creator> <dc:Title>John’s Home Page</dc:Title></Description>

</RDF>236369

Page 36: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

ContainersGroups of things: <bag> <seq>

<alt><bag>: unordered list; duplicates

allowed<seq>: ordered list; duplicates

allowed<alt>: list of alternatives; one will

be selected236369 36

Page 37: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

DC – Dublin CoreThe mission: To make it easier to find

resources using the Internet through the following activities

Developing metadata standards for discovery across domains

Defining frameworks for the interoperation of metadata sets

Facilitating the development of community- or discipline-specific metadata sets that are consistent with the above items

37236369

Page 38: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

DC - Core Metadata Element SetoTitle (dc:title)oSubject (dc:subject)oDescription

(dc:description)oCreatoroPublisher

(dc:publisher)oContributoroDate

o Typeo Formato Identifier

(dc:identifier)o Sourceo Languageo Relation

(dcterms:isPartOf)o Coverageo Rights

38236369

Page 39: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

39

A set of fifteen basic properties for describing generalized Web resources“Title”: the name given to the resource“Creator”: the person or organization primarily

responsible for the resource“Subject”: what the resource is about“Description”: a description of the content“Publisher”: the person or organization responsible

for making the resource available“Contributor”: someone who has provided content to

the resource other than the creator“Date”: date of creation or publication

Dublin Core

236369

Page 40: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

40

“Type”: type of resource, such as home page, technical report, novel, photograph…

“Format”: data format of the resource“Identifier”: URL, ISBN number, …“Source”: another resource that this resource is derived

from“Language”: the language of the content“Relation”: another resource and its relationship to this

one“Coverage”: the portion of time or space described by

this resource (atlases, histories, etc.)“Rights”: the intellectual property rights adhering to

this resource, or a pointer to them

Dublin Core

236369

Page 41: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

41

Manually from HTML or “user domain XML”With special assisting tools – like Protégé,

Reggie, DC-dot, RDF for XMLIdeally – with some automated procedure

from HTML/XML documentsCan we use XSLT for that?

Creating RDF documents

236369

Page 42: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

236369 42

Page 43: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

RDF SchemaRDF Schema (RDFS) enriches the data model

of RDF, adding vocabulary and associated semantics forClasses and subclassesProperties and sub-propertiesTyping of properties

Support for describing simple ontologiesAdds an object-oriented flavorBut with a logic-oriented approach and using

“open world” semantics236369 43

Page 44: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

44

Not an XML Schema!A “companion” specification for RDF specClass, Type, subClassOf,Properties: domain, rangeMisc: label, comment, isDefinedBy,etc.

RDF Schema

236369

Page 45: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Classes, Resources, … Basically, traditional ontologies:

use the term “mammal” “every dolphin is a mammal” “Flipper is a dolphin” etc.

RDFS defines resources and classes: everything in RDF is a “resource” “classes” are also resources, but… they are also a collection of possible resources (i.e.,

“individuals”) “mammal”, “dolphin”, …

Relationships are defined among classes/resources: “typing”: an individual belongs to a specific class (“Flipper is a

dolphin”) “subclassing”: instance of one is also the instance of the other

(“every dolphin is a mammal”)

RDFS formalizes these notions in RDF45236369

Page 46: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Classes, Resources in RDF(S)

RDFS defines rdfs:Resource, rdfs:Class as nodes; rdf:type, rdfs:subClassOf as properties

46236369

Page 47: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Typed NodesA resource may belong to several classes

rdf:type is just a property…“Flipper is a mammal, but Flipper is also a TV

star…”i.e., it is not like a data type in this sense!The type information may be very important

for applications e.g., it may be used for a categorization of

possible nodes

47236369

Page 48: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Inferred Properties

(#Flipper rdf:type #Mammal) is not in the original RDF data… …but can be inferred from the RDFS rules Better RDF environments will return that triplet, too

48236369

Page 49: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Inference: FormalThe RDF Semantics document has a list of

(44) entailment rules :“if such and such triplets are in the graph, add this

and this triplet”do that recursively until the graph does not changethis can be done in polynomial time for a specific

graph

Whether those extra triplets are physically added to the graph, or deduced when needed is an implementation issue

49236369

Page 50: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Properties (Predicates)Property is a special class (rdf:Property)Properties are constrained by their range

and domainProperties are also resources… (have URI’s)For example, (P rdfs:range C) means:

1. P is a property

2.C is a class instance3.when using P, the “object” must be an individual in C

* this is an RDF statement with subject P, object C, and property rdfs:range

50236369

Page 51: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Property Specification Example

51236369

Page 52: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

LiteralsLiterals may have a data type (floats, ints, etc)

defined in XML Schemas, including full XML Fragments

(Natural) language can also be specified (via xml:lang)

52236369

Page 53: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

236369 53

<rdf:RDFxmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"xml:base= "http://www.animals.fake/animals#">

<rdf:Description rdf:ID="animal"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/></rdf:Description>

<rdf:Description rdf:ID="horse"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> <rdfs:subClassOf rdf:resource="#animal"/></rdf:Description>

</rdf:RDF>Horse is defined as subclass of animal

Example

Page 54: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Example<?xml version="1.0"?><rdf:RDF

xmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base= "http://www.animals.fake/animals#"><rdfs:Class rdf:ID="animal" /><rdfs:Class rdf:ID="horse">

<rdfs:subClassOf rdf:resource="#animal"/> </rdfs:Class>

</rdf:RDF>

236369 54

Abbreviated version. Works because an RDFS class is an RDF resource.

Use rdfs:Class instead of rdfDescription and drop the rdf:type information

Page 55: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

RDF and RDF Schema

<rdf:RDF xmlns:g=“http://schema.org/gen” xmlns:u=“http://schema.org/univ”> <u:Chair rdf:ID=“john”> <g:name>John Smith</g:name> </u:Chair></rdf:RDF>

<rdfs:Property rdf:ID=“name”> <rdfs:domain rdf:resource=“Person”></rdfs:Property>

<rdfs:Class rdf:ID=“Chair”> <rdfs:subclassOf rdf:resource= “http://schema.org/gen#Person”></rdfs:Class>

u:Chair

John Smith

rdf:typeg:name

g:Person

g:name

rdfs:Class rdfs:Property

rdf:typerdf:type

rdf:type

rdfs:subclassOf

rdfs:domain

55236369

Page 56: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

SPARQL Protocol and RDF Query Language

236369 56

Page 57: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

SPARQLSPARQL =

Query Language + Protocol + XML Results Format

• Access and query RDF graphs

• Product of the RDF Data Access Working Group

236369 57

Page 58: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

58

SPARQL QueryPREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?title2WHERE{ ?doc dc:title "SPARQL at speed" . ?doc dc:creator ?c . ?docOther dc:creator ?c . ?docOther dc:title ?title2

}

• On an abstracts/papers database:“Find other papers by the authors of a given paper.”

236369

Page 59: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

59

SPARQL QueryPREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX shop: <http://example/shop#>

SELECT ?titleWHERE{ ?doc dc:title ?title . FILTER regex(?title, "SPARQL") . ?doc dc:creator ?c . ?c foaf:name ?name . OPTIONAL { ?doc shop:price ?price }}

• “Find books with ‘SPARQL’ in the title. Get the authors’ name and the price (if available).”

• Multiple vocabularies

236369

Page 60: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

60

Inference• An RDF graph may be backed by inference

− OWL, RDFS, application, rules

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?typeWHERE{ ?x rdf:type ?type .}

:x rdf:type :C .:C rdfs:subClassOf :D .

--------| type |========| :C || :D |--------

236369

Page 61: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

61

Another ExamplePREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX ldap: <http://ldap.hp.com/people#>PREFIX foaf:

SELECT ?name ?email{ ?doc dc:title ?title . FILTER regex(?title, “SPARQL”) . ?doc dc:creator ?reseacher . ?researcher ldap:email ?email . ?researcher ldap:name ?name}

• “Find the name and email addresses of authors of a paper about SPARQL”

236369

Page 62: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Other SPARQL FeaturesLimit the number of returned results;

remove duplicates, sort them, … Return the full subgraph (instead of a

list of bound variables) Construct a graph combining a

separate pattern and the query results Use datatypes and/or language tags

when matching a pattern

62236369

Page 63: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Programming Practice - JenaRDF toolkit in Java from HP’s Bristol lab; it has

a large number of classes/methods listing, removing associated properties, objects, comparing

full RDF graphs manage typed literals, mapping Seq, Alt, etc. to Java

constructs

an “RDFS Reasoner” a full SPARQL implementation a layer (Joseki) for remote access of triples

(essentially, and triple database) and more… Probably the most widely used RDF environment in

Java today

63236369

Page 64: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Programming Practice - Jena RDF toolkit in Java from HP’s Bristol lab; it has

a large number of classes/methods listing, removing associated properties, objects,

comparing full RDF graphs manage typed literals, mapping Seq, Alt, etc. to Java

constructs an “RDFS Reasoner” a full SPARQL implementation a layer (Joseki) for remote access of triples

(essentially, and triple database) and more… Probably the most widely used RDF environment in

Java today

64236369

Page 65: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

236369 65

Page 66: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

RDF Schema is LimitedWe cannot express facts such as

Two classes are disjointBuild a class that is the union of two classesCardinality restrictionScope of propertiesProvide relationships between properties, such

as transitive, unique, inverse

236369 66

Page 67: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Ontologies (OWL)RDFS is useful, but does not solve all the

issuesCan a program reason about some terms?

E.g.: “if «A» is left of «B» and «B» is left of «C», is «A» left of «C»?”

Programs should be able to deduce such statements

If somebody else defines a set of terms: are they the same?

67236369

Page 68: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

How to speak Ontology?We need a Web Ontologies Language to

define: more on the terminology used in a specific context more constraints on properties, logical

characterization of properties etc.

W3C’s Ontology Language (OWL) - A layer on top of RDFS with additional possibilities

“OWL is now the most used KR language in the history of AI…” (Jim Hendler)

68236369

Page 69: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

OWLA Web ontology language that is more

expressive than RDF and RDF SchemaWritten in XML on top of RDFUsing OWL we want to provide exact

descriptions of items and the relationships between them

236369 69

Page 70: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Why “OWL” and not “WOL”?Some urban legends…

e.g., reference to Owl from Winie the Pooh, who misspelled his name as “WOL”

A reference to an AI project at MIT of the mid 70’s by Bill Martin, called “One World Language”…

an early attempt for a KR language and associated ontology, intended to be a universal language for encoding meaning for computers

70236369

Page 71: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Property Restrictions in OWLRestriction may be by: value constraints (i.e.,

further restrictions on the range) all values must be from a class at least one value must be from a class

cardinality constraints (i.e., how many times the property can be

used on an instance?) minimum cardinality maximum cardinality exact cardinality

71236369

Page 72: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Property Restriction Example“A dolphin is a mammal living in the sea or in

the Amazonas”

72236369

Page 73: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

OWL (cont.)Property Characterization

In OWL, one can characterize the behavior of properties (symmetric, transitive, …)

Term Equivalence/Relations For classes (owl:equivalentClass, owl:disjointWith) For properties (owl:equivalentProperty, owl:inverseOf) For individuals (owl:sameAs, owl:differentFrom)

Special class owl:Ontology with special properties:

owl:imports, owl:versionInfo, owl:priorVersion owl:backwardCompatibleWith, owl:incompatibleWith rdfs:label, rdfs:comment can also be used

73236369

Page 74: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

OWL and LogicOWL expresses a small subset of

First Order Logic it has a “structure” (class hierarchies,

properties, datatypes…), and “axioms” can be stated within that structure only.

Inference based on OWL is within this framework only

74236369

Page 75: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

However: Ontologies are Hard!A full ontology-based application is

Complex systemHard to implementHeavy to runAnd not all application may need it

Three layers of OWL are defined: Lite, DL(Description Logic) Full

decreasing level of complexity and expressiveness

75236369

Page 76: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

OWL Example: FOAF – The way to describe yourselfStands for "Friend Of A Friend"Provides structured linksInformation distributed & extensible

Name (foaf:name) E-mail (Foaf:mbox) Representing picture

(Foaf:img) Your publications

(Foaf:publications) Your online account

(Foaf:holdsAccount)

76236369

Page 77: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

Linkshttp://www.w3.org/RDF/s

236369 77

Page 78: 1236369. The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,

SPARQL LinksJena: Java and .Net Semantic Web

Frameworkhttp://jena.sourceforge.net/

SPARQL Queryhttp://jena.sourceforge.net/ARQ

SPARQL Protocolhttp://www.joseki.org

SquirrelRDF: Access legacy SQL:http://jena.sourceforge.net/SquirrelRDF

236369 78