the semantic web stefan decker information sciences institute university of southern california

42
The Semantic Web Stefan Decker Information Sciences Institute University of Southern California

Upload: matthew-glenn

Post on 29-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

The Semantic Web

Stefan Decker

Information Sciences Institute

University of Southern California

2

Outline

• Semantic Web Overview– Vision, Challenges, Rationals

• Semantic Web in SCEC

3

Semantic Web

• coined by Tim Berners-Lee (1997)

"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

– T. Berners-Lee, J. Hendler, O. Lassila,“The Semantic Web”, Scientific American, May 2001

4

Doctor’s appointment“The Semantic Web”, Scientific American, May 2001

MomPhysician’s Agent

Lucy’s Agent

requiredtreatment

Schedule appointment

Insurance Co.

Provider sites

Rating

in-plan?close-by?

Specialist?

Pete’ Agent

Driving schedule

5

Means to Achieve the Vision

• Explicit Ontologies– Needed to understand each others data

(e.g., joint notion about what a schedule is)

• Web Services– Required to actively interconnect systems

(automatically make an appointment)

6

Technical challenges

• Interoperability– Inaccurate, incomplete, heterogeneous data– Unreliable, ill-defined, evolving services

• Natural language processing, data mining– make information explicit

• Human-computer interaction– querying interfaces, visualization

• Scalability– Subsecond performance

7

Social challenges

• Standardization is hard– DublinCore

• Bogus or inaccurate metadata– Physician rating, profile

• Competition and commoditization

• Economical incentive– Chicken and egg

• Complexity: developers and users

8

Jump Starters

• Machine Readable Data:– .org (human-edited

directory)

– .org (Music encyclopedia)

– RSS (RDF Site Summary)

– (embedded metadata)

– CC/PP (Composite Capability/Preference Profiles)

– P3P (Platform for Privacy Preferences)

9

Jump Starters

• B2B Vocabulary Projects– PapiNet.org: Vocabulary for Paper Industry– BPMI.org: Vocabulary for exchanging Business Process

Models– XML-HR: Vocabularies for human resources (HR)– DMTF (Distributed Management Task Force)

(Vocabularies for managing enterprises– …

• Research Vocabulary Projects– Gen Ontology Working Group– Earth Sciences– MathNet– …

10

How do we get there?

Research communities

Industry

Standards bodies

Non-profit

DL, AI, DB, …

IBM, Nokia, HP, Microsoft(?),...

W3C, OMG, …

US, EC, Japan

Business.semanticweb.org

11

Non-profit

• DARPA– “DARPA Agent Markup Language”– since Aug 2000

• NSF– Co-sponsored events (e.g., SWWS)– Further support in the loop

• European Council– “Semantic Web Technologies”, FrameWork 6

• Japan– Interoperability Technology Association for

Information Processing, Japan (INTAP)

www.daml.org

www.ontoweb.org

www.semanticweb.org/SWWS

www.net.intap.or.jp/INTAP/

12

AI: “Add logic to the Web”

• Assertions, rules• Agents• Interoperability

– First-order logics– Ontologies, description logics– Logic programming, datalog– Problem-solving methods– …

Distributed knowledge base

13

DB: “Everything is syntax”

• Semistructured data• Web services• Interoperability

– Data integration– Mediation, query rewriting– Model management– Conceptual modeling

Conglomerate of distributed heterogeneous

(semistructured) databases

14

Many Previously Unknown Communication Partners

15

Heterogenous Data

• To many data formats/languages

16

1. Step

• Define uniform, underlying syntax– Lowest common denominator: labeled graphs

(semi-structured Data) -> RDF

ID F-name L-name

1 Stefan Decker

2 Birgit Decker

ID F-nameL-name

DeckerStefan

row

Person

Person

1

ID F-name

L-name

DeckerBirgit2

row

Relational Database Structured Text (e.g., Vcard)

begin: vcardfn: Stefann: Decker;Stefanend: vcard

Stefan

fn

vcard1

Decker;Stefan

n

17

XML

• Containment, hierarchy

• Adjacency (A followed by B)

• Attributes (atomic values)

• Opaque reference (IDREF)

Good for serialization, poor for modeling relational semantics

18

Encoding of Information

http://www.w3.org/Home/Lassila Creator

Endless encoding possibilities in XML:<Creator> <uri>http://www.w3.org/Home/Lassila</uri> <name>Ora Lassila</name></Creator>

<Document uri=“http://www.w3.org/Home/Lassila” <Creator>Ora Lassila</Creator></Document>

<Document uri=“http://www.w3.org/Home/Lassila” Creator=“Ora Lassila”/>

“The Creator of the Resource “http://www.w3.org/Home/Lassila” is Ora Lassila

Ora Lassila

19

Introduction to RDF

• RDF (Resource Description Framework)– Beyond Machine readable to Machine understandable

• RDF unites a wide variety of stakeholders:– Digital librarians, content-raters, privacy advocates,

B2B industries, AI...– Significant (but less than XML) industrial momentum,

lead by W3C

• RDF consists of two parts– RDF Model (a set of triples)– RDF Syntax (different XML serialization syntaxes)

• RDF Schema for definition of Vocabularies (simple Ontologies) for RDF (and in RDF)

20

A Simple Example• Describing Resources

– URIs: global OIDs, literals– Binary relationships between objects– Arcs (relationships) are first-class objects– Blank (anonymous) nodes

• “Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila”• Structure

– Resource (subject) http://www.w3.org/Home/Lassila– Property (predicate) http://www.schema.org/#Creator– Value (object) "Ora Lassila”

http://www.w3.org/Home/Lassila s:Creator Ora Lassila

21

RDF

• Graph-based universal syntax

Scheduling Service

Insurance Ratings Calendar

RDF-Layer (Single dataformat, Query and storage System)

(Agent-) Applications

Semantics in a global, open environment?

22

Step2: Ontologies

• What is an Ontology?„An ontology is a specification of a conceptualization.“

Tom Gruber, 1993

• Ontologies are social contracts– Agreed, explicit semantics– Understandable to outsiders– (Often) derived in a community process

• Ontologies require Knowledge Representation– Is_a hierarchy, part of, attributes, axioms

23

RDF and Ontologies

Idea: Define an Ontology Language by defining

predefined nodes and arcs

The Ontology Language itself is just an Ontology

Ontologies are used to tag data from sources

24

Step 2: Layers on Top of RDF

Tim Berners-Lee:“Axioms, Architecture and Aspirations”W3C all-working group plenary Meeting28 February 2001

ID F-nameL-name

DeckerStefan

row

Person

1

ID F-name

L-name

Birgit2

row

Decker

LivingThing

subClassOf

From an Ontology

25

W3C Semantic Web Activity

• Annotation (Annotea)

• Access control

• Calendaring

• Collaboration

• Logic

• Rules

• Workflows

Working Groups

Web Ontology

Advanced development

RDF Core

26

RDF Core Working Group

• Resource Description Framework (RDF)

• Goals– Improve RDF abstract model and XML syntax

according to implementors feedback

– Define precise semantics for RDF and RDF Schema

– Clarify ties with XML family

27

Web Ontology Working Group

• Standard definition language for ontologies (conceptual models)

• Derived from Description Logics– But partial mapping to Datbase and Datalog possible ->

(see Horrocks, Volz, Decker, Grossof: WWW2003)

• Extension of RDF Schema and DAML+OIL– Class Expressions (Intersection, Union, Complement)– XML Schema Datatypes– Enumerations– Property Restrictions

• Cardinality Constrains• Value Restrictions

28

The Layer Cake

Tim Berners-Lee:“Axioms, Architecture and Aspirations”W3C all-working group plenary Meeting28 February 2001

Recommendation Phase

Standardization Phase

Research Phase

29

SCEC/IT Architecture for a Community Modeling Environment

30

Tasks within SCEC - CME

• Towards an Earth Sciences Ontology:– Cataloging and Unification of Existing

Databases• E.g., Fissures and Fault Activity Database

• Building a Mediation Environment

• Organizing a Community Process

• Enriching of Web Services and Grid Infrastructure with Semantics– Service Discovery and Match Making

31

Fault Activity Database

• Hand-Maintained within SCEC (Sue Perry)

• Re-engineering of the Database Schemata<rdfs:Class rdf:about="&FAD_v1;AVG_RECURRENCE_INTERVAL"

rdfs:label="AVG_RECURRENCE_INTERVAL"><a:_slot_constraints

rdf:resource="&FAD_v1;SCFADsep_02_00106"/><rdfs:subClassOf rdf:resource="&rdfs;Resource"/>

</rdfs:Class><rdfs:Class rdf:about="&FAD_v1;AVG_SLIP_PER_EVENT"

rdfs:label="AVG_SLIP_PER_EVENT"><rdfs:subClassOf rdf:resource="&rdfs;Resource"/>

</rdfs:Class><rdfs:Class rdf:about="&FAD_v1;AVG_SLIP_PER_EVENT_METHOD"

rdfs:label="AVG_SLIP_PER_EVENT_METHOD"><rdfs:subClassOf rdf:resource="&rdfs;Resource"/>

</rdfs:Class><rdf:Property rdf:about="&FAD_v1;CFM-A_coord_file_URL"

a:maxCardinality="1" rdfs:label="CFM-A_coord_file_URL"><rdfs:domain rdf:resource="&FAD_v1;FAULT"/><rdfs:range rdf:resource="&rdfs;Literal"/>

</rdf:Property>

32

Planned: Mediation Environment with RDF-based Rule Language

Fault Activity Database

Fissures Grid Services

Mediation with RDF-based Rule Language

Applications

33

Motivation: Why Rule Languages for the Web

• Plethora of data available– Data needs to be adapted and combined– “Time to Market”: Faster to write rules than code– Data Transformation and Integration

• Logic specification, not programming– Tabled evaluation/bottom-up evaluation– Semi-structured data– Multiple semantics (Relational Data, UML, ER,

TopicMaps, DAML+OIL, XML-Schema, special purpose data models)

– Distributed, heterogeneous sources

34

What’s Wrong With Existing Approaches?

• Built-in semantics (e.g. SiLRI, RQL, DQL)– but: many RDF-based languages with different

semantics (DAML+OIL, RDF Schema, UML/RDF, TopicMaps/RDF, DMTF, …)

– For each language a specialized query language ????

35

TRIPLE:Language Overview

•Native support •for Resources & namespaces,•Abbreviations•Models (sets of RDF statements)•Reification

•Rules with expressive bodies (full FOL syntax)•Inspired by F-Logic:

•subject[predicateobject] (“molecule”)

36

Language Description I

• Namespace and resource abbreviations:– rdf := “http://www.w3.org/1999/02/22-rdf-syntax-ns#”.– isa := rdf:subClassOf.

• Statements, triples, molecules:– subject[predicateobject]– subject[p1o1; p2 o2; ...]– s1[p1 s2[p2o] ]

• Models, model expressions, parameterized models:– s[po]@m “triple <s,p,o> in model m”– s[po]@(m1 m2) model intersection, union, diff.– s[po]@sf(m1, X, Y)Skolem function

37

Language Description II• Reification:

– stefan[believes <Ora[isAuthorOfhomepage]> ]• Logical formulae:

– usual logical connectives and quantifiers: – all variables introduced via (or )

• Clauses:– facts: s[p1o1; p2 o2; ...].– rules: X s1[p1X] s2[p2X] ... .

• Model blocks:– @model { clauses } Mdl @model(Mdl) { clauses }

38

dc := “http://purl.org/dc/elements/1.0/”.db := “http://www-db.stanford.edu/”.····@db:documents { db:d_01_01 [ dc:title TRIPLE; dc:creator “Stefan Decker”; dc:subject RDF; dc:subject triples; ... ].

}

Example: Dublin Corenamespace abbreviations

model block

factdb:d_01_01

Stefan Decker

RDF triples

TRIPLE

dc:title dc:creator

dc:subject dc:subject

...

Person

Stefan Decker

rdf:typename

N p(N)[ rdf:type xyz:Person; xyz:name N ] D D[dc:creator N].

rule

N = “Stefan Decker”N P P[rdf:type xyz:Person; xyz:name N]@db:documents.

query:“find all names”

39

Example: Specification of RDF Schema Semantics

namespace abbreviations

resource abbreviations

model block

rdf := 'http://www.w3.org/...rdf-syntax-ns#'.rdfs := 'http://www.w3.org/.../PR-rdf-schema-...#'.type := rdf:type.subPropertyOf := rdfs:subPropertyOf.subClassOf := rdfs:subClassOf.

FORALL Mdl @rdfschema(Mdl) {

FORALL O,P,V O[P->V] <- O[P->V]@Mdl. FORALL O,V O[subClassOf->V] <- EXISTS W (O[subClassOf->W] AND W[subClassOf->V]).

…}

Transitivity of subClassOf

“copy” triples from Mdl

40

Example: Cars Ontology with RDF Schema Semantics

@cars {

xyz:MotorVehicle[rdfs:subClassOf -> rdfs:Resource].

xyz:PassengerVehicle[rdfs:subClassOf -> xyz:MotorVehicle].

xyz:Truck[rdfs:subClassOf -> xyz:MotorVehicle].

xyz:Van[rdfs:subClassOf -> xyz:MotorVehicle].

xyz:MiniVan[

rdfs:subClassOf -> xyz:Van;

rdfs:subClassOf -> xyz:PassengerVehicle].

}

xyz:MotorVehicle

xyz:Vanxyz:Truc

k

xyz:PassengerVehicle

xyz:MiniVan

X = xyz:Van X = xyz:Truck X = xyz:PassengerVehicle

FORALL X <- X[rdfs:subClassOf -> xyz:MotorVehicle]@cars.

FORALL X <- X[rdfs:subClassOf -> xyz:MotorVehicle]@rdfschema(cars).

X = xyz:Van

X = xyz:Truck

X = xyz:PassengerVehicle

X = xyz:MiniVan

41

Grid Computing and Web Services (ongoing)

• Matchmaking between Jobs and Resources

• Hard-Coded in Globus Toolkit– Reeingineering using a Ontology and Rule-

based solution– RDF and DMTF Vocabulary (www.dmtf.org)

<rdfs:Class rdf:ID="CIM_ComputerSystem"> <rdfs:subClassOf rdf:resource="#CIM_System"/><version><![CDATA["2.6.0"]]></version><rdfs:comment parseType="Literal"><![CDATA["A class derived from System that is a special collection of ManagedSystemElements. This collection provides compute capabilities and serves as aggregation point to associate one or more of the following elements: FileSystem, OperatingSystem, Processor and Memory (Volatile and/or NonVolatile Storage)."]]></rdfs:comment><rdfs:subClassOf> <daml:Restriction> <daml:toClass rdf:resource="#string"/> <daml:onProperty><daml:DatatypeProperty rdf:ID="NameFormat"> <daml:toClass rdf:resource="http://www.w3.org/2001/XMLSchema#string"/></daml:DatatypeProperty> </daml:onProperty></rdfs:Class>

42

Semantic Web and Earth Sciences

• Semantic Web field provides technologies for explicity vocabulary and mediate data

• Standards-based, many resources available– Editors, Rule Engines, APIs

• Effort feeds back for other domain