modeling with rdf rdf and tabular data consider the product database relation idmodel...

75
Modeling with RDF RDF and Tabular Data Consider the Product database relation ID Model No. Division Product Line SKU 1 ZX-3 Manufacturing support Paper machine FB3524 2 ZX-3P Manufacturing support Paper machine KD5243 3 B-1430 Control Engineering Feedback line KS4520 4 B-1430X Control Engineering Feedback line CL5934 Each row describes a single entity The type of each entity is the table name (Product) Give each row a distinct URIref The table gives us a locally unique identifier: the primary key (ID)

Upload: cory-fields

Post on 12-Jan-2016

249 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Modeling with RDF RDF and Tabular Data Consider the Product database relation

ID Model No. Division Product Line SKU

1 ZX-3 Manufacturing support Paper machine FB3524

2 ZX-3P Manufacturing support Paper machine KD5243

3 B-1430 Control Engineering Feedback line KS4520

4 B-1430X Control Engineering Feedback line CL5934

Each row describes a single entity

The type of each entity is the table name (Product)

Give each row a distinct URIref

The table gives us a locally unique identifier: the primary key (ID)

Page 2: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

To get a globally unique identifier, use the URI for the database itself

Say, http://www.acme.mfg.com

Use the prefix mfg:

For an identifier for a row, concatenate the table name (“Product”) with the unique key

Express it in the mfg: namespace

mfg:Product1, mfg:Product2, …

The column labels correspond to properties

For globally unique identifiers, use same namespace as for individuals

For local names, concatenate table name and column label

mfg:Product_ModelNo, mfg:Product_Division, …

There’s 1 triple per table cell

Plus 1 triple per row for type info

Page 3: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

In N3 (1st and 3rd rows only)

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix mfg: <http://www.acme.mfg.com/> .

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

mfg:Product1 rdf:type mfg:Product .

mfg:Product1 mfg:Product_ID "1"^^xsd:integer .

mfg:Product1 mfg:Product_ModelNo "ZX-3" .

mfg:Product1 mfg:Product_Division "Manufacturing support" .

mfg:Product1 mfg:ProductLine "Paper machine" .

mfg:Product1 mfg:Product_SKU "FB3524" .

mfg:Product3 rdf:type mfg:Product .

mfg:Product3 mfg:Product_ID "3"^^xsd:integer .

mfg:Product3 mfg:Product_ModelNo "B-1430" .

mfg:Product3 mfg:Product_Division "Control engineering" .

mfg:Product3 mfg:ProductLine "Feedback line" .

mfg:Product3 mfg:Product_SKU "KS4520" .

Page 4: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

In RDF/XML (1st and 3rd rows)<?xml version="1.0"?>

<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:mfg="http://www.acme.mfg.com/"

xmlns:xsd="http://www.w3.org/2001/XMLSchema#">

<mfg:Product rdf:about="http://www.acme.mfg.com/Product1">

<mfg:Product_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1</mfg:Product_ID>

<mfg:Product_ModelNo>ZX-3</mfg:Product_ModelNo>

<mfg:Product_Division>Manufacturing support</mfg:Product_Division>

<mfg:Product_ProductLine>Paper machine</mfg:ProductLine>

<mfg:Product_SKU>FB3524</mfg:Product_SKU>

</mfg:Product>

<mfg:Product rdf:about="http://www.acme.mfg.com/Product3">

<mfg:Product_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">3</mfg:Product_ID>

<mfg:Product_ModelNo>B-1430</mfg:Product_ModelNo>

<mfg:Product_Division>Control engineering</mfg:Product_Division>

<mfg:Product_ProductLine>Feedback line</mfg:Product_ProductLine>

<mfg:Product_SKU>KS4520</mfg:Product_SKU>

</mfg:Product>

</rdf:RDF>

Page 5: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

The RDF Graph (1st and 3rd rows)

Page 6: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

The triples could be spread across the Web

Use a common base

Some of the property values (objects) could be resources, identified by URIrefs, instead of literals

Especially Product_Division and Product_ProductLine

Foreign keys (unique identifiers in other relations unique URIrefs)

Page 7: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

RDF and Inferencing In the Semantic Web, consistency constraints on the data can be

expressed in the data itself

Lets us model smart data: can describe how they should be used

The Semantic Web stack includes a series of layers on top the RDF layer to describe consistency constraints on the data

The key is the notion of inferencing

Given some stated info, we can determine related info that can be considered as if stated

E.g., given X is married to Y, conclude that Y is married to X

Or conclude that the following is an inconsistent set of statements:

X is disjoint from Y, z X, z Y

This may seem trivial, but it’s exactly this that’s missing in dumb data

Page 8: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Type Propagation Rule (an Inference Pattern)

If A rdfs:subClassOf B and x rdf:type A then x rdf:type B

This statement is the entire definition of subClassOf in RDFS

Consistent with informal notion of broader term in a thesaurus or taxonomy

Our data lives in the Web

To make them more useful, they should behave in a consistent way when combined with data from multiple sources

Basing the meaning of constraint terms on inferencing provides a robust solution to understanding the meaning of novel combinations of terms

E.g., to determine how multiple inheritance works in RDFS, apply the Type Propagation Rule twice:

If C is a subclass of A and a subclass of B, then,

if x is of type C, then x is of type A and x is of type B

Page 9: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Asserted triples are those asserted in the original RDF store (possibly merged from multiple sources)

Inferred triples are the additional triples inferred by one of the inference rules governing a given inference engine

An inference engine could infer a triple already asserted

We consider it asserted

No mater how a triple is available, an inference engine treats it in the same way

Page 10: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Example Given the asserted triples

ex:PassengerVehicle rdfs:subClassOf ex:MotorVehicle .

ex:Van rdfs:subClassOf ex:MotorVehicle .

ex:MiniVan rdfs:subClassOf ex:Van .

ex:MiniVan rdfs:subClassOf ex:PassengerVehicle .

ex:myVehicle rdf:type exMiniVan

ex:yourVehicle rdf:type ex:Van

The following are some of the inferred triples

ex:myVehicle rdf:type ex:Van

ex:myVehicle rdf:type ex:PassengerVehicle

ex:yourVehicle rdf:type ex:MotorVehicle

But, e.g., the following triple can’t be inferred

ex:yourVehicle rdf:type ex:PassengerVehicle

Page 11: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

The situation is more subtle when info is merged from multiple sources each with its own inference engine

Most RDF implementations allow new triples to be asserted into the triple store

So an application can assert inferred triples

If those triples are serialized and shared on the Web,

another application could merge them with other sources and draw further inferences

Here the distinction between asserted and inferred might be too course to be useful

Page 12: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Suppose 1 info source provides a list of members of the class PassengerVehicle and another provides a list of members of class Van To find a list of all MotorVehicles , we use the inference rule for

subClassOf

We have a merged graph

The inferencing determines that both myVehicle and yourVehicle are members of the class MotorVehicle

Page 13: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Two fundamental components in this data integration example

1. A model that expresses the relationship between the 2 data sources Here the model consists of a single class having both the

classes to be integrated as subclasses This is represented by the single concept of subClassOf

2. The notion of inference Inferencing applies the model to the 2 data sources to

produce a single, integrated answer

When data seem disconnected, it’s often because some apparently simple consistency is conspicuously absent

Page 14: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

RDFS and Meaning Modeling in RDF is about graphs—it creates a graph structure to

represent data

Modeling in RDFS is about sets

RDFS provides a way to talk about the vocabulary used in an RDF graph

It lets an info modeler express, relative to particular data modeling and integration needs, how individuals are related how properties used to define individuals are related to sets of

individuals etc.

Page 15: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

RDFS is like other schema languages in providing info about how we describe data

But it differs from other schema languages in important ways

Consider document modeling systems

A schema language here (XML Schema) lets us express the set of allowed formats for a document

Can then determine (automatically) whether a given document confirms to the schema

A database schema provides type and key info for relations

A relation itself has nothing indicating the meaning of info in a given column or which column is used to index the relation

Page 16: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

For OO programming systems, the class structure again helps organize the data

It not only describes the data but also determines, according to the inheritance policy of the language, what methods are available for a given instance and how they are implemented

This contrasts with databases and XML:

It doesn’t interpret the data but instead gives a systematic way to describe info and transformations for that info

In all cases, the schema is info about the data

Page 17: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

The schema in RDF should help provide meaning to the data

The mechanism by which it does this is inference

Inference lets us know more about a set of data than what’s explicitly expressed in the data

It thus explicates the meaning of the original data

The additional info is based in a systematic way on patterns in the original data

RDFS provides detailed axioms that express exactly what inferences can be drawn from given data patterns

Page 18: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Most modeling systems have a clear distinction between the data and its schema

But many modern systems model the schema in the same forma as the data—e.g.,

The introspection API of Java represents the object models as objects

An XML Schema document is a well-formed XML document

For RDF, the schema language was defined in RDF from the start

All schema info in RDFS is defined with RDF triples

It’s thus easy to give a formal description of the semantics of RDFS:

Give inferences rules that work over patterns of triples

Page 19: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

In RDF, everything is expressed in triples

The meaning of asserted triples is expressed in inferred triples

The structures that drive these inferences are also triples

So this process can continue as far as it needs to:

The schema info that provides context for info on the Semantic Web can itself be distributed on the Semantic Web

Page 20: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Can explicate the meaning of each RDFS resource by indicating what triples can be inferred in certain circumstances (defined by some pattern of triples)

Illustrate this with one of the most fundamental RDFS terms: rdfs:subClassOf

In RDFS, class membership is given meaning only when there are several classes and a known relation between them

That something’s a member of a class means it’s a member of any superclass Cf. the Type Propagation Rule

The various rules taken together give a way to express a variety of relations between classes and properties Such a collection of classes and properties gives a rudimentary

definition of a vocabulary for RDF data I.e., it defines a set of elements and the relationships of those sets to

the properties describing the elements

Page 21: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Note that we distinguish two senses of “is” with

:Kaneda rdf:type :AllStarPlayer .

:AllStarPlayer rdfs:subClassOf :MajorLeaguePlayer .

RDFS provides a meaning for rdfs:subClassOf here:

The type info for Kaneda propagates in the expected way, giving

:Kaneda rdf:type :MajorLeaguePlayer .

Page 22: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

This very simple interpretation of the subclass relationship makes it a workhorse of RDFS modeling

It’s like IF/THEN of programming languages:

IF something is a member of the subclass THEN it’s a member of the superclass

But nothing in RDFS corresponds to the ELSE clause

You can’t infer things from the lack of asserted membership

Page 23: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

The RDFS Inference Patterns RDFS consists of a small number of inference patterns

Each provides type info on individuals in a variety of circumstances

Relationship Propagation through rdfs:subPropertyOf

The mechanism RDFS provides for talking about the properties that link individuals is based on an inference pattern defined using resource (keyword) rdfs:subPropertyOf

Terminology includes verbs as well as nouns

Many of the same requirements for mapping nouns from one source to another apply to relationships

Relationship brother is more specific than sibling

If someone is my brother, then he is also my sibling

The rule: For properties P and R, we have

If P rdfs:subPropertyOf R, then, if A P B, then infer A R B

Page 24: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Example A large firm engages a number of people in various capacities

and has a variety of ways to administer these relationships

Some are directly employed by the firm while others are contractors

Among the contractors, some are directly contracted to the company on a freelance basis, others on a long-term retainer, and still others contract through an intermediate firm

All these people can be said to work for the firm

To see how to model this in RDFS, first consider the inferences we should be able to draw and under what circumstances

Name the relationships between a person and the firm :contractsTo, :freeLancesTo, :indirectlyContractsTo, :isEmployedBy, :WorksFor

Page 25: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

If we assert any of these properties about a person, we’d like to infer that he :worksFor the firm

And there are intermediate conclusions we can draw—e.g., Both a freelancer and an indirect contractor contract to the firm and

indeed work for it

All these relationships are expressed using refs:subPropertyOf

First introduce the properties:

:freeLancesTo a rdf:Property .

:contractsTo a rdf:Property .

:indirectlyContractsTo a rdf:Property .

:isEmployedBy a rdf:Property .

:worksFor a rdf:Property .

Now the subproperty relationships:

:freeLancesTo rdfs:subPropertyOf :contractsTo . (1)

:indirectlyContractsTo rdfs:subPropertyOf :contractsTo .(2)

:isEmployedBy rdfs:subPropertyOf :worksFor . (3)

:contractsTo rdfs:subPropertyOf :worksFor . (4)

Page 26: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

To see what inferences can be drawn, we need some instance data

First, some classes:

:Firm a rdfs:Class .

:Employee a rdfs:Class .

Then some individuals:

:TheFirm a :Firm .

:Goldman a :Employee .

:Spence a :Employee .

:Long a :Employee .

Page 27: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Suppose we have the following triples:

:Goldman :isEmployedBy :TheFirm . (a)

:Spence :freeLancesTo :TheFirm . (b)

:Long :indirectlyContractsTo :TheFirm . (c)

(1)

:worksFor

:contractsTo :isEmployedBy

:freeLancesTo :indirectlyContractsTo

(2)

(4) (3)

We can draw the following inferences:

:Goldman :worksFor :TheFirm . From (a) and (3)

:Spence :contractsTo :TheFirm . From (b) and (1)

:Long :contractsTo :TheFirm . From (c) and (2)

:Spence :worksFor :TheFirm . From (b) and (1) & (4)

:Long :worksFor :TheFirm . From (c) and (2) & (4)

Page 28: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

rdfs:subPropertyOf lets us describe a hierarchy of properties

Whenever a property in the tree holds between 2 entities, so does every property above it

Page 29: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Typing Data by Usage—rdfs:domain and rdfs:range Besides saying how 2 properties relate to each other, we’d also like

to represent how a property is used relative to the defined classes

Recall the forms (where P is a property and D and R classes)

P rdfs:domain D .

P rdfs:range R .

Informally: Relation P relates values from class D to values from class R

D is the class of the subject of a triple using P, R is the class of the object

The rules: For property P and classes D and R, we have

If P rdfs:domain D and x P y, then x rdf:type D .

If P rdfs:range R and x P y, then y rdf:type R .

Page 30: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Using P in a way apparently inconsistent with this declaration isn’t an error

Rather, RDFS infers the type info needed to bring P into compliance with its domain and range declarations

In RDFS, can’t assert that a given individual isn’t a member of a given class

(Contrast with OWL)

In fact, no notion of an incorrect or inconsistent inference

So, unlike with XML Schema, an RDF Schema never proclaims an input invalid

It just infers appropriate type info

In this way, RDFS behaves more like a database schema:

Declares what joins are possible but makes no statement about the validity of the joined data

Page 31: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Combination of Domain and Range with

rdfs:subClassOf The inference patterns can interact in interesting ways

Can see this with the 3 patterns seen so far

Show the interaction between rdfs:subClassOf and rdfs:domain, starting with an example

Page 32: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Given the following class info

:Woman a rdfs:Class .

:MarriedWoman a rdfs:Class .

:MarriedWoman rdfs:subClassOf :Woman . (1)

And the property info

:maidenName a rdf:Property .

:maidenName rdfs:domain :MarriedWoman . (2)

And the triple

:Karen :maidenName "Stephens" .

We infer by (2)

:Karen a :MarriedWoman .

But also, by (1) and (2),

:Karen a :Woman .

Page 33: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

In general, for any resource X,

Given X :maidenName Y, infer X a :Woman

But this is exactly the definition of domain, i.e., we just saw that

:maidenName rdfs:domain :Woman .

Until now, we’ve applied the inference pattern when a triple using rdfs:domain was asserted or inferred

Now we’re inferring an rdfs:domain triple when we prove that the inference pattern holds

I.e., we view the inference patter as the definition of what it means for rdfs:domain to hold

Generalizing, we have the new inference pattern: Where P is a property and D and C classes,

If P rdfs:domain D and D rdfs:subClassOf C, then P rdfs:domain C

Page 34: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

The same conclusion holds for rdfs:range, by the same argument

So, whenever we have domain or range info about a property and a

triple involving that property,

we can draw conclusions about the type of any element based just on that triple

The definitions of rdfs:domain and rdfs:range are the most common problem areas in RDFS for modelers experienced in another data modeling paradigm

Page 35: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Relation to OOP

There’s an “obvious” correspondence

between an rdfs:Class and a class in OOP and

between an RDFS property and an OOP variable,

whence the assertion

P rdfs:domain C .

corresponds to the definition of the variable corresponding to P being defined at class C

From this “obvious” mapping comes an expectation that these definitions inherit in the same way as variable definitions do in OOP

But in RDFS there’s no notion of inheritance per se

The only mechanism is inference

Page 36: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

The RDFS inference rule closest to the OO notion of inheritance is the Subclass Propagation Rule

But, using the “obvious” interpretation, we assert that the variable maidenName is defined in MariedWoman

Then infer that it is in the class higher in the tree, Woman

From the OOP point of view, this is like inheritance up the tree

The fallacy in the conclusion comes from the “obvious” mapping of rdfs:domain as defining a variable relative to a class

It’s never accurate in the Semantic Web to say that a property is “defined for a class”

A property is defined independently of any class

The RDFS relations specify which inferences can be correctly made about it in particular contexts

Page 37: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

RDFS Modeling Combinations and Patterns

The effect of the few, simple RDFS inference rules can be quite subtle in the context of shared info in the Semantic Web

Set Intersection No explicit RDFS modeling construct for set intersection (or union)

But, when we want to model intersection (or union), we needn’t always have to model it explicitly

Often need only certain inferences supported by these notions

Sometimes these inferences are available through design patterns that combine the RDFS primitives in specific ways

Page 38: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Regarding intersection, an inference we might want to draw is that

If x is in C, then it is in both A and B

The relationship being expressed is C A B

This inference is supported by making C a common subclass of A & B:

C rdfs:subClassOf A .

C rdfs:subClassOf B .

Given the triple

x a C .

by the Type Propagation Rule, we infer (as desired) the triples

x a A .

x a B .

Can’t draw the inference in the other direction

From membership in A and B, we can’t infer membership in C

We can’t express A B C

Page 39: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Example: Hospital Skills Suppose we’re describing hospital staff

A surgeon is a member of the hospital staff and a qualified physician Surgeon Staff Physician

But not every staff physician is a surgeon—the inclusion goes one way

From our assertions, we want to be able to infer, given Kildare is a surgeon, he’s also a staff member and a physician

Assert

:Surgeon rdfs:subClassOf :Staff .

:Surgeon rdfs:subClassOf :Physician .

:Kildare a :Surgeon .

Infer by Type Propagation Rule

:Kildare a :Staff .

:Kildare a :Physician .

Can’t make the inference the other way

A psychiatrist is also both a staff member and a physician

Page 40: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Property Intersection Even though it’s unfamiliar to think of a property as a set, we can

still use intersection and union to describe the functionality supported for properties

RDFS again can’t support these exactly

But we can approximate these notions with proper use of refs:subPropertyOf

One inference is based on one property being an intersection of 2 others

P R S

Given x P y, we want to infer both x R y and x S y

Page 41: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Example: Patients in Hospital Rooms When a patient is logged into a room, we know that

He is on the duty roster for that room

His insurance is billed for that room

Assert

:loggedIn rdfs:subPropertyOf :billedFor .

:loggedIn rdfs:subPropertyOf :assignedTo .

:Marcus :loggedIn :Room101 .

Infer

:Marcus :billedFor :Room101 .

:Marcus :assignedTo :Room101 .

Can’t make the inference in the other direction

If Marcus is billed for Room 101 and assigned to Room 101, no RDFS rules apply

Page 42: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Set Union Express A B C by making C a common superclass of A and

B:A rdfs:subClassOf C .

B rdf:subClassOf C .

Then x a A or x a B implies x a C

Summary C A B by making C a subclass of A and of B

C A B by making both A and B subclasses of C

Page 43: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Example: All-Stars In determining candidates for the All Stars, a league’s rules could state that

MVPs are candidates

Top scorers are candidates

Assert

:MVP rdfs:subClassOf :AllStarCandidate .

:TopScorer rdfs:subClassOf :AllStarCandidate .

Given

:Reilly a :MVP .

:Kaneda a :TopScorer .

we may infer

:Reilly a :AllStarCandidate .

:Kaneda a :AllStarCandidate .

We can’t draw the inference the other way

Given someone is a candidate, can’t infer he’s an MVP or top scorer

Page 44: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Property Union Use rdfs:subPropertyOf to combine properties from diverse

sources as we use rdfs:subClassOf to combine classes as a union

If 2 sources use properties P and Q similarly, a single amalgamated property R can be defined as

P rdfs:subPropertyOf R .

Q rdfs:subPropertyOf R .

Then (for resources x and y), from x P y or x Q y infer x R y

Page 45: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Example: Merging Library Records One library uses property borrows to indicate that a patron has

borrowed a book

Another library uses checkedOut for the same relationship

If we’re sure the 2 mean exactly the same, we make them equivalent:

Library1:borrows rdfs:subPropertyOf Library2:checkedOut .

Library2:checkedOut rdfs:subPropertyOf Library1:borrows .

Any relationship expressed by one library is inferred to hold for the other

Page 46: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

If we are not sure they’re exactly the same but have an application that wants to treat them the same,

use the Union pattern to create a common super-property:

Library1:borrows rdfs:subPropertyOf :hasPossession .

Library2:checkedOut rdfs:subPropertyOf :hasPossession .

All patrons & books from both libraries are related by :hasProperty Info from both sources is merged

Page 47: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Property Transfer In modeling info from multiple sources, a common requirement is:

If 2 entities are related by some relationship in one source,

the same entities should be related by a corresponding relationship in the other source

Given property P in one source and property Q in another, say that all uses of P are to be considered uses of Q with

P rdfs:subPropertyOf Q .

Now, from x P y infer x Q y

Perhaps strange to have a design pattern consisting of a single triple

But this use of rdfs:subPropertyOf is so pervasive it merits being distinguished

Page 48: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Example: Terminology Reconciliation A growing number of standard info representation schemes are being

published in RDFS

Info developed in advance of a standard must be retargeted to be compliant

E.g., suppose a given legacy bibliography system uses the term author for the creator of a book

This is dc:creator

Make the system’s data conform to Dublin Core with a single triple

:author rdfs:subPropertyOf dc:creator .

Now anyone for whom the author property has been defined has the same value defined for dc:creator

Page 49: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

The work is done by the RDFS inference engine

Not by an off-line editing process

So legacy applications using the author property can carry on as before

Newer, DC-compliant applications can use the inferred data in a standard way

Page 50: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Challenges Look at modeling scenarios that can be addressed with the patterns

we’ve seen

Term Reconciliation Resolve terms used by different agents who want to use their

descriptions together in a federated application

Challenge 2 Enforce the assertion that any member of one class is automatically

treated as a member of another

Page 51: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Solution Take the case where a given term in one vocabulary is fully

subsumed by a term in another

E.g., a researcher is a special case of an analyst

We’d like RDFS to infer from the fact that x is a researcher to that x is an analyst

Researcher rdfs:subClassOf :Analyst .

Now any resource that’s a Researcher

:Wenger a :Researcher .

is inferred to be an Analyst as well

:Wenger a :Analyst

Page 52: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Challenge 3 Suppose there’s considerable semantic overlap between the

concepts analyst and researcher

But neither concept is defined sharply: some analysts might not be researchers and vice versa

Still, for the federated application we want to treat these concepts as the same

Page 53: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Solution Use the Union pattern

Define a new term, investigator, for the federated domain that’s not defined in either of the sources

Effectively define investigator as the union of researcher and analyst using the super-property idiom:

:Analyst rdfs:subClassOf :Investigator .

:Researcher rdfs:subClassOf :Investigator .

No commitment to a direct relationship between analyst and researcher

But have a federated handle for speaking of the general class

Page 54: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Challenge 4 We’ve determined that the 2 classes really are identical

In terms of inference, we want any member of the one class to be a member of the other and vice versa

Page 55: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Solution RDFS doesn’t provide a primitive statement of class equivalence

But achieve it with rdfs: subClassOf

:Analyst rdfs:subClassOf :Researcher .

:Researcher rdfs:subClassOf :Analyst .

If we have

:Reilly a :Researcher .

:Kaneda a :Analyst .

we can infer

:Reilly a :Analyst .

:Kaneda a :Researcher .

The 2 rdfs:subClassOf triples (or any cycle of rdfs:subClassOf triples) assert the equivalence of the 2 classes

Page 56: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Instance-Level Data Integration Suppose we have answers to a single question coming from

multiple sources

E.g., a command-and-control mission planner wants to know where ordinance can’t be targeted

Several sources of info contribute One source provides a list of facilities and their types, some of

which—civilian facilities—must never be targeted Another provides descriptions of airspaces, some of which are

off-limits

A target is off-limits if excluded by either of these sources

Page 57: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Challenge 5 Define a single class whose contents include all individuals from all

these sources (and any ones later added)

Solution Use the Union construction to join the 2 sources into a single,

federated class

fc:CivilianFacility rdfs:subClassOf cc:OffLimits .

space:NoFlyZone rdfs:subClassOf cc:OffLimits .

Any instance of the facility or airspace descriptions identified as restricted are inferred to be cc:OffLimits

Page 58: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Readable Labels with rdfs:label Resources on the Semantic Web are specified with URIs

Not particularly meaningful to people

rdfs:label gives a standard way for presentation engines (e.g., web browsers) to display a printable name of a resource

There might be another source for human-readable names for any resource

Use a combination of the property union and property transfer patterns

E.g., we’ve imported RDF info from an external form (e.g., database or spreadsheet) defining 2 classes of individuals: Person and Movie

Person has property personName

Movie has property movieTitle

E.g.,

:Person1 :personName “James Dean” .

:Movie1 :movieTitle “Rebel without a Cause” .

Page 59: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Challenge 6 Want to use a generic display mechanism, using rdfs:label, to

display info about these people and movies

Solution Define each property as a subproperty of rdfs:label

:personName rdfs:subPropertyOf rdfs:label .

:movieTitle rdfs:subPropertyOf rdfs:label .

When presentation engine queries for rdfs:label of any resource,

inference provides the value of personName or movieTitle as appropriate

No need for code that understands the (domain-specific) distinction between Person and Movie

Page 60: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Data Typing Based on Use A shipping company manages a fleet of vessels, including vessels

under construction being repaired currently in service retired from service

Table 1 shows some info the company might keep

In RDF, each row corresponds to a resource of type ship:Vessel

Page 61: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Some triples:

ship:Berengaria ship:maidenVoyage “Dec. 16, 1946” .

ship:QEII ship:nextDeparture “Mar. 4, 2010” .

Subclasses of Vessel

DeployedVessel—has been deployed sometime in its lifetime

InServiceVessel—currently in service

OutOfServiceVessel—currently out of service (for any reason, possibly retired or never deployed)

RDFS triples:

ship:DeployedVessel rdfs:subClassOf ship:Vessel .

ship:InServiceVessel rdfs:subClassOf ship:Vessel .

ship:OutOfServiceVessel rdfs:subClassOf ship:Vessel .

Page 62: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Challenge 7 Automatically classify a vessel into a specific subclass from info in

Table 1:

Has had a maiden voyage ship:DeployedVessel

Next departure set ship:InServiceVessel

Has a decommission or destruction date

ship:OutOfServiceVessel

Solution Enforce these inferences using rdfs:domain

ship:maidenVoyage rdfs:domain ship:DeployedVessel .

ship:nextDeparture rdfs:domain ship:InServiceVessel .

ship:decommissionedDate rdfs:domain ship:OutOfServiceVessel .

ship:destructionDate rdfs:domain ship:OutOfServiceVessel .

Page 63: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Each of the 3 subclasses is in the domain of 1 or more of the 4 properties

All 4 instances have ship:maidenVoyage ship:DeployedVessel ship:QEII & ship:Constitution have ship:nextDeparture ship:InServiceVessel

ship:Titanic has ship:destructionDate and ship:Berengia has ship:decommissionedDate ship:OutofServiceVessel

Page 64: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Challenge 8 All these inferences concern the subject of the rows—the vessels

Can also draw inferences about what’s in the other table cells

Express that the commander of a ship has the rank of captain

Solution Express ranks as classes:

ship:Captain rdfs:subClassOf ship:Officer .

ship:Commander rdfs:subClassOf ship:Officer .

ship:LieutenantCommander rdfs:subClassOf ship:Officer .

ship:Lieutenant rdfs:subClassOf ship:Officer .

ship:Ensign rdfs:subClassOf ship:Officer .

Express that a ship’s commander has rank captain:

ship:hasCommander rdfs:range ship:Captain .

Page 65: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

From Table 1, can infer that all of Johnson, Warwick, Black, Montgomery are captains

Page 66: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Filtering Undefined Data Select out for further processing individuals based on info defined

for them

Challenge 9 Those vessels with nextDeparture dates are selected as input to,

e.g., a system scheduling group tours

Page 67: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Solution Define the class that’s the domain of nextDeparture:

ship:DepartingVessel a rdfs:Class .

ship:nextDeparture rdfs:domain ship:DepartingVessel .

Only ship:Constitution and ship:QEII are members of ship:DepartingVessel—see Table 1

Page 68: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

RDFS and Knowledge Discovery Because of the inference-based semantics of RDFS, domains and

ranges aren’t used to validate info

They’re used to determine new info based on old

domain and range are best understood as tools for knowledge discovery rather than knowledge description

On the Semantic Web, we don’t know in advance how info from somewhere else should be interpreted in a new context

domain and range let us things discover about our data based on its use

Use domain and range in one of the above (or similar) patterns,

where a particular knowledge filtering or discovery pattern is intended

Page 69: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Modeling with Domains and Ranges In the shipping example, 2 definitions for the nextDeparture domain:

ship:nextDeparture rdfs:domain ship:InServiceVessel . (1)

ship:nextDeparture rdfs:domain ship:DepartingVessel . (2)

Consider the case of the QEII, for which we asserted

ship:QEII ship:maidenVoyage “May 2, 1969” .

ship:QEII ship:nextDeparture “Mar. 4, 2010” .

ship:QEII ship:hasCommander ship:Warwick .

The rules of inference for rdfs:domain let us infer

ship:QEII a ship:InServiceVessle . by (1)

ship:QEII a ship:DepartingVessel . by (2)

Each is from the definition of rdfs:domain as applied to the indicated domain declaration

Any vessel for which a nextDeparture is specified is inferred to be in the intersection of the 2 classes in the domain statements

Page 70: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

This is counterintuitive from the point of view of OOP

If a “property” is associated with 2 classes, the intuition is that we can use it with members of either class

I.e., it may be used with their union

To see the ramifications of this intersection behavior,

suppose a company manages a team of traveling salespersons, each with schedule of trips

sales:SalesPerson rdfs:subClassOf foaf:Person .

sales:sells rdfs:domain sales:SalesPerson .

sales:sells rdfs:range sales:ProductLine .

sales:nextDeparture rdfs:domain sales:SalesPerson .

Page 71: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Now merge the info for the sales force management with the schedules of the ocean liners

A simple connection to make between the 2 models is to link their nextDeparture properties:

sales:nextDeparture rdfs:subPropertyOf ship:nextDeparture .

ship:nextDeparture rdfs:subPropertyOf sales:nextDeparture .

The intuition is that the 2 uses of nextDeparture are in fact the same (Both refer to dates)

To see what inferences are supported, suppose we assert

sales:Johannes sales:nextDeparture “May 31, 2008” .

We already have

ship:QEII ship:nextDeparture “Mar. 4, 2010” .

Page 72: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

We can infer the following

sales:Johannes a ship:DepartingVessel .

by rdfs:subPropertyOf then rdfs:domain

ship:QEII a sales:SalesPerson .

by rdfs:subPropertyOf then rdfs:domain

ship:QEII a foaf:Person .

by rdfs:subClassOf

Page 73: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

Might want to blame rdfs:domain, but the real issue is a modeling error

Jumped to the conclusion that 2 properties should be mapped so closely

The domain statements are quite strong statements

It would be surprising if 2 properties defined so specifically didn’t have extreme ramifications when merged

Lesson: Avoid merging things whose meanings have important differences

Page 74: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

If we just want to merge the 2 notions of nextDeparture for a calendar application,

map both properties to a 3rd, domain-neutral property:

ship:nextDeparture rdfs:subPropertyOf cal:nextDeparture .

sales:nextDeparture rdfs:subPropertyOf cal:nextDeparture .

The amalgamating property, cal:nextDeparture, doesn’t need any domain info

We can (as desired) infer

sales:Johannes cal:nextDeparture “May 31, 2008” .

ship:QEII cal:nextDeparture Mar. 4, 2010” .

Page 75: Modeling with RDF RDF and Tabular Data Consider the Product database relation IDModel No.DivisionProduct LineSKU 1ZX-3Manufacturing supportPaper machineFB3524

With just a few primitives, RDFS provides considerable subtlety for modeling relationships between data sources

But also the chance for subtle errors

Understand the modeling connections by tracing the inferences