rdf query languages
DESCRIPTION
RDF Query Languages. Flavius Frasincar [email protected]. Contents. Why RDF Query Languages? RDF Features (Recap) RDF Query Language Requirements RDF Query Languages RQL (RDF Query Language): Select: variables Where: path expressions From: condition Summary. Why RDF QLs?. - PowerPoint PPT PresentationTRANSCRIPT
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 1
RDF Query Languages
Flavius [email protected]
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 2
Contents
• Why RDF Query Languages?• RDF Features (Recap)• RDF Query Language Requirements• RDF Query Languages• RQL (RDF Query Language):
– Select: variables– Where: path expressions– From: condition
• Summary
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 3
Why RDF QLs?
• RDF is the standard representation language for Web metadata (foundation of the Semantic Web)
• RDF is already used in:– Large description schemas: ODP (Open Directory Project) - web site
classification with 385,965 topics, UNSPSC (United Nations Standard Products and Services Code) - product classification with 16,506 classes
– Large description bases: ODP classifies 3,339,355 sites
• RDF QLs are needed in order to access data from (large) RDF representations
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 4
RDF
Primitive Semantics: Subject Predicate Object (one statement)Three alternative notations:
• Graph
• Triple (http://example.com/sb.jpg, painted_by, “Rembrandt”)
• RDF/XML <rdf:Description rdf:ID=http://example.com/sb.jpg>
<painted_by> Rembrandt </painted_by> </rdf:Description>
painted_byhttp://example.com/sb.jpg Rembrandt
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 5
RDF Features
• RDF:– Data Model: Directed Labeled Graph
• Nodes: Resources (with or without URIs) or Literals• Edges: Properties (attributes or relationships)• Labels: Nodes (URI) or Edges (Property URI)
• RDF Schema:– Multiple classification of resources– Specialization of both classes/properties (simple and multiple)– Unordered, optional, and multivalued properties– Domain and range polymorphism of properties
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 6
RDF vs. XML
• Different Data Models:– RDF data model: a directed graph with labels on both edges and
nodes– XML data model: a tree with labels on edges or nodes
• Different Semantics:– RDF is able to model complex semantic relations (e.g. class/property hierarchies based on specialization)– XML has only one type of semantics (inclusion semantics) (an element contains another element)
• RDF has an XML syntax RDF/XML but XML QLs do not support RDF semantics: we need an RDF QL
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 7
Requirements for an RDF QL
• Understand RDF Data Model (RDF graph or RDF triples)
• Path expressions can use labels from both nodes and edges
• Compose queries: the output of one query can be used as input for the next query
• Declarative: not bound to any implementation (closer to human language!)
• Support RDF Schema
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 8
RDF Query Languages
• Triple-based: querying the structure– RDQL– Triple [successor of SiLRI] (Horn logic)
e.g. Find statements whose subject is … and object is …
• XML-based: querying the syntax– RDF Query – RQuery (XQuery)
e.g. Find description elements whose attribute value contains …
• Graph-based (but not graphical): querying the semantics– RQL (OQL)
e.g. Find resources classified under … whose property value is …
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 9
RDF Query Language (RQL)
• Declarative query language for RDF• Language proposal (not yet a standard)• Based on the RDF-graph representation• Supports RDF Schema (a few from the existing RDF QL do that)• References (small differences between them):
– RQL from ICS-FORTH (Greece) (http://139.91.183.30:9090/RDF/RQL/)– Sesame from Aidministrator (Holland) (http://sesame.aidministrator.nl/)
• The rest of the presentation refers to the Sesame impl.
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 10
RQL Input
• The input to an RQL query is a complete RDF model, i.e. a model that contains its RDFS-closure (defined in RDF Semantics).
• Note that the RDFS-closure includes the RDF-closure• [RDF-closure] e.g. rdf1: if (xxx aaa yyy) then add (aaa rdf:type
rdf:Property) • [RDFS-closure] e.g. rdfs9: if (xxx rdfs:subClassOf yyy) and (aaa
rdf:type xxx) then add (aaa rdf:type yyy)
• There are operators variants (append ^) that discard this new data (intensional data) and consider only the given statements (extensional data) from an RDF model
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 11
Example: RDF Input
rdfs:subPropertyOf
paints
technique Artist
Literal
Literal
first_name
last_name Artifact ExtResource Literal creates
Sculptor Sculpter
Painter Painting
Cubist Flemish
&r1 &r2
sculpts
paints
Rembrandt
van Rijn
first_name
last_name
Literal
Literal
Literal
Literal
mime_type
title
file_size
last_modified
oil on canvas
17
Abraham and Isaac
technique
file_size
title rdfs:subClassOf
rdf:type
<property>
Schema Instance
cult=http://www.icom.com/schema.rdf# adm=http://www.oclc.org/schema.rdf#
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 12
Example:Web Resources
• &r1http://www.european-history.com/rembrandt.html
• &r2http://www.artchive.com/rembrandt/abraham.jpg
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 13
Select-Where-From
select X, Y
from {X}cult:paints{Y},{X}cult:first_name{Xfname}
where Xfname like "Rembrandt"
using namespace cult=http://www.icom.com/schema.rdf#
• Variables on graph labels• Path expressions/conditions use variables and constants • RQL result is a table of tuples (a relation) that has for each
variable (the columns) a value assigned (the rows)
List of variables
List of path expressions
Condition (optional)
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 14
RQL Result
X Yhttp://www.european-history.com/rembrandt.html http://www.artchive.com/rembrandt/artist_at_his_easel.jpg
http://www.european-history.com/rembrandt.html http://www.artchive.com/rembrandt/abraham.jpg
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Bag rdf:ID="query_result"> <rdf:li> <rdf:Seq> <rdf:li rdf:resource="http://www.european-history.com/rembrandt.html"/> <rdf:li rdf:resource="http://www.artchive.com/rembrandt/ artist_at_his_easel.jpg"/> </rdf:Seq> </rdf:li> <rdf:li>…abraham.jpg …</rdf:li> </rdf:RDF>
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 15
Why RQL Result Is a Bag?
select X
from {X}cult:paints{Y},{X}cult:first_name{Xfname}
where Xfname like "Rembrandt"
using namespace
cult=http://www.icom.com/schema.rdf#
Xhttp://www.european-history.com/rembrandt.html
http://www.european-history.com/rembrandt.html
• e.g. if only one variable is returned there might be multiple bindings of this variable with the same value (we need a Bag)
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 16
Namespaces
• All the labels for nodes and edges are associated with a certain namespace
using namespace cult=http://www.icom.com/schema.rdf# adm=http://www.oclc.org/schema.rdf#
• cult contains information intended for museum specialists(e.g. artists, artifacts, museums descriptions)• adm contains information for portal administrators(e.g. title, file_size, mime-type of a certain external resource)
• (Web) Resources are orthogonally classified using the two above schemas
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 17
Select: Variables
• There are three kinds of variables:– Instance: e.g. X– Class: e.g. $C– Property: e.g. @P
• “Find all resources together with their associated classes, properties, and property values”:
select X, $C, @P, Y is equivalent to select *from {X : $C}@P{Y} (* = all variables) from {X : $C}@P{Y}
• “A resource X has type C” has two syntaxes
X : C (not standalone) or C{X} (a path expression that limits a node)
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 18
From: Path Expressions
• Path expressions specify a linear path through the RDF data model
• Each variable used in a path expression is bound to labels from the model
• “Find all painters and their associated paintings”
select Painter, Painting from {Painter}cult:paints{Painting}using namespace cult=http://www.icom.com/schema.rdf#
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 19
The ‘.’ in Path Expressions
• Path expressions can be arbitrarily long• The ‘.’is used to specify a join condition between the object and
the subject of two consecutive properties
select Painter, Painting, Techniquefrom {Painter}cult:paints{Painting}. cult:technique{Technique}using namespace cult=http://www.icom.com/schema.rdf#
• In the above example Painting is the object of cult:paints and the subject of cult:technique
• If Painting is not interesting it can be omittedfrom {Painter}cult:paints. cult:technique{Technique}
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 20
List of Path Expressions
• Since path expressions are linear it is not possible to express two paths with the same origin in one path expression
• List of path expressions sharing variables
select Painter, Painting, Painter_lname
from {Painter}cult:paints{Painting},
{Painter}cult:last_name{Painter_lname}
using namespace
cult=http://www.icom.com/schema.rdf#
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 21
Class of a Resource
select Painter, $Painter, Painting
from {Painter : $Painter}cult:paints{Painting}
using namespace
cult=http://www.icom.com/schema.rdf#
select Painter, Painter_type, Painting
from {Painter}rdf:type{Painter_type}, {Painter}cult:paints{Painting}
using namespace
rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns# ,
cult = http://www.icom.com/schema.rdf#
• Q1 returns the most specific type (class) for a resource while Q2 returns all types of this resource
Q1 (better)
Q2
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 22
Class Restriction for Resourcesselect Painterfrom {Painter :cult:Flemish}cult:paints{Painting}using namespace cult=http://www.icom.com/schema.rdf#
• Note that cult:Flemish must be part of the domain of cult:paints, otherwise the query returns 0 results.
select Painterfrom cult:Flemish{Painter}using namespace cult=http://www.icom.com/schema.rdf#
• Q1 returns multiple times a Flemish painter that has more than one paintings while Q2 does not so.
Q1
Q2 (better)
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 23
Domain and Rangeselect $Domain, $Rangefrom {:$Domain}cult:has_style{:$Range}using namespace cult=http://www.icom.com/schema.rdf#
select domain(@P),@P,range(@P)from {}@P{}where @P = cult:has_styleusing namespace cult=http://www.icom.com/schema.rdf#
• Q1 return data from schema with RDFS-closure while Q2 return data present in schema without RDFS-closure (both are independent of the model instance)
Q1 (better)
Q2
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 24
Where: Condition
• The where clause is optional • The condition constrains the value of variables bound in the
from clause. It uses two kind of operators:– Comparison: <, <=, =, >, >=, != like (with *)[lexical], in [set]– Logical: and, or, not
• The first 5 comparison operators are overloaded for sets or single-valued (classes, properties, reals, integers, and literals/resources) based on set comparison or single-value comparison (subClassOf, subPropertyOf, reals comparison, integers comparison, and lexical comparison)
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 25
Comparison Operators
• “Select all artists, their type, and their first name that have a painting resource containing the string ‘abraham’”
select Artist, $Artist, ArtistFNamefrom {Artist : $Artist} cult:first_name {ArtistFName}where Artist in select Painter
from {Painter} cult:paints {Painting} where Painting like "*abraham*"
using namespace cult = http://www.icom.com/schema.rdf#
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 26
Logical Operators
• “Select all painters with a first name that starts with R and all sculptors with a first name that does not start with M”
select Artist, ArtistFName
from {Artist :$Artist} cult:first_name {ArtistFName}
where ($Artist <= cult:Painter and ArtistFName like "R*")
or
($Artist <= cult:Sculptor and not (ArtistFName like "M*"))
using namespace
cult = http://www.icom.com/schema.rdf#
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 27
Standard Functions
• Standard functions are used to retrieve standard RDFS relationships
• We already did see: domain() and range()
• Other examples: Class, Property, subClassOf(), subPropertyOf(), typeOf() etc.
• The standard functions can be used also as standalone queries
Class
subClassOf ( http://www.icom.com/schema.rdf#Artist )
typeOf( http://www.european-history.com/rembrandt.html ) etc.
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 28
Strict Interpretation with ‘^’• “Retrieve the direct subclasses of Artist”
subClassOf^ ( http://www.icom.com/schema.rdf#Artist )
• “Retrieve all subclasses of Artist”
subClassOf ( http://www.icom.com/schema.rdf#Artist )
• “Retrieve the most specific classes to which the resource http://www.european-history.com/rembrandt.html belongs to”
typeOf^ ( http://www.european-history.com/rembrandt.html )
• “Retrieve the classes to which the resource http://www.european-history.com/rembrandt.html belongs to”
typeOf ( http://www.european-history.com/rembrandt.html )
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 29
Standalone Queries
• The standard functions: Class, subClassOf, Property, subPropertyOf etc.
• Any class (resource of type rdf:Class): returns the extension (resources) of this class
http://www.icom.com/schema.rdf#Artist
• Any property (resource of type rdf:Property): returns the extension (pairs subject-object) of this property
http://www.icom.com/schema.rdf#creates
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 30
Set Operations• The query results can be combined using the following operators:
union, intersect, and minus• “Retrieve the first name and the last name of all painters”
(select PainterR, PainterLName, PainterFNamefrom cult:Painter{PainterR}. cult:last_name{PainterLName}, {PainterR}cult:first_name{PainterFName})union(select PainterR, PainterLName, NULL from cult:Painter{PainterR}. cult:last_name{PainterLName} where not (PainterR in select PainterR from {PainterR}cult:first_name )) using namespace cult = http://www.icom.com/schema.rdf#
Note that not all painters have a first name in the input model (outer join operation)
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 31
Summary
• There is a need for RDF query languages (XML query language cannot handle RDF semantics)
• RQL: declarative query language for uniformly querying RDF schemas and RDF descriptions Select list of variables (variables to be returned) From list of path expressions (variables are bound) Where condition (constrains the value of variables) – Compositional (in and set operations)– Very expressive– Well-defined semantics, syntax can be improved … … but not yet a standard!
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 32
Appendix
• Try your own queries at:http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum
• The result of the query: – HTML Table– RDF-Bag– XML
• Explore the Museum example (with or without inferred statements):– Schema (ontology)– Instance (data statements)
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 33
Exercise 1
• “Find the first name of painters that have paintings using the
‘oil on canvas’ technique and return also these paintings”
select Painter_fname, Paintingfrom {Painter}cult:paints{Painting}. cult:technique{Painting_technique}, {Painter}cult:first_name{Painter_fname}where Painting_technique like "oil on canvas"using namespace cult=http://www.icom.com/schema.rdf# , adm=http://www.oclc.org/schema.rdf#
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 34
Exercise 2
• “Find the first name of the painters that have a painting
stored in a file with size greater than 5”
select Painter_fnamefrom {Painter}cult:paints{Painting}. adm:file_size{Painting_fsize}, {Painter}cult:first_name{Painter_fname}where Painting_fsize > 15using namespace cult=http://www.icom.com/schema.rdf# , adm=http://www.oclc.org/schema.rdf#
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 35
Exercise 3
• “Find the resources which are not of type ExtResource”
• First Solution:
select R
from rdfs:Resource{R}
where not (R in select R
from adm:ExtResource{R})
using namespace
rdfs=http://www.w3.org/2000/01/rdf-schema# ,
adm=http://www.oclc.org/schema.rdf#
ISA
/department of mathematics and computer science
TU/e eindhoven university of technology
April 17, 2003 36
Exercise 3 (cont’d)
• Second solution:
(select R
from rdfs:Resource{R})
minus
(select R
from adm:ExtResource{R})
using namespace
rdfs=http://www.w3.org/2000/01/rdf-schema# ,
adm=http://www.oclc.org/schema.rdf#