© 2007 openlink software, all rights reserved openlink virtuoso – linked data deploying linked...
TRANSCRIPT
© 2007 OpenLink Software, All rights reserved
OpenLink Virtuoso – Linked Data
Deploying Linked Data
© 2007 OpenLink Software, All rights reserved
Linked Data
Term coined by Tim Berners-Lee Describes recommended best practice for exposing & connecting data
on the Semantic Web Use the RDF data model Identify real or abstract things (resources) in your ‘universe of
discourse’ (Data Spaces), using URIs as unique IDs Make URIs accessible via HTTP so people can discover and
explore these Data Spaces Allow these URIs to be dereferenced and return information Include links to provide ‘discovery paths’ to entities in other Data
Spaces
© 2007 OpenLink Software, All rights reserved
Deployment Challenges
Semantic ‘Data Web’ vs Traditional ‘Document Web’ These are two dimensions of the Web separated by a
common element – the URI Document Web
URIs always point to physical resources Data Web
URIs point to physical or abstract resources URIs for the Document and Data Webs must be interpreted
differently
© 2007 OpenLink Software, All rights reserved
Web Resources
What do we really mean by the term ‘resource’? The ‘Traditional’ and Semantic Webs require subtly different
interpretations
© 2007 OpenLink Software, All rights reserved
Document Web Resources
In the traditional Document Web: All resources are document-orientated URI dereferencing returns a document Rendered representation is nearly always a document No real distinction between a resource and its
representation Such resources have been referred to as ‘information
resources’ ‘Document resource’ is arguably a preferable term
© 2007 OpenLink Software, All rights reserved
Semantic Web Resources
In the Semantic Web: A URI need not identify a document-type resource The identity of a resource is distinct from its representation
The resource may have several possible representations The most desirable representation may change,
depending on the consumer (human or software-agent) Such resources are sometimes referred to as ‘non-
information resources’ ‘Data resource’ is a preferable term
© 2007 OpenLink Software, All rights reserved
Access vs Reference
The Semantic and Document Webs interpret the term ‘resource’ differently
A corollary of this difference in interpretation is: The Semantic and Document Webs interpret URIs
differently Document Web: assumes that the resource a URI refers
to is the same as the thing accessed (dereferenced) Semantic Web: the resource a URI refers to is often not
the same as the thing accessed – access returns a description, not the entity itself (e.g. the entity may be Paris)
© 2007 OpenLink Software, All rights reserved
Access vs Reference – Another View
Paraphrasing Pat Hayes’ paper “In Defense of Ambiguity” Names (URIs) are used to both refer to (reference) and
access things Access should be unambiguous
A name (URI) should provide an unambiguous access path
Reference to abstract (physically inaccessible) entities is inherently ambiguous Referring to an abstract entity relies on describing the
entity As there are many possible descriptions (facets),
reference is ambiguous
© 2007 OpenLink Software, All rights reserved
Deployment Challenges
We’ve established that the Semantic Web and Linked Data require:
Data access with unambiguous naming Data (de)reference with ambiguous association
Or put another way, we need mechanisms for an HTTP server to:
Answer the question “Does this URI identify a (physical) document resource or a (RDF) data resource?”
Provide alternative representations of a resource
© 2007 OpenLink Software, All rights reserved
Deployment Challenge Resolution
Two solutions proposed by the SemWeb Community: Distinguish resource type through URL formats
‘Hash’ vs ‘slash’ URLs Content negotiation with URL rewriting
© 2007 OpenLink Software, All rights reserved
‘Hash’ vs ‘Slash’ URLs
A solution using the syntax of the URL to differentiate ‘abstract’ resources from ‘information’ resources
Slash URIs Don’t contain a fragment identifier (#) Identify document resources in traditional Web E.g. http://demo.openlinksw.com/Northwind/Customer/ALFKI
Identifies a physical (X)HTML document Hash URIs
Contain a fragment identifier Identify data resources (entities) in Semantic Web E.g. http://demo.openlinksw.com/Northwind/Customer/ALFKI#this
Identifies the entity ALFKI, distinct from its representation
© 2007 OpenLink Software, All rights reserved
Content Negotiation
Mechanism defined in HTTP specification Makes it possible to serve different versions of a document
(or, more generally, a resource) at the same URL Software agents can choose which version they want.
HTML Web browsers prefer HTML/XHTML Semantic Web browsers prefer RDF/XML
© 2007 OpenLink Software, All rights reserved
Content Negotiation - Example
HTTP Request:
HTML browser requests a HTML/XHTML document in English or French
GET /whitepapers/data_mngmnt HTTP/1.1
Host: www.openlinksw.com
Accept: text/html, application/xhtml+xml
Accept-Language: en, fr
Accept header indicates preferred MIME types RDF browser might instead stipulate a MIME type of
application/rdf+xml or application/rdf+n3
© 2007 OpenLink Software, All rights reserved
Content Negotiation - Example
HTTP Response:
Server redirects to a URL where the appropriate version can be found
HTTP/1.1 302 Found
Location: http://www.openlinksw.com/whitepapers/data_mngmnt.en.html
Redirect is indicated by HTTP status code 302 (Found) Client then sends another HTTP request to the new URL HTTP defines several 3xx status codes for redirection
© 2007 OpenLink Software, All rights reserved
HttpRange-14 Recommendations
W3C TAG guidelines for indicating resource type through HTTP response code (aka the HttpRange-14 issue)
4xx or 5xx(error)
303(see other)
200(success)
HTTP Response Code
Nothing
A URI
A representation
Material Returned
The specified resource or representation format does not exist.
The resource may be an information or non-information resource. The client is being redirected to an associated representation of the resource in the desired format. The URI of the associated resource has been returned.
Requested resource is an information resource.A representation has been returned.
Inference
© 2007 OpenLink Software, All rights reserved
Content Negotiation Decision Table
200 OK406 (Not available in this format) or 303 (Redirect to associated resource in requested representation format)
Entity ID(Data resource)
http://demo.openlinksw.com/Northwind/Customer/ALFKI#this
303 (Redirect to URL that DESCRIBEs the entity http://demo.openlinksw.com/Northwind/Customer/ALFKI#this in a given Data Space)
200 OKDocument resource
http://demo.openlinksw.com/Northwind/Customer/ALFKI
RDFRepresentation
(X)HTMLRepresentation
URI TypeURI
© 2007 OpenLink Software, All rights reserved
URL Rewriting
Is the act of modifying a URL prior to final processing by a Web server
Provides a means to build a URL ‘on the fly’ identifying the resource in the required representation format referred to by a 303 redirection
Ideal solution is a rules-based URL rewriting processing pipeline using regular expression or sprintf substitutions
© 2007 OpenLink Software, All rights reserved
URL Rewriting – Example Pipeline
Last(must be last in processing chain)
For 406:Vary: negotiate, accept Alternates: {“ALFKI” 0.9 {type application/rdf+xml}}
406 (Not acceptable) or 303 redirect to an associated description of the resource
(text/html) | (application/xhtml.xml)
/Northwind/Customer/([^#]*)
Normal(order irrelevant)
None303 redirect to an associated description of the resource
(text/rdf.n3) | (application/rdf.xml)
/Northwind/Customer/([^#]*)
Normal(order irrelevant)
None200 or 303 redirect to a resource with default representation
None (i.e. default)/Northwind/Customer/([^#]*)
Processing OrderHTTP Response Headers Rule
HTTP Response Code
HTTP Accept Header (Regex)
Source URI(Regex)
© 2007 OpenLink Software, All rights reserved
Deploying Linked Data Using Virtuoso
Virtuoso’s approach is to implement the generic solution outlined so far, using Content negotiation URL rewriting
Virtuoso includes a Rules-based URL Rewriter Can be used to inject Semantic Web data into the
Document Web
© 2007 OpenLink Software, All rights reserved
URL Rewriting Example – The Aim
URI dereferenced by RDF browser client
<http://demo.openlinksw.com/Northwind/Customer/ALFKI> or<http://demo.openlinksw.com/Northwind/Customer/ALFKI#this>
becomes after rewriting (omitting URL encoding)
/sparql?query =CONSTRUCT{ <http://demo.openlinksw.com/Northwind/Customer/ALFKI#this> ?p ?o }FROM <http://demo.openlinksw.com/Northwind/>WHERE { <http://demo.openlinksw.com/Northwind/Customer/ALFKI#this> ?p ?o }
© 2007 OpenLink Software, All rights reserved
URL Rewriting for RDF Browser
© 2007 OpenLink Software, All rights reserved
URL Rewriting for iSparql
iSparql Query Buildere.g. Browsing RDF View: <http://demo.openlinksw.com/Northwind>
Dereferencing: <http://demo.openlinksw.com/Northwind/Customer/ALFKI#this> or<http://demo.openlinksw.com/Northwind/Customer/ALFKI>
UI supports two commands for dereferencing a URI: ‘Explore’ (i.e. Get all links to & from)
SELECT ?property ?hasValue ?isValueOf WHERE {{ <http://demo.openlinksw.com/Northwind/Customer/ALFKI#this>
?property ?hasValue } UNION { ?isValueOf ?property <http://demo.openlinksw.com/Northwind/Customer/ALFKI#this> }}
‘Get Dataset’ (i.e. Treat URI as a subgraph) SELECT * FROM
<http://demo.openlinksw.com/Northwind/Customer/ALFKI#this>WHERE { ?s ?p ?o }
© 2007 OpenLink Software, All rights reserved
URL Rewriting for iSparql: Issues
‘Get Dataset’ Option – Issues with URI being dereferenced:<http://demo.openlinksw.com/Northwind/Customer/ALFKI#this>
Assumes URI is a named graph – It isn’t! It’s a unique node ID (object ID / entity instance ID) The only graph defined by our RDF View is:
<http://demo.openlinksw.com/Northwind> It’s not directly dereferenceable
The cure ? Construct a subgraph using URL rewriting !
© 2007 OpenLink Software, All rights reserved
Northwind URL Rewriting: The Aim
Aim of URL rewriting for the Northwind RDF view:
Create a rule for RDF browsers which will map an IRI
<http://demo.openlinksw.com/Northwind/Customer/something>
to a SPARQL query
CONSTRUCT <iri> ?p ?o FROM <http://demo.openlinksw.com/Northwind/>
WHERE { <iri> ?p ?o }
and rewrite the request as /sparql?query=CONSTRUCT ...
© 2007 OpenLink Software, All rights reserved
Virtuoso - URL Rewriter Key Elements
Rewriting Rule Describes how to parse a ‘nice’ URL and compose the
actual ‘long’ URL of the resource to be returned Two types: sprintf-based and regex-based
Rewriting Rule List Named, ordered list of rewriting rules or rule lists Tried from top to bottom, first matching rule is applied
Conductor UI for rewriting rule configuration Configuration API – alternative to Conductor UI, for scripts
Functions for creating, dropping, enumerating rules & rule lists
© 2007 OpenLink Software, All rights reserved
Conductor UI for URL Rewriter
© 2007 OpenLink Software, All rights reserved
URL Rewriter API: Enabling Rewriting
Enabled through vhost_define( ) function vhost_define( ) defines a virtual host or virtual path opts parameter is a vector of field-value pairs Field url_rewrite controls / enables URL rewriting Field value is the IRI of the rule list to apply
e.g. VHOST_DEFINE (lpath=>'/Northwind, ppath=>'/DAV/Northwind/',vhost=>‘demo.openlinksw.com', lhost=>'192.168.11.2:80', is_dav=>1,vsp_user=>'dba', is_brws=>0, opts=>vector ('url_rewrite',
'oplweb_rule_list1'));
© 2007 OpenLink Software, All rights reserved
URL Rewriter API: Summary
Functions in DB.DBA schema: URLREWRITE_CREATE_SPRINTF_RULE URLREWRITE_CREATE_REGEX_RULE URLREWRITE_CREATE_RULELIST URLREWRITE_DROP_RULE URLREWRITE_DROP_RULELIST URLREWRITE_ENUMERATE_RULES URLREWRITE_ENUMERATE_RULELISTS
© 2007 OpenLink Software, All rights reserved
‘Nice’ URLs vs ‘Long’ URLs
Rewriter developed with broader objectives than Linked Data – consequently influenced terminology
Rewriter takes a ‘nice’ URL and rewrites it as a ‘long’ URL ‘Nice’ URL
Free from parameters, typically short ‘Long’ URL
Typically contains query string with named parameters Often ignored by web crawlers (viewed as highly
dynamic) => low page ranking
© 2007 OpenLink Software, All rights reserved
Sprintf Rules vs Regex Rules
For ‘nice’ to ‘long’ URL conversion Functionally equivalent Only difference is syntax of match pattern definition
For ‘long’ to ‘nice’ URL conversion Only works for sprintf-based rules Regex-based rules are unidirectional
© 2007 OpenLink Software, All rights reserved
URLREWRITE_CREATE_REGEX_RULE
URLREWRITE_CREATE_REGEX_RULE (rule_iri, allow_update, nice_match, nice_params, nice_min_params, target_compose, target_params, target_expn := null, accept_pattern := null, do_not_continue := 0, http_redirect_code := null ) ;
rule_iri: rule’s name / identifier
nice_match: regex to parse URL into a vector of ‘occurrences’
nice_params: vector of names of the parsed parameters.Length of vector equals # of ‘(…)’ specifiers in the regex
target_compose: ‘compose’ regex for the destination URL
target_params: vector of names of parameters to pass to the ‘compose’ expression as $1, $2 etc
target_expn: optional SQL text to execute instead of a regex compose
accept_pattern: regex expression to match the HTTP Accept header
do_not_continue: on a match, try / don’t try next rule in rule list
http_redirect_code: null, 301, 302 or 303. 30x => HTTP redirect
© 2007 OpenLink Software, All rights reserved
Rewriting Process
If current virtual directory has ‘url_write’ option set, server traverses any associated rule list recursively.
For each rule in rule list: Input for rule is normalised URL from first ‘/’ after host:port If rule’s regex matches, result is a vector of values Names & values of parameters in any query string or the
request body are decoded Destination URL is composed
© 2007 OpenLink Software, All rights reserved
Destination URL - Parameter Handling
Value of each parameter is taken from (in order of priority):
Value of a parameter in the match result Value of a named parameter in the input query string If POST request, value of a named parameter in request
body
If parameter value cannot be derived from above sources, next rule is applied
© 2007 OpenLink Software, All rights reserved
URL Rewriter API – Northwind Example
Rewriting rule: DB.DBA.URLREWRITE_CREATE_REGEX_RULE ('oplweb_rule1‘, 1, '([^#]*)‘, vector('path'), 1,'/sparql?query=CONSTRUCT+{+%%3Chttp%%3A//demo.openlinksw.com%U%
%23this%%3E+%%3Fp+%%3Fo+}+FROM+%%3Chttp%%3A//demo.openlinksw.com/Northwind/%%3E+WHERE+{+%%3Chttp%%3A//demo.openlinksw.com%U%%23this%%3E+%%3Fp+%%3Fo+}&format=%U’,
vector('path', 'path', '*accept*'),null, '(text/rdf.n3)|(application/rdf.xml)', 0, 303);
In effect (omitting URL encoding):/sparql?query = CONSTRUCT { %U ?p ?o } FROM
<http://demo.openlinksw.com/Northwind/> WHERE { %U ?p ?o }
where %U is a placeholder for the original URI
© 2007 OpenLink Software, All rights reserved
URL Rewriter API – Northwind Example
Arguments in previous rule defined by URLREWRITE_CREATE_REGEX_RULE: nice_match arg: ([^#]*)
regex matches input IRI up to fragment delimiter
nice_params arg: vector('path') ‘path’ is name of first match group in nice_match regex
accept_pattern arg: (text/rdf.n3)|(application/rdf.xml) regex to match HTTP Accept header
target_params arg: vector('path', 'path', '*accept*') names of params whose values will replace %U placeholders in the target
URL pattern *accept* passes matched part of Accept header
for substitution into &format=%U portion of query stringe.g. application/rdf.xml
© 2007 OpenLink Software, All rights reserved
URL Rewriter API – Northwind Example
Enabling Rewriting:
DB.DBA.URLREWRITE_CREATE_RULELIST ('oplweb_rule_list1',1,vector ('oplweb_rule1'));
-- ensure a Virtual Directory /oplweb existsVHOST_REMOVE (lpath=>'/Northwind', vhost=>‘demo.openlinksw.com',lhost=>'192.168.11.2:80');VHOST_DEFINE (lpath=>'/Northwind', ppath=>'/DAV/Northwind/',vhost=>‘demo.openlinksw.com', lhost=>'192.168.11.2:80', is_dav=>1,vsp_user=>'dba', is_brws=>0, opts=>vector ('url_rewrite',
'oplweb_rule_list1'));
© 2007 OpenLink Software, All rights reserved
URL Rewriter - Verification with curl
curl utility provides a useful tool for verifying HTTP server responses and rewriting rules
$ curl -I -H "Accept: application/rdf+xml"http://demo.openlinksw.com/Northwind/Customer/ALFKIHTTP/1.1 303 See OtherServer: Virtuoso/05.00.3016 (Solaris) x86_64-sun-solaris2.10-64 PHP5Connection: closeContent-Type: text/html; charset=ISO-8859-1Date: Tue, 14 Aug 2007 13:30:22 GMTAccept-Ranges: bytesLocation:/sparql?query=CONSTRUCT+{+%3Chttp%3A//demo.openlinksw.com/Northwind/Customer/ALFKI%23this%3E+%3Fp+%3Fo+}+FROM+%3Chttp%3A//demo.openlinksw.com/Northwind%3E+WHERE+{+%3Chttp%3A//demo.openlinksw.com/Northwind/Customer/ALFKI%23this%3E+%3Fp+%3Fo+}&format=application/rdf%2BxmlContent-Length: 0
© 2007 OpenLink Software, All rights reserved
URL Rewriter – URIQADefaultHost Macro
URIQADefaultHost Macro Makes rewriting rules (& RDF View definitions) more
portable Each occurrence is substituted with the value of the
DefaultHost parameter in URIQA section of virtuoso.ini configuration file
DefaultHost ::= server name. e.g. www.example.com:8890'/sparql?query=CONSTRUCT+{+%%3Chttp%%3A//^{URIQADefaultHost}^%U%
%23this%%3E+%%3Fp+%%3Fo+}+FROM+%%3Chttp%%3A//^{URIQADefaultHost}^/Northwind/%%3E+WHERE+{+%%3Chttp%%3A//^{URIQADefaultHost}^%U%%23this%%3E+%%3Fp+%%3Fo+}&format=%U'