weaving the pedantic web (ld

28
Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e 0:39:00 1 Weaving the Pedantic Web LDOW 2010 Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, Axel Polleres

Upload: aidan-hogan

Post on 14-Jan-2015

3.307 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Weaving the Pedantic Web (LD

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

0:39:00 1

Weaving the Pedantic Web

LDOW 2010Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan

Decker, Axel Polleres

Page 2: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

2

Linked Data…

Page 3: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

3

Purpose of talk: Application developers… how to not sink…

Page 4: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

4

Purpose of talk: RDF Publishers… how to avoid common mistakes…

Page 5: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

5

Talking about errors in Linked Data…

We’ll try not to ruin the party

…statistics based on crawl: April 2009 5k domain limit 150k URIS, 55k RDF docs 12.5m triples (quads) Mentioning 1.6m URIs 5,850 classes/9,507 props Accept: application/rdf+xml

…okay… so no RDFa

Statistics are *illustrative* not exhaustive!

Page 6: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

6

Chapter 1: HTTP-level issues… …a good RDF description these days is hard to find

Page 7: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

7

Waldo URIs: URIs with no dereferencable RDF

Not a crawler’s idea of fun…

Page 8: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

8

Hmm not *so* many…

5.3% of HTTP URIs return 40x/50x Excluding redirects… 92.8% return 200 OK

In return, only 45.4% of 200 Okay return report application/rdf+xml

34.8% return HTML… probably just HTML docs… okay… maybe a *few* contain RDFa

Page 9: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

9

Lies… Damned Lies… & Content-Type Reporting

“Trust me, it’s RDF/XML”

Page 10: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

10

Okay… So he’s actually pretty honest

16.9% of valid RDF/XML documents returned with an invalid/more generic Content-type:

text/xml (9.5%)application/xml (5.9%)text/plain (1%)text/html (0.4%)

Of those returning Content-type:application/rdf+xml

98.8% were valid RDF/XML

Page 11: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

11

I wish they’d used a redirect…

Same triples, different document

Page 12: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

12

E.g., the Miracle at Calais: turning 1,778 triples into ~∞ quads

http://d.opencalais.com/1/type/em/r/SameTriplesDifferentDocument

(apologies to OpenCalais guys – it’s just a convenient example)

Page 13: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

13

Chapter 2: Reasoning issues… …or, how I learned to start worrying and stop loving

OWL

Page 14: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

14

It looks important, but I’m afraid I don’t fully follow

Undefined classes and properties…

Page 15: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

15

Quite common…

14.3% of triples use undeclared property 8.1% of triples use undeclared class

Three cases:

Case 1: Namespace has no vocabulary/is not deferencable

(e.g., rss:item) Case 2: Term invented in related namespace (e.g., foaf:tagLine invented by LiveJournal) Case 3: Term is misspelt version of term defined in namespace (e.g., foaf:image vs. foaf:img)

Page 16: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

16

Despite what you claim, not all of you can *actually be* Spartacus

Not-so-unique values for Inverse-Functional Properties

Page 17: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

17

Spartacus relived…

08445a31a78661b5c746feff39a9db6e4e2cc5cf

sha1-sum of ‘mailto:’ common value for foaf:mbox_sha1sum

An inverse-functional (uniquely identifying) property!!!

Any person who shares the same value will be considered the same

*I’m Spartacus!*…and so’s my wife

Page 18: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

18

…unattended, can be pretty serious…

foaf:mbox_sha1sum a owl:InverseFunctionalProperty .

?x foaf:mbox_sha1sum 08445a31a78661b5c746feff39a9db6e4e2cc5cf .

OWL 2 RL rule prp-ifp: ?p a owl:InverseFunctionalProperty . ?x1 ?p ?z . ?x2 ?p ?z .

⇒ ?x1 owl:sameAs ?x2 .

106 ?x1/?x2 bindings in body 1012 inferred pair-wise and reflexive owl:sameAs statements

…or in simpler terms: pow!

Page 19: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

19

As he would undoubtedly be able to tell you, “true” is not a valid xsd:int

Malformed/incompatible datatypes

Page 20: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

20

Not *too* bad…

4.7% of typed literals were “ill-typed” (lexically invalid)… mostly xsd:dateTimes (26.4% of all date-time

literals were invalid; e.g., omitted the seconds field)

Also, literals are sometimes incompatible with the datatype-range of a property: E.g., 21.8% of ical:description triples used

language tags incompatible with the defined range of xsd:string

E.g., 100% of sl:creationDate triples use plain literal values incompatible with defined range of xsd:date

Page 21: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

21

Despite what FOAF says, it seems thatPersons can also be Documents

Mystical beings… Members of disjoint classes

Page 22: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

22

Again, not *too* bad…

1,329 members of disjoint classes found

Generally caused by naïve URI naming: Use of information resource URIs to name

entities (particularly foaf:Persons) E.g., <me> foaf:knows <jim/foaf.rdf> .

Page 23: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

23

Anybody can say anything, anywhere, and unfortunately for everyone else, have a good chance of being taken

seriously

Ontology hijacking…

Page 24: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

24

From http://www.eiao.net/rdf/1.0<owl:Property rdf:about="http://www.w3.org/1999/02/22-rdf-syntax-ns#type">

<rdfs:label xml:lang="en">type</rdfs:label><rdfs:comment xml:lang="en">Type of resource</rdfs:comment><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#testRun"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#pageSurvey"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#siteSurvey"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#scenario"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#rangeLocation"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#startPointer"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#endPointer"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#header"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#runs"/>

</owl:Property>

Ontology hijacking!!(apologies to EIAO guys – it’s just a convenient example)

Redefining Everything… …and home in time for tea

Page 25: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

25

Solutions?

Page 26: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

26

All presented issues have a suitable antidote, once you know about them

See paper for discussion…

Application side: workarounds

Page 27: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

27

Syntax errors quite rare, partly due to popularity of W3C RDF/XML syntax validator

Need an all-in-one validation service Should not only validate strict errors, but

give feedback on suspected issues We offer a prototypical service at:

http://swse.deri.org/RDFAlerts/

Publishing side: Validators!

Page 28: Weaving the Pedantic Web (LD

Digital Enterprise Research Institute www.deri.ie

28

Get the community to contact publishers about errors/issues as they arise

Get involved: http://pedantic-web.org/ 137 members!

Acknowledgements to: Aidan Hogan, Alex Passant, Me, Antoine Zimmermann, Axel Polleres, Michael Hausenblas, Richard Cyganiak, Stéphane Corlosquet

Publishing side: Pedantic Web Group