labels in the web of data

31
KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association INSTITUTE FOR APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS www.kit.edu Labels in the Web of Data Basil Ell , Denny Vrandečić, and Elena Simperl 10th International Semantic Web Conference, Bonn 26 October 2011

Upload: basil-ell

Post on 10-May-2015

233 views

Category:

Science


3 download

DESCRIPTION

This presentation was given at 10th International Semantic Web Conference, Bonn, and is related the publication of the same title. Abstract of the publication: Entities on the Web of Data need to have labels in order to be exposable to humans in a meaningful way. These labels can then be used for exploring the data, i.e., for displaying the entities in a linked data browser or other front-end applications, but also to support keyword-based or natural-language based search over the Web of Data. Far too many applications fall back to exposing the URIs of the entities to the user in the absence of more easily understandable representations such as human-readable labels. In this work we introduce a number of label-related metrics: completeness of the labeling, the efficient accessibility of the labels, unambiguity of labeling, and the multilinguality of the labeling. We report our findings from measuring the Web of Data using these metrics. We also investigate which properties are used for labeling purposes, since many vocabularies define further labeling properties beyond the standard property from RDFS. The publication is available at http://www.aifb.kit.edu/images/c/c0/LabelsInTheWebOfData.pdf

TRANSCRIPT

Page 1: Labels in the web of data

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

INSTITUTE FOR APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS

www.kit.edu

Labels in the Web of Data

Basil Ell, Denny Vrandečić, and Elena Simperl

10th International Semantic Web Conference, Bonn

26 October 2011

Page 2: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

2 31.03.2014

Main message in a nutshell

Labels necessary but often

missing (62%) or problematic

Findings relevant for linked

data publishers & consumers

Relevant for front-end tools

(linked data browsers,

semantic web search engines)

Basil Ell – Labels in the Web of Data

Page 3: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

3 31.03.2014 Basil Ell – Labels in the Web of Data

Outline

Motivation

Related Work

Challenges

Labeling properties

Metrics

Results

Guidelines

Conclusions

Page 4: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

4 31.03.2014

MOTIVATION

Basil Ell – Labels in the Web of Data

Page 5: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

5 31.03.2014

Motivation

Where labels are necessary

Displaying labels instead of URIs to end-users

Searching over the Web of Data

Document annotation

Basil Ell – Labels in the Web of Data

Page 6: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

6 31.03.2014

Motivation Scenario: linked data browsing

Basil Ell – Labels in the Web of Data

[SIGMA]

Is this

meaningful to

users?

Page 7: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

7 31.03.2014

RELATED WORK

Basil Ell – Labels in the Web of Data

Page 8: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

8 31.03.2014

Related work (1/3)

Linked data browsers – how they deal with the

problem of missing labels

Display URI

Display last part of URI

Let user select labeling properties

Linked data summarization & verbalization

Basil Ell – Labels in the Web of Data

Page 9: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

9 31.03.2014

Related work (2/3)

Semantic search engines such as Falcons,

Sindice, MicroSearch, Watson, SWSE, Swoogle

provide keyword-based searches

Rely on existence of nodes that are labeled or on

meaningful URIs

Basil Ell – Labels in the Web of Data

Page 10: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

10 31.03.2014

Related work (3/3)

[Azlinayati et al.] analyzed identifiers and labels in

219 ontologies

Teminological data

Web of Data mainly consists of instance data

Basil Ell – Labels in the Web of Data

Page 11: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

11 31.03.2014

CHALLENGES

Basil Ell – Labels in the Web of Data

Page 12: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

12 31.03.2014

Challenges

Multitude of labeling properties

Missing labels

Label selection & ambiguity

Multilinguality

Basil Ell – Labels in the Web of Data

Page 13: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

13 31.03.2014

LABELING PROPERTIES

Basil Ell – Labels in the Web of Data

Page 14: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

14 31.03.2014

Labeling properties

Basil Ell – Labels in the Web of Data

BT

C2

01

0 d

ata

36 labeling properties

identified in

3,167,799,445 ntriples.

http://www.w3.org/2000/01/rdf-schema#label

http://xmlns.com/foaf/0.1/nick

http://purl.org/dc/elements/1.1/title

http://purl.org/rss/1.0/title

http://xmlns.com/foaf/0.1/name

http://purl.org/dc/terms/title

http://www.geonames.org/ontology#name

http://xmlns.com/foaf/0.1/nickname

http://swrc.ontoware.org/ontology#name

http://sw.cyc.com/CycAnnotations_v1#label

http://rdf.opiumfield.com/lastfm/spec#title

http://www.proteinontology.info/po.owl#ResidueName

http://www.proteinontology.info/po.owl#Atom

http://www.proteinontology.info/po.owl#Element

http://www.proteinontology.info/po.owl#AtomName

http://www.proteinontology.info/po.owl#ChainName

http://purl.uniprot.org/core/fullName

http://purl.uniprot.org/core/title

Page 15: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

15 31.03.2014

METRICS

Basil Ell – Labels in the Web of Data

Page 16: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

16 31.03.2014

Metrics

1. Completeness

2. Efficient accessibility

3. Unambiguity

4. Multilinguality

Basil Ell – Labels in the Web of Data

Page 17: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

17 31.03.2014

Completeness

All non-information resources should have labels

Labeling completeness metric LC

Ratio of regarded entities with at least one label

Notation:

Basil Ell – Labels in the Web of Data

LC

Labeling properties

Dataset

Regarded entities

LC(D) should be 1

LCrdfs

NIRD

LClp

NIRD

LCrdfs

NIRD

1)

2)

3)

Page 18: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

18 31.03.2014

Efficient accessibility (1/2)

URIs without labels can be dereferenced

Example

ex:Bonn ex:location ex:Germany .

ex:Bonn rdfs:label “Bonn“ .

Need to dereference ex:location and ex:Germany

before displaying first triple

LE: ratio of all mentioned URIs with at least one

label

Basil Ell – Labels in the Web of Data

Page 19: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

19 31.03.2014

Efficient accessibility (2/2)

Metric parameter for a set of entities with known

labels (FOAF, GoodRelations, ...)

Example

ex:Basil foaf:img ex:basil.jpg

ex:Basil rdfs:label “Basil“

LE should be 1

Basil Ell – Labels in the Web of Data

L Erdfs

D 0.5L Erdfs

foafD 1

Page 20: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

20 31.03.2014

Unambiguity

An entity can have multiple labels (e.g. synonyms)

that are not differentiated (e.g. by language)

LUf is the ratio of all entities that have exactly one

preferred label according to a selection procedure f

Example

ex:loc rdfs:label „place“ .

ex:loc rdfs:label „location“ .

LUf should be 1

Basil Ell – Labels in the Web of Data

Page 21: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

21 31.03.2014

Multilinguality

Language tags used on literals to state their

natural language

Display according to user’s language preferences

Example

"Bonn"@en or " "@ko or "Bonna"@la

LLN: number of label languages

LLClang: completeness for language lang

Basil Ell – Labels in the Web of Data

Page 22: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

22 31.03.2014

RESULTS Measurements on BTC 2010 corpus

Basil Ell – Labels in the Web of Data

Page 23: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

23 31.03.2014

Results: Completeness

Labeled 38%

Unlabeled 62%

Non-information resources

Basil Ell – Labels in the Web of Data

Page 24: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

24 31.03.2014

Results: Efficient accessability

Basil Ell – Labels in the Web of Data

top: 10 most occurring vocabulary namespaces L EBTC

top

about 5 data sets per PLD, 741 data sets

Page 25: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

25 31.03.2014

Results: Unambiguity

Unambiguous 98%

Ambiguous 2%

Basil Ell – Labels in the Web of Data

Unambiguity rate of 0.98

Page 26: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

26 31.03.2014

Result: Multilinguality

1.80%

1.83%

2.41%

2.71%

2.98%

3.28%

3.34%

3.36%

3.64%

3.69%

3.96%

5.11%

5.22%

44.72%

fi

sv

no

zh

pt

ru

es

nl

ja

pl

it

fr

de

en

Basil Ell – Labels in the Web of Data

4.78% of NIR labels have language tag

2.2% of datasources contained one language

0.7% of datasources contained several languages

Page 27: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

27 31.03.2014

GUIDELINES

Basil Ell – Labels in the Web of Data

Page 28: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

28 31.03.2014

Guidelines

Provide labels for all URIs mentioned in a

given RDF graph

Provide a complete set of labels in all

supported languages

Subproperty your labeling properties with rdfs:label

Do not provide more than one preferred

label for each URI

Basil Ell – Labels in the Web of Data

Page 29: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

29 31.03.2014

Conclusions

Defined four parameterizeable metrics

Suggested guidelines for labeling

Many problems due to LOD principles

Solution: serving data via SPARQL?

Application can exactly specify its need

Labeling is essential for the Web of Data to

become widely used

Basil Ell – Labels in the Web of Data

Page 30: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

30 31.03.2014

QUESTIONS? Thank you for your attention

Basil Ell – Labels in the Web of Data

? ? ?

Page 31: Labels in the web of data

KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods

31 31.03.2014

Conclusions

98%

2%

Unambiguity

Unambiguous

Ambiguous

Basil Ell – Labels in the Web of Data

38%

62%

Completeness (NIR)

Labeled Unlabeled

2.98%

3.28%

3.34%

3.36%

3.64%

3.69%

3.96%

5.11%

5.22%

44.72%

pt

ru

es

nl

ja

pl

it

fr

de

en

Multilinguality

Efficient accessability