labels in the web of data
DESCRIPTION
This presentation was given at 10th International Semantic Web Conference, Bonn, and is related the publication of the same title. Abstract of the publication: Entities on the Web of Data need to have labels in order to be exposable to humans in a meaningful way. These labels can then be used for exploring the data, i.e., for displaying the entities in a linked data browser or other front-end applications, but also to support keyword-based or natural-language based search over the Web of Data. Far too many applications fall back to exposing the URIs of the entities to the user in the absence of more easily understandable representations such as human-readable labels. In this work we introduce a number of label-related metrics: completeness of the labeling, the efficient accessibility of the labels, unambiguity of labeling, and the multilinguality of the labeling. We report our findings from measuring the Web of Data using these metrics. We also investigate which properties are used for labeling purposes, since many vocabularies define further labeling properties beyond the standard property from RDFS. The publication is available at http://www.aifb.kit.edu/images/c/c0/LabelsInTheWebOfData.pdfTRANSCRIPT
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
INSTITUTE FOR APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS
www.kit.edu
Labels in the Web of Data
Basil Ell, Denny Vrandečić, and Elena Simperl
10th International Semantic Web Conference, Bonn
26 October 2011
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
2 31.03.2014
Main message in a nutshell
Labels necessary but often
missing (62%) or problematic
Findings relevant for linked
data publishers & consumers
Relevant for front-end tools
(linked data browsers,
semantic web search engines)
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
3 31.03.2014 Basil Ell – Labels in the Web of Data
Outline
Motivation
Related Work
Challenges
Labeling properties
Metrics
Results
Guidelines
Conclusions
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
4 31.03.2014
MOTIVATION
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
5 31.03.2014
Motivation
Where labels are necessary
Displaying labels instead of URIs to end-users
Searching over the Web of Data
Document annotation
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
6 31.03.2014
Motivation Scenario: linked data browsing
Basil Ell – Labels in the Web of Data
[SIGMA]
Is this
meaningful to
users?
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
7 31.03.2014
RELATED WORK
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
8 31.03.2014
Related work (1/3)
Linked data browsers – how they deal with the
problem of missing labels
Display URI
Display last part of URI
Let user select labeling properties
Linked data summarization & verbalization
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
9 31.03.2014
Related work (2/3)
Semantic search engines such as Falcons,
Sindice, MicroSearch, Watson, SWSE, Swoogle
provide keyword-based searches
Rely on existence of nodes that are labeled or on
meaningful URIs
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
10 31.03.2014
Related work (3/3)
[Azlinayati et al.] analyzed identifiers and labels in
219 ontologies
Teminological data
Web of Data mainly consists of instance data
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
11 31.03.2014
CHALLENGES
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
12 31.03.2014
Challenges
Multitude of labeling properties
Missing labels
Label selection & ambiguity
Multilinguality
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
13 31.03.2014
LABELING PROPERTIES
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
14 31.03.2014
Labeling properties
Basil Ell – Labels in the Web of Data
BT
C2
01
0 d
ata
36 labeling properties
identified in
3,167,799,445 ntriples.
http://www.w3.org/2000/01/rdf-schema#label
http://xmlns.com/foaf/0.1/nick
http://purl.org/dc/elements/1.1/title
http://purl.org/rss/1.0/title
http://xmlns.com/foaf/0.1/name
http://purl.org/dc/terms/title
http://www.geonames.org/ontology#name
http://xmlns.com/foaf/0.1/nickname
http://swrc.ontoware.org/ontology#name
http://sw.cyc.com/CycAnnotations_v1#label
http://rdf.opiumfield.com/lastfm/spec#title
http://www.proteinontology.info/po.owl#ResidueName
http://www.proteinontology.info/po.owl#Atom
http://www.proteinontology.info/po.owl#Element
http://www.proteinontology.info/po.owl#AtomName
http://www.proteinontology.info/po.owl#ChainName
http://purl.uniprot.org/core/fullName
http://purl.uniprot.org/core/title
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
15 31.03.2014
METRICS
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
16 31.03.2014
Metrics
1. Completeness
2. Efficient accessibility
3. Unambiguity
4. Multilinguality
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
17 31.03.2014
Completeness
All non-information resources should have labels
Labeling completeness metric LC
Ratio of regarded entities with at least one label
Notation:
Basil Ell – Labels in the Web of Data
LC
Labeling properties
Dataset
Regarded entities
LC(D) should be 1
LCrdfs
NIRD
LClp
NIRD
LCrdfs
NIRD
1)
2)
3)
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
18 31.03.2014
Efficient accessibility (1/2)
URIs without labels can be dereferenced
Example
ex:Bonn ex:location ex:Germany .
ex:Bonn rdfs:label “Bonn“ .
Need to dereference ex:location and ex:Germany
before displaying first triple
LE: ratio of all mentioned URIs with at least one
label
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
19 31.03.2014
Efficient accessibility (2/2)
Metric parameter for a set of entities with known
labels (FOAF, GoodRelations, ...)
Example
ex:Basil foaf:img ex:basil.jpg
ex:Basil rdfs:label “Basil“
LE should be 1
Basil Ell – Labels in the Web of Data
L Erdfs
D 0.5L Erdfs
foafD 1
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
20 31.03.2014
Unambiguity
An entity can have multiple labels (e.g. synonyms)
that are not differentiated (e.g. by language)
LUf is the ratio of all entities that have exactly one
preferred label according to a selection procedure f
Example
ex:loc rdfs:label „place“ .
ex:loc rdfs:label „location“ .
LUf should be 1
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
21 31.03.2014
Multilinguality
Language tags used on literals to state their
natural language
Display according to user’s language preferences
Example
"Bonn"@en or " "@ko or "Bonna"@la
LLN: number of label languages
LLClang: completeness for language lang
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
22 31.03.2014
RESULTS Measurements on BTC 2010 corpus
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
23 31.03.2014
Results: Completeness
Labeled 38%
Unlabeled 62%
Non-information resources
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
24 31.03.2014
Results: Efficient accessability
Basil Ell – Labels in the Web of Data
top: 10 most occurring vocabulary namespaces L EBTC
top
about 5 data sets per PLD, 741 data sets
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
25 31.03.2014
Results: Unambiguity
Unambiguous 98%
Ambiguous 2%
Basil Ell – Labels in the Web of Data
Unambiguity rate of 0.98
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
26 31.03.2014
Result: Multilinguality
1.80%
1.83%
2.41%
2.71%
2.98%
3.28%
3.34%
3.36%
3.64%
3.69%
3.96%
5.11%
5.22%
44.72%
fi
sv
no
zh
pt
ru
es
nl
ja
pl
it
fr
de
en
Basil Ell – Labels in the Web of Data
4.78% of NIR labels have language tag
2.2% of datasources contained one language
0.7% of datasources contained several languages
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
27 31.03.2014
GUIDELINES
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
28 31.03.2014
Guidelines
Provide labels for all URIs mentioned in a
given RDF graph
Provide a complete set of labels in all
supported languages
Subproperty your labeling properties with rdfs:label
Do not provide more than one preferred
label for each URI
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
29 31.03.2014
Conclusions
Defined four parameterizeable metrics
Suggested guidelines for labeling
Many problems due to LOD principles
Solution: serving data via SPARQL?
Application can exactly specify its need
Labeling is essential for the Web of Data to
become widely used
Basil Ell – Labels in the Web of Data
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
30 31.03.2014
QUESTIONS? Thank you for your attention
Basil Ell – Labels in the Web of Data
? ? ?
KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods
31 31.03.2014
Conclusions
98%
2%
Unambiguity
Unambiguous
Ambiguous
Basil Ell – Labels in the Web of Data
38%
62%
Completeness (NIR)
Labeled Unlabeled
2.98%
3.28%
3.34%
3.36%
3.64%
3.69%
3.96%
5.11%
5.22%
44.72%
pt
ru
es
nl
ja
pl
it
fr
de
en
Multilinguality
Efficient accessability