the rdf report card: beyond the triple count

Post on 21-Aug-2015

11.801 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Leigh Dodds@ldodds

http://kasabi.comhttp://slideshare.net/ldodds

The RDF Report Card

Beyond the Triple Count

26th September 2011

SemTechBiz 2011

Triple counts tell us nothing

Triple counts are not a quality indicator

http://dbpedia.org/resource/London

6 triples for Population Density

Property Count Value

http://dbpedia.org/ontology/PopulatedPlace/populationDensity 2 4807.04806.971873853451

http://dbpedia.org/ontology/populationDensity 2 4806.9718744807.000000

http://dbpedia.org/property/populationDensityKm 1 4807

http://dbpedia.org/property/populationDensitySqMi 1 12450

12 triples for Location (1)

Property Count Value

georss:point 1 51.507222222222225 -0.1275

geo:geometry 1 POINT(-0.1275 51.5072)

geo:lat 1 51.507221

geo:long 1 -0.127500

12 triples for Location (2)Property Count Value

dbpprop:latd 1 51

dbpprop:latm 1 30

dbpprop:lats 1 26

dbpprop:latns 1 N

dbpprop:longd 1 0

dbpprop:longm 1 7

dbpprop:longs 1 39

dbpprop:longew 1 W

~4.6m redundant triples

Triple counts don't indicate utility

http://bbc.co.uk/programmes

2.5 million unique users per week, 60 req/s*

*http://www.guardian.co.uk/media/pda/2011/apr/06/bbc-yves-raimond

http://bbc.co.uk/programmes

Dataset is less than 50 million triples

Beyond the Triple Count

Dataset Information Spectrum

High DetailLow Detail

Summary and overview of dataset content

Detailed data model documentation & guides

Dataset Information Spectrum

High DetailLow Detail

Summary and overview of dataset content

Detailed data model documentation & guides

More Information

Dataset Information Spectrum

High DetailLow Detail

● Title, Description● Provenance● Publication dates● Licensing● Usage cues● Related datasets

Metadata

Dataset Information Spectrum

High DetailLow Detail

Scope ● What types of entity?● How many of each type?● Coverage

● Geographic● Events (time)

Dataset Information Spectrum

High DetailLow Detail

Structure ● URI Scheme● Vocabulary meshing

● How is a person described?

Dataset Information Spectrum

High DetailLow Detail

Internals ● List of Schemas & RDF terms● Class/property usage counts● Triple counts● Named graph structure● Source files

RDF Report Card Example

Summarising Content of a Dataset

● Find all classes in all datasets in Kasabi

● Tag each class against a pre-defined set of categories● Customized version of top-level schema.org

classes

● Generate a report card for each dataset listing types of entity

Report Card Categories

Ordnance Survey

http://beta.kasabi.com/dataset/ordnance-survey-linked-data

BBC Music

http://beta.kasabi.com/dataset/bbc-music

British National Bibliography

http://beta.kasabi.com/dataset/british-national-bibliography-bnb

NHS Performance Data

http://beta.kasabi.com/dataset/nhs-performance-data

Summary

● Triple counts tell us nothing● Vital to present the quality & utility of our data

● Data publishing platforms should support this

● "Progressive disclosure"● Right detail at the right time

● Dataset analysis can generate useful summaries● e.g. an RDF report card

top related