assessing linked data mappings using network measures

25
ESWC - May 2012 Assessing Linked Data mappings 1/25 Assessing Linked Data Mappings using Network Measures Christophe Guéret, Paul Groth, Claus Stadler, Jens Lehmann 9th Extended Semantic Web Conference (ESWC) May 29, 2012 http://latc-project.eu http://www.vu.nl http://aksw.org

Post on 12-Sep-2014

1.756 views

Category:

Technology


0 download

DESCRIPTION

When generating a lot of WoD links automatically, data quality is a pressing issue. This presentation, and the related paper, introduce LinkQA: a network based node-centric framework to analyse the impact of linkage on the network topology and assess the quality of these links.

TRANSCRIPT

Page 1: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 1/25

Assessing Linked Data Mappings using Network Measures

Christophe Guéret, Paul Groth, Claus Stadler, Jens Lehmann

9th Extended Semantic Web Conference (ESWC)May 29, 2012

http://latc-project.eu http://www.vu.nlhttp://aksw.org

Page 2: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 2/25

The next 25+5 minutes

The impact of links in the Web of Data

Main questions

What is the impact of link creation?

Can we detect “bad” links based on their impact?

Is adding links always a good thing?

Contributions

A framework to assess the impact of links

Results for 5 metrics

Page 3: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 3/25

Is this a good or a bad link ?

Page 4: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 4/25

Measuring the Web of Data

Look at the topology using network analysis tools

Impossible to get the complete graph

Sampling of the graph focusing on specific nodes

See the bigger picture through aggregation

Build the local network around a resource

Repeat the process a sufficient number of time

Page 5: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 5/25

Network sampling process

Use SPARQL end point or de-reference the resources to get the descriptions

Page 6: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 6/25

Aggregation of local results

ObservedTarget

Page 7: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 7/25

Metrics

Compute local scores for a resource

Criteria

Use only the local network

Representative of a global property

Not sensitive to change of observation scale

5 metrics currently available in LinkQA

Page 8: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 8/25

What do we want to see?

Increase of connectivity within topical groups

Increase chances of finding related information

More bridges between topical groups

Improve browsing capabilities

More connectivity around hubs

Decrease the dependency upon the hubs

Page 9: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 9/25

Metric 1 – Degree

Metric

Number of edges around the target node

Target

Power-law distribution of values

Intuition

Presence of hubs

Page 10: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 10/25

Metric 2 – Clustering coefficient

Metric

Density of links around the target node

Target

Increase clustering around nodes

Intuition

Topical clusters

Page 11: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 11/25

Metric 3 – Centrality

Metric

Ratio between outgoing and incoming links

Target

Lower the discrepancy between the values

Intuition

Hubs are sensitive

Page 12: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 12/25

Metric 4 – SameAs chains

Metric

Number of “open” sameAs chains

Target

No open sameAs

Intuition

Peer agreement

Page 13: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 13/25

Metric 5 – Description enrichment

Metric

Richness of resource description

Target

Increase as possible

Intuition

“SameAsed” resources are complementary

Page 14: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 14/25

Under the hood of LinkQA

http://www.flickr.com/photos/cradlehall/5747161514

Page 15: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 15/25

Workflow of an analysis

Page 16: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 16/25

Output of an analysis

Results on the node and aggregated scale

Per metric:

Indication of change with respect to the target

Sorted list of outlier nodes, sorted by their distance to the target

Plus, a global ranking of nodes

=> Input for manual inspection by an expert

Page 17: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 17/25

Experimental results

Page 18: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 18/25

Global impact of links

Observe the distributions to detect bad links

Page 19: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 19/25

First evaluation

160 linking specifications for Silk, developed in the context of LATC

6 linking specifications with manual verification of results

50 positive links

50 negative links

Execute LinkQA with 10 samples of 50 links

Page 20: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 20/25

Results of the detection

“C” if change detected in > 50% of runs

Page 21: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 21/25

Some explanations

Low sensitivity of metrics:

Lack of data

Stable change

50/50 accuracy of detection:

Targets may not be the right ones

Sample may not be big enough

Semantics agnostic measures are less performant

Page 22: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 22/25

A closer look at the outliers

See if the outliers are necessarily bad links

Page 23: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 23/25

Second evaluation

Linking specifications for Silk, developed in the context of LATC

All linking specifications sampled to have

45 positive links

5 negative links

Execute LinkQA five time, on five samples

Page 24: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 24/25

Rank of positive and negative links

Page 25: Assessing Linked Data Mappings using Network Measures

ESWC - May 2012 Assessing Linked Data mappings 25/25

Take home message

LinkQA is a node centric approach to measure the impact of links in the WoD network

Scalable, can be distributed

Current results show that

The 5 metrics defines are to be improved

Metrics considering Semantics perform better

The network sample seems too small

Outliers detection improves with the number of metrics