metrics-driven approach for lod quality assessment

Download Metrics-Driven Approach for LOD  Quality  Assessment

Post on 23-Feb-2016

36 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Metrics-Driven Approach for LOD Quality Assessment . 2014-May-07. Outline. What is t he problem?. What have others done? . What is our solution?. Does it work?. What is the problem?. Linked Open Data (LOD): Realizing Semantic Web by interlinking existing but dispersed data - PowerPoint PPT Presentation

TRANSCRIPT

Slide 1

Metrics-Driven Approach forLOD Quality Assessment 2014-May-071What is the problem?What have others done? What is our solution?Does it work?Outline22What is the problem?Linked Open Data (LOD): Realizing Semantic Web by interlinking existing but dispersed data

Main components of LOD:URIs to identify things RDF to describe dataHTTP to access data

33

Datasets: 295Triples: over 30,000,000,000 (30 B)Links: over 500,000,000 (500 M)

4What is the problem?4Inclusion Criteria for publishing and interlinking datasets into LOD cloudresolvable http/https URIs

Presented in one of the standard formats of Semantic Web (RDF, RDFa, RDF/XML, Turtle, N-Triples)

Contains at least 1000 triples

Connected via at least 50 RDF links to the existing datasets of LOD

Accessible via RDF crawling, RDF dump, or SPARQL endpointIs dataset ready to publish?5What is the problem?56Idea of the LOD: Publishing first, improving later

Results in: quality problems in the published datasets

Missing link:

What is the problem?Data Quality evaluation before release6Data quality in the Context of LODGeneral Validators

Parsing and Syntax

Accessibility / Dereferencability

ValidatorsQuality Assessment of Published data Classifying quality problems of LOD

Using metadata for quality assessment

filtering poor quality data (WIQA)

Semantic Annotation using ontologies

7What have others done?7Limitations of related works:Syntax validation, not quality evaluation

Not scalable

Not full automated

Evaluation after publishing 8What have others done?8What is our solution?Proposing a set of metrics for

Inherent quality assessment of datasets

before interlinking to LOD cloud

9910What is our solution?10111. Selecting Inherent Quality Dimensions11121. Selecting Inherent Quality Dimensions12132. Proposing MetricsExample:Goal: Assessment of the consistency of a dataset in the context of LODQuestion: What is the degree of conflict in the context of data value?Metric: The number of functional properties with inconsistent values

1314LODQM: Linked Open Data Quality Model

6 Quality dimensions 32 Metrics

3. Developing LODQM

154. Theoretical Validation Metric TypeNumber of metricsNull-ValueNon-NegativitySymmetryMonotonicity Disjoint Module AdditivityMergingCohesive ModulesComplexity29n/a__Cohesion2___Coupling1_n/a_

165. Empirical Evaluation 5.15.25.35.45.55.65.71617DatasetsNo. of triplesNo. of instancesNo. of classesNo. of propertiesFAO Water Areas10,7305863119Water Economic Zones29,1931,074113127Large Marine Ecosystems12,0127162131Geopolitical Entities22,72531288101ISSCAAP Species Classification398,16625,2535293Species Taxonomic Classification319,49011,7413326Commodities56,4202,7881019Vessels4,2362406225. Empirical Evaluation 18

5. Empirical Evaluation 195. Empirical Evaluation Result: Three pairs of metrics are correlated:{IFP, Im_DT}{Im_DT, Sml_Cls} {Inc_Prp_Vlu, IF}

The others are independent

20

5. Empirical Evaluation 21

5. Empirical Evaluation 225. Empirical Evaluation Result: Only one pair of quality dimensions is correlated:{Interlinking, Syntactic accuracy} The others are independent

236. Quality Prediction Result:

20 out of 32 metrics are selectedUsing Neural Network Method: MultiLayerPerceptron DatasetNo. of triplesNo. of instancesDomainGeonames6,590699GeographyIMDB866291MovieAnatomy6,4496449AnatomyCiteseer948,770173963PublicationFAO248,73128,098Food Science24

6. Quality Prediction Conclusion on Metrics25Appreciative of your Attention and Comments