kit graduiertenkolloquium 11.05.2016

61
KIT Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu Validation Framework for RDF-based Constraint Languages M.Sc. (TUM) Thomas Hartmann Graduiertenkolloquium, 11.05.2016

Upload: dr-ing-thomas-hartmann

Post on 28-Jan-2018

263 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: KIT Graduiertenkolloquium 11.05.2016

KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu

Validation Frameworkfor RDF-based Constraint Languages

M.Sc. (TUM) Thomas Hartmann

Graduiertenkolloquium, 11.05.2016

Page 2: KIT Graduiertenkolloquium 11.05.2016

2

enthusiasm for SW technologies

problem statement

Page 3: KIT Graduiertenkolloquium 11.05.2016

3

common need for RDF Validation

problem statement

Page 4: KIT Graduiertenkolloquium 11.05.2016

4

common needs of data practitioners

W3C RDF Validation Workshop

2 international working groups on RDF validation

constraint languagesSPARQL Query Language for RDF

SPARQL Inferencing Notation (SPIN)

Web Ontology Language (OWL)

Shape Expressions (ShEx)

Resource Shapes (ReSh)

Description Set Profiles (DSP)

no clear favorite

RDF validation as research field

problem statement

Page 5: KIT Graduiertenkolloquium 11.05.2016

5

Which types of research data and related metadata

are not yet representable in RDF and

how to adequately model them

to be able to validate RDF data

against constraints extractable from these vocabularies?

research question 1

RQ1

LDOW (WWW 2013)SemStats (ISWC 2013)DC 2012IASSIST Quarterly, 38(4) & 39(1), 7-16IASSIST Quarterly, 38(4) & 39(1), 17-24IASSIST Quarterly, 38(4) & 39(1), 25-37IASSIST Quarterly, 38(4) & 39(1), 38-46ESWC 2011 (Poster)

Page 6: KIT Graduiertenkolloquium 11.05.2016

6

development of 3 RDF vocabularies:

1. DDI-RDF Discovery Vocabulary (DDI-RDF)

to support the discovery of metadata on unit-record data

2. Physical Data Description (PHDD)

to describe data in tabular format and its physical properties

3. The SKOS Extension for Statistics (XKOS)

to describe the structure and textual properties of formal statistical classifications

to describe relations between classifications and concepts and among concepts

contribution 1

RQ1

Page 7: KIT Graduiertenkolloquium 11.05.2016

www.kit.edu7

research question 2

Page 8: KIT Graduiertenkolloquium 11.05.2016

8

XML, XML Schema (XSD)

RDF, Web Ontology Language (OWL)

XML Schemas > OWL ontologies

time-consuming work designing domain ontologies from scratch by hand

reuse information contained in XML Schemas

designing OWL domain ontologies

RQ2

Page 9: KIT Graduiertenkolloquium 11.05.2016

9

How to directly validate XML data

on semantically rich OWL axioms

using common RDF validation tools

when XML Schemas, adequately representing particular domains,

have already been designed?

research question 2

RQ2

Page 10: KIT Graduiertenkolloquium 11.05.2016

10

sub-class relationships

OWL hasValue restrictions on data properties

OWL universal restrictions on object properties

semantically rich OWL axioms

<library>

<book year="February 1890">

<author>

<name>Arthur Conan Doyle</name>

</author>

<title>The Sign of the Four</title>

</book>

</library>

Title ⊑ value.string

Year ⊑ value.integer

RQ2

Page 11: KIT Graduiertenkolloquium 11.05.2016

11

on formal logics based transformations

OWL axioms extracted out of XML Schemas

Explicitly

Implicitly

formally underpin transformations

to formally define and model semantics in a semantically correct way

complete extraction of XML Schemas' structural information

XML can directly be validated against semantically rich OWL axioms

any XML Schema is convertible to OWL

minimized effort designing OWL domain ontologies

contributions

IJMSO, 8(3)

RQ2

Page 12: KIT Graduiertenkolloquium 11.05.2016

12

DC (ISWC 2012)ICITST 2011

OCAS (ISWC 2011)

RQ2

Page 13: KIT Graduiertenkolloquium 11.05.2016

13

1. step of approach

executed generic test cases created out of the XML Schema meta-model

transformed XML Schemas of 6 XML standards

2. step of approach

specified SWRL rules for 3 OWL domain ontologies

verified hypothesis

determined effort for traditional manual approach

estimated effort for semi-automatic approach

DDI-RDF serves as OWL domain ontology

The effort and the time needed to deliver high quality domain ontologies from scratch

by reusing information of already existing XML Schemas is much less than

creating domain ontologies completely manually and from the ground up.

evaluation

IJMSO, 8(3)

RQ2

Page 14: KIT Graduiertenkolloquium 11.05.2016

www.kit.edu14

research question 3

Page 15: KIT Graduiertenkolloquium 11.05.2016

15

development of constraint languages

http://purl.org/net/rdf-validation

DC 2014RQ3

Page 16: KIT Graduiertenkolloquium 11.05.2016

16

Which types of constraints

must be expressible by constraint languages to meet

all collaboratively and comprehensively identified requirements

to formulate constraints and validate RDF data?

research question 3

RQ3

Page 17: KIT Graduiertenkolloquium 11.05.2016

17

published 81 constraint types

constraints are instantiated from constraint types

each constraint type corresponds to a specific requirement

types of constraints on RDF data

RQ3

Page 18: KIT Graduiertenkolloquium 11.05.2016

18

expressivity of constraint languages

low-level implementation languages vs. high-level constraint languages

OWL 2 is the most expressive high-level constraint language

RQ3

Page 19: KIT Graduiertenkolloquium 11.05.2016

19

high-level constraint languages either

lack an implementation or

are based on different implementations

How to consistently validate RDF data

against constraints of any constraint type

expressed in any RDF-based constraint language?

research question 4-1

RQ4

Page 20: KIT Graduiertenkolloquium 11.05.2016

20

SPIN as basic validation framework

validation environment for RDF-based constraint languages

constraint languages are translated into SPARQL

represented in RDF in form of a SPIN mapping

a SPIN mapping contains one SPIN construct templatefor each supported constraint type

consistent validation across RDF-based constraint languages

DC 2014

RQ4

Page 21: KIT Graduiertenkolloquium 11.05.2016

21

validation process

RQ4

Page 22: KIT Graduiertenkolloquium 11.05.2016

22

validation results

CONSTRUCT {

_:constraintViolation

a spin:ConstraintViolation ;

spin:violationRoot ?subject ;

rdfs:label ?violationMessage ;

spin:violationSource ?violationSource ;

:severityLevel ?severityLevel ;

spin:violationPath ?violationPath ;

spin:fix ?violationFix }

RQ4

Page 23: KIT Graduiertenkolloquium 11.05.2016

23

full implementations forall OWL 2 and DSP language constructs

all constraint types expressible in OWL 2 and DSP

major constraint types representable by ShEx and ReSh

validation environment

http://purl.org/net/rdfval-demo

RQ4

Page 24: KIT Graduiertenkolloquium 11.05.2016

24

constraints and constraint language constructs must be representable in RDF

constraint languages and supported constraint types must be expressible in SPARQL

limitations

RQ4

Page 25: KIT Graduiertenkolloquium 11.05.2016

25

How to represent constraints of any constraint type and

how to reduce the representation of

constraints of any constraint type

to the absolute minimum?

research question 4-2

RQ4

Page 26: KIT Graduiertenkolloquium 11.05.2016

26

abstraction layer

enables to express each constraint type

straight-forward mappings from high-level constraint languages

based on formal logics

validation framework for RDF-based constraint languages

RQ4

Page 27: KIT Graduiertenkolloquium 11.05.2016

27

conceptual model

DC 2015

RQ4

75%

Page 28: KIT Graduiertenkolloquium 11.05.2016

28

minimum qualified cardinality restrictions (R-75)

OWL:

SHACL:

:Publication rdfs:subClassOf

[ a owl:Restriction ;

owl:minQualifiedCardinality 1 ;

owl:onProperty :author ;

owl:onClass :Person ] .

:PublicationShape

a sh:Shape ;

sh:scopeClass :Publication ;

sh:property [

sh:predicate :author ;

sh:valueShape :PersonShape ;

sh:minCount 1 ; ] .

:PersonShape

a sh:Shape ;

sh:scopeClass :Person .

RQ4

Page 29: KIT Graduiertenkolloquium 11.05.2016

29

ShEx:

ReSh:

DSP:

:Publication { :author @:Person{1, } }

:Publication a rs:ResourceShape ; rs:property [

rs:propertyDefinition :author ;

rs:valueShape :Person ;

rs:occurs rs:One-or-many ; ] .

[ dsp:resourceClass :Publication ; dsp:statementTemplate [

dsp:minOccur 1 ;

dsp:property :author ;

dsp:nonLiteralConstraint [ dsp:valueClass :Person ] ] ] .

RQ4

minimum qualified cardinality restrictions (R-75)

Page 30: KIT Graduiertenkolloquium 11.05.2016

30

SPARQL and SPIN:

CONSTRUCT { [ a spin:ConstraintViolation ... . ] } WHERE {

?subject

a ?C1 ;

?predicate ?object .

BIND ( qualifiedCardinality( ?subject, ?predicate, ?C2 ) AS ?c ) .

BIND( STRDT ( STR ( ?c ), xsd:nonNegativeInteger ) AS ?cardinality ) .

FILTER ( ?cardinality < ?minimumCardinality ) .

FILTER ( ?minimumCardinality = 1 ) .

FILTER ( ?C1 = :Publication ) .

FILTER ( ?C2 = :Person ) .

FILTER ( ?predicate = :author ) . }

SELECT ( COUNT ( ?arg1 ) AS ?c )

WHERE { ?arg1 ?arg2 ?object . ?object a ?arg3 . }

RQ4

minimum qualified cardinality restrictions (R-75)

Page 31: KIT Graduiertenkolloquium 11.05.2016

31

minimum qualified cardinality restrictions (R-75):

simple constraints

RQ4

[ a rdfcv:SimpleConstraint ;

rdfcv:contextClass :Publication ;

rdfcv:leftProperties ( :author ) ;

rdfcv:classes ( :Person ) ;

rdfcv:constrainingElement "minimum cardinality" ;

rdfcv:constrainingValue "1" ] .

Page 32: KIT Graduiertenkolloquium 11.05.2016

32

framework is solely based on the abstract definitions of constraint types

just 1 SPIN mapping for each constraint type

How to ensure for any constraint type that

RDF data is consistently validated against

semantically equivalent constraints of the same constraint type

across RDF-based constraint languages?

research question 4-3

RQ4

Page 33: KIT Graduiertenkolloquium 11.05.2016

33

mappings from constraint languages to the abstraction layer and back enable…

How to ensure for any constraint type that

semantically equivalent constraints of the same constraint type

can be transformed

from one RDF-based constraint language to another?

RQ4

research question 4-4

Page 34: KIT Graduiertenkolloquium 11.05.2016

34

What is the role reasoning plays in practical data validation?

research question 5-1

RQ5

SEMANTiCS 2015

Page 35: KIT Graduiertenkolloquium 11.05.2016

35

reasoning solves redundency

Publication ⊑ ∃ publicationDate . xsd:date

Book ⊑ Publication

Conference-Proceeding ⊑ Publication

Journal-Article ⊑ Publication

RQ5

Page 36: KIT Graduiertenkolloquium 11.05.2016

36

For which constraint types reasoning may be performed

prior to validation to enhance data quality?

research question 5-2

RQ5

Page 37: KIT Graduiertenkolloquium 11.05.2016

37

> 2/5 of constraint types

property domains (R-25):

constraint types with reasoning

∃ author.⊤ ⊑ Publication

author(Alices-Adventures-In-Wonderland, Lewis-Carroll)

→ rdf:type(Alices-Adventures-In-Wonderland, Publication)

RQ5

Page 38: KIT Graduiertenkolloquium 11.05.2016

38

For which constraint types validation results differ

(1) if the CWA or the OWA and

(2) if the UNA or the nUNA is assumed?

CWA dependent: 56.8%

UNA dependent: 66.6%

research question 5-3

RQ5

Page 39: KIT Graduiertenkolloquium 11.05.2016

39

expressivity of constraint languages

RQ5

Page 40: KIT Graduiertenkolloquium 11.05.2016

40

collected 115 constraints

from vocabularies or domain experts

on 3 common vocabularies

well-established (QB, SKOS)

under development (DDI-RDF)

classified constraints

implemented constraints

evaluation

evaluation

ICSC 2016

33 SPARQL endpoints

Page 41: KIT Graduiertenkolloquium 11.05.2016

41

classification of constraint types

RDFS/OWL based

constraint language based

SPARQL based

classification of constraints

informational

warning

error

evaluation

classification

Page 42: KIT Graduiertenkolloquium 11.05.2016

42

C (constraints), CV (constraint violations)

values in %

evaluation

main finding

C CV

SPARQL 63.2 78.2

CL 34.7 21.8

RDFS/OWL 35.6 21.8

Page 43: KIT Graduiertenkolloquium 11.05.2016

43

evaluation based on 3 vocabularies

evaluation

limitation

Page 44: KIT Graduiertenkolloquium 11.05.2016

44

RQ1: future work

publication of RDF vocabularies

DDI Alliance specifications

W3C recommendation for DDI-RDF

DDI-Lifecycle MD (Model-Driven)

new requirements based on experiences with DDI-RDF

international working group: DDI Moving Forward Project

individual contributions

formalize conceptual model (using UML 2)

conceptualize and implement diverse model serializations (e.g., RDFS/OWL)

future work

Page 45: KIT Graduiertenkolloquium 11.05.2016

45

aligning PHDD and CSV on the WEB

overlap in the description of tabular data in CSV format

broader scope of PHDD

description of tabular data with fixed record length

description of tabular data with multiple records per case

evaluation for use in DDI-Lifecycle MD

RQ1: future work

future work

Page 46: KIT Graduiertenkolloquium 11.05.2016

46

RQ2: future work

bidirectional transformations from models of any meta-model to OWL

generalize from XSD meta-model based unidirectional transformations from XSD models into OWL models

enable to validate any data against constraints extractable from models of any meta-model using common RDF validation tools

future work

Page 47: KIT Graduiertenkolloquium 11.05.2016

47

RQ3: future work

maintain and extend RDF validation database

collect case studies and use cases

extract requirements

publish constraint types

future work

Page 48: KIT Graduiertenkolloquium 11.05.2016

48

RQ4: future work

SPIN mappings for constraint languages not expressible in SPARQL

keep framework and constraining elements in sync

combine the framework with SHACL

derive SHACL extensions with SPARQL bodies

define mappings from SHACL to the abstraction layer and back

synchronize consistent implementations of constraint types

future work

Page 49: KIT Graduiertenkolloquium 11.05.2016

49

acknowledgements, publications, research data

29 publications5 journal articles, 9 conference articles, 3 workshop articles, 2 specifications, 10 technical reports

1. author of all (except 1) journal articles, conference articles, workshop articles

research dataKIT research data repository: http://dx.doi.org/10.5445/BWDD/11

GitHub repository: https://github.com/github-thomas-hartmann/phd-thesis

international working groups

DCMI RDF Application Profiles Task Group

part of the editorial board

RDF Vocabularies Working Group

editor for DDI-RDF and PHDD

W3C RDF Data Shapes Working Group

DDI Moving Forward Project

Page 50: KIT Graduiertenkolloquium 11.05.2016

50

outlook and summary of main contributions

provide a basis for continued research

incorporate findings of this thesis into the working groups

RDF vocabularies

RDFication of XML

set of constraint types

validation framework for RDF-based constraint languages

role of reasoning for data validation

THANK YOU!

Page 51: KIT Graduiertenkolloquium 11.05.2016

www.kit.edu51

appendix

Page 52: KIT Graduiertenkolloquium 11.05.2016

52

publications: journal articles

1. Bosch, Thomas & Mathiak, B. (2015). Use Cases Related to an Ontology of the Data Documentation Initiative. IASSIST Quarterly, 38(4) & 39(1), 25–37. http://iassistdata.org/iq/issue/38/4

2. Bosch, Thomas, Olsson, O., Gregory, A., & Wackerow, J. (2015c). DDI-RDF Discovery - A Discovery Model for Microdata. IASSIST Quarterly, 38(4) & 39(1), 17–24. http://iassistdata.org/iq/issue/38/4

3. Bosch, Thomas & Zapilko, B. (2015). Semantic Web Applications for the Social Sciences. IASSIST Quarterly, 38(4) & 39(1), 7–16. http://iassistdata.org/iq/issue/38/4

4. Schaible, J., Zapilko, B., Bosch, Thomas, & Zenk-Möltgen, W. (2015). Linking Study Descriptions to the Linked Open Data Cloud. IASSIST Quarterly, 38(4) & 39(1), 38–46. http://iassistdata.org/iq/issue/38/4

5. Bosch, Thomas & Mathiak, B. (2013b). How to Accelerate the Process of Designing Domain Ontologies based on XML Schemas. International Journal of Metadata, Semantics and Ontologies - Special Issue on Metadata, Semantics and Ontologies for Web Intelligence, 8(3), 254 – 266. http://www.inderscience.com/info/inarticle.php?artid=57760

Please note that in 2015, my last name changed from Bosch to Hartmann.

Page 53: KIT Graduiertenkolloquium 11.05.2016

53

publications: articles in conference proceedings

1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages. In Proceedings of the 10th International Conference on Semantic Computing (ICSC 2016) Laguna Hills, California, USA: IEEE. http://www.ieee-icsc.com/

2. Bosch, Thomas & Eckert, K. (2015). Guidance, Please! Towards a Framework for RDF-based Constraint Languages. In Proceedings of the 15th DCMI International Conference on Dublin Core and Metadata Applications (DC 2015) São Paulo, Brazil. http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386/368

3. Bosch, Thomas, Acar, E., Nolle, A., & Eckert, K. (2015a). The Role of Reasoning for RDF Validation. In Proceedings of the 11th International Conference on Semantic Systems (SEMANTiCS 2015) (pp. 33–40). Vienna, Austria: ACM. http://doi.acm.org/10.1145/2814864.2814867

4. Bosch, Thomas & Eckert, K. (2014a). Requirements on RDF Constraint Formulation and Validation. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/257

5. Bosch, Thomas & Eckert, K. (2014b). Towards Description Set Profiles for RDF using SPARQL as Intermediate Language. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/270

Please note that in 2015, my last name changed from Bosch to Hartmann.

Page 54: KIT Graduiertenkolloquium 11.05.2016

54

publications: articles in conference proceedings

6. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2012). Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences. In Proceedings of the 12th DCMI International Conference on Dublin Core and Metadata Applications (DC 2012) Kuching, Sarawak, Malaysia. http://dcpapers.dublincore.org/pubs/article/view/3654

7. Bosch, Thomas (2012). Reusing XML Schemas’ Information as a Foundation for Designing Domain Ontologies. In P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J. Parreira, J. Hendler, G. Schreiber, A. Bernstein, & E. Blomqvist (Eds.), The Semantic Web - ISWC 2012, volume 7650 of Lecture Notes in Computer Science (pp. 437–440). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-35173-0_34

8. Bosch, Thomas & Mathiak, B. (2012). XSLT Transformation Generating OWL Ontologies Automatically Based on XML Schemas. In Proceedings of the 6th International Conference for Internet Technology and Secured Transactions (ICITST 2011), IEEE Xplore Digital Library (pp. 660–667). Abu Dhabi, United Arab Emirates. http://edas.info/web/icitst2011/program.html

9. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2011). Designing an Ontology for the Data Documentation Initiative. In Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Poster-Session Heraklion, Greece. http://www.eswc2011.org/content/accepted-posters.html

Please note that in 2015, my last name changed from Bosch to Hartmann.

Page 55: KIT Graduiertenkolloquium 11.05.2016

55

publications: articles in workshop proceedings

Please note that in 2015, my last name changed from Bosch to Hartmann.

1. Bosch, Thomas, Cyganiak, R., Gregory, A., & Wackerow, J. (2013a). DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data. In Proceedings of the 6th Workshop on Linked Data on the Web (LDOW 2013), 22nd International World Wide Web Conference (WWW 2013), volume 996 Rio de Janeiro, Brazil. http://ceur-ws.org/Vol-996/

2. Bosch, Thomas, Zapilko, B., Wackerow, J., & Gregory, A. (2013b). Towards the Discovery of Person-Level Data - Reuse of Vocabularies and Related Use Cases. In Proceedings of the 1st International Workshop on Semantic Statistics (SemStats 2013), 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia. http://semstats.github.io/2013/proceedings

3. Bosch, Thomas & Mathiak, B. (2011). Generic Multilevel Approach Designing Domain Ontologies Based on XML Schemas. In Proceedings of the 1st Workshop Ontologies Come of Age in the Semantic Web (OCAS 2011), 10th International Semantic Web Conference (ISWC 2011) (pp. 1–12). Bonn, Germany. http://ceur-ws.org/Vol-809/

Page 56: KIT Graduiertenkolloquium 11.05.2016

56

publications: specifications

Please note that in 2015, my last name changed from Bosch to Hartmann.

1. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2016). DDI-RDF Discovery Vocabulary: A Vocabulary for Publishing Metadata about Data Sets (Research and Survey Data) into the Web of Linked Data. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/discovery

2. Wackerow, J., Hoyle, L., & Bosch, Thomas (2016). Physical Data Description. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/phdd.html

Page 57: KIT Graduiertenkolloquium 11.05.2016

57

publications: technical reports

Please note that in 2015, my last name changed from Bosch to Hartmann.

1. Hartmann, Thomas (2016a). Validation Framework for RDF-based Constraint Languages - PhD Thesis Appendix. Karlsruhe Institute of Technology (KIT), Karlsruhe. http://dx.doi.org/10.5445/IR/1000054062

2. Vompras, J., Gregory, A., Bosch, Thomas, & Wackerow, J. (2015). Scenarios for the DDI-RDF Discovery Vocabulary. DDI Working Paper Series. http://dx.doi.org/10.3886/DDISemanticWeb02

3. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015b). Report on Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/Requirements

4. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015a). Report on the Current State: Use Cases and Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/UCR_Deliverable

5. Bosch, Thomas, Nolle, A., Acar, E., & Eckert, K. (2015b). RDF Validation Requirements - Evaluation and Logical Underpinning. Computing Research Repository (CoRR), abs/1501.03933. http://arxiv.org/abs/1501.03933

Page 58: KIT Graduiertenkolloquium 11.05.2016

58

publications: technical reports

Please note that in 2015, my last name changed from Bosch to Hartmann.

6. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015a). Constraints to Validate RDF Data Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04479. http://arxiv.org/abs/1504.04479

7. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015b). Evaluating the Quality of RDF Data Sets on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04478. http://arxiv.org/abs/1504.04478

8. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2014). Designing an Ontology for the Data Documentation Initiative. Computing Research Repository (CoRR), abs/1402.3470. http://arxiv.org/abs/1402.3470

9. Bosch, Thomas & Mathiak, B. (2013a). Evaluation of a Generic Approach for Designing Domain Ontologies Based on XML Schemas. Gesis Technical Report 08, Gesis - Leibniz Institute for the Social Sciences, Mannheim, Germany. http://www.gesis.org/publikationen/archiv/gesis-technical-reports/

10. Block, W., Bosch, Thomas, Fitzpatrick, B., Gillman, D., Greenfield, J., Gregory, A., Hebing, M., Hoyle, L., Humphrey, C., Johnson, J., Linnerud, J., Mathiak, B., McEachern, S., Radler, B., Risnes, Ø., Smith, D., Thomas, W., Wackerow, J., Wegener, D., & Zenk-Möltgen, W. (2012). Developing a Model-Driven DDI Specification. DDI Working Paper Series

Page 59: KIT Graduiertenkolloquium 11.05.2016

59

research questions

1. Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies?

2. How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed?

3. Which types of constraints must be expressible by constraint languages to meet all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data?

4. How to ensure for any constraint type that (1) RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages and (2) semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another?

5. What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality?

appendix

Page 60: KIT Graduiertenkolloquium 11.05.2016

60

summary of contributions

1. Development of three RDF vocabularies (1) to represent all types of research data and related metadata in RDF and (2) to validate RDF data against constraints extractable from these vocabularies

2. Direct validation of XML data using common RDF validation tools against semantically rich OWL axioms extracted from XML Schemas properly describing certain domains

3. Publication of 81 types of constraints that must be expressible by constraint languages to meet all jointly and extensively identified requirements to formulate constraints and validate RDF data against constraints

4.1 Consistent validation across RDF-based constraint languages

4.2 Minimal representation of constraints of any type

4.3 For any constraint type, RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages

4.4 For any constraint type, semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another

5. We delineate the role reasoning plays in practical data validation and investigated for each constraint type (1) if reasoning may be performed prior to validation to enhance data quality, (2) how efficient in terms of runtime validation is performed with and without reasoning, and (3) if validation results depend on different underlying semantics

6. Evaluation of the Usability of Constraint Types for Assessing RDF Data Quality

appendix

Page 61: KIT Graduiertenkolloquium 11.05.2016

61

summary of limitations

1. XML Schemas must adequately represent particular domains in a syntactically and semantically correct way

2. Constraints of supported constraint types must be representable in RDF

3. Constraint languages and supported constraint types must be expressible in SPARQL

4. The generality of the findings of the large-scale evaluation has to be proved for all vocabularies

appendix