enabling combined software and data engineering at web-scale

22
http://aligned-project.eu ISWC, 21st October 2016, Kobe Enabling combined Software and Data engineering at Web-scale The ALIGNED suite of Ontologies Monika Solanki https://w3id.org/people/msolanki @nimonika University of Oxford Joint work with Bojan Boži´ c, Markus Freudenberg, Dimitris Kontokostas, Christian Dirschl, Rob Brennan & The ALIGNED consortia

Upload: monika-solanki

Post on 27-Jan-2017

157 views

Category:

Technology


0 download

TRANSCRIPT

http://aligned-project.eu ISWC, 21st October 2016, Kobe

Enabling combined Software and Dataengineering at Web-scale

The ALIGNED suite of Ontologies

Monika Solankihttps://w3id.org/people/msolanki

@nimonikaUniversity of Oxford

Joint work withBojan Božic, Markus Freudenberg, Dimitris Kontokostas,

Christian Dirschl, Rob Brennan &The ALIGNED consortia

http://aligned-project.eu ISWC, 21st October 2016, Kobe

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

Motivation

Recent years have seen a significant increase in thedemand for data-intensive applications.Crucial technical and economic challenge ⇒ Effective,collaborative integration of software and big dataengineering for Web-scale systems.Current engineering techniques for building these systemsare both immature and often partitioned into softwareengineering and data engineering processes, tasks orteams.There is a need for integrated engineering approachesalong with an underlying curatorial process to improve andmanage data over time.

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

Semantic models and Linked data

The expressivity of semantic models makes them useful forboth addressing data quality and applying model-drivenapproaches to software engineering.Semantic models can enable tools to easily publishrelevant meta-data about engineering processes.Linked data based restful APIs can enable tool integrationand process or lifecycle synchronisation/communication.

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

GoalDevelop a common suite of lightweight meta-models orvocabularies to describe both software and data engineeringsystem specifications and lifecycles, thereby creating acommon technical space for tools to easily publish relevantmeta-data about systems engineering.

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

ContributionsThe ALIGNED* suite of ontologies

Specifically designed to model the information exchangeneeds of combined software and data engineering.Aims to align the divergent processes encapsulating dataand software engineering.Deployed for validation and incremental improvement inthe ALIGNED project on four, large-scale data-intensivesystems engineering use cases.Improves productivity, agility and quality.

*http://aligned-project.eu

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

The ALIGNED suite of ontologies

Provides support forSemantics-based model driven software engineeringData quality engineering techniquesDevelopment of tools for unified views of software and dataengineering processesSoftware/data test case interlinking,

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

ALIGNED Suite: Overview

Text

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

ALIGNED Suite: Design Intents

Text

Design Intent Ontology (DIO) documents the design decisionsabout data intensive system artefacts such as requirements,designs or datasets.

Available at: https://w3id.org/dio

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

ALIGNED Suite: Software Engineering

Text

Defines the major agents, activities and entities involved in asoftware engineering project and their relations with a specialfocus on capturing the engineering lifecycle.

Available at: https://w3id.org/slohttps://w3id.org/sip

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

ALIGNED Suite: Data Engineering (1)

Text

DLO is the basis for deriving specific domain ontologies whichrepresent lifecycles of concrete data engineering projects -DBpedia and Seshat.

Available at: https://w3id.org/dlo

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

ALIGNED Suite: Data Engineering (2)

Text

DataID is a multi-layered meta-data system, which, in its core,describes datasets and their different manifestations.

Available at: http://dataid.dbpedia.org/ns/core

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

ALIGNED Suite: Unified quality reports (1)

Defines a unified reporting representation for data qualitymetrics, ontology reasoning errors, test cases, and test caseresults based on the W3C SHACL reporting vocabulary.

RUT is designed to capture the lifecycle of RDF validation withthe test driven validation methodology.

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

ALIGNED Suite: Unified quality reports (2)

RVO describes both ABox and TBox reasoning errors for theintegration of reasoners into data lifecycle tool-chains.

Available at: https://w3id.org/rvo

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

ALIGNED Suite: Domain Models

Enterprise information processing: extensions and models for theJURION use case.E-research in the Social Sciences and Humanities: extensions andmodels for the Seshat use case.Crowd-sourced public datasets: extensions and models for theDBpedia use case.Enterprise software development: extensions and models for thePoolParty use case.

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

Use case: Wolters Kluwer’s JURION

JURION: an innovative legal information platformdeveloped by Wolters Kluwer GermanyMerges and interlinks over 1 million documents of contentand data from diverse sources.Data is presented to users, e.g. law officesData lifecycle stages: extraction, storage, authoring,interlinking, enrichment, quality analysis, repair andpublication.Information processing pipeline ⇒ highly customisedapplications for legal information retrieval, alerts, analysisand semantic search.

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

Use case: Wolters Kluwer’s JURION

ChallengeCurrently, the software development process and data life cycleare highly independent from each other and require extensivemanual management to coordinate their parallel development,leading to higher costs, quality issues and a slowertime-to-market.

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

Wolters Kluwer’s JURION

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

Evaluation (1)

Generic criteria EvaluationValue Addition - enrich information about process specific

procedures for a tool by adding data andsoftware engineering specific metadata- add context dependent information forenabling automation in tools

Potential users - community of content producers, ownersof large amounts of data, data managers,ontology engineers- Software development model design-ers, and developers of human societiesdatasets

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

Evaluation (2)

Generic criteria EvaluationAvailability -https://w3id.org/*

-http://aligned-project.eu-https://github.com/aligned-h2020/ALIGNED_Ontologies

Sustainability - Long term sustainability has been as-sured by TCD and the ontology engineersinvolved in the design

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

Evaluation (3)

Generic criteria EvaluationDesign and technicalquality

- Designed in accordance to ontology en-gineering principles- Axiomatisations based on the compe-tency questions identified during require-ments scoping from potential exploitingapplication

Documentation - The ALIGNED public deliverables andpublications- Self documentation- HTML documentation via the LODE ser-vice- Graphicall illustrations

[email protected], @nimonika The ALIGNED suite of Ontologies

http://aligned-project.eu ISWC, 21st October 2016, Kobe

Conclusions

Combining data and software engineering processes toincrease productivity and agility, is a challenge.The proposed ALIGNED suite of ontologies providessemantic models of design intents, domain specificdatasets, software engineering processes, qualityheuristics and error handling mechanisms.The ALIGNED suite contributes immensely towardsenabling interoperability and alleviating some of thecomplexities involved.We have exemplified the usage of the suite on a real-worlduse case from the legal domain and evaluated it againstthe desired criteria.

[email protected], @nimonika The ALIGNED suite of Ontologies