enabling combined software and data engineering at web-scale
TRANSCRIPT
http://aligned-project.eu ISWC, 21st October 2016, Kobe
Enabling combined Software and Dataengineering at Web-scale
The ALIGNED suite of Ontologies
Monika Solankihttps://w3id.org/people/msolanki
@nimonikaUniversity of Oxford
Joint work withBojan Božic, Markus Freudenberg, Dimitris Kontokostas,
Christian Dirschl, Rob Brennan &The ALIGNED consortia
http://aligned-project.eu ISWC, 21st October 2016, Kobe
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
Motivation
Recent years have seen a significant increase in thedemand for data-intensive applications.Crucial technical and economic challenge ⇒ Effective,collaborative integration of software and big dataengineering for Web-scale systems.Current engineering techniques for building these systemsare both immature and often partitioned into softwareengineering and data engineering processes, tasks orteams.There is a need for integrated engineering approachesalong with an underlying curatorial process to improve andmanage data over time.
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
Semantic models and Linked data
The expressivity of semantic models makes them useful forboth addressing data quality and applying model-drivenapproaches to software engineering.Semantic models can enable tools to easily publishrelevant meta-data about engineering processes.Linked data based restful APIs can enable tool integrationand process or lifecycle synchronisation/communication.
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
GoalDevelop a common suite of lightweight meta-models orvocabularies to describe both software and data engineeringsystem specifications and lifecycles, thereby creating acommon technical space for tools to easily publish relevantmeta-data about systems engineering.
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
ContributionsThe ALIGNED* suite of ontologies
Specifically designed to model the information exchangeneeds of combined software and data engineering.Aims to align the divergent processes encapsulating dataand software engineering.Deployed for validation and incremental improvement inthe ALIGNED project on four, large-scale data-intensivesystems engineering use cases.Improves productivity, agility and quality.
*http://aligned-project.eu
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
The ALIGNED suite of ontologies
Provides support forSemantics-based model driven software engineeringData quality engineering techniquesDevelopment of tools for unified views of software and dataengineering processesSoftware/data test case interlinking,
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Overview
Text
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Design Intents
Text
Design Intent Ontology (DIO) documents the design decisionsabout data intensive system artefacts such as requirements,designs or datasets.
Available at: https://w3id.org/dio
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Software Engineering
Text
Defines the major agents, activities and entities involved in asoftware engineering project and their relations with a specialfocus on capturing the engineering lifecycle.
Available at: https://w3id.org/slohttps://w3id.org/sip
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Data Engineering (1)
Text
DLO is the basis for deriving specific domain ontologies whichrepresent lifecycles of concrete data engineering projects -DBpedia and Seshat.
Available at: https://w3id.org/dlo
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Data Engineering (2)
Text
DataID is a multi-layered meta-data system, which, in its core,describes datasets and their different manifestations.
Available at: http://dataid.dbpedia.org/ns/core
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Unified quality reports (1)
Defines a unified reporting representation for data qualitymetrics, ontology reasoning errors, test cases, and test caseresults based on the W3C SHACL reporting vocabulary.
RUT is designed to capture the lifecycle of RDF validation withthe test driven validation methodology.
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Unified quality reports (2)
RVO describes both ABox and TBox reasoning errors for theintegration of reasoners into data lifecycle tool-chains.
Available at: https://w3id.org/rvo
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Domain Models
Enterprise information processing: extensions and models for theJURION use case.E-research in the Social Sciences and Humanities: extensions andmodels for the Seshat use case.Crowd-sourced public datasets: extensions and models for theDBpedia use case.Enterprise software development: extensions and models for thePoolParty use case.
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
Use case: Wolters Kluwer’s JURION
JURION: an innovative legal information platformdeveloped by Wolters Kluwer GermanyMerges and interlinks over 1 million documents of contentand data from diverse sources.Data is presented to users, e.g. law officesData lifecycle stages: extraction, storage, authoring,interlinking, enrichment, quality analysis, repair andpublication.Information processing pipeline ⇒ highly customisedapplications for legal information retrieval, alerts, analysisand semantic search.
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
Use case: Wolters Kluwer’s JURION
ChallengeCurrently, the software development process and data life cycleare highly independent from each other and require extensivemanual management to coordinate their parallel development,leading to higher costs, quality issues and a slowertime-to-market.
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
Wolters Kluwer’s JURION
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
Evaluation (1)
Generic criteria EvaluationValue Addition - enrich information about process specific
procedures for a tool by adding data andsoftware engineering specific metadata- add context dependent information forenabling automation in tools
Potential users - community of content producers, ownersof large amounts of data, data managers,ontology engineers- Software development model design-ers, and developers of human societiesdatasets
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
Evaluation (2)
Generic criteria EvaluationAvailability -https://w3id.org/*
-http://aligned-project.eu-https://github.com/aligned-h2020/ALIGNED_Ontologies
Sustainability - Long term sustainability has been as-sured by TCD and the ontology engineersinvolved in the design
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
Evaluation (3)
Generic criteria EvaluationDesign and technicalquality
- Designed in accordance to ontology en-gineering principles- Axiomatisations based on the compe-tency questions identified during require-ments scoping from potential exploitingapplication
Documentation - The ALIGNED public deliverables andpublications- Self documentation- HTML documentation via the LODE ser-vice- Graphicall illustrations
[email protected], @nimonika The ALIGNED suite of Ontologies
http://aligned-project.eu ISWC, 21st October 2016, Kobe
Conclusions
Combining data and software engineering processes toincrease productivity and agility, is a challenge.The proposed ALIGNED suite of ontologies providessemantic models of design intents, domain specificdatasets, software engineering processes, qualityheuristics and error handling mechanisms.The ALIGNED suite contributes immensely towardsenabling interoperability and alleviating some of thecomplexities involved.We have exemplified the usage of the suite on a real-worlduse case from the legal domain and evaluated it againstthe desired criteria.
[email protected], @nimonika The ALIGNED suite of Ontologies