diachron preservation: evolution management for preservation

12
Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) [email protected]

Upload: prelida-project

Post on 01-Jul-2015

174 views

Category:

Documents


5 download

DESCRIPTION

by Giorgos Flouris (FORTH), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu

TRANSCRIPT

Page 1: DIACHRON Preservation: Evolution Management for Preservation

Evolution Management for Preservation

PRELIDA Consolidation Workshop 17.10.2014

Giorgos Flouris (FORTH)[email protected]

Page 2: DIACHRON Preservation: Evolution Management for Preservation

Evolution Management Problem

Preservation ↔ Evolution

Page 3: DIACHRON Preservation: Evolution Management for Preservation

Change Detection

• Change detection for evolution management

– Identifying changes between versions

• Challenges (in DIACHRON)

1. Diverse data models

2. Dynamic datasets

3. Recoverable versions

4. Changes as first-class citizens

5. Cross-snapshot queries

Page 4: DIACHRON Preservation: Evolution Management for Preservation

Evolution in DIACHRON

Pilot dataset DIACHRON

Ve

rsio

n 1

Pilot dataset DIACHRON

Ve

rsio

n 2

Page 5: DIACHRON Preservation: Evolution Management for Preservation

Change Types: Motivation

What a naïve diff will report

Add (Rec, diachron:subject, EFO_001927)Add (Rec, diachron:hasRecordAttribute, rAtt1)Add (rAtt1, diachron:predicate, rdfs:subClassOf)Add (rAtt1, diachron:object, ObsoleteClass)

What the pilot expects

Add_SuperClass (EFO_001927, ObsoleteClass)

Page 6: DIACHRON Preservation: Evolution Management for Preservation

Change Hierarchy: Low-level (1/3)

• Low-level changes

– DIACHRON model, for internal use

– Fixed: Add, Delete

– Just additions and deletions of triples

– Simple set difference

Page 7: DIACHRON Preservation: Evolution Management for Preservation

Change Hierarchy: Simple (2/3)

• Pilot terminology: – Add_SuperClass

Add_Dimension

• Fixed, pre-defined

• Comprising of low-level changes

• Partitioning is perfect– Complete and unambiguous

Page 8: DIACHRON Preservation: Evolution Management for Preservation

Change Hierarchy: Complex (3/3)

• Pilot terminology:

– Add_Synonym, Mark_As_Obsolete

• Totally custom, pilot-specific (defined at run-time)

Page 9: DIACHRON Preservation: Evolution Management for Preservation

Using Changes for Evolution Management

• DIACHRON data model contains all versions

• Detection based on SPARQL queries

– Provided at deployment time (for simple)

– Generated at creation time (for complex)

• Recoverability

– Allows moving back and forth between versions

Page 10: DIACHRON Preservation: Evolution Management for Preservation

Representation Requirements

• Interesting queries– Return the simple changes that dataset X underwent

between versions V1 and V2– Return the changes that resource X underwent in the first

semester of 2014– Give me all resources of type X that underwent change Y– Return all countries for which the unemployment rate of

their capital city increased at a rate higher than the average increase of the country as a whole, between versions V1 and V2

• Access to both the changes and the data is required– Changes are first-class citizens– Allowing preservation

Page 11: DIACHRON Preservation: Evolution Management for Preservation

DIACHRON

Data

Changes Ontology

C1

Add_SuperClass

V1

V2

asc_p1

asc_p2

Simple_Change

Change

prov:Activity

Data level

Schema level

EFO_001927

ObsoleteClass

old_version

new_version

diachron:Entity

Add_Synonym

Complex_Change

… …

Page 12: DIACHRON Preservation: Evolution Management for Preservation

Conclusion

• Main DIACHRON message – (Linked) data preservation is related to evolution management

• DIACHRON challenges1. Diverse data models2. Dynamic datasets3. Recoverable versions4. Changes as first-class citizens5. Cross-snapshot queries

• Solutions– DIACHRON data model (#1)– Appropriate change definition and detection (#2, #3)– Changes and data represented at the same level (#4, #5)