Evolution Management for Preservation
PRELIDA Consolidation Workshop 17.10.2014
Giorgos Flouris (FORTH)[email protected]
Evolution Management Problem
Preservation ↔ Evolution
Change Detection
• Change detection for evolution management
– Identifying changes between versions
• Challenges (in DIACHRON)
1. Diverse data models
2. Dynamic datasets
3. Recoverable versions
4. Changes as first-class citizens
5. Cross-snapshot queries
Evolution in DIACHRON
Pilot dataset DIACHRON
Ve
rsio
n 1
Pilot dataset DIACHRON
Ve
rsio
n 2
Change Types: Motivation
What a naïve diff will report
Add (Rec, diachron:subject, EFO_001927)Add (Rec, diachron:hasRecordAttribute, rAtt1)Add (rAtt1, diachron:predicate, rdfs:subClassOf)Add (rAtt1, diachron:object, ObsoleteClass)
What the pilot expects
Add_SuperClass (EFO_001927, ObsoleteClass)
Change Hierarchy: Low-level (1/3)
• Low-level changes
– DIACHRON model, for internal use
– Fixed: Add, Delete
– Just additions and deletions of triples
– Simple set difference
Change Hierarchy: Simple (2/3)
• Pilot terminology: – Add_SuperClass
Add_Dimension
• Fixed, pre-defined
• Comprising of low-level changes
• Partitioning is perfect– Complete and unambiguous
Change Hierarchy: Complex (3/3)
• Pilot terminology:
– Add_Synonym, Mark_As_Obsolete
• Totally custom, pilot-specific (defined at run-time)
Using Changes for Evolution Management
• DIACHRON data model contains all versions
• Detection based on SPARQL queries
– Provided at deployment time (for simple)
– Generated at creation time (for complex)
• Recoverability
– Allows moving back and forth between versions
Representation Requirements
• Interesting queries– Return the simple changes that dataset X underwent
between versions V1 and V2– Return the changes that resource X underwent in the first
semester of 2014– Give me all resources of type X that underwent change Y– Return all countries for which the unemployment rate of
their capital city increased at a rate higher than the average increase of the country as a whole, between versions V1 and V2
• Access to both the changes and the data is required– Changes are first-class citizens– Allowing preservation
DIACHRON
Data
Changes Ontology
C1
Add_SuperClass
V1
V2
asc_p1
asc_p2
Simple_Change
Change
prov:Activity
Data level
Schema level
EFO_001927
ObsoleteClass
old_version
new_version
diachron:Entity
Add_Synonym
Complex_Change
… …
Conclusion
• Main DIACHRON message – (Linked) data preservation is related to evolution management
• DIACHRON challenges1. Diverse data models2. Dynamic datasets3. Recoverable versions4. Changes as first-class citizens5. Cross-snapshot queries
• Solutions– DIACHRON data model (#1)– Appropriate change definition and detection (#2, #3)– Changes and data represented at the same level (#4, #5)