the entity registry system: collaborative editing of entity data in poorly connected environments
TRANSCRIPT
Data Archiving and Networked Services
The Entity Registry SystemCollaborative Editing of Entity Data in Poorly Connected Environments
Christophe Guéret (@cgueret)
Philippe Cudré-Mauroux
AAAI Spring Symposium #SD4HumTech15March 23-25, 2015 Stanford University
The big question
“This symposium aims to address the question of whether the technology is mature enough to warrant further investigation or whether the disadvantages outweight the utility of SD for this domain”
And the answer (for Linked Data) is…
Yes, it is mature enough !
But Linked Data platforms need to be downscaled before they can deliver their full potential in the specific context. So far most of what the community has to offer does not fit
On the upscaling of platforms
● General design approach– Design a “one size fits all” data model for the common space
– Make a centralised store in the cloud
– Connect users to the store
● Scale up to cater for more users
● Have a hard time trying to fit in users when– Limited or no infrastructures (connectivity, electricity, ...)
– Limited agreement on models / data heterogeneity
– Different level of (computer) literacy
On the opposite
● Downscaling platforms to make them fit specific, challenging, usage contexts and use-cases
http://worldwidesemanticweb.org/
Other WWSW aspects
● Interfaces : non text-centric interaction with data (SPARQL-Voice, Icons, …)
● Relevancy: find the subset of structure data that is the most relevant, contextualised reasoning, local+global data
● Data: publication of development related data as Linked Open Data (IATI, IDS, ...)
Short video on our website in “About”
The Entity Registry System (ERS)
Entities● Semi-structured,
interlinked descriptions of shared instances
– Persons
– Objects
– Software
– Locations
– Sensors
– …
Collaboratively describing entities
● A single information space can be useful
● But even when not done in a challenging context, deploying collaborative entity-editing platforms is technically exceedingly challenging– Local/Global QoS to serve arbitrary entity data
● Performance, scale-out
– Collaborative aspects
● Transactions, versioning, integration
– Offline / mobile concerns
● Caching / replication / serializability
One solution: ERS
● Web-less Linked Data
● Three-tier solution to deploy entity-powered apps– Flexible
● Seamlessly reconcile entities in local / ad-hoc / global modes– Collaborative
● Transactional consistency, data versioning– Scalable
● Shared data store, tunable completeness – Open-source
● https://github.com/ers-devs
Starting centralised design
Introducing the “Contributors”
● The central store is removed
● Contributors are they own trusted data store
● They can cache content from other contributors
● They have a private store for private data
Adding a “Bridge”
● Can only cache content from Contributors
● Useful for asynchronous messaging
● Convenient for groups (schools, clusters, ...)
And put it on a bus, or something else
● Can be used to implement a sneakernet
● Contributors can also do this when visiting different bridges
Need to get all the data in one place ?
● Use the third component of ERS : Aggregator
● An Aggregator aggregates the content coming from several Bridges
About consistency of statements
● Different point of view are, by design, found in separated containers
● Provenance data is available for all containers
● Voting/concensus can resolve conflicts
<house1> “#people” “1”
<house1> “#people” “2”
<house1> “#people” “2”
<house1> “#people” “1”
About updates and suppressions
● Statements containers are uniquely identified
● Updates– New versions of documents get automatically replicated
● Deletes– Only the creator of a given container can delete it
– Deletion in cache store do not get replicated
What ERS does not solve (yet)
● Minting of identifiers– Every contributor can create their own identifiers. There is no
enforced scheme
● Global search for existing identifiers– Only local search is possible
● Modeling of data– Selection of vocabulary comes from the applications using ERS
Take away message
● Linked Data is a good way to create a globally integrated, yet decentralised, information space for describing entities
● ERS is provides simple Linked Data without the Web, without HTTP, without SPARQL, ...
● Reference implementation is open source, based on CouchDB/JSON-LD/Python/Avahi, lightweight, and compatible with HXL hashtags approach ;-)