christophe gueret: publish web data - an interactive session

Data Archiving and Networked Services

Publishing data on the Web

Christophe Guéret (@cgueret)

Evolution and variation of classification systems March 4-5, 2015 Amsterdam

Publishing data on the Web

● Its' easy! Everybody does it! – … in very different ways :-/

● Several, even not so big, issues : – Several competing standards and formats

– Data hard to compare across sources

– Lack of documentation

– Limited capacities to assess trust

– Missing dialog publisher ↔ consumer

– etc

htp://www.w3.org/2013/dwbp/

● W3C working group “Data on the Web best practices”

● Part of the Data Activity– Also in this activity : Working group for CSV on the Web

● Charted until July 2016, running since January 2014

● Focus on defining best practices for publishing and using open data via the Web– Agnostic to technologies

– Scope: government data, research data, cultural heritage data

Goals

● Pub lish a se t o f best p ract ices fo r pub lish ing and consum ing open data– and suppo rt ing list o f use-cases and requ irem en ts

● Publish a vocabu lary for quality and granularity description

● Publish a vocabu lary for data usage description

Feb 24 headlines : frst draf !!!

which means ...

We need your feedback on the work

done so far !

© DonkeyHotey, Flickr

Plans for the remaining tme

● Go quickly through the best practices

● Split up in groups of 3 or 4 persons

● Each group review the BP and say what is missing, what should be deleted, what should be added, …

– Write everything on post-its!

● We collect and cluster the input. This will be reported back to the group on Friday

The 27 best practces

© James Perkins, Flickr

htp://www.w3.org/TR/dwbp/

Derived from htp://www.w3.org/TR/dwbp-ucr/

Grouped in topics (1/4)

● Metadata– What kind of metadata should be considered when describing data on the

Web?

– How can metadata be provided in a machine readable way?

● Data Identification– How can unique re-use be provided for data resources?

– How should URIs be designed and managed for persistence?

● Data Formats– What kind of data formats should be considered when publishing data on

the Web?

(List based on https://www.linkedin.com/pulse/open-data-standards-steven-adler )

https://www.linkedin.com/pulse/open-data-standards-steven-adler


● Data Vocabularies– How can existing vocabularies be used to provide semantic interoperability?

– How can a new vocabulary be designed if needed?

● Data Licenses– How can data licenses be made machine readable?

– How can license information about data published on the Web be provided/gathered?

● Data Provenance– How can data provenance information about data published on the Web be

provided/gathered?




● Data Quality– How can data quality information about data on the Web be

provided/gathered?

● Sensitive Data– How can data be published without infringing a person's right to privacy or

an organization's security?

● Data Access– What kind of data access should be considered when publishing data on the

Web?

– What requirements should be taken into account when deciding how to make data available on the Web?




● Data Versions– How can different versions of a dataset be tracked and managed?

● Data Preservation– How can publishers decide when and how data on the Web should

be archived?

● Feedback– How can user feedback about data consumed from the Web be

gathered?



christophe gueret: publish web data - an interactive session

Education