why 5-star data?
Post on 09-Apr-2017
739 Views
Preview:
TRANSCRIPT
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Dr. Sabin BuragaFaculty of Computer Science, UAIC Iasi, Romania
profs.info.uaic.ro/~busaco/ slideshare.net/busaco
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/open participation
open data
open software
open app development
open web
open cloud
open (computing) hardware
⛈
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
World Wide Web = “a common information space
in which we communicate by sharing information”
Tim Berners-Lee (2013)
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Client Web application Storage
(user interface) server/framework (data persistence)
Internet
(Web)
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Client Web application Storage
(user interface) server/framework (data persistence)
Internet
(Web)
URL – Uniform Resource Identifier
addressability
for example: http://www.slideshare.net/busaco/presentations/
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Client Web application Storage
(user interface) server/framework (data persistence)
Internet
(Web)
HTTP – HyperText Transfer Protocol
access to resources
a browser asks a Web server to provide a resource representation
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Client Web application Storage
(user interface) server/framework (data persistence)
Internet
(Web)
HTML, JSON, PDF, PNG, SVG,…
representation(s) of a resource
a Web page includes URLs to other resourceshypermedia
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Reusing & sharing data available on the Web
data access via a Web service
usually, by using an API
(Application Programming Interface)
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Web servicespublic APIsmash-ups
www.programmableweb.com
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
APIs could be described via an open format (see OpenAPI specifications): http://theapistack.com/
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
aging…
James Governor (2007)
software ≈ fishdata ≈ wine
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
open data
“A piece of content or data is open
if anyone is free to use, reuse, and redistribute it.”
http://opendefinition.org/
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/ >
“If you have access to the data,
then you can achieve continuity
even if you don’t have access to
the underlying source of the application.
Open data is more important than open source. […]
Data persists, open data endures.”
Ian Davis, 2009
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
legal/technical openness
availability & access
reusing & sharing
universal participation
inter-operability
opendatahandbook.org
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Reusing data available on the Web
necessity of adopting a (re)use license
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Reusing data available on the Web
necessity of adopting a (re)use license
fair use
public domain
copyleft
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Reusing data available on the Web
necessity of adopting a (re)use license
Creative Commons
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
openness, transparency, respect
https://creativecommons.org/
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Data availability
on the Web
as “opaque” document
(usually, using a proprietary format)
does not refer – via current Web technologies –
other resources of interest
Tom Health (2007)
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Data availability
in the Web
assuring discoverability via hypermedia
uses open data models/formats
(e.g., HTML, XML, JSON, CSV, RDF etc.)
platform independent
Tom Health (2007)
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Can we evaluate the data openness?
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
5 ★ Open Data
Tim Berners-Lee (2009)
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
1-star data
the content is available on the Web – by using any
format – according to an open license
http://opendefinition.org/licenses/
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
users can view, print, locally store,
and – eventually – modify the document
the document itself can be shared on the Internet
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
a PDF containing a scanned image ☹
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
the document could be easily published on the Web
in order to reuse the data kept into the document,
additional processing might be necessary
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
2-star data
additionally, the content must be available
as structured data (e.g., relations between entities)
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
users can process the document by using, in most cases,
a proprietary software application
the document can be exported
into another (structured) format
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
a proprietary format
containing structured data ☹
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
the document can be easily published on the Web
data is still “locked” into the document +
processing is depending by a specific application
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
3-star open data
using an open (non-proprietary) format
to make data available on the Web
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
same content as HTML5 document ☺
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
<section class="timeslot" lang="en"><div class="timeslot-label"><time class="start-time"
datetime="20160508T11:45"> 11<span>45</span>
</time><time class="end-time"
datetime="20160508T12:45">12<span>45</span>
</time></div><p class="title">Why 5-Star Data?</p><p class="speaker">Sabin-Corneliu Buraga</p>
</section>
denoting a certain meaning from
the document’s author point of view
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
data can be managed (viewed, processed, filtered,
converted, shared, reused, etc.) in any manner
important aspect: platform independence
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
the document is still rather simple to be published on Web
exporting data into a proprietary format
could be problematic
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
4-star open data
each “thing” (entity) of interest from the document
is denoted by a Web address – URL
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
data, information, and knowledge are identified via URLs
in order to be accessed and (re)used
RDF (Resource Description Framework) modelW3C standards
www.w3.org/standards/semanticweb/
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
machine-friendly RDF assertions ☺
<!-- the thing identified by ‘busaco’ is a person -->
<div resource="#busaco" typeof="foaf:Person">
<a property="url" href="..."><span property="name">
Sabin Buraga</span></a>
</div>
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
machine-friendly RDF assertions ☺
towards classes of things:presentations
personsorganizations
...
things, not strings
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
content publishing could be much difficult, employing
the adoption of the semantic Web – or Web of Data –
technologies, tools, and methodologies
data in the Weblong term implications
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
5-star open data
additionally, data is inter-connected to other
datasets, according to the linked data initiative
linkeddata.org
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
inter-connecting open datasets ☺
graphofthings.org
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
possibility to discover other (related) data of interest
while consuming the datanetwork effect
other advantage: Web-based automatic reasoning
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
difficulties:
assuring data/knowledge consistency
problems related to slow adoption
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
5stardata.info
Michael Hausenblas (2012)
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
★make your stuff available on the Web
(whatever format) under an open license
★★make it available as structured data
e.g., Excel instead of image scan of a table
★★★use non-proprietary formats
e.g., CSV (Comma Separated Values) instead of Excel
★★★★use Web addresses (URLs) to denote things,
so that people can point at your stuff
★★★★★link your data to other data – see http://datahub.io/ –
to provide context
Ed Summers (2010)
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
Several real-life examples?
Dr.
Sab
in B
ura
ga
www.purl.org/net/busa
co
augmenting the current Web search activities
via HTML5 schema.org + RDFa rdfa.info
Dr.
Sab
in B
ura
ga
www.purl.org/net/busa
co
Academic Torrentshttp://academictorrents.com/
Awesome Public Datasetshttps://github.com/caesar0301/awesome-public-datasets
Awesome JSON Datasetshttps://github.com/jdorfman/awesome-json-datasets
Common Crawlhttp://commoncrawl.org/the-data/
DataHubhttps://datahub.io/dataset
Dr.
Sab
in B
ura
ga
www.purl.org/net/busa
co
DBpedia.orga crowd-sourced
community effort
to extract structured
information from Wikipedia
in order to be
“intelligently” processed
by software
Dr.
Sab
in B
ura
ga
www.purl.org/net/busa
co
Wikidata.org – a free knowledge base that can be read
and edited by both humans & machines
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
open e-government: visualizing + comparing quality indicators
(license, formats, availability, metadata) regarding open datasets
opendatamonitor.eu
Dr. S
abin
-Cor
nel
iuBura
ga–
htt
p://
pro
fs.in
fo.u
aic.ro
/~busa
co/
“Software – as a service or not – is just a container.
What makes software valuable has always been what
it does to data. Now, in the same spirit of SOA (Service
Oriented Architecture) and SaaS (Software As A Service),
a new concept is emerging, Data-as-a-Service – DaaS.”
Pete Soderling (2010)
top related