finding data sets

Post on 20-Jan-2015

1.328 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Finding Data Sets

Anja Jentzsch, Freie Universität Berlin

17 April 2012

Tutorial: Practical Cross-Dataset Queries on the Web of Data

WWW2012, Lyon, France

1

Different motivations

• Finding data sets

• Look for resources to link a data set to

• Find a data set with relevant data to consume / integrate

• Finding vocabularies

• Find vocabularies to use to model data sets

• Find vocabularies to map your existing schema to

2

Different tool types

• Search engines

• find data sets based on keywords

• Data catalogs / directories

• explore data sets and faceted search

• Data Marketplaces

• explore and consume data sets

3

Linked Data Search Engines

• The description of the resources is published as document in RDF

• RDF search engine index the RDF documents

• Process similar to that of search engines for HTML documents

4

5http://sindice.com

6http://sindice.com

7http://sig.ma

8http://sig.ma

9http://swoogle.umbc.edu

10http://kmi-web05.open.ac.uk/WatsonWUI/

11http://factforge.net

12http://factforge.net

Suitability

• Look for resources to link a data set to

• Good

• Find a data set with relevant data to consume

• Maybe good: depends on how the query is expressed

• Find vocabularies to use to model data sets

• Not good: everything is indexed, too much noise

13

Data catalogs

• Several governments and institutions are opening their catalogs

• http://datacatalogs.org provides a manually curated index of 226 data catalogs

14

15http://datacatalogs.org

16

The Data Hub

• Manually curated list of (>3.500) data sets, at least 326 Linked Data Sets

• Various metadata for each data set

• Other views over (part of) its content

• Semantic CKAN (http://semantic.ckan.net)

• LATC Data Source Inventory

• LOD Cloud

• State of the LOD Cloud

17

18http://thedatahub.org

19

20http://dsi.lod-cloud.net

21http://lod-cloud.net

22http://lod-cloud.net/state/

23http://lod-cloud.net/state

Data Marketplaces

• “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable and unified format. Several of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers.” (http://datamarket.com)

24

Kasabi

• Data domain

• All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …

• Data population

• Public datasets

• User submitted datasets

• Data size

• 186 data sets

• Data model

• RDF

25

26http://kasabi.com

Freebase

• Metaweb (USA), now Google

• Free for 100K read API calls per day (10K write), paid for higher volumes

• Data access

• REST API

• Linked Data endpoint (http://rdf.freebase.com)

• Triple uploader / RDF dumps

• Data tools

• Web based – schema editor, review queue, viewers, …

• GridWorks (Google Refine)

• Exploring, data cleaning, transformation of tabular data

• Map data to Freebase schema & RDF export (3rd party extension) 27

28http://www.freebase.com

29

Linked Open Vocabularies (LOV)

• Initiative similar to the LOD Cloud but focused on vocabularies

• 250+ vocabularies

30

31http://labs.mondeca.com/dataset/lov/

32

top related