: 688227 start date of project: 2015/12/01 duration: 36 ... · the address list grew to 300...

17

Upload: others

Post on 15-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

Collaborative Project

Holistic Benchmarking of Big Linked DataProject Number: 688227 Start Date of Project: 2015/12/01 Duration: 36 months

Deliverable 1.1.3Final Community Member List, UseCases, and Datasets

Dissemination Level Public

Due Date of Deliverable Month 28, 31/03/2018

Actual Submission Date Month 28, 30/03/2018

Work Package WP1 - Requirements Elicitation and Commu-nity Building

Task T1.1

Type Report

Approval Status Final

Version 2.0

Number of Pages 16

Abstract: This deliverable is an update of D1.1.2 and presents the up-dates carried out on the intermediate member list as well as on the usecases. The use cases are the results of discussions carried out during theHobbit project meetings and are hence endorsed by the project consortium.

The information in this document re�ects only the author's views and the European Commission is not liable for any use

that may be made of the information contained therein. The information in this document is provided "as is" without

guarantee or warranty of any kind, express or implied, including but not limited to the �tness of the information for a

particular purpose. The user thereof uses the information at his/ her sole risk and liability.

This project has received funding from the European Union's Horizon 2020 research and innovation programme under

grant agreement No 688227.

Page 2: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

History

Version Date Reason Revised by

0.0 15/05/2017 First draft created Axel Ngonga (InfAI)

0.3 30/05/2017 Peer reviewed Nadine Jochimsen (InfAI)

1.0 31/05/2017 Updates and corrections Axel-Cyrille Ngonga Ngomo (InfAI)

2.0 30/11/2017 Updates and corrections Gayane Sedrakyan (imec)

2.0 24/03/2018 Updates and corrections Gayane Sedrakyan (imec)

2.0 28/03/2018 Peer reviewed Pavel Smirnov (AGT)

2.0 29/03/2018 Updates and corrections Gayane Sedrakyan (imec)

Author List

Organization Name Contact Information

InfAI Axel-Cyrille Ngonga Ngomo [email protected]

imec Gayane Sedrakyan [email protected]

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 1

Page 3: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Executive Summary

This document details the �nal state of the Hobbit community and is basically an update ofD1.1.2. During the second project year the focus of WP1 was not only on expansion but also onconsolidation and curation. In particular, the partners focused on curating the contact list and updatingit with new contacts. After the curation, removal of unreliable addresses and addition of novel contacts,the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interactionwith experts and other research projects has also led to the de�nition of use cases within whichbenchmarking as o�ered by Hobbit could be of central importance. The updated list of use cases andthe relevant benchmarks and datasets are detailed.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 2

Page 4: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Contents

1 Introduction 6

2 Final State of the Community 7

2.1 Community Building Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Current State of the Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Datasets 11

4 Use Cases 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 3

Page 5: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

List of Tables

1 Dissemination channels of Hobbit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Excerpt of dissemination and outreach events in which Hobbit participated. . . . . . 9

3 Excerpt of dissemination and outreach events in which Hobbit participated. . . . . . 10

4 Excerpt of the datasets available to the Hobbit project. A complete list can be foundat http://hobbit.ilabt.imec.be. Generators can create datasets of any size. Hencesize and expected growth cannot be stated. . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Excerpt of the datasets available to the Hobbit project. A complete list can be foundat http://hobbit.ilabt.imec.be. Generators can create datasets of any size. Hencesize and expected growth cannot be stated. . . . . . . . . . . . . . . . . . . . . . . . . 12

6 Excerpt of the datasets available to the Hobbit project. A complete list can be foundat http://hobbit.ilabt.imec.be. Generators can create datasets of any size. Hencesize and expected growth cannot be stated. . . . . . . . . . . . . . . . . . . . . . . . . 13

7 Excerpt of the datasets available to the Hobbit project. A complete list can be foundat http://hobbit.ilabt.imec.be. Generators can create datasets of any size. Hencesize and expected growth cannot be stated. . . . . . . . . . . . . . . . . . . . . . . . . 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 4

Page 6: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

List of Figures

1 Overview of Hobbit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Snapshot of Hobbit's Twitter account . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Distribution of Hobbit contacts in the world (left) and in Europe (right) . . . . . . . 8

4 Distribution of roles of Hobbit contacts . . . . . . . . . . . . . . . . . . . . . . . . . . 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 5

Page 7: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 Introduction

In its �rst year, Hobbit has aimed to establish itself as the provider of a benchmarking platformfor industry and academia with a focus of Big Linked Data technologies. One of the key steps towardsachieving this goal was to build up a community of interested parties around the project. As shownin Figure 1, the idea behind this community is to

1. gather supplementary datasets relevant to the project,

2. gather KPIs for the evaluation of the frameworks,

3. gather solutions to benchmark and

4. collect potential members of the Hobbit association.

Data Collection

Industrydata

Measure Collection

Benchmark Creation

Benchmark 1

KPIsTasks

KPIsTasksKPIsTasks

KPIsTasks

KPIsTasks

KPIsTasks

Benchmark 2

Benchmark n

HOBBITPlatform

Solution 1

Solution k

Solution 2

Challenges

Reports

Participants/Community

Figure 1: Overview of Hobbit

During the second project year, the project continued to build up the infrastructure necessary toachieve the goal aforementioned. In particular, new challenges were organized. This led to the focusin the area of datasets and community being on dissemination and consolidation. The �nal volumeof the community has hence increased in number, with the contact list now at 300 relevant contacts(120%) of the goal for the end of the project, see Section 2). The possible use cases (see Section 4)have not been altered and still represent the current stand of the consortium.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 6

Page 8: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Final State of the Community

2.1 Community Building Channels

As per the second year of the project, we continued using the multi-channel strategy describedin Table 1. The results achieved by using this strategy are monitored continuously by the Hobbitconsortium (especially by the dissemination and outreach group). The outcomes of this monitoringare the subject of deliverable D1.4 of Hobbit.1

Channel Description

Mailing list Subscriptions to the HOBBIT mailing list

Survey Respondents to the survey sent out for requirements gather-ing

Flyers Distribution of �yers at di�erent events

Talks Presentations of the HOBBIT project

Workshops Organization of workshop at major conferences and events

Cooperations Cooperation with relevant H2020 and national projects

Challenges Organization of challenges at major conferences (ISWC,DEBS, ESWC)

Publications Scienti�c publications about the core technologies of HOB-BIT. Upcoming are publications which use the HOBBITplatform.

Table 1: Dissemination channels of Hobbit

2.2 Current State of the Community

Over the two years of the project, HOBBIT was disseminated in manifold ways with the aim ofbuilding up a community around the project. For example, the project was disseminated at morethan 55 events (see Table 2 and Table 3 for an excerpt), within which we also aimed to get interestedparties to join HOBBIT even at the lowest level of engagement possible. We also interacted throughsocial media, for example by generating tweet content on a daily basis (see Figure 2). The partieswe interacted with across our multi-channel outreach and dissemination strategy (see subsection 2.1)were asked to join the HOBBIT community or to provide us with contact data for further reference.In addition, accomodating HOBBIT Association in a subgroup of an existing task force of BDVAassociation resulted in expanded networks and community contacts.

We gathered the following qualitative information on contacts:

• Email

• Full name, �rst name and last name

1Available at https://project-hobbit.eu/about/deliverables/.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 7

Page 9: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 2: Snapshot of Hobbit's Twitter account

• Role and role type

• Company

• Country

• LinkedIn

• Comment

• Source

• Project Contact

Figure 3: Distribution of Hobbit contacts in the world (left) and in Europe (right)

So far, 300 contacts were established and registered in the project contact database. We mainlyfocused on attracting the attention of companies to the project. In particular, 38.8% of the members

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 8

Page 10: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 4: Distribution of roles of Hobbit contacts

of the contact list are CxOs (e.g.,COO, CEO, CTO) or managers. 34 % are academics (professors,researchers, etc.) while only 16.4% are company employees (see Figure 4). Of these 300 contacts, 255contacts were subject to further interactions (107 companies, 76 academics).We hence consider the 255as being meaningful contacts in the sense of the description of work, meaning that we have alreadyachieved 102% of the target of 250 meaningful contacts. While the contact data cannot be publishedin this deliverable for reasons of privacy, Figure 3 gives an overview of the geospatial distribution ofthe community so far. Most of our contacts are European, with 43 contacts from Germany and 32from Belgium. We have however also aimed to reach out beyond Europe to get a glimpse of the currentideas, trends and use cases that could bene�t the Hobbit association later. For example, our contactlist extends to the USA, Canada, Brazil, Australia and China.

During the second project year the project has contacted BDVA Association with the aim of elabo-rating the creation of the HOBBIT association under its umbrella. After several rounds of negotiations,the integration of HOBBIT in a subgroup of an existing task force was achieved in the third year ofthe project.This subgroup will merge the e�orts of several data benchmarking projects under one BigData Benchmarking umbrella within which HOBBIT will take the lead of linked data benchmarking.The total number of members subscribed in the task force is 176.

Event name Attendees/readers (esti-mates)

Big Data Value Association (BDVA) workshops 2017 ≈ 100

European Big Data Value Forum (EBDVF) 2017 ≈ 1,200

International Semantic Web Conference (ISWC) 2017 ≈ 800

European Semantic Web Conference (ESWC) 2017 ≈ 400

Distributed Event-Based Systems Conference (DEBS) 2017 ≈ 90

Web Intelligence (WI) 2017 ≈ 300

Table 2: Excerpt of dissemination and outreach events in which Hobbit participated.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 9

Page 11: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Event name Attendees/readers (esti-mates)

Knowledge Capture (K-CAP) 2017 ≈ 75

World Wide Web (WWW) 2017 ≈ 800

Ontology Matching (OM) 2017 ≈ 30

NLIWOD 2017 at ISWC 2017 ≈ 50

BLINK 2017 at ISWC 2017 ≈ 25

International Conference on Semantic Systems (SEMAN-TiCS) 2017

≈ 370

IEEE BigData 2017 ≈ 600

AAAI Conference on Arti�cial Intelligence 2017 ≈ 1,850

European Big DataForum ≈ 300

BITKOM Big Data Summit 2016 ≈ 600

CEBIT 2016 ≈ 300

International Semantic Web Conference (ISWC) 2016 -BLINK workshop

≈ 300

International Semantic Web Conference (ISWC) 2016 - LinkDiscovery Tutorial

≈ 300

European Semantic Web Conference (ESWC) 2016 ≈ 300

European Conference on Arti�cial Intelligence (ECAI) 2016 ≈ 300

Diverse project meetings ≈ 100

Interviews ≈ 10,000

Table 3: Excerpt of dissemination and outreach events in which Hobbit participated.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 10

Page 12: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Datasets

During 24 months, 23 datasets were gathered by the consortium and in a CKAN repository,accessible through the URL hobbit.ilabt.imec.be. These datasets are listed in the tables below (seeTable 4, Table 5, Table 6 and Table 7).

Dataset Description Size (approxi-mation)

Growth (ex-pected)

Medical SubjectHeadings (MeSH)

Public RDF Datasets of Medi-cal Subject Headings (MeSH) con-trolled vocabulary

27,883 descriptorsin 2016 MeSH;87,000 entryterms, 232,000SupplementaryConcept Records(SCRs)

Approximately2% per year

LinkedSpending LinkedSpending contains govern-ment spendings from all over theworld as Linked Data. Linked-Spending uses the information col-lected by the OpenSpending projectand makes it available as data cube

2 million �nancialtransactions

7% per year

DBpedia DBpedia is a crowd-sourced commu-nity e�ort to extract structured in-formation from Wikipedia and makethis information available on theWeb. DBpedia allows answeringcomplex questions using the W3Cstandard SPARQL.

3 billion facts,125 languages,38.3 entities

10-20% per year

CER Smart Me-tering Project

The Smart Metering ElectricityCustomer Behaviour Trials (CBTs)took place during 2009 and 2010with over 5,000 Irish homes andbusinesses participating.

5,375 homes, 780businesses

Static

Next Bike Live information of GPS position ofaround 20,000 bicycles in about 70cities (http://www.nextbike.net/)

Live stream of3,000 bike posi-tions, 70 cities

Unclear

Table 4: Excerpt of the datasets available to the Hobbit project. A complete list can be found athttp://hobbit.ilabt.imec.be. Generators can create datasets of any size. Hence size and expectedgrowth cannot be stated.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 11

Page 13: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Dataset Description Size (approxi-mation)

Growth (ex-pected)

BioASQ Dataset underlying the question an-swering challenge of the same name.The challenges focuses on large-scalebiomedical semantic indexing andquestion answering

800 questions 20,00%

Energy Map Ger-many

CSV data of development of solarenergy within Germany with instal-lation date, location, nominal capac-ity, GPS information

1.5 million entries 1-2%

LDBC The LDBC-SNB Data Generator(DATAGEN) is the responsible ofproviding the data sets used by allthe LDBC benchmarks.

Generator Generator

LinkedGeoData LinkedGeoData is an e�ort to adda spatial dimension to the Web ofData / Semantic Web.

30 billion facts 5-10% per year

TLC Trip RecordData

This dataset includes trip recordsfrom all trips completed in yellowand green taxis in NYC in 2014 andselected months of 2015.

1.1 billion taxitrips

10-20% per year

GitHub Data GitHub is how people build softwareand is home to the largest commu-nity of open source developers in theworld, with over 12 million peoplecontributing to 31 million projects.

31 millionprojects,12 mil-lion users

5-10% per year

TWIG Ontology The ontology for the synthetic ver-sion of Twitter based on the Twit-ter7 dataset.

Generator Generator

QALD6 Question Answering on Linked Dataversion 6. The dataset containsapproximately questions in naturallanguage as well as the correspond-ing SPARQL queries and keywordqueries to gather information fromDBpedia, DBpedia abstracts and re-lated datasets.

500 questions 10%

Table 5: Excerpt of the datasets available to the Hobbit project. A complete list can be found athttp://hobbit.ilabt.imec.be. Generators can create datasets of any size. Hence size and expectedgrowth cannot be stated.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 12

Page 14: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Dataset Description Size (approxi-mation)

Growth (ex-pected)

BENGAL This family of datasets for namedentity recognition, entity disam-biguation and relation extractionare generated automatically out ofRDF data using natural languagegeneration.

Generator Generator

LIVED The �Long Device Level EnergyData� (LIVED) dataset and con-tains measurements collected fromsmart plugs multi-sensors as de-picted.

2.5 billion mea-surements

Static

Linked Connec-tions

Linked Connections is a method forgenerating publishing transit datausing a low-cost API. It does this byexposing data in JSON(-LD).

Generator Generator

Weidm�ller Energyand InjectionMolding Data Set

This data set consists of simulateddata using based on real measure-ments. The sensor measurements inthe data set are taken from man-ufacturing machines. It containsreadings from energy meters as wellas sensors that monitor the pro-duction process of injection moldingmachines.

Generator Generator

Linked SoftwareDependencies

Performs queries over software mod-ules. Experimented with access475,000+ npm JavaScript librariesas 150,000,000+ RDF triples usingTPF, HDT or Turtle

Generator Generator

VIAF The VIAF (Virtual InternationalAuthority File) combines multiplename authority �les into a singleOCLC-hosted name authority ser-vice. The goal of the service isto lower the cost and increase theutility of library authority �les bymatching and linking widely-usedauthority �les and making that in-formation available on the Web.

Generator Generator

Table 6: Excerpt of the datasets available to the Hobbit project. A complete list can be found athttp://hobbit.ilabt.imec.be. Generators can create datasets of any size. Hence size and expectedgrowth cannot be stated.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 13

Page 15: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Dataset Description Size (approxi-mation)

Growth (ex-pected)

GeoNames GeoNames contains placenames andcovers all countries.

GeoNames con-tains over 8million place-names

static

LOV LOV stands for Linked Open Vo-cabularies. LOV provides a choiceof several hundreds of such vo-cabularies, based on quality re-quirements including URI stabilityand availability on the Web, useof standard formats and publica-tion best practices, quality meta-data and documentation, identi�-able and trustable publication body,proper versioning policy.

Generator Generator

RISIS Research Infrastructure for researchand innovation policy studies. Alldatasets can be accessed via the visitrequest option.

unclear unclear

Synthetic TraceGenerator

Generates car trips as sequences ofGPS points. It takes into accounttypical origins and destinations forsome area, as well as speeds for everyroad segment.

Generator Generator

Table 7: Excerpt of the datasets available to the Hobbit project. A complete list can be found athttp://hobbit.ilabt.imec.be. Generators can create datasets of any size. Hence size and expectedgrowth cannot be stated.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 14

Page 16: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Use Cases

The use cases of interest to theHobbit community and contacts vary signi�cantly and are collectedcontinuously. So far, we were able to gather descriptions within

1. dissemination events,

2. interviews,

3. collaborations with other projects and

4. in deliverables of other projects.

This data collection process returns use cases hint at applications in the following domains (note thatthe names and contacts from which the information was gathered are partly omitted on purpose forthe sake of privacy):

• Industry 4.0: The use of semantics in the industry 4.0 is of central importance for the creation ofmachines that can justify their behavior and interact with their users. Amongst other activities,we gathered information from the experts in the SAKE2 and STEP3 projects, who expressedinterest in benchmarking link discovery, storage, machine learning and visualisation. Datasetssuch as the CER Smart Metering, LIVED and Weidmüller are of interest.

• Geospatial data analysis: Geospatial datasets belong to the largest and most used datasetson the planet. Contacts with experts from related projects (GeoKnow,4 GEISER,5 SmartRegio,6

STEP, SLIPO, SAGE) revealed that these experts are interested in Hobbit datasets related togeospatial entities and points of interest (LinkedGeoData, Energy Map Germany, LinkedConnec-tions, TLC Record Trip). The benchmarks of interest here are related to knowledge extractionfrom structured and unstructured data, storage, versioning and machine learning and visualisa-tion.

• Smart Energy: Devising a machinery that can use energy data to provide customers with intel-ligent energy services ranging from the automatic selection of energy providers to the detectionof unwanted states (machinery on during the weekend, open fridge doors, etc.) is regarded as aninnovative goal worthy of pursuit. Benchmarking how well such systems perform demands bench-marks in data acquisition, storage, versioning. Relevant datasets include the LIVED, Weidmüllerand CER Smart Metering datasets.

• Weather Data Analysis: The increasing amount of streaming data from weather sensorsdemands novel techniques for the semantic analysis of streaming data. The area of continuousqueries was regarded as one of the key areas for which benchmarking methodologies and uni�edsemantics still need to be dealt with. Here, Smart metering data (LIVED, Weidmüller, CER)are regarded as being of signi�cance, while storage and acquisition benchmarks are key.

• Human Resource Management: A rather surprising use case for the HOBBIT datasets, gen-erators and benchmarks for the sake of �nding good candidates for job o�ers. Novel applications

2http://sake-projekt.de3https://www.projekt-step.de/4http://geoknow.eu/5http://www.projekt-geiser.de/6http://www.smartregio.org/

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 15

Page 17: : 688227 Start Date of Project: 2015/12/01 Duration: 36 ... · the address list grew to 300 relevant contacts. The list of datasets were increased to 23. The interaction with experts

D1.1.3 - v. 2.0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

for this purpose demand e�cient entity recognition, entity linking and relation extraction, whichare the area targeted by the knowledge extraction benchmark of HOBBIT. Relevant datasetshere include the TWIG and the BENGAL datasets.

• Enterprise Search: Searching through streams of ever changing data is of central importancefor data-driven companies. The use cases here include federated search across several datasets(see projects DIESEL7 and WDAqua8) to search on mobile devices (e.g., project QAMEL9). TheQALD 6, DBpedia, BioASQ, MESH and BENGAL datasets are here the most related while theknowledge acquisition benchmarks are the most important.

• European societal challenges: Through our collaboration with BigDataEurope, we were ableto gather use cases for HOBBIT for seven of the societal challenges formulated by the EuropeanUnion (i.e., health, food and agriculture, energy, transport, climate, social sciences and security).Given the diversity of the challenges, virtually all datasets and benchmarks provided by HOBBITare relevant for at least one of the challenges or for the technical solutions underlying thesechallenges. For example, the CER Smart Metering data and the data storage and knowledgebenchmarks are of central importance for the energy domain while LinkedConnections and allother transport datasets are relevant for the transport societal challenge.

Minor use cases include works on linguas francas for storage, morphology analysis as well as indexingfor storage and question answering.

7https://diesel-project.eu/8http://wdaqua.eu/9https://qamel.eu/

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 16