"stories" in data and the roles of crowdsourcing – views of a web miner

52
"Stories" in data and the roles of crowdsourcing – views of a Web miner Bettina Berendt Dept. of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.bere ndt / Thanks to: Ilija Subašić, Markus Luczak-Rösch, and Laura Drăgan

Upload: bo-hartman

Post on 30-Dec-2015

16 views

Category:

Documents


1 download

DESCRIPTION

"Stories" in data and the roles of crowdsourcing – views of a Web miner. Bettina Berendt Dept . of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt / Thanks to: Ilija Subašić, Markus Luczak-Rösch, and Laura Dr ă gan. A story. Story structure. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

"Stories" in data and the roles of crowdsourcing – views of a Web miner

Bettina Berendt

Dept. of Computer ScienceKU Leuven, Belgiumhttp://people.cs.kuleuven.be/~bettina.berendt/

Thanks to: Ilija Subašić, Markus Luczak-Rösch, and Laura Drăgan

Page 2: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner
Page 3: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

A story

Page 4: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Story structure

Page 5: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

One case of provenance

Page 6: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Another case of provenance

Page 7: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Formalizing provenance: a high-level view

Page 8: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Challenge 1:Many voices

Page 9: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Challenge 2

Page 10: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Challenge 3:subjectivity

Page 11: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The STORIES Tool

Page 12: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Uncover (1)

Page 13: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Uncover (2)

Page 14: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Scan (over time)

Page 15: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Uncover

Page 16: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Zoom

Page 17: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Search: formulating ad-hoc concepts

Page 18: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Track (2)

Page 19: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Textual summarization

Page 20: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Challenge 4

Page 21: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Crowd-sourcing the truth? Wikipedia (here: the Gaza Flotilla Raid)

Page 22: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Challenge 5

Page 23: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Challenge 4: More specifically

Challenge 5: vagueness - reprise

Page 24: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The “live crowdsourcing activity“•Goal: crowdsource data citation metadata•Motivation 1 / possible extension

•Motivation 2 / case study

Page 25: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner
Page 26: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

http://prov.usewod.org

Page 27: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The data

Datasets

Publications

[People]

Page 28: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The datasets

Preloaded:

– USEWOD datasets– DBpedia– SWDF– Bio2RDF– LinkedGeoData– BioPortal– OpenBioMed

Page 29: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The datasets

Preloaded:

– Generic (!)– Versions/releases– References

Page 30: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The datasets

Add new:

– Name*– Version– Release date– URL

Page 31: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The publications

Preloaded:

– USEWOD workshop papers

Page 32: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The publications

Add new:

– Title*– Authors– Year– URL

Page 33: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The data

Page 34: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The task

Capture

which dataset is used in which publication

and

how

Page 35: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Data representation

Datasets

Publications

Connections between them

schema.org

prov:Entity

?

Page 36: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Data representation

Datasets

Publications

Connections between them

schema.org

prov:Entity

prov:Derivation

Page 37: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The task

Capture

which dataset is used in which publication

and

how

Page 38: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Connections

Publication – Publication

Publication – Dataset

Dataset – Publication

Dataset - Dataset

Page 39: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Connections

Publication – Publication

citation

Page 40: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Connections

Publication – Dataset

Dataset – Publication

mentions

describes

evaluates

analyses

compares

Page 41: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Connections

Dataset – Dataset

extends

includes

overlaps

transformation of

generalisation of

Page 42: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Data representation

Subclasses of prov:Derivation

(inverse of Publication-DS)

Page 43: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

The task

Capture

which dataset is used in which publication

and

how

Page 44: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Data representation

Page 45: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Data representation

Page 46: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Bundles

Page 47: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Live crowdsourcing activity 2014: outcomes

Participants 6

Bundles 81 avg: 13.5, min: 2, max:27

Publications 19

Datasets 2 (3)

Connections 95 Inclusion: 62 Analysis: 21, Mention: 6

Page 48: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Lessons learned

Data is dirty

– even coming from experts

Focus on the task

– make everything else simpler– minimise data input

Page 49: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Questionnaire results

Inconclusive results on the suitability of the vocabulary,

But interesting answers to: „“what questions would this information answer for you?“:

● “What are popular datasets?”● “Which datasets are facilitators for research

on X?”● “What publications are related through a

dataset (but don't mention each other)?”

Page 50: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

• What is outsourced• Who is the crowd• How is the task designed• How are the results validated• How can the process be optimised

[Quinn & Bederson, 2012]

Outlook (1): Dimensions of crowdsourcing

Page 51: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

Dimensions Specific questions• [Who] Which crowd(s)? Experts & non-experts• [What] Enhanced by IE?• [design/validation] How to combine these

sources of metadata?• [Optimisation] Incentives?

▫“Student science“?▫Citizen science?▫“Learner science“?

Enlarging the scope: “How come ...?“ Storytelling

Page 52: "Stories" in data  and  the roles of crowdsourcing  –  views of a Web miner

THANK YOU!

Some references:

• Subašić, I. & Berendt, B. (2009). Discovery of interactive graphs for understanding and searching time-indexed corpora. Knowledge and Information Systems. http://people.cs.kuleuven.be/~bettina.berendt/Papers/subasic_berendt_2009.pdf

• Berendt, Bettina; Last, Mark; Subasic, Ilija; Verbeke, Mathias (2013). New formats and interfaces for multi-document news summarization and its evaluation, In: Fiori, Alessandro (ed.), Innovative Document Summarization Techniques: Revolutionizing Knowledge Understanding. IGI Global. https://lirias.kuleuven.be/bitstream/123456789/423917/1/berendt_last_subasic_verbeke_2013_withbib.pdf

• Dragan, Laura, Luczak-Rösch, Markus, Simperl, Elena, Berendt, Bettina and Moreau, Luc (2014) Crowdsourcing data citation graphs using provenance. In, Provenance Analytics (ProvAnalytics2014), Cologne, DE, 09 Jun 2014. 4pp. http://eprints.soton.ac.uk/365374/

• ~ Presentation at LCPD 2014 : Second workshop on Interlinking and Contextualizing Publications and Datasets, to appear in DLIB Magazine