"stories" in data and the roles of crowdsourcing – views of a web miner
DESCRIPTION
"Stories" in data and the roles of crowdsourcing – views of a Web miner. Bettina Berendt Dept . of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt / Thanks to: Ilija Subašić, Markus Luczak-Rösch, and Laura Dr ă gan. A story. Story structure. - PowerPoint PPT PresentationTRANSCRIPT
"Stories" in data and the roles of crowdsourcing – views of a Web miner
Bettina Berendt
Dept. of Computer ScienceKU Leuven, Belgiumhttp://people.cs.kuleuven.be/~bettina.berendt/
Thanks to: Ilija Subašić, Markus Luczak-Rösch, and Laura Drăgan
A story
Story structure
One case of provenance
Another case of provenance
Formalizing provenance: a high-level view
Challenge 1:Many voices
Challenge 2
Challenge 3:subjectivity
The STORIES Tool
Uncover (1)
Uncover (2)
Scan (over time)
Uncover
Zoom
Search: formulating ad-hoc concepts
Track (2)
Textual summarization
Challenge 4
Crowd-sourcing the truth? Wikipedia (here: the Gaza Flotilla Raid)
Challenge 5
Challenge 4: More specifically
Challenge 5: vagueness - reprise
The “live crowdsourcing activity“•Goal: crowdsource data citation metadata•Motivation 1 / possible extension
•Motivation 2 / case study
http://prov.usewod.org
The data
Datasets
Publications
[People]
The datasets
Preloaded:
– USEWOD datasets– DBpedia– SWDF– Bio2RDF– LinkedGeoData– BioPortal– OpenBioMed
The datasets
Preloaded:
– Generic (!)– Versions/releases– References
The datasets
Add new:
– Name*– Version– Release date– URL
The publications
Preloaded:
– USEWOD workshop papers
The publications
Add new:
– Title*– Authors– Year– URL
The data
The task
Capture
which dataset is used in which publication
and
how
Data representation
Datasets
Publications
Connections between them
schema.org
prov:Entity
?
Data representation
Datasets
Publications
Connections between them
schema.org
prov:Entity
prov:Derivation
The task
Capture
which dataset is used in which publication
and
how
Connections
Publication – Publication
Publication – Dataset
Dataset – Publication
Dataset - Dataset
Connections
Publication – Publication
citation
Connections
Publication – Dataset
Dataset – Publication
mentions
describes
evaluates
analyses
compares
Connections
Dataset – Dataset
extends
includes
overlaps
transformation of
generalisation of
Data representation
Subclasses of prov:Derivation
(inverse of Publication-DS)
The task
Capture
which dataset is used in which publication
and
how
Data representation
Data representation
Bundles
Live crowdsourcing activity 2014: outcomes
Participants 6
Bundles 81 avg: 13.5, min: 2, max:27
Publications 19
Datasets 2 (3)
Connections 95 Inclusion: 62 Analysis: 21, Mention: 6
Lessons learned
Data is dirty
– even coming from experts
Focus on the task
– make everything else simpler– minimise data input
Questionnaire results
Inconclusive results on the suitability of the vocabulary,
But interesting answers to: „“what questions would this information answer for you?“:
● “What are popular datasets?”● “Which datasets are facilitators for research
on X?”● “What publications are related through a
dataset (but don't mention each other)?”
• What is outsourced• Who is the crowd• How is the task designed• How are the results validated• How can the process be optimised
[Quinn & Bederson, 2012]
Outlook (1): Dimensions of crowdsourcing
Dimensions Specific questions• [Who] Which crowd(s)? Experts & non-experts• [What] Enhanced by IE?• [design/validation] How to combine these
sources of metadata?• [Optimisation] Incentives?
▫“Student science“?▫Citizen science?▫“Learner science“?
Enlarging the scope: “How come ...?“ Storytelling
THANK YOU!
Some references:
• Subašić, I. & Berendt, B. (2009). Discovery of interactive graphs for understanding and searching time-indexed corpora. Knowledge and Information Systems. http://people.cs.kuleuven.be/~bettina.berendt/Papers/subasic_berendt_2009.pdf
• Berendt, Bettina; Last, Mark; Subasic, Ilija; Verbeke, Mathias (2013). New formats and interfaces for multi-document news summarization and its evaluation, In: Fiori, Alessandro (ed.), Innovative Document Summarization Techniques: Revolutionizing Knowledge Understanding. IGI Global. https://lirias.kuleuven.be/bitstream/123456789/423917/1/berendt_last_subasic_verbeke_2013_withbib.pdf
• Dragan, Laura, Luczak-Rösch, Markus, Simperl, Elena, Berendt, Bettina and Moreau, Luc (2014) Crowdsourcing data citation graphs using provenance. In, Provenance Analytics (ProvAnalytics2014), Cologne, DE, 09 Jun 2014. 4pp. http://eprints.soton.ac.uk/365374/
• ~ Presentation at LCPD 2014 : Second workshop on Interlinking and Contextualizing Publications and Datasets, to appear in DLIB Magazine