a whole new zooniverse: guidelines and tools for crowdsourced science
TRANSCRIPT
1
A WHOLE NEW ZOONIVERSEGUIDELINES AND TOOLS FOR
CROWDSOURCED SCIENCE
Elena [email protected] @esimperlNovember 16th, 2016
OVERVIEW• Citizen science is a fascinating subject
for Web science research• Our work helps system designers with• Frameworks of motivations and incentives
engineering• Design guidelines and recommendations• Methods to make crowdsourced tasks more
effective• Methods to study engagement and community
health
TUTORIAL@ISWC2013
3
CITIZEN SCIENCE@WAIS
WHAT IS CITIZEN SCIENCE
5
CITIZEN SCIENCE PROJECTS
6
CITIZEN SCIENCE PLATFORMS
STUDYING CITIZEN SCIENCE: HUMAN COMPUTATION & CROWDSOURCING
Task design Task assignment Answer validation and aggregation Contributors’ performance Motivation and incentives
STUDYING CITIZEN SCIENCE: ONLINE COMMUNITY Roles and activities Patterns of participation Community health Motivation and incentives
STUDYING CITIZEN SCIENCE: OPEN SCIENCE Scientific workflows Scientific practice Publishing, citation, and peer-review models
9
STUDYING CITIZEN SCIENCE: EDUCATION AND SCICOMM
Teaching methods and assessment Tutorial design Learning analytics Engagement strategy
10
11
WHEN MAKES CITIZEN SCIENCE SUCCESSFUL
tasks people time
quality science community
learning social media …
12
WHAT MAKES CITIZEN SCIENCE SUCCESSFUL (2)
[Cox et al., 2015]
LEVELS OF ENGAGEMENT
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 3102468
101214161820
Month since registration
Activ
e us
ers
in %
~1% of participants contribute 72% of Talk & 29% of Task
[Luczak-Roesch et al., 2014]
14
DATA QUALITYExisting quality inference algorithms are limited to binary classificationConceptualise the problem for realistic workflowsDevelop efficient implementations of algorithmsCompare them
15
DATA QUALITY (2) Majority Voting Find the annotation with the top vote as
the true label for each object
Message Passing Use object-specific worker messages to
represent how reliable a worker is in labelling each specific object.
Expectation Maximization Infer the true label for each object, using
annotations from all users, accounting for the error rates of each user;
Estimates the error rates of each user by comparing their annotations with inferred true labels.
Results measured in terms of different accuracy metrics and time
Experiments still ongoing Preview Majority voting performs
exceptionally well for large numbers of annotations
If less data is available, one could explore message passing (possibly in combination with majority voting)
GAMIFICATION
Survey of 27 papers & 31 VCS projects
16
17
SOCIALITY Discussions and engagement with volunteers are integral part of the experience
Leads to serendipitous scientific discoveries
Encourages autonomy and helps with community building
WORK VS TALK
40.5%
Classifications
Talk
con
trib
ution
s
Classifications
[Luczak-Roesch et al., 2014; Tinati et al., 2015]
CHAT AND INSTANT MESSAGING
Microposts
PH SG SW NN GZ CC PF SF AP WS
91%2
0
6
4
10
8
[Luczak-Roesch et al., 2014; Tinati et al., 2015, WebSci]
DISCUSSION PROFILES
Deeply engaged
volunteers, few threads but multiple posts within
them
9 0.1%
Content producers,
posting across many boards and
threads
70.1%
Thread followers
and PM (one-to-
one) talkers
8 0.4%
First to respond and
question answerers
4 1%
Highly active thread
starters and answerers
across a wide range of topics
1 2.8%
Infrequent volunteers,
single thread
posts, no personal messages
5 5.5%
Watcher and starter of
many threads, but not first to
reply
36.5%
Highly active thread
starters and first to reply back
2 14.6%
Long active volunteers
(the core group), posting
sporadically
6 69.0%
[Tinati et al., 2015, WebSci]
21
FROM CROWD TO COMMUNITY Survey of 48 projects and 150 publications
Identifying affordances from online community themes within literature Task visibility Goals Feedback Rewards
Community features found to have greater role than previously considered Encourage task completion, discussions
etc.
Themes align to key success factors of volunteer engagement, task completion and submission accuracy
[Reeves et al., 2017]
FROM PROJECTS TO ECOSYSTEMS
Project A
Project B
Project C
Participant X
Part. Y
[Luczak-Roesch et al., 2014]
DESIGNING PLATFORMSTask
specificity
Community
development
Task design
PR and engagem
ent
Bootstrapping the community
Serendipitous scientific discovery
Engaging with people, supporting profession team
Supporting individuals, finding new scientific discoveries
Obtaining new citizen scientists
Retaining people
Supporting people, improving task completion
Obtaining new citizen scientists
Reinvigorating old users
[Tinati et al., 2015, CHI]
WHAT’S NEXT? Human computation & crowdsourcing Task assignment: what tasks are interesting/relevant for whom? Data quality: scalable and in real-time Peer review, collaborative approaches The role of gamification: is science a game?
Online community Making discussions more effective
Science Citizen science platforms that everyone can use New forms of publishing, citation, reproducibility, and replication
05/02/2023 25
[email protected]@ESIMPERL
WWW.SOCIAM.ORGWWW.STARS4ALL.EU
All publications available at http://dblp.uni-trier.de/pers/hd/s/Simperl:Elena_Paslaru_Bontas