a whole new zooniverse: guidelines and tools for crowdsourced science

1

A WHOLE NEW ZOONIVERSEGUIDELINES AND TOOLS FOR

CROWDSOURCED SCIENCE

Elena [email protected] @esimperlNovember 16th, 2016

OVERVIEW• Citizen science is a fascinating subject

for Web science research• Our work helps system designers with• Frameworks of motivations and incentives

engineering• Design guidelines and recommendations• Methods to make crowdsourced tasks more

effective• Methods to study engagement and community

health

TUTORIAL@ISWC2013

3

CITIZEN SCIENCE@WAIS

WHAT IS CITIZEN SCIENCE

5

CITIZEN SCIENCE PROJECTS

6

CITIZEN SCIENCE PLATFORMS

STUDYING CITIZEN SCIENCE: HUMAN COMPUTATION & CROWDSOURCING

Task design Task assignment Answer validation and aggregation Contributors’ performance Motivation and incentives

STUDYING CITIZEN SCIENCE: ONLINE COMMUNITY Roles and activities Patterns of participation Community health Motivation and incentives

STUDYING CITIZEN SCIENCE: OPEN SCIENCE Scientific workflows Scientific practice Publishing, citation, and peer-review models

9

STUDYING CITIZEN SCIENCE: EDUCATION AND SCICOMM

Teaching methods and assessment Tutorial design Learning analytics Engagement strategy

10

11

WHEN MAKES CITIZEN SCIENCE SUCCESSFUL

tasks people time

quality science community

learning social media …

12

WHAT MAKES CITIZEN SCIENCE SUCCESSFUL (2)

[Cox et al., 2015]

LEVELS OF ENGAGEMENT

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 3102468

101214161820

Month since registration

Activ

e us

ers

in %

~1% of participants contribute 72% of Talk & 29% of Task

[Luczak-Roesch et al., 2014]

14

DATA QUALITYExisting quality inference algorithms are limited to binary classificationConceptualise the problem for realistic workflowsDevelop efficient implementations of algorithmsCompare them

15

DATA QUALITY (2) Majority Voting Find the annotation with the top vote as

the true label for each object

Message Passing Use object-specific worker messages to

represent how reliable a worker is in labelling each specific object.

Expectation Maximization Infer the true label for each object, using

annotations from all users, accounting for the error rates of each user;

Estimates the error rates of each user by comparing their annotations with inferred true labels.

Results measured in terms of different accuracy metrics and time

Experiments still ongoing Preview Majority voting performs

exceptionally well for large numbers of annotations

If less data is available, one could explore message passing (possibly in combination with majority voting)

GAMIFICATION

Survey of 27 papers & 31 VCS projects

16

17

SOCIALITY Discussions and engagement with volunteers are integral part of the experience

Leads to serendipitous scientific discoveries

Encourages autonomy and helps with community building

WORK VS TALK

40.5%

Classifications

Talk

con

trib

ution

s

Classifications

[Luczak-Roesch et al., 2014; Tinati et al., 2015]

CHAT AND INSTANT MESSAGING

Microposts

PH SG SW NN GZ CC PF SF AP WS

91%2

0

6

4

10

8

[Luczak-Roesch et al., 2014; Tinati et al., 2015, WebSci]

DISCUSSION PROFILES

Deeply engaged

volunteers, few threads but multiple posts within

them

9 0.1%

Content producers,

posting across many boards and

threads

70.1%

Thread followers

and PM (one-to-

one) talkers

8 0.4%

First to respond and

question answerers

4 1%

Highly active thread

starters and answerers

across a wide range of topics

1 2.8%

Infrequent volunteers,

single thread

posts, no personal messages

5 5.5%

Watcher and starter of

many threads, but not first to

reply

36.5%

Highly active thread

starters and first to reply back

2 14.6%

Long active volunteers

(the core group), posting

sporadically

6 69.0%

[Tinati et al., 2015, WebSci]

21

FROM CROWD TO COMMUNITY Survey of 48 projects and 150 publications

Identifying affordances from online community themes within literature Task visibility Goals Feedback Rewards

Community features found to have greater role than previously considered Encourage task completion, discussions

etc.

Themes align to key success factors of volunteer engagement, task completion and submission accuracy

[Reeves et al., 2017]

FROM PROJECTS TO ECOSYSTEMS

Project A

Project B

Project C

Participant X

Part. Y

[Luczak-Roesch et al., 2014]

DESIGNING PLATFORMSTask

specificity

Community

development

Task design

PR and engagem

ent

Bootstrapping the community

Serendipitous scientific discovery

Engaging with people, supporting profession team

Supporting individuals, finding new scientific discoveries

Obtaining new citizen scientists

Retaining people

Supporting people, improving task completion

Obtaining new citizen scientists

Reinvigorating old users

[Tinati et al., 2015, CHI]

WHAT’S NEXT? Human computation & crowdsourcing Task assignment: what tasks are interesting/relevant for whom? Data quality: scalable and in real-time Peer review, collaborative approaches The role of gamification: is science a game?

Online community Making discussions more effective

Science Citizen science platforms that everyone can use New forms of publishing, citation, reproducibility, and replication

05/02/2023 25

[email protected]@ESIMPERL

WWW.SOCIAM.ORGWWW.STARS4ALL.EU

All publications available at http://dblp.uni-trier.de/pers/hd/s/Simperl:Elena_Paslaru_Bontas

mailto:[email protected]

http://www.sociam.org/

http://www.stars4all.eu/

a whole new zooniverse: guidelines and tools for crowdsourced science

Education