supporting content curation communities: the case of the encyclopedia of life

16
Supporting Content Curation Communities: The Case of the Encyclopedia of Life Dana Rotman College of Information Studies, University of Maryland, College Park College Park, Maryland 20742. E-mail: [email protected] Kezia Procita College of Information Studies, University of Maryland, College Park College Park, Maryland 20742. E-mail: [email protected] Derek Hansen College of Information Studies, University of Maryland, College Park College Park, Maryland 20742. E-mail: [email protected] Cynthia Sims Parr National Museum of Natural History, Smithsonian Institution. Washington, DC 20013. E-mail: [email protected] Jennifer Preece College of Information Studies, University of Maryland, College Park College Park, Maryland 20742. E-mail: [email protected] This article explores the opportunities and challenges of creating and sustaining large-scale “content curation communities” through an in-depth case study of the Encyclopedia of Life (EOL). Content curation commu- nities are large-scale crowdsourcing endeavors that aim to curate existing content into a single repository, making these communities different from content cre- ation communities such as Wikipedia. In this article, we define content curation communities and provide examples of this increasingly important genre. We then follow by presenting EOL, a compelling example of a content curation community, and describe a case study of EOL based on analysis of interviews, online discus- sions, and survey data. Our findings are characterized into two broad categories: information integration and social integration. Information integration challenges at EOL include the need to (a) accommodate and validate multiple sources and (b) integrate traditional peer reviewed sources with user-generated, nonpeer- reviewed content. Social integration challenges at EOL include the need to (a) establish the credibility of open- access resources within the scientific community and (b) facilitate collaboration between experts and novices. After identifying the challenges, we discuss the potential strategies EOL and other content curation communities can use to address them, and provide technical, content, and social design recommendations for overcoming them. Introduction Scientific progress is, in large part, dependent on the development of high-quality shared information resources tailored to meet the needs of various scientific communities. Traditionally created and maintained by a handful of paid scientists or information professionals, scientific informa- tion resources such as repositories, databases, and archives are increasingly being crowdsourced to professional and nonprofessional volunteers in what we define as “content curation communities.” Content curation communities are distributed communities of volunteers who work together to curate data from disparate resources into coherent, vali- dated, and oftentimes freely available repositories. Content curation communities helped to develop resources on drug discovery (Li, Cheng, Wang, & Bryant, 2010), worm habi- tats or bird migration (Sullivan et al., 2009), astronomic shifts (Raddick et al., 2007), and language (Hughes, 2005), to name a few. Received July 8, 2011; revised December 14, 2011; accepted December 15, 2011 © 2012 ASIS&T Published online 28 March 2012 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/asi.22633 JOURNAL OF THEAMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 63(6):1092–1107, 2012

Upload: harish-srinivas

Post on 26-Jan-2015

104 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

Supporting Content Curation Communities:The Case of the Encyclopedia of Life

Dana RotmanCollege of Information Studies, University of Maryland, College ParkCollege Park, Maryland 20742. E-mail: [email protected]

Kezia ProcitaCollege of Information Studies, University of Maryland, College ParkCollege Park, Maryland 20742. E-mail: [email protected]

Derek HansenCollege of Information Studies, University of Maryland, College ParkCollege Park, Maryland 20742. E-mail: [email protected]

Cynthia Sims ParrNational Museum of Natural History, Smithsonian Institution. Washington, DC 20013. E-mail: [email protected]

Jennifer PreeceCollege of Information Studies, University of Maryland, College ParkCollege Park, Maryland 20742. E-mail: [email protected]

This article explores the opportunities and challenges ofcreating and sustaining large-scale “content curationcommunities” through an in-depth case study of theEncyclopedia of Life (EOL). Content curation commu-nities are large-scale crowdsourcing endeavors thataim to curate existing content into a single repository,making these communities different from content cre-ation communities such as Wikipedia. In this article,we define content curation communities and provideexamples of this increasingly important genre. We thenfollow by presenting EOL, a compelling example of acontent curation community, and describe a case studyof EOL based on analysis of interviews, online discus-sions, and survey data. Our findings are characterizedinto two broad categories: information integration andsocial integration. Information integration challengesat EOL include the need to (a) accommodate andvalidate multiple sources and (b) integrate traditionalpeer reviewed sources with user-generated, nonpeer-reviewed content. Social integration challenges at EOLinclude the need to (a) establish the credibility of open-access resources within the scientific community and

(b) facilitate collaboration between experts and novices.After identifying the challenges, we discuss the potentialstrategies EOL and other content curation communitiescan use to address them, and provide technical, content,and social design recommendations for overcomingthem.

Introduction

Scientific progress is, in large part, dependent on thedevelopment of high-quality shared information resourcestailored to meet the needs of various scientific communities.Traditionally created and maintained by a handful of paidscientists or information professionals, scientific informa-tion resources such as repositories, databases, and archivesare increasingly being crowdsourced to professional andnonprofessional volunteers in what we define as “contentcuration communities.” Content curation communities aredistributed communities of volunteers who work together tocurate data from disparate resources into coherent, vali-dated, and oftentimes freely available repositories. Contentcuration communities helped to develop resources on drugdiscovery (Li, Cheng, Wang, & Bryant, 2010), worm habi-tats or bird migration (Sullivan et al., 2009), astronomicshifts (Raddick et al., 2007), and language (Hughes, 2005),to name a few.

Received July 8, 2011; revised December 14, 2011; accepted December 15,

2011

© 2012 ASIS&T • Published online 28 March 2012 in Wiley OnlineLibrary (wileyonlinelibrary.com). DOI: 10.1002/asi.22633

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 63(6):1092–1107, 2012

Page 2: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

Content curation communities are related to content cre-ation communities like Wikipedia, but face different chal-lenges, as curation and creation are fundamentally differentactivities. They both require a community of contributors tohelp maintain free content, but the tasks performed by thecontributors differ, as does the requisite skill set. Contentcuration communities aggregate, validate, and annotateexisting content with its associated intellectual propertyclaims into a comprehensive repository, while content cre-ation communities avoid intellectual property concerns bycreating their own content from scratch. Both content cre-ation and content curation communities share some aspectswith organizational repositories, mainly the need to “shape”knowledge—whether created or curated—into a more usefulformat that will create a “dynamic memory system” (Yates,Wagner, & Majchrzak, 2010) that will serve the communitymembers. As such, both communities fill important niches inthe information landscape and have already proven theirworth. Although content creation communities, particularlyWikipedia, have been examined extensively to uncover theprinciples that have led to their successes and failures (c.f.Forte & Bruckman, 2005; Kittur, Chi, Pendleton, Suh, &Mytkowicz, 2007; Kriplean, Beschastnikh, & McDonald,2008; Nov, 2007), there have not been comparable studies ofcontent curation communities. To realize the ambitiousgoals of such communities, we must understand how toeffectively design the technologies, information standards,processes, and social practices that support them.

The goal of this article is to characterize content curationcommunities and their increasing prominence in the infor-mation landscape, and then to use the formative years of theEncyclopedia of Life (EOL) as a case study for providinginsights into the design of such communities. Our keyresearch questions are as follows:

• What are the unique informational and social challengesexperienced by EOL, as an example of a content curationcommunity?

• What are the key information and social design choices avail-able to designers of content curation communities?

Content Curation Communities

Content curation communities can be defined by describ-ing their core elements:

• Community: The word “community” emphasizes that a largenumber of people are involved in some type of collectiveeffort. Although not necessary, they typically include volun-teers and reflect the social roles and participation patterns thatmirror those of other types of online communities (Preece,2000). Much of the curation effort is done individually, butwhat separates content curation communities from personalcuration is the coordinated curation work that pulls contentinto a mass repository.

• Content: The word “content” conveys the primary emphasisof the community. Curation communities exist for thepurpose of providing content, which can take on many forms

such as text, multimedia (e.g., images, videos), and structureddata (e.g. metadata). The content is typically free and web-based, but doesn’t necessarily have to be. The nature andscope of the content is based on topics and resources that thespecific community finds valuable enough to curate.

• Curation: The word “curation” describes the type of work thatis performed by these communities; it is their raison d’être.Curation connotes the activities of identifying, selecting, vali-dating, organizing, describing, maintaining, and preservingexisting artifacts. This term follows the term “content”because the artifacts being curated by the community arecontent, not necessarily the objects themselves (e.g., rocks oranimals aren’t curated, but content about them is curated).Curators of museum objects or physical samples are typicallyexperts in their domain, suggesting that this role requiressome level of content expertise, which can range from layinterest in the specific domain to professional-level expertise,although this is not a requirement for the definition.

Although there are various examples for content curationcommunities (see Table 1), in this article, we focus on ascientific content curation community, which builds on along tradition of sharing and curating scientific data andliterature. The scientific enterprise has always been collabo-rative. However, the advent of the Internet has enabled largedistributed scientific communities to collaborate at a scaleand pace never before realized (Borgman, 2007). Distributedcommunities of scientists were coined “collaboratories,” aterm first used in the early 1980s, to indicate a laboratorywithout walls where data are shared via advances in infor-mation technology independent of time and location (Boset al., 2007; Finholt, 2002). Collaboratories and their morerecent instantiations led to an increase in the diversity ofparticipants and participation levels (Birnholtz & Bietz,2003; Bos et al., 2007; Finholt, 2002). Yet issues of datareuse, upholding scientific standards, and maintaining high-quality data persist (Zimmerman, 2007).

Bos and his colleagues (2007) identified seven distincttypes of collaboratories, including “community datasystems,” where scientific data is created, maintained, andimproved by a geographically distributed community.Unlike content curation communities that strive to aggregatevarious types of content, whether traditional peer-reviewedscientific data or user-generated data, community datasystems and related efforts such as “curated databases”(Buneman, Cheney, Tan, & Vansummeren, 2008) focussolely on curating scientific data. In many of these commu-nity data systems, volunteers play the role of content con-tributor by submitting new data that are then vetted andannotated by paid professionals. Content curation commu-nities are not primarily about collecting original data. Forexample, the EOL reverses the traditional community datasystem approach by having its volunteer content curatorsannotate and vet data that have already been collected incommunity data systems or other publications. In many ofthese cases, the EOL curators’ primary contribution is toidentify and combine data from reputable sources into ameaningful whole.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1093DOI: 10.1002/asi

Page 3: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

TAB

LE

1.C

ompa

riso

nof

vari

ous

genr

esof

cont

ent

cura

tion

com

mun

ities

.

Dom

ain/

com

mun

ityC

urat

ion

purp

ose

Stru

ctur

eIn

form

atio

nin

tegr

atio

nSo

cial

inte

grat

ion

Coo

rdin

atio

nm

echa

nism

s

Eco

logy

—E

ncyc

lope

dia

ofL

ife

(EO

L)

Agg

rega

ting

info

rmat

ion

abou

tev

ery

livin

gsp

ecie

son

eart

h.Fo

cus

ona

broa

dco

mm

unity

ofus

ers,

rang

ing

from

the

gene

ral

publ

icto

acad

emic

s.

Hie

rarc

hica

lst

ruct

ure–

Cou

ncil

and

exec

utiv

eco

mm

ittee

inch

arge

ofse

tting

gene

ral

goal

san

dpo

licie

s,w

orki

nggr

oups

inch

arge

ofda

y-to

-day

wor

k,an

dvo

lunt

eer

cura

tors

who

are

inch

arge

ofth

eda

ta.

Pulli

ngda

tafr

omva

riou

sre

sour

ces

(ope

nac

cess

and

scie

ntifi

cre

sour

ces)

.D

iffe

rent

data

form

ats.

Dat

ane

eds

valid

atio

nan

dqu

ality

cont

rol.

Use

ofdi

ffer

ent

clas

sific

atio

nsc

hem

es.

Inte

ract

ion

with

vari

ous

grou

ps,o

rgan

izat

ions

and

indi

vidu

als

com

ing

from

diff

eren

tdo

mai

ns–a

need

toes

tabl

ish

wor

king

proc

esse

san

dtr

ust

betw

een

them

.R

ecog

nitio

nof

the

valu

eof

cont

ent

cura

tion

with

inth

esc

ient

ific

com

mun

ity.

Det

aile

dFA

Q,

mai

ling

list,

mee

tings

inpe

rson

with

som

ecu

rato

rs.

Lan

guag

ean

din

form

atio

nre

sear

ch—

Ope

nla

ngua

gear

chiv

es(O

LA

C);

(ww

w.la

ngua

gear

chiv

es.o

rg)

Har

vest

ing

lingu

istic

reso

urce

s,in

dexi

ng,a

rchi

ving

and

stan

dard

izin

gfo

rus

eby

lingu

ists

and

othe

rs.

Focu

son

abr

oad

com

mun

ityof

user

s.

Hie

rarc

hica

lst

ruct

ure–

Gov

erni

ngad

viso

rybo

ard

ofre

spec

ted

peer

s.C

ounc

il–ap

prov

edby

the

advi

sory

boar

d,m

akin

gde

cisi

ons

for

the

com

mun

ity;W

orki

nggr

oups

–ad

hoc

asso

ciat

ions

that

draf

tst

anda

rds,

note

s,an

ddo

cum

enta

tion

(tha

tar

eap

prov

edby

the

coun

cil)

.Ind

ivid

uals

volu

ntee

rto

part

icip

ate

inth

eco

mm

unity

effo

rts–

can

adva

nce

togo

vern

ing

role

sin

time.

Pulli

ngda

tafr

omnu

mer

ous

(som

etim

esco

nflic

ting)

reso

urce

s.St

anda

rdiz

atio

npr

oces

sof

data

,re

sour

ces,

crea

ting

anac

cept

edco

ntro

lled

voca

bula

ry.

Met

adat

ast

anda

rdiz

atio

nha

sto

beap

prov

edby

the

gove

rnin

gad

viso

ryan

dco

unci

l–lo

ngpr

oces

s,so

met

imes

depe

nden

ton

indi

vidu

als

com

ing

from

diff

eren

tdo

mai

nsw

ithdi

ffer

ent

view

poin

ts.

Inte

ract

ion

with

vari

ous

grou

ps,o

rgan

izat

ions

and

indi

vidu

als

com

ing

from

diff

eren

tdo

mai

ns.

Ver

yce

ntra

lized

orga

niza

tiona

lst

ruct

ure.

Task

regu

latio

nan

ddi

visi

onof

labo

rar

eno

tal

way

scl

ear.

Web

port

alfo

rth

eco

mm

unity

,ex

tens

ive

docu

men

tatio

n,Q

&A

,dir

ecto

ryof

com

mun

itym

embe

rs.

Scie

nces

Che

mSp

ider

(ww

w.c

hem

spid

er.c

om/)

Dat

abas

efo

rcu

ratin

gch

emic

alst

ruct

ures

,phy

sica

lpr

oper

ties,

imag

es,

spec

tra,

and

othe

rre

late

din

form

atio

n.C

onne

cted

tom

any

othe

rda

taba

ses.

Focu

son

abr

oad

com

mun

ityof

user

s–st

uden

ts,

acad

emic

s,in

dust

ry.

Sem

ihie

rarc

hica

lst

ruct

ure–

Gov

erne

dby

the

Roy

alC

hem

istr

ySo

ciet

y(U

K).

Use

rsca

nco

mm

ent

onot

hers

’ac

tions

;re

gist

ered

user

sca

ncu

rate

nam

es,l

inks

,spe

ctra

,an

dot

her

reso

urce

s.M

aste

rcu

rato

rsva

lidat

ean

dap

prov

eth

eda

ta.O

rgan

izat

ions

(uni

vers

ityde

part

men

ts,r

esea

rch

grou

ps)

can

depo

sit

data

.Pr

ereq

uisi

tes–

regi

stra

tion

and

basi

ckn

owle

dge

ofch

emis

try.

Pulli

ngda

tafr

omva

riou

sre

sour

ces

(ope

nac

cess

reso

urce

s–W

ikip

edia

;op

enan

dpr

ivat

ere

posi

tori

es).

Not

all

data

isve

tted

and

itre

quir

esva

lidat

ion

acco

rdin

gto

scie

ntifi

cst

anda

rds.

Lin

ksto

chem

ical

vend

ors’

cata

logu

esra

ise

issu

esof

com

mer

cial

ism

.No

cont

rolle

dvo

cabu

lary

orse

man

ticm

arku

p.

Use

rsne

edto

form

ally

appl

yto

atta

ince

rtai

ncu

ratio

nro

les–

priv

ilege

san

dac

cess

tocu

ratio

nar

egr

ante

dba

sed

oncr

eden

tials

.So

me

user

sdo

not

have

form

alcr

eden

tials

and

thus

are

barr

edfr

ompl

ayin

ga

mor

eac

tive

role

inth

eco

mm

unity

.E

mph

asis

onbu

ildin

gin

tra-

com

mun

ityre

puta

tion

thro

ugh

mic

roco

ntri

butio

ns.

Dis

cuss

ion

foru

man

dpa

ges

for

data

-int

egra

tion

issu

es.A

form

albl

ogpu

blis

hed

byth

eR

CS.

Lite

ratu

rePr

ojec

tG

uten

berg

(ww

w.g

uten

berg

.org

/)

Dig

itizi

ngco

pyri

ght-

free

and

publ

ic-d

omai

nbo

oks.

Focu

son

publ

icac

cess

tocu

ltura

lw

orks

.

Purp

osef

ully

dece

ntra

lized

–com

mun

itym

embe

rsde

cide

inde

pend

ently

wha

tto

cura

te.

“Pos

ting

team

”cl

eans

and

appr

oves

wor

ksfo

rpu

blic

atio

n.

No

com

preh

ensi

velis

tof

cultu

ral

wor

ksin

need

ofpu

blic

atio

n.D

ecen

tral

ized

stru

ctur

ean

dno

“sel

ectio

npo

licy”

lead

toca

ses

whe

renu

mer

ous

volu

ntee

rsar

eun

know

ingl

yw

orki

ngon

the

sam

ebo

ok/c

ultu

ral

piec

e.Pr

oofr

eadi

ngst

anda

rds

are

uphe

ld,y

etth

epr

ojec

tbu

tm

etad

ata

isno

tal

way

sav

aila

ble.

Litt

leat

trib

utio

n.In

mos

tca

ses

ther

eis

noat

trib

utio

nto

mem

bers

who

clea

ned

and

valid

ated

the

data

,jus

tto

volu

ntee

rsw

hodi

gitiz

ed(i

.e.,

prod

uced

)it.

Mai

ling

list,

priv

ate

emai

ls,e

xten

sive

FAQ

docu

men

tatio

nan

dne

wsl

ette

rs.

1094 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi

Page 4: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

Gen

ealo

gyG

enW

eb(u

sgen

web

.org

)

Prov

idin

gre

sour

ces

for

gene

alog

yre

sear

chfo

rev

ery

coun

tyin

the

Uni

ted

Stat

es.

Focu

son

indi

vidu

alge

neal

ogy

rese

arch

ers.

Com

plet

ely

dece

ntra

lized

–vol

unte

ers

“ado

pt”

one

(or

mor

e)co

unty

and

prod

uce

all

the

data

/link

sfo

rth

atar

ea.C

oord

inat

ing

team

and

proj

ect

coor

dina

tors

are

inch

arge

ofea

chst

ate,

asw

ell

aslin

king

indi

vidu

alre

sour

ces

toth

ege

nera

lw

ebsi

tean

dre

solv

ing

confl

icts

,bu

tha

vem

inim

alco

ntro

lov

erco

nten

t.N

ocl

ear

proc

ess

for

beco

min

ga

mem

ber

ofth

eco

ordi

natin

gte

am.

Var

ying

leve

lsof

com

plet

enes

san

dac

cura

cy,d

epen

dent

onth

ein

divi

dual

volu

ntee

r.Fe

dera

ted

repo

sito

ries

dono

tal

low

for

com

preh

ensi

vese

arch

ofda

taba

ses.

Dif

fere

ntda

tafo

rmat

,no

cont

rolle

dvo

cabu

lary

.

Mos

tlyvo

lunt

eers

act

indi

vidu

ally

with

little

inte

ract

ion

with

othe

rsor

with

the

coor

dina

ting

team

.

Mai

ling

lists

and

priv

ate

e-m

ails

.G

riev

ance

com

mitt

eere

solv

esco

nflic

tsbe

twee

nm

embe

rs.

Cra

fts

Rav

elry

(ww

w.r

avel

ry.c

om)

Cur

atin

gan

dsh

arin

gya

rn-b

ased

craf

ts,

proj

ects

,ide

asan

dto

ols.

Focu

son

abr

oad

com

mun

ityof

craf

ters

.

Em

phas

izes

indi

vidu

alcu

ratio

nan

dso

cial

netw

orki

ng.E

ach

mem

ber

crea

tes

thei

rco

nten

tan

dca

nde

posi

tda

tain

toa

colle

ctiv

eda

taba

se.O

ther

mem

bers

can

com

men

t.A

nnot

atio

nis

done

only

bym

embe

rsw

hoha

vejo

ined

aned

iting

grou

pan

dfo

rda

tath

atis

open

toed

iting

.R

emov

alof

data

ison

lydo

neby

sele

cted

itors

.

Dat

ava

lidat

ion.

No

cont

rolle

dvo

cabu

lary

,sta

ndar

diza

tion,

orre

gula

tion

rega

rdin

gan

nota

tion

orco

ntri

butio

nof

data

mak

esse

arch

diffi

cult.

Soci

alin

tegr

atio

nis

done

mos

tlyon

soci

alne

twor

king

tool

san

dle

sson

data

base

dco

mpo

nent

sof

the

com

mun

ity.

Wik

i,fo

rum

s,su

bgro

ups,

soci

alne

twor

king

(com

men

ts,f

rien

dslis

ts).

Spec

ific

disc

ussi

onfo

rum

sfo

rda

taed

iting

and

clea

ning

purp

oses

.

Web

Inde

xing

Ope

nW

ebD

irec

tory

(ww

w.d

moz

.org

)

Boo

kmar

king

,ca

talo

guin

g,an

dcu

ratio

nof

web

page

s.Fo

cus

oncu

ratio

nfo

rth

egr

eate

rgo

od.

Hie

rarc

hica

lst

ruct

ure:

edito

rsha

veto

appl

yfo

ra

posi

tion.

Seni

ored

itors

appr

ove

juni

ored

itors

appl

icat

ions

,cur

ated

cont

ent

and

edito

rial

sugg

estio

ns.C

urat

ion

isdo

neon

all

leve

lsbu

tha

sto

beve

tted.

Adm

ins

cont

rol

all

editi

ngpr

ivile

ges

and

cont

ent.

Lon

gse

rvin

gse

nior

edito

rsca

nbe

com

ead

min

sth

roug

hre

com

men

datio

nfr

omot

her

adm

ins.

Dat

aar

est

ruct

ured

onto

logi

cally

,bas

edon

exis

ting

form

alca

talo

guin

gpr

inci

ples

(usn

etin

spir

ed).

Cha

nges

toth

est

ruct

ure

mad

eth

roug

hdi

scus

sion

amon

gth

ese

nior

edito

rsun

tilre

achi

ngco

nsen

sus.

No

anno

tatio

nto

ols

avai

labl

e.D

iffe

rent

voca

bula

ries

for

gene

ral

cont

ent

and

adul

tco

nten

t.D

iffic

ulty

mai

ntai

ning

qual

ityco

nten

tan

dre

mov

ing

spam

.

The

com

mun

itypr

omot

edse

lf-r

egul

atio

n;th

eref

ore

nost

rict

polic

ies

for

soci

alin

tegr

atio

nw

ere

publ

icly

publ

ishe

d.A

llega

tions

ofbl

ackl

istin

ged

itors

and

cont

ent,

pers

onal

confl

icts

and

lack

ofco

ntro

lov

erre

mov

alof

editi

ngpr

ivile

ges

caus

efr

agm

enta

tion

inth

eco

mm

unity

.

Dis

cuss

ion

foru

ms

open

only

toed

itors

.A

lthou

ghth

esi

teha

sbe

enon

eof

the

first

exam

ples

for

cont

ent

cura

tion

(est

ablis

hed

in19

98),

itlo

stso

me

ofits

appe

aldu

eto

the

rise

ofw

ikis

and

folk

sono

mie

s,an

dlim

ited

itsw

ork

begi

nnin

gin

2007

.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1095DOI: 10.1002/asi

Page 5: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

In addition to being an influence on collaboratories,scientific content curation communities are also a subset ofthe growing trend of crowdsourcing scientific work to inter-ested enthusiasts and volunteers. Crowdsourced projects canbe found across academic domains, from astronomy(Galaxy Zoo, Cerberus) to biochemistry (FoldIt, ChemSpi-der), geography and climatology (Old Weather Tracking),history (Civil War Diaries and Letters transcription project,FamilySearch, The Great War archive), archeology (AncientLives), and even peer review (IMPROVER). The premise ofcrowdsourced projects is that engaging a large number ofusers in solving complex problems that require manyresources (time, effort, expertise, or computing powers) willfacilitate quick and potentially novel findings. The websitescistarter.com lists more than 400 crowdsourced scientificprojects, across different domains. From this list we canlearn that crowdsourced projects are more prevalent indomains that are labor intensive and where large-scale ques-tions can be broken down to smaller, manageable tasks.Most crowdsourced scientific projects focus on generatingnew data rather than curating existing data. Content curationcommunities are less popular than scientific crowdsourcingprojects for various reasons that will be outlined in thearticle.

Content curation communities have also been influencedby more generic social media initiatives and tools. As webcontent has proliferated, the need to curate general content,and to separate the wheat from the chaff, has increased. Forexample, a number of news sites curate stories of interest.Some, like Google News, use automatic processing to groupsimilar articles from reputable sources. Others, such asSlashdot rely on user-generated ratings to filter out the low-quality discussions about news (Lampe & Resnick, 2004).Social sharing sites like Diigo, Flickr, and YouTube alsoallow popular content to rise to the top (e.g., Rotman &Preece, 2010). Although many of these approaches workwell at identifying “popular” content, they are not designedfor situations where specialized expertise is needed, makingit unclear how to apply similar design strategies to scientificdomains. Many online communities use Frequently AskedQuestions (FAQs) and wiki repositories (Hansen et al.2007) as a way of collecting high-quality content, but unlikecontent curation communities, they are focused on theirown community content, not content created by othersources. One of the goals of this article is to learn fromstrategies employed by existing social media tools to effec-tively aid in negotiating the challenges faced by contentcuration communities.

EOL

Imagine an electronic page for each species of organism onEarth, available everywhere by single access on command. Thepage . . . comprises a summary of everything known about thespecies’ genome, proteome, geographical distribution, phyloge-netic position, habitat, ecological relationships and, not least, itspractical importance for humanity. (Wilson, 2003, p. 77)

The EOL is “a sweeping global effort to gather and sharethe vast wealth of information on every creature—animals,plants and microorganisms—and make it available as a web-based resource” (EOL, 2011). EOL is an open-access aggre-gation portal (www.eol.org). One of its bold goals is toengage individuals, including scientists and nonscientists,on contributing to scientific information curation, so thatthey will “study, protect and prolong the existence of Earth’sprecious species” (EOL, 2011). EOL is structured such thateach species is presented on a dedicated page, where allavailable and relevant information, vetted and nonvettedalike, is presented. Currently, there are 3 million pages,representing 1.9 million known species on Earth. Most ofthese are stubs, but trusted content is available on over600,000 pages that collectively are viewed by nearly 3million visitors a year. Over 40,000 people have registeredon to the site. Paid staff members include information, edu-cation, and technical professionals. Contributors includeinterested volunteers, who can freely contribute informationabout organisms, as well as credentialed scientists and“citizen scientists” (Cohn, 2008), who agree to curate infor-mation on EOL. Seven hundred and fifty content curators areassociated with EOL, 200 of whom are currently active. Asmall number of scientific contributors and content curatorsreceive temporary part-time financial support from a com-petitive Fellows program, but most content on EOL arisesthrough volunteer partnerships with database and librarydigitization projects and citizen contributions. All the vettedand unvetted content is freely available online.

EOL’s curation work process brings together automatictools and human input: EOL automatically aggregatesobjects such as images or text onto publicly available pagesbased on metadata from various content sources. Curatorsare granted access to controls, allowing them to “trust,”“untrust,” and “hide” objects such as images or text. EOLcurators cannot change the objects but they can leave com-ments and make a trust/untrust/hide decision. The systemshares these comments and curator actions with the originalcontent providers who may choose to make a correction andresubmit their data. Curators can rate objects on EOL, basedon their comprehensiveness and quality. EOL uses thisrating to sort the objects on the page so the highest ratedobjects show first. To a lesser extent, curators are alsoencouraged to be contributors by adding their own text topages or improving third-party content (e.g., editing Wiki-pedia pages) before importing them into EOL. Althoughthere is no formal training and accreditation process forcurators, they are expected to learn about EOL’s policiesand tools by reading documentation on EOL’s website or byfollowing discussions in an opt-in mailing list.

Curator activity is only loosely monitored by EOL staffand other curators, and social conflicts, as discussed later,are usually resolved through backchannels such as personale-mails. To retain established curators, EOL emphasizes per-sonal feedback and recognition through e-mails but alsopublic recognition through highlighting curators’ work ontheir personal EOL pages. Curators are also invited to test

1096 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi

Page 6: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

new features and provide input into the design of futuretools. EOL predecessors include sites that provide freelyavailable, biodiversity-focused, and comprehensive, speciescoverage, such as the Tree of Life Web project (TOLWeb),which was created with a focus on biodiversity and evolu-tion to share with readers the “interrelationships of earth’swealth of species.” Unlike EOL, TOLWeb is a contentcreation community where preapproved users from the pro-fessional science community directly add information(Maddison, Schulz, & Maddison, 2007).

EOL derives its content from scientific content partners,including many of its predecessors, but also from user-generated content: In December of 2009, EOL began acontent partnership with Wikipedia, importing Wikipediaspecies pages into EOL. Although the addition of thesearticles increased EOL’s species coverage considerably, itwas controversial because of the nontraditional authoringprocess of Wikipedia that allows anyone to contributecontent. Other examples of nonauthoritative source mate-rial that EOL aggregates include Flickr and WikimediaCommons photos. The inclusion of both types of sourcesprovided an excellent window into the challenges thatcontent curation communities face when blending contentfrom sources that have different levels of credibility, asdiscussed later.

Methods

We chose to perform a case study of EOL for severalreasons. Case studies are ideal for understanding an entiresystem of action (Feagin, Orum, & Sjoberg, 1991), in thiscase, a content curation community. One of the goals of thisresearch is to provide actionable insights to content curationcommunity designers; a case study approach draws theboundaries of inquiry precisely around the matter (i.e., thesystem of action) that community designers can influence.Case studies are also ideal for describing emerging phenom-ena, such as content curation communities, where currentperspectives have not yet been empirically substantiated(Yin, 2008). Finally, case studies offer a way of investigatinga phenomenon in depth, and within its real-life context,lending high external validity to the findings. EOL hasseveral characteristics that support the choice of case studyresearch: (a) it is a prototypical example of a content cura-tion community; (b) it is a high-impact project with aninternational reputation in an important scientific domain;(c) it has sufficient history and is large enough to haveencountered many of the potential challenges typical ofthese types of communities; and (d) it is still evolving so thatsome of the insights from this study can be incorporated intoEOL’s design.

Data Collection

EOL conducted a survey in July and August of 2010. Thesurvey was sent to all of EOL’s scientist participants (con-tributors, database partners, content curators) to obtain

information to guide the next phase of EOL’s development.Participants weren’t offered any incentive for their partici-pation in the survey. Considering the lack of reward, partici-pants’ time-demands and heterogeneity, the survey wasdesigned such that the questions were mostly open-endedand each question was independent of other questions, sothat each participant could choose to answer only the ques-tions that were relevant to him, and skip questions that hefound irrelevant. Survey questions asked about participants’field of expertise, opinions about EOL and related efforts,involvement in various activities and initiatives, technologi-cal aspects of the site, and more. A link to the survey wassent by e-mail to 1,225 participants; 281 (23%) respondedand completed the survey. Typical time to completion was15–20 minutes. Because of the survey structure, responserates varied by question, and for questions that were used inthis study, response rate varied from 18% to 86%.

Five interviews were conducted with EOL curators tobetter understand the role of content curators, the nature ofthe work they perform, their perceptions of EOL, and thechallenges and opportunities they and others at EOL face.These interviewees were contacted based on their activeinvolvement in EOL. A semistructured interview approachwas used to allow for follow-up questions that encouragedfurther elaboration of important findings. Additionally,seven interviews were based on a convenience sample(Patton, 1990) of volunteers who contribute to Wikipediaspecies pages and are familiar with EOL. These participantswere contacted based on their levels of participation onWikipedia’s Species Pages, which are displayed on EOL.Interviews were conducted face to face, via Skype, and viae-mail and detailed notes and transcriptions were capturedfor analysis.

Additional data sources were used. One author tookdetailed notes during the EOL 2010 Rubenstein Fellows andMentors Orientation Workshop. The workshop broughttogether nearly all 2010 fellowship recipients and theirmentors, and included a roundtable discussion on expandingthe utility of EOL for scientists, which elicited informationfrom participants about their practical needs and their viewsof the project. The research team also reviewed the nearly200 posts sent to the EOL curators’ Google Group since itsinception in May 2009 until March 17, 2010. These mes-sages helped us to develop our interview protocol and iden-tify salient issues to the community that are used as evidencethroughout the study.

Data Analysis

Notes and transcripts from interviews, workshop obser-vations, open-ended survey questions, and online forumposts were analyzed using a grounded theory approach(Corbin & Strauss 2007). Grounded theory calls for thesystematic analysis of data so that common concepts andideas are extracted and then axially referenced to producehigher level themes and concepts that frame the theoreticalunderstanding of the researched phenomenon. In the spirit

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1097DOI: 10.1002/asi

Page 7: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

of grounded theory, we have tried to begin “as close aspossible to the ideal of no theory under consideration and nohypothesis to test” (Eisenhardt, 1989, p. 536). This wasparticularly important for this project, which is a first look ata content curation community. Once the major themes wereidentified and analyzed, we were able to relate them toexisting theories and literature in the findings and discus-sion. All data were reviewed by at least two different authorsto reduce personal biases.

Findings

Our first research question asked: “What are the uniqueinformation and social challenges experienced by EOL as anexample of a content curation community?” We found thatthe primary challenges faced by EOL relate to integration:both integration of information from disparate sources andintegration of social practices and norms from differentcommunities. Although information integration and socialintegration are highly intertwined, we discuss them sepa-rately below.

Information Integration Challenges

Content curation communities deal with large corpora ofcontent from diverse sources, which must be selected, orga-nized, managed, and integrated into a holistic resource. Thisis no small feat, given the complexity of dealing with dif-ferent types of data (e.g., photographs, text, video, audio),different taxonomies, meta-data standards, licensing, andcompeting incentives for data sharing. Ideally, content cura-tion communities will create a one-stop shop, in which thisunderlying complexity is transparent to users. This requirescontent curators to address several information integrationchallenges, two of which were identified as particularlyimportant by EOL participants: (a) dealing with multiple andat times competing standards and (b) ensuring informationquality. In this section, we discuss these challenges and theireffect on the content curation community.

Standards and Classification Challenges

A dominant theme that emerged from all of our datasources was the importance of biological classification in thework that the EOL content curation community performs.Many challenges inherent in the curation work of EOL relateto dealing with different standards of classification and theirhighly contested nature. Biological classification—theprocess of describing, naming, and classifying species intostable hierarchies—provides frameworks for all biologicalresearch. Indeed, the current body of taxonomic informationand practice represents 250 years of research, starting withLinnaeus in the 18th century (Tautz, Arctander, Minelli,Thomas, & Vogler, 2003). When asked to describe theirprofessional role in an open-ended survey question, nearlyhalf (49%) of respondents used the term taxonomist or sys-tematics, both of which refer to a well-defined field of study

that refines and applies taxonomic theory and practice as ameans to further the study of biological diversity.1 Likewise,although only one of the EOL curator interviewees identifiedhimself as a “senior taxonomist,” all of them work closelywith biological classifications as part of their everyday jobs.The frequency and passion associated with comments abouthow EOL deals with biological classification (as detailedbelow) suggest that not only is biological classificationimportant to the field of biology, but it is also seen as fun-damental to the work occurring at EOL.

Standards and classification challenges at EOL emanatefrom debates within the greater biology community, as wellas usability decisions made by EOL. Although we discussthese issues separately, they are inter-related since thebroader classification debates in biology are reified in theways that EOL presents its aggregated content. Disputes onhow to classify species are woven through the history oftaxonomy and evolutionary biology. Classification facespersistent challenges in the form of deep knowledge gaps inhow much is unknown about biodiversity, arguments overnomenclature and species’ names, as well as the extent towhich species should be grouped or separated (Wheeler,2008). In addition, every year new species are discovered aswell as go extinct, constantly changing the scope and hier-archy of classification (Gonzalez-Oreja, 2008). Further-more, existing work on taxonomy is sporadic, with much ofthe research focused on specific types of organisms, whileothers have been largely ignored (Tautz et al. 2003). Oneresult of these longstanding issues is the existence of com-peting classifications within the biological sciences, each ofwhich is recognized as imperfect. Technology for supportinga consensus has not yet solved the challenges of this work(Hine, 2008).

The debates from the greater biological community cometo a head at EOL because the work occurring at EOLrequires that there be a standard type of classification chosento organize the effort and the information in the community.One curator described how she encountered “conflicts inhigher taxonomies,” including one about a particular speciesthat prompted someone to be “really mad about the way itwas done and who was doing it.” Other interviewees anddiscussions in the forums agreed with the sentiment that “intaxonomy there are huge disputes.” Such strong feelingsemerge because taxonomists and species specialists growaccustomed to using a preferred schema. EOL, as an aggre-gator of content based on different classifications, isunavoidably thrown into the middle of such debates, causingtensions within the content curators’ community.

Having only one biological classification option as a pointof contention. Early in its history, all EOL species pageswere organized using the Catalogue of Life (COL) annualchecklist and classification, the product of global partner-ships. Many content curators preferred to use familiar clas-sifications and did not like being forced into using COL,

1See Society of Systematic Biologists: http://systbiol.org/

1098 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi

Page 8: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

even though it is considered one of the most comprehensivespecies catalogues available (Orrell, 2011). When askedwhich classification they preferred, survey respondentsnamed multiple systems (e.g., CAS, COF, COL, EOL,Global Checklist, ITIS, TOLWeb, WoRMS); no single clas-sification system was preferred by the majority of respon-dents. Early Google Group discussions clarified theproblems associated with having a rigid classification stan-dard as the basis for which pages exist; a surveyed curatorexplained how EOL “species pages [are] based on nomen-clatorial lists derived second- or third-hand from variousother databases . . . [and] . . . species names that are nomen-clatorially available are not necessarily equivalent to speciestaxa as currently conceptualized by biologists. . . . Thisnomenclatorial chaos has made it extremely difficult tocurate.” One survey respondent described how “the [choiceof] taxonomy on the main part of EOL is a big turn off forrecruiting the group that I am targeting.”

Obstacles for creating master taxonomic classifications. InOctober 2009 the EOL informatics team responded to initialcurator complaints about COL and added a function thatallows content curators to choose the default taxonomicclassification by which they could navigate the site. Partici-pants expressed widely varying opinions about being able tomanipulate specific classifications and integrate new self-designated classifications on EOL. One survey questionasked users of a particular tool (LifeDesk), that is used tocreate a modified or new classification on EOL; of all theparticipants who answered the question, only a third haveused this tool for creating new classifications, but the vastmajority of them (80%) indicated that they would like tohave their classification integrated into EOL, and a smallermajority (56%) would have liked to submit their classifica-tions to other large-scale catalogues as well. However, inpractice, most users do not modify or create new classifica-tions. According to survey participants, the main reasons forthis have to do with usability—”[It is] difficult to determinehow to add or change taxa previously listed in classification”and “I’m not really familiar with the features”—and effort—”[I don’t have the] time.”

Quality Control

Quality control challenges faced by EOL arise because itaggregates content from a variety of sources, including bothtraditional scientific content and nontraditional user-generated content. The decision to incorporate data fromnontraditional sources including Wikipedia and Flickr wasmeant to help increase species’ coverage, while taking stepsto maintain the validity of the data. Wikipedia articlesimported into EOL are initially marked as “nonreviewed”material. Curators can approve the content, which removesthe markings. The specific version of the Wikipedia articlethat was imported and approved remains unchanged even ifthe original Wikipedia article is modified. Content curatorsare encouraged to fix any problems with the page on

Wikipedia itself before approving the text, but cannot makechanges to the EOL version of the content. This is similar tothe recommendation that EOL curators should submit prob-lems they identify to scientific publishers and informationproviders rather than changing them on EOL. The differenceis that Wikipedia allows them to directly make their ownedits. Images of species are offered to EOL by members ofthe EOL Images Flickr group2 which, as of this writing,includes over 2,500 members and 90,000 images. These areautomatically harvested using user-generated tags andplaced on EOL pages, initially with the same marking,suggesting that they have not yet been reviewed. As theyare reviewed, they are then trusted or not trusted bycontent curators on EOL.

Most participants support the integration of content fromnontraditional sources (e.g., Wikipedia, Flickr), but wouldlike this content vetted and/or clearly identified. Though anarguably controversial aspect of EOL, many survey respon-dents and interviewees shared positive feelings regardingEOL’s aggregation of nontraditional sources. When surveyedabout integration of such sources, curators responded: “It’s agood start”; “Many of them are [a] good source of informa-tion”; “Fine by me”; and “Many hands make light work—can’t expect all the pages to be populated by experts.”However, even the most enthusiastic respondents expressed adesire to assure that the content was vetted and/or clearlyidentified as a nontraditional source on EOL. One inter-viewee captured this sentiment by stating that integratingnontraditional content was a “good idea as long as you keepthings apart. The most important thing is that the end userneeds to know where stuff comes from and as long as that isgiven, it is a really good thing to mix specialist content withamateur content.”Alarge group of survey respondents postedsimilar thoughts: “I do not see any problem with that in [the]case that the source is clearly shown” and “As long as it isclearly marked up as unvetted.” Several other curatorsstressed the need to allow them to closely monitor nontradi-tional resources: “It is often wrong—curators should be ableto ‘exclude’ content or mark it as ‘erroneous’ ” and “Fine aslong as it is clearly marked as unvetted.”

Some participants strongly oppose the use of nontraditionalcontent from sources such as Wikipedia. A small but vocalminority of survey participants expressed strong oppositionto the inclusion of nontraditional data. They used harshwords such as “contamination,” “muddling,” and “vandal-ism” to express their concerns. This opposition primarilystems from integrating what participants view as lowerquality content. One interviewee stated, “It may be better tohave no information about a taxon available on the web thaninformation that has not been vetted and may be incorrect.”Some survey participants expressed an even stronger disap-proval when asked their opinion of integrating nontradi-tional content from sources like Wikipedia and Flickr using

2http://www.flickr.com/groups/encyclopedia_of_life/

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1099DOI: 10.1002/asi

Page 9: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

the following language: “not acceptable”; “many sources arefull of errors”; “should not be allowed. Too many reams ofspurious data”; “unvetted content continues to perpetuateerrors that slip through the scientific peer review process thathave been passed into the public domain”; and “a lot of whatI have seen is either old or of poor quality.” Although theseopinions were about nonpeer-reviewed, nontraditionalcontent in general, some survey participants went so far as tosingle out Wikipedia and Flickr as undesirable content pro-viders, for example: “In several cases, typos in Flickr con-tributions have resulted in ‘phantom’ species being added toEOL”; “Wikipedia, I’m not so sure about . . . Wikipedia haslots of mistakes”; and “Not a fan of Flickr material.”

A different set of respondents was concerned that usingWikipedia content in EOL makes the distinction between thetwo, and the role of EOL as a scientifically trusted resource,less clear. One interviewee stated, “If EOL = Wikipedia,then why have EOL? A worthy question.” Another said: “Isuspect that many professionals will get frustrated with theproliferation of unvetted material and lose interest in con-tributing, and that as a result, EOL will become so much likeWikipedia that it will lose its relevance, and the point ofEOL was to have vetted content that was checked by spe-cialists. If this is the route to be taken then why wouldanyone come to EOL instead of just going to Wikipedia?”And another: “Don’t think this is a good idea. . . . Whatwould then make EOL different from Wikipedia?”

There is no consensus about where the work of fixing errorsshould occur. The EOL content curation community, by itsnature, faces numerous quality control challenges that stemfrom the variance in accuracy, terminology, and quality ofthe curated resources, whether these resources are tradi-tional (scientific) or nontraditional (user-generated). Tradi-tional resources are often slow to make changes and fixerrors because of their hierarchical and measured work pro-cesses, leading to frustration on the part of the content cura-tors who were surveyed: “EOL suggests I communicate withthe original authors of the source database . . . where nobodyis currently funded to update or repair it.” Yet the alternativeof fixing errors directly on EOL is problematic as well, asthe incorrect information persists in the original source.Unlike traditional resources that necessitate preapproval forchanges and fixing errors, on Wikipedia curators can directlyfix errors without having to wait for editorial approval. Somecurators who were interviewed liked being able to directlyedit Wikipedia pages before importing them to EOL: “[EOLaggregating Wikipedia content] is in some ways easierbecause you can go in and edit on Wikipedia much easierthan on EOL” and “When I heard [EOL] was going to[aggregate Wikipedia], I thought, ‘Oh, great, I’ll make a lotof changes to Wikipedia . . . and have them load into EOLinstead of contributing to EOL, this is easier.” A potentialproblem is having multiple versions of a Wikipedia page,one inside EOL and one outside EOL, or in Wikipedia itself,which could “divide contributors and decrease the quality ofcontent.”

Despite the positive reaction of some participants tomaking edits on Wikipedia, those with a low opinion of thequality of Wikipedia content did not like being forced tomake edits there. For example, one interviewee noted that“Wikipedia is full of errors. . . . I never have time to go cleanthem all up, there’s always errors . . . it’s like a full time joband nobody has full time to do it.” Others were less con-demning of Wikipedia, but still recognized that it is “hard tokeep up with the volume” of data from there and it is a “bigjob to vet it all,” let alone fix it.

Social Integration Challenges

EOL, as a content curation community, brings togethertwo distinct populations: professional scientists and citizenscientists. Each population brings to the table its ownmotivations, practices, accreditation procedures, and norms(Hara, Solomon, Kim, & Sonnenwald, 2003; Ling et al.,2005; O’Hara, 2008; Van House, 2002a). Amalgamating thedifferences between the two populations into an efficientwork process is a major challenge. This section details thevarious obstacles to social integration: (a) the role of open-source resources in the scientific community and (b) conflictamong scientists and citizen scientists.

The role of content curation communities within the largerscientific community. Despite endorsements from promi-nent scientists such as Wilson (2003), and EOL providingthe esteemed and competitive Rubenstein Fellowships, thelarger scientific community has not yet explicitly legiti-mized curation work that occurs at EOL. This happens for anumber of reasons and has multiple implications for EOLand other content curation communities in the scientificrealm. Below, we address three challenges associated withthis issue including (a) a lack of familiarity with the project,(b) lack of legitimacy of curation work within the largerscientific community, and (c) lack of resources to supportthe effort.

Lack of familiarity with EOL. Many scientists do not knowabout EOL despite its growing popularity among Internetusers; they do not have a clear mental model of how EOLrelates to larger scientific endeavors, or know how they caneffectively contribute to EOL. Unlike scientific journals andconferences, which have been around for centuries, contentcuration communities like EOL are a new and unfamiliarscientific genre. As was evident from the survey, scientistswho were not familiar with EOL’s aim were hesitant toparticipate because they did not know what was expected ofthem (“[I] was not aware of requirements / needs of theprogram”). Others were confused about the professionalprerequirements needed for an individual to be able to par-ticipate in the curation activities (“I doubt I am qualified”).

Lack of legitimacy of content curation work. Anotherproblem is that the scientific community does not yet con-sider work on EOL as legitimate scientific activity. The

1100 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi

Page 10: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

scientific community can be thought of as a constellation ofcommunities of practice (Wenger, 1998); it comprises indi-viduals belonging to one or more research disciplines andsubdisciplines (i.e., communities) that share a domain, apractice, and a sense of community. As communities ofpractice, scientific communities socially construct what con-stitutes a legitimate contribution to the community. Publish-ing in certain journals (and not others), presenting at certainconferences, reviewing papers, and receiving grants are allwell-established academic practices that are legitimateactivities among most scientific communities. Scientists,especially junior scientists, must present a strong researchtrack, situated within the discipline’s major publicationoutlets and other legitimized activities, to advance (Latour &Woolgar, 1979).

Although content curation communities like EOL con-tribute indirectly to advancing science through improvedoutreach and provision of information that can supportscience, the work of curating existing scientific knowledgeis not as valued by scientific communities as creating novelscientific knowledge. This marginalizes the contributions ofcontent curators who are torn between using their precioustime on traditional, legitimate activities and curation activi-ties that are not yet fully legitimized, but are highly valuedby the curators themselves and other interested parties (e.g.,the public). The result is that curation activities are notconsidered a priority among many scientists who were sur-veyed: “This does not count with respect to my universityaccounting—not publications, not grants, not teaching—soit is very hard to find time to do this sort of thing.” Curationactivities are also almost completely unsupported by thescientific community as a whole, as several curators com-mented: “iI has been difficult to convince tenured scientiststo contribute” and “A number of scientists in my field haveexpressed reservations about sharing content with EOL. So,there is not [sic] impetus from the community, and I have notpursued possible sharing of content due to the concerns thatpeople have expressed.”

Recognizing the importance of individual attribution inscientific domains, EOL has documented each curator’s con-tributions. For example, following a link to “Who can curatethis page?” on a species page takes a reader to a list ofcontent curators. Clicking on any of their names will provideinformation and links to all of their contributions within thesite. As one interviewee noted, this strategy is rewarding tosome participants: “Having your content represented onEOL is more rewarding because your name is associatedwith it, people can know what you’ve done.” However, itdoes not address the larger issue of the scientific communityrecognizing these curation activities as legitimate contribu-tions to science, as would be evidenced by, for example,tenure committees counting the number and quality ofcurated species pages or improvements to the EOL taxo-nomic classification scheme.

Lack of resources for content curation work. Issues offunding, time availability, and legitimacy of curation work

are all intertwined. Almost 40% of the survey participantswho were asked about their reason for not actively curatingEOL content associated funding, recognition, and timeaspects: “Lack of time/funding . . . I have no research grants. . . volunteer work is a luxury” and “Time is my biggestlimitation. I’ve thought of using LifeDesk for the fauna ofmy region but need some support to do this—hint, hint.”Without formal recognition by the scientific community ofopen-access content curation efforts, significant sustainablefunding will be lacking. This will keep curation activities inthe realm of “volunteer work” rather than “scientific work,”which will in turn make it harder to receive certain types offunding. For now, the opportunity cost of the time spentcurating at EOL is borne by the content curators themselves,which limits participation.

Professionals and nonprofessionals: Collaboration andconflict. A different social integration challenge, whichEOL faces, is the need to bridge between different contribu-tor populations. Scientists and citizen scientists participatein scientific endeavors for different reasons: citizen scien-tists participate in collective scientific activities out of bothegoistic and altruistic motivations, such curiosity, sense ofcommunity, and commitment to conservation and relatededucational efforts (Evans et al., 2005; Raddick et al., 2010;Rotman et al., 2012), while professional scientists contrib-ute to the formal processes and structure of “the scientificmethod” to advance science and further their own profes-sional career (Cohn, 2008; Latour & Woolgar, 1979).Designing a community that appeals to these distinct popu-lations can be challenging, although it also affords opportu-nities to leverage the differences to accomplish more thancould be accomplished with only one of the groups.

EOL curators in the study held highly divergent viewsregarding the role citizen scientists can and should play incuration activities at EOL, and their willingness to workwith them. The main reason that scientists were reluctant tocollaborate with citizen scientists was their fear that citizenscientists do not uphold the same rigorous scientific stan-dards that professional scientists are committed to: “I thinkthat the benefit/cost ratio of participation by nonexpertsshifts in groups that are popular with amateurs and untraineddilettantes.”

For other scientists who were interviewed, collaborationswith citizen scientists or nonprofessionals were perceived asa net gain: “[Experts and nonexperts] contributing to thesame resource pages . . . can be both useful and potentiallydamaging, though I haven’t seen any of the later” and “[I]really like working with the Flickr community. I like theimages. [It is] always fun working with people and seeingtheir different approaches.” However, even those who sup-ported collaboration with citizen scientists expressed adesire to distinguish clearly between trusted, professionalscientists and untrusted nonprofessionals (e.g., “amateurs,”“untrained dilettantes,” “nonexperts”). Surveyed scientistsexpressed nearly universal desire to “vet,” “approve,” and“control” content submitted by citizen scientists, or at least

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1101DOI: 10.1002/asi

Page 11: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

have it be clearly differentiated from expert content (e.g.,“content should be marked with different levels of trustabil-ity”). They preferred to retain control of their expertopinions and privileged status that gives them the finalword. This view positions citizen scientists at the bottomof the curation ladder and professional scientists at the top,maintaining the traditional power balance between themandreducing some of the collaborative effect of contentcuration communities.

Discussion—Facing the Challenges That ContentCuration Communities Present

Content curation communities flourish in various fields.From science to marketing to hobbies, they serve users andcommunities who find content aggregation, annotation, andarchiving valuable. They may differ in the type of contentthey curate (images, texts, preformed data, etc.), in their use(personal or collective) and in their structure (centralizedand hierarchal vs. decentralized and flat), but, surprisingly,many of them share similar challenges. Table 1 providesexamples of content curation communities from differentdomains, and uses the “informational” and “social” chal-lenges framework identified in EOL to comment on them.Although detailed analysis of each example was beyond thescope of this article, the table provides insights into thecommonalities and differences among content curationcommunities.

When comparing the various examples of content cura-tion communities with EOL (Table 1), we can clearly seethat most similarities can be found between EOL andcontent curation communities in the academic domain, suchas chemistry and linguistics. This is not surprising given theoverreaching purpose of these communities: to share andexchange data and knowledge among a broad spectrum ofusers, emphasizing professional academically orientedusers. A more important distinction is that these academiccommunities’ need to uphold scientific principles and aca-demic rigor, in both the work processes they endorse and thequality of the data they offer. These needs emphasize thechallenges that pertain to information validation and struc-ture as well as social integration aspects that are tied to theinformation. Communities addressing more “leisurely”activities, such as news consumption, reading and otherhobbies, represent a different genre of content curation,where the consumption of curated materials can be indi-vidual or collaborative. More importantly, the data qualityissue, while relevant, can be less controversial and plays asmaller role in establishing the structure and the legitimacyof the community.

Information Integration

In this section, we address the second research question:“What are the key information and social design choicesavailable to designers of content curation communities?”and place it in the broader context of scientific content

curation communities. We outline design recommendationsthat may help to address the challenges related to informa-tion integration and social integration. Some of these rec-ommendations defy neat classification as they apply to boththemes, so they are discussed jointly.

The ongoing debates regarding standardization choicesare a major challenge for information integration on EOLand other taxonomic content curation sites: Standardization,e.g., creating taxonomies and controlled vocabularies, pro-vides a universal framework for scientists to communicateand it presents a shared representation of a field of knowl-edge that provides the structure for understanding andcollaboration. Information standards can be viewed asboundary objects (Bowker & Star, 2000; Star, 2010) orboundary infrastructures (Star, 2010). Boundary infrastruc-tures, in their broadest sense, are items that create a sharedspace or “a sort of arrangement that allow different groups towork together without consensus” but within organic set-tings that require them to address “information and workrequirements” (Star, 2010, p. 602). EOL focuses on spec-ies classification systems that allow for communicationbetween scientists with different subspecialties, languages,and resources. A careful approach is needed for choosingtaxonomies that will serve as boundary infrastructures andhence support shared understanding and communication.Some possible options could include the following.

Option 1: Choose or develop a master standard. We usethe example of classification to address the broader chal-lenge of creating and maintaining uniform data standards.The initial approach was for EOL to select one master tax-onomy to serve as a unifying boundary object; EOL decidedon the COL as the default consensus classification. Althoughthis approach is the most parsimonious, it is not withoutproblems. As indicated by the findings above, many partici-pants were not pleased with this option. In addition, usingEOL as a platform to foster a new master or universal clas-sification schema would drastically shift the focus of theproject, in terms of purpose and resource allocation. It alsomay not be stable and entirely consistent, as agreement onbreakpoints may not be universal or enduring. Therefore,although this option may be relevant to other content cura-tion communities that face standardization challenges, it isnot a plausible solution for EOL, unless it can be achieved ina fluid way that does not promise consistency or authority.This is reminiscent of the notion that these databases “do notsweep away the past but engage with it in dynamic fashion(Hine, 2008, p. 150).

Option 2: Simultaneously support multiple standards. Analternative option is to do what EOL currently does, andallow content curators and users to choose which taxonomicclassification to use when viewing species pages. It is clearthat this approach helps support content curators’ prefer-ences and allows for additional taxonomic information to bepulled into EOL. However, it does not push the larger com-munity toward the creation of a unifying boundary standard,

1102 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi

Page 12: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

which could benefit the scientific community and enablemore meaningful interdisciplinary exchanges. This optionalso necessitates extensive support for users, whether byproviding detailed documentation and help and discussionpages, or by automating and streamlining the selectionprocess.

Option 3: Flexibility in determining the default stan-dard. Alternative options could also be explored. A com-promise between the two options would be to supportmultiple taxonomic classifications, but let the content cura-tors determine the default classification for any given set ofspecies. This has the benefit of helping novices who may notknow which standard is most appropriate to choose, as wellas assisting in the development of a unified taxonomic clas-sification. In addition, EOL could allow users and contentcurators to create and share lists of species (e.g., species ina particular ecosystem, species that are significant to users)to supplement taxonomic classifications as meaningful waysto browse and explore EOL.

Noticeably absent from the EOL participants’ dialoguesaround standards was a recognition of the potential role ofinformation professionals. This is not surprising given the“invisible” nature of information work more generally(Bates, 1999; Ehrlich & Cash, 1999). Information pro-fessionals’ extensive experience working with multiplestandards, open-access resources, content databases, andinterfaces could be leveraged, as they bridge between thedifferent subcommunities that form the larger content cura-tion community, address standardization challenges, andcreate new mechanisms for data curation. EOL currentlyemploys information professionals on the project, but maybenefit from active solicitation of microcontributions fromacademic science librarians and other information profes-sionals who would join the ranks of EOL curators. Thesemicrocontributions may include work related to standard-ization, as well as other traditional information work suchas verifying citations and writing species summaries (afamiliar activity to those who have done abstracting work).

Other factors that complicate the role of curators arisefrom the rapidly changing digital information world.Data curated in many content curation communities, andspecifically scientific content curation communities, aresubmitted in different formats (text, tables, graphical,visualizations) using multiple media (e.g., static anddynamic visualizations, audio, video, and multimedia) withdifferent technical standards. Knowing the source of thedata is becoming increasingly complex. Content curationcommunities like EOL start to draw from not only scien-tific (e.g., GBIS), vetted, and known sources (e.g., Wiki-pedia) but also the growing number of volunteer citizenscience projects that are springing up across the world(iNaturalist, iSpot, Project Noah). Keeping track of thesources of this data, checking the validity of the data andensuring appropriate credit is given to contributors willbecome increasingly complex as more people becomeinvolved.

To assist both curators and contributors, more visualiza-tion tools that clearly articulate changes over time could bedeveloped, akin to “Historyflow,” which was used for theanalysis of wiki contributions (Viégas, Wattenberg, &Dave, 2004) or “Tempoviz,” which is used for tracking net-works of contribution over time (Ahn, Taieb-Maimon,Sopan, Plaisant, & Shneiderman, 2011; Bongwon, Chi,Kittur, & Pendleton, 2008). Visualizations can be apowerful tool at both the macrolevel and the microlevel.On the macro level, visualizations can be used to uncoverthe implicit activity behind the curation scene, as well aspower relations between curators, data gaps, and areas thatrequire intense community efforts. On the microlevel, visu-alization tools can offer curators and users insights aboutthe history of each contribution, whether it replaced previ-ous content, what is its quality and values as assessed bythe community, and who curated or modified it.

Social Integration

Large-scale content curation communities such as EOLcan succeed only if they find ways of facilitating effectivecollaboration among different groups of contributors, whobring significant domain-specific knowledge with them,whether they are professional credentialed scientists, volun-teer ecology citizen scientists, or information professionals.We envision a successful model for a content curation com-munity as one that is based on a hybrid structure, whereprofessional scientists collaborate with citizen scientists,with the assistance of professional information specialists,all committed to the same idea of open-access content cura-tion. To achieve this and overcome the social integrationissues discussed above and issues pertaining to qualitycontrol, which are intertwined with social integration, wesuggest the following design recommendations: (a) estab-lishing active subgroups within large content curation com-munities, (b) establishing the role of open-access resourcesin the scientific community, (c) facilitating appropriate task-routing, (d) recognizing other contributors’ efforts, and (e)building on and developing new techniques to motivate andrecognize participation.

Establishing active subgroups within large content curationcommunities. The size and scope of crowdsourced contentcuration communities make community-building efforts dif-ficult. One challenge is that members have diverse interestsand are associated with multiple communities of practice,each of which has its own shared domain of interest (e.g., forEOL, insects, plant biology), mutual engagement (e.g., con-ferences, journals, relationships), and shared repertoire (e.g.,terminology, values, ways of doing things). In its earlystages of development, content curation communities likeEOL focus on creating their own community of practicerelated to curation of content. When they grow sufficiently,we propose that they actively develop interconnected sub-groups that mirror other existing communities of practice(e.g., for EOL, invasive species, birds, phylogenetics). Such

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1103DOI: 10.1002/asi

Page 13: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

subgroups could create master lists (“collections”) of pagesof shared interest and can discuss relevant topics. This canpromote users’ sense of community, emphasizing collabora-tion over individual data curation. By welcoming subgroupswithin their walls, content curation communities will notonly be places from which to send and receive content, theywill also become a place to engage directly in the corepractices of its members and subgroups.

Understanding the need to support the various practicesand norms of its sub-communities, EOL is currently roll-ing out tools that will facilitate a federated network ofsubcommunities.

Establishing the value of community-curated content in thescientific community. Scientists are committed to the tra-ditional standards that are set by their individual scientificcommunities. Currently, work on open-access data reposi-tories is not highly valued in the scientific promotionprocess, or in acquiring funding for research. Several routesof action can help in changing this predicament:

• Active efforts to integrate community curated data into sci-entific publications. This can be achieved by directing effortsto promote content curation communities and their productswithin the scientific community (in scientific journals andconferences). It can also be done structurally, by creatingeasy-to-use import tools that will enable scientists to pullcurated data into their databases for continuous work (e.g. viathe EOL API), or by consolidating vetted content into scien-tific publications (e.g., citing the curated source in back-ground material, text mining the community for new insights,or aggregating and repurposing analyzable data). Forexample, the ChemSpider content curation community(Pence and Williams, 2010) is integrated with Royal Societyof Chemistry journals and databases like PubMed, which usethe curated content to facilitate new types of searches.

• Providing financial support for active content curators.EOL’s Rubenstein Fellowships is a program for budding sci-entists that provides financial support and mentorship forup to 1 year. Rubinstein Fellowships are competitive andprestigious positions and can contribute to the scientists’ cre-dentials and future funding. Similar efforts can help in estab-lishing the scientists’ status within their respective scientificcommunities, and simultaneously enhance the role of thecontent curation communities as reputable entities within thescientific establishment. Interestingly, there is an emergenceof a novel “biocurator” profession in molecular biology(Sanderson, 2011), suggesting that some scientific communi-ties acknowledge the merits of such activities. However,developing and continuously supporting positions that upholdthese activities requires a substantial financial investment thatnot all content curation communities have. This is especiallytrue for budding content curation communities that arise outof community members’ interests and are not backed up bywell-established institutions.

• Endorsing curation work at academic institutions. Manyactivities performed by scholars such as reviewing articlesand grants, serving on editorial boards, and engaging withlocal communities are endorsed by academic institutions thatexpect their faculty members to engage in them even though

they are not explicitly paid to do so. Recognizing contribu-tions to content curation communities as a form of highlyvalued professional service, like journal reviewing, could goa long way towards motivating participation in curating tasks.

Facilitating the process—Getting the right task to the rightpeople. In all content curation communities, but especiallyin hybrid ones that are built upon collaboration betweendifferent user populations, a special emphasis should beput on facilitating a streamlined curation process. Part ofwhat makes peer-production communities, like those whodevelop open source software and Wikipedia, successful isthe ability of individuals to assign themselves to the tasksthat they can complete most readily (Benkler, 2002).Nobody knows a person’s interests and skills better thanhimself; furthermore, people are highly motivated to con-tribute when they feel that their contribution is needed andunique (Ling et al., 2005). What is needed, then, are toolsthat facilitate the matching of individuals skills and interestswith curation needs. The use of Watchlists (as on Wikipedia)that notify a user when changes are made to a page they are“watching” can help people stay abreast of changes butdoesn’t point users to gaps in the existing content. Cosley,Frankowski, Terveen, and Riedl (2006) suggested bridgingthis gap in the Wikipedia community with their automatictask routing tool that uses past editing behavior to recom-mend new pages a user may want to edit. Similar appro-aches could be modified and applied to content curationcommunities.

Many other options exist as well, such as allowing indi-vidual content curators to post lists of microcontributions or“wanted ads” that other contributors can monitor (i.e., sub-scribe to) or have sent to them based on a profile they havefilled out describing their interests and level of expertise.Posting EOL needs to the EOL Flickr group has worked wellfor soliciting needed photos, but this has not yet been scaledup or automated for other photos or other types of content.Such routing processes can also recommend relativelysimple tasks at first and, after an “initiation” period, advanceto more complex and high-level tasks, such as vettingresources. This would also allow users to grow and progressand work towards attaining a higher status within the contentcuration community. No matter the mechanism, helpingpotential contributors know what is needed and empoweringthem to accomplish those tasks is paramount to the successof content curation communities.

We have identified quality control, both in regards tocontent and to curation, as a crucial factor for the success ofa content curation community. Creating a mechanism fortask routing can also allow for better quality control:Improved control and feedback of the work that is beingdone within the content curation community will rid thecommunity of some of the inherent weariness of “dalli-ances” and “contamination” of resources and hesitation ofassociation with nonprofessionals. Such task routing emu-lates the traditional scientific apprenticeship process, inwhich a novice is accepted into the scientific community

1104 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi

Page 14: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

through legitimate peripheral participation (Lave & Wenger,1991). Through this process of initiation, citizen scientistsand novice curators can become trustworthy and establishedpartners in the content curation community. EOL expects tolaunch tools to support this process in the near future.

Recognizing other contributors’ efforts. To enable socialintegration, users first have to acknowledge other users’presence and contributions to the community. In large-scalecommunities, such as EOL, this is an especially difficult taskbecause of the federated nature of the activity. Therefore,explicit mechanisms are required to create such awareness.Portraying users and their contribution, by way of attributingthe data to the users who curated it, on the front page of thecommunity on a rotating basis, as well as on other desig-nated pages, can increase other users’ awareness of them andheighten their status within the community (Kriplean et al.,2008; Preece & Shneiderman, 2009). EOL currently uses“featured pages,” as does Wikipedia, which could be aug-mented by “featured curators” or “featured contributor”pages to recognize important contributors, make clear tofirst-time visitors that it is a community-created site, andpoint to model content curators for others to emulate.Another form of attribution could be facilitated by creatinga feedback mechanism that will notify users whether thedata they contributed or curated was vetted and approved.Similarly, a feedback mechanism could also alert a user tothe use and reuse of the data he contributed both within thecommunity and by third parties such as when it is down-loaded, viewed, or is cited in publications. By providingperiodic personal reports of the number of data downloads,citations, and reuse, and sharing these statistics with theentire community, the contributor is openly appreciated andhis social presence within the community is heightened.Scientists could include data from these reports on theircurricula vitae to help legitimize the work being done.

Other mechanisms that are needed for social integrationare internal conflict resolution tools, such as Wikipedia“talk” pages (cf. Kittur & Kraut, 2008) or GenWeb’s “griev-ance committee.” The proposed federated subcommunitiescan provide a place to discuss disagreements and issues thatstem from different disciplinary backgrounds and practices.In this context, dedicated and long-time contributors (bothscientists and citizen scientists) may also be conferred therole of “mediators” in case topical and procedural conflictsarise. EOL is currently instituting a role of “master cura-tors,” who will take upon themselves some of the aforemen-tioned social facilitation roles.

Building on and developing new techniques to motivate andrecognize participation. Some techniques for motivatingcontributions, such as “thank you” feedback notes, and rec-ognizing contributors participation, such as reputationsystems, have already been mentioned. But these examplesare just the tip of today’s iceberg in promoting the socialparticipation needed for content curation communities toflourish. iSpot, a site for teaching about biodiversity devel-

oped by the UK Open University (http://www.ispot.org.uk/),provides activities for students and the public to engage,motivate, and reward contributors. iSpot’s state-of-the-artreputation system combines recognizing expertise and accu-racy of suggestions with effort and social interaction: Userswho correctly helped identify species receive “reputationpoints” based on the level of agreement and helpfulness theyreached. Reputation is based on collective agreement anddiscussion and cannot be easily manipulated. Buildingsimilar, task-based reputation systems is an important steptowards improved social integration. An even bolder stepwould be to implement systems that “transfer” reputationfrom one content curation community to another. This way,participants need not start afresh when contributing toseveral communities, as their expertise will already havebeen substantiated. This is a potent idea, but one that needsto be implemented carefully: Transferring reputation is mostrelevant to content curation communities that share topicalsimilarities, as different domains require different types andlevels of knowledge and expertise.

Conclusion

In this article, we define a new genre of online commu-nity, “content curation communities,” using the EOL as acase study. We conducted a multimethod analysis, combin-ing survey and interview data with observations and analysisof online interaction. Our aim was to explore and understandthe challenges faced by content curation communities and torecommend potential design solutions. Although our focushas been on EOL, we believe many of the challenges anddesign recommendations will apply to other content cura-tion communities as well.

In the past, the contribution of citizen scientists, natural-ists, and enthusiasts to the development of science, specifi-cally astronomy, biology, and geology, has been welldocumented (Ellis & Waterton, 2004; Schiff, Van House, &Butler, 1997; Van House, 2002b), but their participation wasusually considered peripheral and limited to one-sided datatransfer with little or no mutual knowledge exchangebetween scientists and enthusiasts, but this is changing.Content curation communities offer the opportunity to shiftthe balance towards more collaborative practices, whereboth scientists and citizen scientists work together to createa vetted and reputable repository that can be used for tradi-tional scientific endeavors and for education beyond thescientific community.

Bringing together heterogeneous populations of con-tributors is challenging from both practical and social stand-points. Creative design solutions, such as those discussedabove, may ameliorate these challenges.

EOL is larger and more ambitious than most contentcuration communities, yet it can be viewed as a typical andrepresentative instance of the new genre of collaborativeinformation resources. It is still evolving and growing, andin its growth, it is currently facing the challenges outlined inthis article. By rethinking and redesigning some of the

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1105DOI: 10.1002/asi

Page 15: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

online tools available to content curators, the community isattempting to face these challenges. This is done based onnot only the lessons learned from the daily operation of EOLand the findings of this study but also feedback receivedfrom EOL curators. An approach that speaks to contributors’needs and allows their voice to be heard may not onlysupport the design changes of the content curation commu-nity but also enhance contributors’ sense of community andtheir willingness to further contribute to the community inthe future. EOL is already implementing some of thesechanges in its current version that was recently released.These changes will be the basis for future studies of contentcuration communities.

It should be noted that the study is not devoid of limita-tions. The concept of content curation communities is arelatively new one, and although we have attempted to focuson the most pressing issues faced by EOL, other typesof content curation communities, within and outside thescientific realm, may face other challenges that were notaddressed in this article. Comparable examples of collabo-rations between scientists and citizen scientists in otherdisciplines (e.g., astronomy or chemistry) may surface dif-ferent challenges. Similarly, our analysis of scientificcontent curation communities follows the current structureof the academic world. The advance of crowdsourcingcitizen-science projects may bring about a fundamentalchange to how science is done and to the balance betweenscientists and volunteers. This change will necessitate a newlook at the informational and societal infrastructures thatfacilitate scientific work. Despite these limitations, webelieve the concept of content curation communities isenduring and a useful frame for starting a dialogue aboutrelated communities. Furthermore, the focus on informationand social challenges seems to apply well to multiplecontent curation communities as shown in Table 1.

Future research on content curation communities shouldencompass other important issues as follows: (a) motiva-tional factors affecting professionals’ and volunteers’ par-ticipation in content curation communities, (b) the role ofinformation professionals within content curation com-munities, (c) strategies for developing a strong sense ofcommunity within content curation communities, and (d)different types of content curation communities and designstrategies that support them. Our future work will addressthese topics.

References

Ahn, J., Taieb-Maimon, M., Sopan, A., Plaisant, C., & Shneiderman, B.(2011). Temporal visualization of social network dynamics: prototypesfor nation of neighbors. Social Computing, Behavioral-Cultural Model-ing and Prediction. In Salerno, J., Yang, S., Nau, D. & Chai, S.K. (Eds.),(Vol. 6589, 309–316). Berlin: Springer. DOI: 10.1007/978-3-642-19656-0_43

Bates, M.J. (1999). The invisible substrate of information science. Journalof the American Society for Information Science and Technology,50(12), 1043–1050.

Benkler, Y. (2002). Coase’s Penguin, or, Linux and “The Nature of theFirm.” The Yale Law Journal, 112(3), 369–446.

Birnholtz, J.P., & Bietz, M.J. (2003). Data at work: Supporting sharing inscience and engineering.

Bongwon S., Chi, E.H., Kittur, A., & Pendleton, B.A. (2008). Lifting theveil: improving accountability and social transparency in Wikipedia withwikidashboard. In Proceedings of the SIGCHI conference on HumanFactors in Computing Systems (CHI ‘08) (pp. 1037–1040). New York:ACM Press.

Borgman, C.L. (2007). Scholarship in the digital age: Information, infra-structure, and the Internet. Cambridge, MA: The MIT Press.

Bos, N., Zimmerman, A., Olson, J., Yew, J., Yerkie, J., Dahl, E., & Olson,G. (2007). From shared databases to communities of practice: A tax-onomy of collaboratories. Journal of Computer Mediated Communica-tion, 12(2), 652–672.

Bowker, G.C., & Leigh Star, S. (2000). Sorting things out: classificationand its consequences. Cambridge, MA: The MIT Press.

Buneman, P., Cheney, J., Tan, W.-C., & Vansummeren, S. (2008). Curateddatabases. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGARTSymposium on Principles of Database Systems (PODS ’08) (pp. 1–12).New York: ACM Press.

Cohn, J.P. (2008). Citizen science: Can volunteers do real research?BioScience, 58(3), 192–197.

Corbin, J., & Strauss, A.C. (2007). Basics of qualitative research:techniques and procedures for developing grounded theory (3rd ed.).Thousand Oaks, CA: Sage.

Cosley, D., Frankowski, D., Terveen, L., & Riedl, J. (2006). Using intelli-gent task routing and contribution review to help communities buildartifacts of lasting value. In Proceedings of the SIGCHI conference onHuman Factors in computing systems (CHI ‘06), R. Grinter, T. Rodden,P. Aoki, E. Cutrell, R. Jeffries, and G. Olson (Eds.). NewYork, NY: ACMPress 1037–1046. DOI: 10.1145/1124772.1124928

Eisenhardt, K.M. (1989). Building theories from case study research. TheAcademy of Management Review, 14(4), 532–550.

Ellis, R., & Waterton, C. (2004). Environmental citizenship in themaking: the participation of volunteer naturalists in UK biologicalrecording and biodiversity policy. Science and public policy, 31(2),95–105.

Encylcopedia of Life. (2011). Retrieved from www.eol.org/aboutEhrlich, K., & Cash, D. (1999). The invisible world of intermediaries:

A cautionary tale. Computer Supported Collaborative Work, 8(1–2),147–167.

Evans, C., Abrams, E., Reitsma, R., Roux, K., Salmonsen, L., & Marra, P.P.(2005). The neighborhood nestwatch program: Participant outcomes of acitizen science ecological research project. Conservation Biology, 19(3),589–594.

Feagin, J.R., Orum, A.M., & Sjoberg, G. (Eds.). (1991). A case for the casestudy. Chapel Hill, NC: The University of North Carolina.

Finholt, T.A. (2002). Collaboratories. Annual review of informationscience and technology, 36(1), 73–107.

Forte, A., & Bruckman, A. (2005). Why do people write for wikipedia?Incentives to contribute to open. content publishing. Proc. of GROUP 05Workshop: Sustaining Community: The Role and Design of IncentiveMechanisms in Online Systems. Sanibel Island, FL. 6–9.

Gonzalez-Oreja, J. A. (2008). The Encyclopedia of Life vs. the brochure oflife: Exploring the relationships between the extinction of species and theinventory of life on Earth. Zootaxa. 1965, 61–68.

Hansen, D.L., Ackerman, M., Resnick, P., & Munson, S. (2007). VirtualCommunity Maintenance with a Repository. In Proceedings of theAmerican Society for Information Science and Technology 44(1), 1–20.

Hara, N., Solomon, P., Kim, S.L., & Sonnenwald, D.H. (2003).An emerging view of scientific collaboration: Scientists’ perspectiveson collaboration and factors that impact collaboration. Journal of theAmerican Society for Information Science and Technology, 54(10), 952–965.

Hine, C. (2008). Systematics as cyberscience: Computers, change andcontinuity in science. Cambridge, MA: MIT Press.

Hughes, B. (2005). Metadata quality evaluation: Experience from the openlanguage archives community. Digital Libraries: International Collabo-ration and Cross-Fertilization, 135–148.

1106 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi

Page 16: Supporting Content Curation Communities: The Case of the Encyclopedia of Life

Kittur, A., Chi, E., Pendleton, B.A., Suh, B., & Mytkowicz, T. (2007).Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise ofthe Bourgeoisie. Alt.CHI, 2007. San Jose, CA.

Kittur, A., & Kraut, R.E. (2008). Harnessing the wisdom of crowds inwikipedia: Quality through coordination. In Proceedings of the 2008ACM Conference on Computer Supported Cooperative Work (CSCW‘08) (pp. 37–46). New York: ACM Press.

Kriplean, T., Beschastnikh, I., & McDonald, D.W. (2008). Articulations ofwikiwork: uncovering valued work in wikipedia through barnstars. InProceedings of the 2008 ACM Conference on Computer SupportedCooperative Work (CSCW ‘08) (pp. 47–56). New York: ACM Press.

Lampe, C., & Resnick, P. (2004). Slash (dot) and burn: Distributed mod-eration in a large online conversation space. In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems (CHI‘04) (pp. 543–550). New York: ACM Press.

Latour, B., & Woolgar, S. (1979). Laboratory life: The construction ofscientific facts: Princeton Univ Press.

Lave, J., & Wenger, E. (1991). Situated Learning: Legitimate peripheralparticipation. Cambridge, UK: Cambridge University Press.

Li, Q., Cheng, T., Wang, Y., & Bryant, S.H. (2010). PubChem as a publicresource for drug discovery. Drug Discovery Today, 15(23–24), 1052–1057. doi: 10.1016/j.drudis.2010.10.003

Ling, K., Beenen, G., Ludford, P., Wang, X., Chang, K., Li, X., Cosley, D.,Frankowski, D., Terveen, L., Al Mamunur, R., Resnick, P., & Kraut, R.(2005). Using social psychology to motivate contributions to onlinecommunities. Journal of Computer-Mediated Communication, 10(4),Article 10.

Maddison, D.R., Schulz, K.S., & Maddison, W.P. (2007). The tree of lifeweb project. Zootaxa, 1668, 19–40.

Nov, O. (2007). What motivates wikipedians? Communications of theACM, 50(11), 60–64.

O’Hara, K. (2008). Understanding geocaching practices and motivations. InProceedings of the 26thAnnual SIGCHI Conference on Human Factors inComputing Systems. (CHI ‘08) (pp. 1177–1186). NewYork: ACM Press.

Orrell, T. (2011). ITIS: The integrated taxonomic information system.Retrieved from http://www.catalogueoflife.org/col

Patton, M.Q. (1990). Qualitative evaluation and research methods.Newbury Park, London: Sage.

Pence, H.E., & Williams, A. (2010). ChemSpider: An Online ChemicalInformation Resource Journal of Chemical Education 2010 87(11),1123–1124.

Preece, J. (2000). Online communities: Designing usability, supportingsociability. Chichester, UK: John Wiley & Sons.

Preece, J., & Shneiderman, B. (2009). The reader-to-leader framework:Motivating technology-mediated social participation. AIS Transactionson Human-Computer Interaction, 1(1), 5.

Raddick, J., Lintott, C.J., Schawinski, K., Thomas, D., Nichol, R.C., Andree-scu, D., Bamford, S., Land, K.R., Murray, P., Slosar, A., Szalay, A.S., &Vandenberg, J. (2007). Galaxy zoo: An experiment in public scienceparticipation. Bulletin of the American Astronomical Society, 39,892.

Raddick, M.J., Bracey, G., Gay, P.L., Lintott, C.J., Murray, P., Schawinski,K., Szalay, A., & Vandenberg, J. (2010). Galaxy zoo: Exploring themotivations of citizen science volunteers. Astronomy Education Review,9(1), 010103. doi:http://dx.doi.org/10.3847/AER2009036

Rotman, D., & Preece, J. (2010). The “WeTube” in YouTube—Creating anonline community through video sharing. International Journal of Web-based Communities, 6(2), 317–333.

Rotman, D., Preece, J., Hammock, J., Procita, K., Hansen, D., Parr, C.S.,Lewis, D., & Jacobs, D. (2012). Dynamic changes in motivation incollaborative ecological citizen science projects. In Proceedings of theACM Conference on Computer Supported Collaborative Work (CSCW‘12). New York: ACM press.

Sanderson, K. (2011). Bioinformatics: Curation generation. Nature,470(7333), 295–296.

Schiff, L.R., Van House, N.A., & Butler, M. H. (1997). Understandingcomplex information environments: a social analysis of watershed plan-ning. In Proceedings of the Second ACM International Conference onDigital libraries (DL ‘97) (pp. 161–168). New York: ACM Press.

Star, L.S. (2010). This is not a boundary object: Reflections on the originof a concept. Science, Technology & Human Values, 35(5), 601–617.

Sullivan, B., Wooda, C.L., Iliffa, M.J., Bonneya, R.E., Finka, D., &Kelling, S. (2009). eBird: A citizen-based bird observation networkin the biological sciences. Biological Conservation, 142(10), 2282–2292.

Tautz, D., Arctander, P., Minelli, A., Thomas, R.H., & Vogler, A.P. (2003).A plea for DNA taxonomy. Trends in Ecology & Evolution, 18(2),70–74.

Van House, N.A. (2002a). Digital libraries and practices of trust: networkedbiodiversity information. Social Epistemology, 16(1), 99–114.

Van House, N.A. (2002b). Trust and epistemic communities in biodiversitydata sharing. In Proceedings of the Second ACM/IEEE-CS Joint Con-ference on Digital libraries (JCDL ‘02) (pp. 231–239). New York: ACMPress.

Viégas, F.B., Wattenberg, M., & Dave, K. (2004). Studying cooperation andconflict between authors with history flow visualizations. In Proceedingsof the SIGCHI Conference on Human Factors in Computing Systems(CHI ’04) (pp. 575–582). New York: ACM Press.

Wenger, E. (1998). Communities of practice—Learning, meaning, andidentity. Cambridge, MA: Cambridge University Press.

Wheeler, Q. (2008). The new taxonomy. New York: CRC Press.Wilson, E.O. (2003). The encyclopedia of life. Trends in Ecology &

Evolution, 18(2), 77–80.Yates, D., Wagner, C., & Majchrzak, A. (2010). Factors affecting shapers of

organizational wikis. Journal of the American Society for InformationScience and Technology, 61(3), 543–554.

Yin, R.K. (2008). Case study research: Design and methods (4th ed.).Thousand Oaks, CA: Sage.

Zimmerman, A. (2007). Not by metadata alone: the use of diverse forms ofknowledge to locate data for reuse. International Journal on DigitalLibraries, 7(1), 5–16.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1107DOI: 10.1002/asi