supporting content curation communities: the case of the encyclopedia of life
DESCRIPTION
TRANSCRIPT
Supporting Content Curation Communities:The Case of the Encyclopedia of Life
Dana RotmanCollege of Information Studies, University of Maryland, College ParkCollege Park, Maryland 20742. E-mail: [email protected]
Kezia ProcitaCollege of Information Studies, University of Maryland, College ParkCollege Park, Maryland 20742. E-mail: [email protected]
Derek HansenCollege of Information Studies, University of Maryland, College ParkCollege Park, Maryland 20742. E-mail: [email protected]
Cynthia Sims ParrNational Museum of Natural History, Smithsonian Institution. Washington, DC 20013. E-mail: [email protected]
Jennifer PreeceCollege of Information Studies, University of Maryland, College ParkCollege Park, Maryland 20742. E-mail: [email protected]
This article explores the opportunities and challenges ofcreating and sustaining large-scale “content curationcommunities” through an in-depth case study of theEncyclopedia of Life (EOL). Content curation commu-nities are large-scale crowdsourcing endeavors thataim to curate existing content into a single repository,making these communities different from content cre-ation communities such as Wikipedia. In this article,we define content curation communities and provideexamples of this increasingly important genre. We thenfollow by presenting EOL, a compelling example of acontent curation community, and describe a case studyof EOL based on analysis of interviews, online discus-sions, and survey data. Our findings are characterizedinto two broad categories: information integration andsocial integration. Information integration challengesat EOL include the need to (a) accommodate andvalidate multiple sources and (b) integrate traditionalpeer reviewed sources with user-generated, nonpeer-reviewed content. Social integration challenges at EOLinclude the need to (a) establish the credibility of open-access resources within the scientific community and
(b) facilitate collaboration between experts and novices.After identifying the challenges, we discuss the potentialstrategies EOL and other content curation communitiescan use to address them, and provide technical, content,and social design recommendations for overcomingthem.
Introduction
Scientific progress is, in large part, dependent on thedevelopment of high-quality shared information resourcestailored to meet the needs of various scientific communities.Traditionally created and maintained by a handful of paidscientists or information professionals, scientific informa-tion resources such as repositories, databases, and archivesare increasingly being crowdsourced to professional andnonprofessional volunteers in what we define as “contentcuration communities.” Content curation communities aredistributed communities of volunteers who work together tocurate data from disparate resources into coherent, vali-dated, and oftentimes freely available repositories. Contentcuration communities helped to develop resources on drugdiscovery (Li, Cheng, Wang, & Bryant, 2010), worm habi-tats or bird migration (Sullivan et al., 2009), astronomicshifts (Raddick et al., 2007), and language (Hughes, 2005),to name a few.
Received July 8, 2011; revised December 14, 2011; accepted December 15,
2011
© 2012 ASIS&T • Published online 28 March 2012 in Wiley OnlineLibrary (wileyonlinelibrary.com). DOI: 10.1002/asi.22633
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 63(6):1092–1107, 2012
Content curation communities are related to content cre-ation communities like Wikipedia, but face different chal-lenges, as curation and creation are fundamentally differentactivities. They both require a community of contributors tohelp maintain free content, but the tasks performed by thecontributors differ, as does the requisite skill set. Contentcuration communities aggregate, validate, and annotateexisting content with its associated intellectual propertyclaims into a comprehensive repository, while content cre-ation communities avoid intellectual property concerns bycreating their own content from scratch. Both content cre-ation and content curation communities share some aspectswith organizational repositories, mainly the need to “shape”knowledge—whether created or curated—into a more usefulformat that will create a “dynamic memory system” (Yates,Wagner, & Majchrzak, 2010) that will serve the communitymembers. As such, both communities fill important niches inthe information landscape and have already proven theirworth. Although content creation communities, particularlyWikipedia, have been examined extensively to uncover theprinciples that have led to their successes and failures (c.f.Forte & Bruckman, 2005; Kittur, Chi, Pendleton, Suh, &Mytkowicz, 2007; Kriplean, Beschastnikh, & McDonald,2008; Nov, 2007), there have not been comparable studies ofcontent curation communities. To realize the ambitiousgoals of such communities, we must understand how toeffectively design the technologies, information standards,processes, and social practices that support them.
The goal of this article is to characterize content curationcommunities and their increasing prominence in the infor-mation landscape, and then to use the formative years of theEncyclopedia of Life (EOL) as a case study for providinginsights into the design of such communities. Our keyresearch questions are as follows:
• What are the unique informational and social challengesexperienced by EOL, as an example of a content curationcommunity?
• What are the key information and social design choices avail-able to designers of content curation communities?
Content Curation Communities
Content curation communities can be defined by describ-ing their core elements:
• Community: The word “community” emphasizes that a largenumber of people are involved in some type of collectiveeffort. Although not necessary, they typically include volun-teers and reflect the social roles and participation patterns thatmirror those of other types of online communities (Preece,2000). Much of the curation effort is done individually, butwhat separates content curation communities from personalcuration is the coordinated curation work that pulls contentinto a mass repository.
• Content: The word “content” conveys the primary emphasisof the community. Curation communities exist for thepurpose of providing content, which can take on many forms
such as text, multimedia (e.g., images, videos), and structureddata (e.g. metadata). The content is typically free and web-based, but doesn’t necessarily have to be. The nature andscope of the content is based on topics and resources that thespecific community finds valuable enough to curate.
• Curation: The word “curation” describes the type of work thatis performed by these communities; it is their raison d’être.Curation connotes the activities of identifying, selecting, vali-dating, organizing, describing, maintaining, and preservingexisting artifacts. This term follows the term “content”because the artifacts being curated by the community arecontent, not necessarily the objects themselves (e.g., rocks oranimals aren’t curated, but content about them is curated).Curators of museum objects or physical samples are typicallyexperts in their domain, suggesting that this role requiressome level of content expertise, which can range from layinterest in the specific domain to professional-level expertise,although this is not a requirement for the definition.
Although there are various examples for content curationcommunities (see Table 1), in this article, we focus on ascientific content curation community, which builds on along tradition of sharing and curating scientific data andliterature. The scientific enterprise has always been collabo-rative. However, the advent of the Internet has enabled largedistributed scientific communities to collaborate at a scaleand pace never before realized (Borgman, 2007). Distributedcommunities of scientists were coined “collaboratories,” aterm first used in the early 1980s, to indicate a laboratorywithout walls where data are shared via advances in infor-mation technology independent of time and location (Boset al., 2007; Finholt, 2002). Collaboratories and their morerecent instantiations led to an increase in the diversity ofparticipants and participation levels (Birnholtz & Bietz,2003; Bos et al., 2007; Finholt, 2002). Yet issues of datareuse, upholding scientific standards, and maintaining high-quality data persist (Zimmerman, 2007).
Bos and his colleagues (2007) identified seven distincttypes of collaboratories, including “community datasystems,” where scientific data is created, maintained, andimproved by a geographically distributed community.Unlike content curation communities that strive to aggregatevarious types of content, whether traditional peer-reviewedscientific data or user-generated data, community datasystems and related efforts such as “curated databases”(Buneman, Cheney, Tan, & Vansummeren, 2008) focussolely on curating scientific data. In many of these commu-nity data systems, volunteers play the role of content con-tributor by submitting new data that are then vetted andannotated by paid professionals. Content curation commu-nities are not primarily about collecting original data. Forexample, the EOL reverses the traditional community datasystem approach by having its volunteer content curatorsannotate and vet data that have already been collected incommunity data systems or other publications. In many ofthese cases, the EOL curators’ primary contribution is toidentify and combine data from reputable sources into ameaningful whole.
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1093DOI: 10.1002/asi
TAB
LE
1.C
ompa
riso
nof
vari
ous
genr
esof
cont
ent
cura
tion
com
mun
ities
.
Dom
ain/
com
mun
ityC
urat
ion
purp
ose
Stru
ctur
eIn
form
atio
nin
tegr
atio
nSo
cial
inte
grat
ion
Coo
rdin
atio
nm
echa
nism
s
Eco
logy
—E
ncyc
lope
dia
ofL
ife
(EO
L)
Agg
rega
ting
info
rmat
ion
abou
tev
ery
livin
gsp
ecie
son
eart
h.Fo
cus
ona
broa
dco
mm
unity
ofus
ers,
rang
ing
from
the
gene
ral
publ
icto
acad
emic
s.
Hie
rarc
hica
lst
ruct
ure–
Cou
ncil
and
exec
utiv
eco
mm
ittee
inch
arge
ofse
tting
gene
ral
goal
san
dpo
licie
s,w
orki
nggr
oups
inch
arge
ofda
y-to
-day
wor
k,an
dvo
lunt
eer
cura
tors
who
are
inch
arge
ofth
eda
ta.
Pulli
ngda
tafr
omva
riou
sre
sour
ces
(ope
nac
cess
and
scie
ntifi
cre
sour
ces)
.D
iffe
rent
data
form
ats.
Dat
ane
eds
valid
atio
nan
dqu
ality
cont
rol.
Use
ofdi
ffer
ent
clas
sific
atio
nsc
hem
es.
Inte
ract
ion
with
vari
ous
grou
ps,o
rgan
izat
ions
and
indi
vidu
als
com
ing
from
diff
eren
tdo
mai
ns–a
need
toes
tabl
ish
wor
king
proc
esse
san
dtr
ust
betw
een
them
.R
ecog
nitio
nof
the
valu
eof
cont
ent
cura
tion
with
inth
esc
ient
ific
com
mun
ity.
Det
aile
dFA
Q,
mai
ling
list,
mee
tings
inpe
rson
with
som
ecu
rato
rs.
Lan
guag
ean
din
form
atio
nre
sear
ch—
Ope
nla
ngua
gear
chiv
es(O
LA
C);
(ww
w.la
ngua
gear
chiv
es.o
rg)
Har
vest
ing
lingu
istic
reso
urce
s,in
dexi
ng,a
rchi
ving
and
stan
dard
izin
gfo
rus
eby
lingu
ists
and
othe
rs.
Focu
son
abr
oad
com
mun
ityof
user
s.
Hie
rarc
hica
lst
ruct
ure–
Gov
erni
ngad
viso
rybo
ard
ofre
spec
ted
peer
s.C
ounc
il–ap
prov
edby
the
advi
sory
boar
d,m
akin
gde
cisi
ons
for
the
com
mun
ity;W
orki
nggr
oups
–ad
hoc
asso
ciat
ions
that
draf
tst
anda
rds,
note
s,an
ddo
cum
enta
tion
(tha
tar
eap
prov
edby
the
coun
cil)
.Ind
ivid
uals
volu
ntee
rto
part
icip
ate
inth
eco
mm
unity
effo
rts–
can
adva
nce
togo
vern
ing
role
sin
time.
Pulli
ngda
tafr
omnu
mer
ous
(som
etim
esco
nflic
ting)
reso
urce
s.St
anda
rdiz
atio
npr
oces
sof
data
,re
sour
ces,
crea
ting
anac
cept
edco
ntro
lled
voca
bula
ry.
Met
adat
ast
anda
rdiz
atio
nha
sto
beap
prov
edby
the
gove
rnin
gad
viso
ryan
dco
unci
l–lo
ngpr
oces
s,so
met
imes
depe
nden
ton
indi
vidu
als
com
ing
from
diff
eren
tdo
mai
nsw
ithdi
ffer
ent
view
poin
ts.
Inte
ract
ion
with
vari
ous
grou
ps,o
rgan
izat
ions
and
indi
vidu
als
com
ing
from
diff
eren
tdo
mai
ns.
Ver
yce
ntra
lized
orga
niza
tiona
lst
ruct
ure.
Task
regu
latio
nan
ddi
visi
onof
labo
rar
eno
tal
way
scl
ear.
Web
port
alfo
rth
eco
mm
unity
,ex
tens
ive
docu
men
tatio
n,Q
&A
,dir
ecto
ryof
com
mun
itym
embe
rs.
Scie
nces
Che
mSp
ider
(ww
w.c
hem
spid
er.c
om/)
Dat
abas
efo
rcu
ratin
gch
emic
alst
ruct
ures
,phy
sica
lpr
oper
ties,
imag
es,
spec
tra,
and
othe
rre
late
din
form
atio
n.C
onne
cted
tom
any
othe
rda
taba
ses.
Focu
son
abr
oad
com
mun
ityof
user
s–st
uden
ts,
acad
emic
s,in
dust
ry.
Sem
ihie
rarc
hica
lst
ruct
ure–
Gov
erne
dby
the
Roy
alC
hem
istr
ySo
ciet
y(U
K).
Use
rsca
nco
mm
ent
onot
hers
’ac
tions
;re
gist
ered
user
sca
ncu
rate
nam
es,l
inks
,spe
ctra
,an
dot
her
reso
urce
s.M
aste
rcu
rato
rsva
lidat
ean
dap
prov
eth
eda
ta.O
rgan
izat
ions
(uni
vers
ityde
part
men
ts,r
esea
rch
grou
ps)
can
depo
sit
data
.Pr
ereq
uisi
tes–
regi
stra
tion
and
basi
ckn
owle
dge
ofch
emis
try.
Pulli
ngda
tafr
omva
riou
sre
sour
ces
(ope
nac
cess
reso
urce
s–W
ikip
edia
;op
enan
dpr
ivat
ere
posi
tori
es).
Not
all
data
isve
tted
and
itre
quir
esva
lidat
ion
acco
rdin
gto
scie
ntifi
cst
anda
rds.
Lin
ksto
chem
ical
vend
ors’
cata
logu
esra
ise
issu
esof
com
mer
cial
ism
.No
cont
rolle
dvo
cabu
lary
orse
man
ticm
arku
p.
Use
rsne
edto
form
ally
appl
yto
atta
ince
rtai
ncu
ratio
nro
les–
priv
ilege
san
dac
cess
tocu
ratio
nar
egr
ante
dba
sed
oncr
eden
tials
.So
me
user
sdo
not
have
form
alcr
eden
tials
and
thus
are
barr
edfr
ompl
ayin
ga
mor
eac
tive
role
inth
eco
mm
unity
.E
mph
asis
onbu
ildin
gin
tra-
com
mun
ityre
puta
tion
thro
ugh
mic
roco
ntri
butio
ns.
Dis
cuss
ion
foru
man
dpa
ges
for
data
-int
egra
tion
issu
es.A
form
albl
ogpu
blis
hed
byth
eR
CS.
Lite
ratu
rePr
ojec
tG
uten
berg
(ww
w.g
uten
berg
.org
/)
Dig
itizi
ngco
pyri
ght-
free
and
publ
ic-d
omai
nbo
oks.
Focu
son
publ
icac
cess
tocu
ltura
lw
orks
.
Purp
osef
ully
dece
ntra
lized
–com
mun
itym
embe
rsde
cide
inde
pend
ently
wha
tto
cura
te.
“Pos
ting
team
”cl
eans
and
appr
oves
wor
ksfo
rpu
blic
atio
n.
No
com
preh
ensi
velis
tof
cultu
ral
wor
ksin
need
ofpu
blic
atio
n.D
ecen
tral
ized
stru
ctur
ean
dno
“sel
ectio
npo
licy”
lead
toca
ses
whe
renu
mer
ous
volu
ntee
rsar
eun
know
ingl
yw
orki
ngon
the
sam
ebo
ok/c
ultu
ral
piec
e.Pr
oofr
eadi
ngst
anda
rds
are
uphe
ld,y
etth
epr
ojec
tbu
tm
etad
ata
isno
tal
way
sav
aila
ble.
Litt
leat
trib
utio
n.In
mos
tca
ses
ther
eis
noat
trib
utio
nto
mem
bers
who
clea
ned
and
valid
ated
the
data
,jus
tto
volu
ntee
rsw
hodi
gitiz
ed(i
.e.,
prod
uced
)it.
Mai
ling
list,
priv
ate
emai
ls,e
xten
sive
FAQ
docu
men
tatio
nan
dne
wsl
ette
rs.
1094 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi
Gen
ealo
gyG
enW
eb(u
sgen
web
.org
)
Prov
idin
gre
sour
ces
for
gene
alog
yre
sear
chfo
rev
ery
coun
tyin
the
Uni
ted
Stat
es.
Focu
son
indi
vidu
alge
neal
ogy
rese
arch
ers.
Com
plet
ely
dece
ntra
lized
–vol
unte
ers
“ado
pt”
one
(or
mor
e)co
unty
and
prod
uce
all
the
data
/link
sfo
rth
atar
ea.C
oord
inat
ing
team
and
proj
ect
coor
dina
tors
are
inch
arge
ofea
chst
ate,
asw
ell
aslin
king
indi
vidu
alre
sour
ces
toth
ege
nera
lw
ebsi
tean
dre
solv
ing
confl
icts
,bu
tha
vem
inim
alco
ntro
lov
erco
nten
t.N
ocl
ear
proc
ess
for
beco
min
ga
mem
ber
ofth
eco
ordi
natin
gte
am.
Var
ying
leve
lsof
com
plet
enes
san
dac
cura
cy,d
epen
dent
onth
ein
divi
dual
volu
ntee
r.Fe
dera
ted
repo
sito
ries
dono
tal
low
for
com
preh
ensi
vese
arch
ofda
taba
ses.
Dif
fere
ntda
tafo
rmat
,no
cont
rolle
dvo
cabu
lary
.
Mos
tlyvo
lunt
eers
act
indi
vidu
ally
with
little
inte
ract
ion
with
othe
rsor
with
the
coor
dina
ting
team
.
Mai
ling
lists
and
priv
ate
e-m
ails
.G
riev
ance
com
mitt
eere
solv
esco
nflic
tsbe
twee
nm
embe
rs.
Cra
fts
Rav
elry
(ww
w.r
avel
ry.c
om)
Cur
atin
gan
dsh
arin
gya
rn-b
ased
craf
ts,
proj
ects
,ide
asan
dto
ols.
Focu
son
abr
oad
com
mun
ityof
craf
ters
.
Em
phas
izes
indi
vidu
alcu
ratio
nan
dso
cial
netw
orki
ng.E
ach
mem
ber
crea
tes
thei
rco
nten
tan
dca
nde
posi
tda
tain
toa
colle
ctiv
eda
taba
se.O
ther
mem
bers
can
com
men
t.A
nnot
atio
nis
done
only
bym
embe
rsw
hoha
vejo
ined
aned
iting
grou
pan
dfo
rda
tath
atis
open
toed
iting
.R
emov
alof
data
ison
lydo
neby
sele
cted
itors
.
Dat
ava
lidat
ion.
No
cont
rolle
dvo
cabu
lary
,sta
ndar
diza
tion,
orre
gula
tion
rega
rdin
gan
nota
tion
orco
ntri
butio
nof
data
mak
esse
arch
diffi
cult.
Soci
alin
tegr
atio
nis
done
mos
tlyon
soci
alne
twor
king
tool
san
dle
sson
data
base
dco
mpo
nent
sof
the
com
mun
ity.
Wik
i,fo
rum
s,su
bgro
ups,
soci
alne
twor
king
(com
men
ts,f
rien
dslis
ts).
Spec
ific
disc
ussi
onfo
rum
sfo
rda
taed
iting
and
clea
ning
purp
oses
.
Web
Inde
Ope
nW
ebD
irec
tory
(ww
w.d
moz
.org
)
Boo
kmar
king
,ca
talo
guin
g,an
dcu
ratio
nof
web
page
s.Fo
cus
oncu
ratio
nfo
rth
egr
eate
rgo
od.
Hie
rarc
hica
lst
ruct
ure:
edito
rsha
veto
appl
yfo
ra
posi
tion.
Seni
ored
itors
appr
ove
juni
ored
itors
appl
icat
ions
,cur
ated
cont
ent
and
edito
rial
sugg
estio
ns.C
urat
ion
isdo
neon
all
leve
lsbu
tha
sto
beve
tted.
Adm
ins
cont
rol
all
editi
ngpr
ivile
ges
and
cont
ent.
Lon
gse
rvin
gse
nior
edito
rsca
nbe
com
ead
min
sth
roug
hre
com
men
datio
nfr
omot
her
adm
ins.
Dat
aar
est
ruct
ured
onto
logi
cally
,bas
edon
exis
ting
form
alca
talo
guin
gpr
inci
ples
(usn
etin
spir
ed).
Cha
nges
toth
est
ruct
ure
mad
eth
roug
hdi
scus
sion
amon
gth
ese
nior
edito
rsun
tilre
achi
ngco
nsen
sus.
No
anno
tatio
nto
ols
avai
labl
e.D
iffe
rent
voca
bula
ries
for
gene
ral
cont
ent
and
adul
tco
nten
t.D
iffic
ulty
mai
ntai
ning
qual
ityco
nten
tan
dre
mov
ing
spam
.
The
com
mun
itypr
omot
edse
lf-r
egul
atio
n;th
eref
ore
nost
rict
polic
ies
for
soci
alin
tegr
atio
nw
ere
publ
icly
publ
ishe
d.A
llega
tions
ofbl
ackl
istin
ged
itors
and
cont
ent,
pers
onal
confl
icts
and
lack
ofco
ntro
lov
erre
mov
alof
editi
ngpr
ivile
ges
caus
efr
agm
enta
tion
inth
eco
mm
unity
.
Dis
cuss
ion
foru
ms
open
only
toed
itors
.A
lthou
ghth
esi
teha
sbe
enon
eof
the
first
exam
ples
for
cont
ent
cura
tion
(est
ablis
hed
in19
98),
itlo
stso
me
ofits
appe
aldu
eto
the
rise
ofw
ikis
and
folk
sono
mie
s,an
dlim
ited
itsw
ork
begi
nnin
gin
2007
.
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1095DOI: 10.1002/asi
In addition to being an influence on collaboratories,scientific content curation communities are also a subset ofthe growing trend of crowdsourcing scientific work to inter-ested enthusiasts and volunteers. Crowdsourced projects canbe found across academic domains, from astronomy(Galaxy Zoo, Cerberus) to biochemistry (FoldIt, ChemSpi-der), geography and climatology (Old Weather Tracking),history (Civil War Diaries and Letters transcription project,FamilySearch, The Great War archive), archeology (AncientLives), and even peer review (IMPROVER). The premise ofcrowdsourced projects is that engaging a large number ofusers in solving complex problems that require manyresources (time, effort, expertise, or computing powers) willfacilitate quick and potentially novel findings. The websitescistarter.com lists more than 400 crowdsourced scientificprojects, across different domains. From this list we canlearn that crowdsourced projects are more prevalent indomains that are labor intensive and where large-scale ques-tions can be broken down to smaller, manageable tasks.Most crowdsourced scientific projects focus on generatingnew data rather than curating existing data. Content curationcommunities are less popular than scientific crowdsourcingprojects for various reasons that will be outlined in thearticle.
Content curation communities have also been influencedby more generic social media initiatives and tools. As webcontent has proliferated, the need to curate general content,and to separate the wheat from the chaff, has increased. Forexample, a number of news sites curate stories of interest.Some, like Google News, use automatic processing to groupsimilar articles from reputable sources. Others, such asSlashdot rely on user-generated ratings to filter out the low-quality discussions about news (Lampe & Resnick, 2004).Social sharing sites like Diigo, Flickr, and YouTube alsoallow popular content to rise to the top (e.g., Rotman &Preece, 2010). Although many of these approaches workwell at identifying “popular” content, they are not designedfor situations where specialized expertise is needed, makingit unclear how to apply similar design strategies to scientificdomains. Many online communities use Frequently AskedQuestions (FAQs) and wiki repositories (Hansen et al.2007) as a way of collecting high-quality content, but unlikecontent curation communities, they are focused on theirown community content, not content created by othersources. One of the goals of this article is to learn fromstrategies employed by existing social media tools to effec-tively aid in negotiating the challenges faced by contentcuration communities.
EOL
Imagine an electronic page for each species of organism onEarth, available everywhere by single access on command. Thepage . . . comprises a summary of everything known about thespecies’ genome, proteome, geographical distribution, phyloge-netic position, habitat, ecological relationships and, not least, itspractical importance for humanity. (Wilson, 2003, p. 77)
The EOL is “a sweeping global effort to gather and sharethe vast wealth of information on every creature—animals,plants and microorganisms—and make it available as a web-based resource” (EOL, 2011). EOL is an open-access aggre-gation portal (www.eol.org). One of its bold goals is toengage individuals, including scientists and nonscientists,on contributing to scientific information curation, so thatthey will “study, protect and prolong the existence of Earth’sprecious species” (EOL, 2011). EOL is structured such thateach species is presented on a dedicated page, where allavailable and relevant information, vetted and nonvettedalike, is presented. Currently, there are 3 million pages,representing 1.9 million known species on Earth. Most ofthese are stubs, but trusted content is available on over600,000 pages that collectively are viewed by nearly 3million visitors a year. Over 40,000 people have registeredon to the site. Paid staff members include information, edu-cation, and technical professionals. Contributors includeinterested volunteers, who can freely contribute informationabout organisms, as well as credentialed scientists and“citizen scientists” (Cohn, 2008), who agree to curate infor-mation on EOL. Seven hundred and fifty content curators areassociated with EOL, 200 of whom are currently active. Asmall number of scientific contributors and content curatorsreceive temporary part-time financial support from a com-petitive Fellows program, but most content on EOL arisesthrough volunteer partnerships with database and librarydigitization projects and citizen contributions. All the vettedand unvetted content is freely available online.
EOL’s curation work process brings together automatictools and human input: EOL automatically aggregatesobjects such as images or text onto publicly available pagesbased on metadata from various content sources. Curatorsare granted access to controls, allowing them to “trust,”“untrust,” and “hide” objects such as images or text. EOLcurators cannot change the objects but they can leave com-ments and make a trust/untrust/hide decision. The systemshares these comments and curator actions with the originalcontent providers who may choose to make a correction andresubmit their data. Curators can rate objects on EOL, basedon their comprehensiveness and quality. EOL uses thisrating to sort the objects on the page so the highest ratedobjects show first. To a lesser extent, curators are alsoencouraged to be contributors by adding their own text topages or improving third-party content (e.g., editing Wiki-pedia pages) before importing them into EOL. Althoughthere is no formal training and accreditation process forcurators, they are expected to learn about EOL’s policiesand tools by reading documentation on EOL’s website or byfollowing discussions in an opt-in mailing list.
Curator activity is only loosely monitored by EOL staffand other curators, and social conflicts, as discussed later,are usually resolved through backchannels such as personale-mails. To retain established curators, EOL emphasizes per-sonal feedback and recognition through e-mails but alsopublic recognition through highlighting curators’ work ontheir personal EOL pages. Curators are also invited to test
1096 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi
new features and provide input into the design of futuretools. EOL predecessors include sites that provide freelyavailable, biodiversity-focused, and comprehensive, speciescoverage, such as the Tree of Life Web project (TOLWeb),which was created with a focus on biodiversity and evolu-tion to share with readers the “interrelationships of earth’swealth of species.” Unlike EOL, TOLWeb is a contentcreation community where preapproved users from the pro-fessional science community directly add information(Maddison, Schulz, & Maddison, 2007).
EOL derives its content from scientific content partners,including many of its predecessors, but also from user-generated content: In December of 2009, EOL began acontent partnership with Wikipedia, importing Wikipediaspecies pages into EOL. Although the addition of thesearticles increased EOL’s species coverage considerably, itwas controversial because of the nontraditional authoringprocess of Wikipedia that allows anyone to contributecontent. Other examples of nonauthoritative source mate-rial that EOL aggregates include Flickr and WikimediaCommons photos. The inclusion of both types of sourcesprovided an excellent window into the challenges thatcontent curation communities face when blending contentfrom sources that have different levels of credibility, asdiscussed later.
Methods
We chose to perform a case study of EOL for severalreasons. Case studies are ideal for understanding an entiresystem of action (Feagin, Orum, & Sjoberg, 1991), in thiscase, a content curation community. One of the goals of thisresearch is to provide actionable insights to content curationcommunity designers; a case study approach draws theboundaries of inquiry precisely around the matter (i.e., thesystem of action) that community designers can influence.Case studies are also ideal for describing emerging phenom-ena, such as content curation communities, where currentperspectives have not yet been empirically substantiated(Yin, 2008). Finally, case studies offer a way of investigatinga phenomenon in depth, and within its real-life context,lending high external validity to the findings. EOL hasseveral characteristics that support the choice of case studyresearch: (a) it is a prototypical example of a content cura-tion community; (b) it is a high-impact project with aninternational reputation in an important scientific domain;(c) it has sufficient history and is large enough to haveencountered many of the potential challenges typical ofthese types of communities; and (d) it is still evolving so thatsome of the insights from this study can be incorporated intoEOL’s design.
Data Collection
EOL conducted a survey in July and August of 2010. Thesurvey was sent to all of EOL’s scientist participants (con-tributors, database partners, content curators) to obtain
information to guide the next phase of EOL’s development.Participants weren’t offered any incentive for their partici-pation in the survey. Considering the lack of reward, partici-pants’ time-demands and heterogeneity, the survey wasdesigned such that the questions were mostly open-endedand each question was independent of other questions, sothat each participant could choose to answer only the ques-tions that were relevant to him, and skip questions that hefound irrelevant. Survey questions asked about participants’field of expertise, opinions about EOL and related efforts,involvement in various activities and initiatives, technologi-cal aspects of the site, and more. A link to the survey wassent by e-mail to 1,225 participants; 281 (23%) respondedand completed the survey. Typical time to completion was15–20 minutes. Because of the survey structure, responserates varied by question, and for questions that were used inthis study, response rate varied from 18% to 86%.
Five interviews were conducted with EOL curators tobetter understand the role of content curators, the nature ofthe work they perform, their perceptions of EOL, and thechallenges and opportunities they and others at EOL face.These interviewees were contacted based on their activeinvolvement in EOL. A semistructured interview approachwas used to allow for follow-up questions that encouragedfurther elaboration of important findings. Additionally,seven interviews were based on a convenience sample(Patton, 1990) of volunteers who contribute to Wikipediaspecies pages and are familiar with EOL. These participantswere contacted based on their levels of participation onWikipedia’s Species Pages, which are displayed on EOL.Interviews were conducted face to face, via Skype, and viae-mail and detailed notes and transcriptions were capturedfor analysis.
Additional data sources were used. One author tookdetailed notes during the EOL 2010 Rubenstein Fellows andMentors Orientation Workshop. The workshop broughttogether nearly all 2010 fellowship recipients and theirmentors, and included a roundtable discussion on expandingthe utility of EOL for scientists, which elicited informationfrom participants about their practical needs and their viewsof the project. The research team also reviewed the nearly200 posts sent to the EOL curators’ Google Group since itsinception in May 2009 until March 17, 2010. These mes-sages helped us to develop our interview protocol and iden-tify salient issues to the community that are used as evidencethroughout the study.
Data Analysis
Notes and transcripts from interviews, workshop obser-vations, open-ended survey questions, and online forumposts were analyzed using a grounded theory approach(Corbin & Strauss 2007). Grounded theory calls for thesystematic analysis of data so that common concepts andideas are extracted and then axially referenced to producehigher level themes and concepts that frame the theoreticalunderstanding of the researched phenomenon. In the spirit
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1097DOI: 10.1002/asi
of grounded theory, we have tried to begin “as close aspossible to the ideal of no theory under consideration and nohypothesis to test” (Eisenhardt, 1989, p. 536). This wasparticularly important for this project, which is a first look ata content curation community. Once the major themes wereidentified and analyzed, we were able to relate them toexisting theories and literature in the findings and discus-sion. All data were reviewed by at least two different authorsto reduce personal biases.
Findings
Our first research question asked: “What are the uniqueinformation and social challenges experienced by EOL as anexample of a content curation community?” We found thatthe primary challenges faced by EOL relate to integration:both integration of information from disparate sources andintegration of social practices and norms from differentcommunities. Although information integration and socialintegration are highly intertwined, we discuss them sepa-rately below.
Information Integration Challenges
Content curation communities deal with large corpora ofcontent from diverse sources, which must be selected, orga-nized, managed, and integrated into a holistic resource. Thisis no small feat, given the complexity of dealing with dif-ferent types of data (e.g., photographs, text, video, audio),different taxonomies, meta-data standards, licensing, andcompeting incentives for data sharing. Ideally, content cura-tion communities will create a one-stop shop, in which thisunderlying complexity is transparent to users. This requirescontent curators to address several information integrationchallenges, two of which were identified as particularlyimportant by EOL participants: (a) dealing with multiple andat times competing standards and (b) ensuring informationquality. In this section, we discuss these challenges and theireffect on the content curation community.
Standards and Classification Challenges
A dominant theme that emerged from all of our datasources was the importance of biological classification in thework that the EOL content curation community performs.Many challenges inherent in the curation work of EOL relateto dealing with different standards of classification and theirhighly contested nature. Biological classification—theprocess of describing, naming, and classifying species intostable hierarchies—provides frameworks for all biologicalresearch. Indeed, the current body of taxonomic informationand practice represents 250 years of research, starting withLinnaeus in the 18th century (Tautz, Arctander, Minelli,Thomas, & Vogler, 2003). When asked to describe theirprofessional role in an open-ended survey question, nearlyhalf (49%) of respondents used the term taxonomist or sys-tematics, both of which refer to a well-defined field of study
that refines and applies taxonomic theory and practice as ameans to further the study of biological diversity.1 Likewise,although only one of the EOL curator interviewees identifiedhimself as a “senior taxonomist,” all of them work closelywith biological classifications as part of their everyday jobs.The frequency and passion associated with comments abouthow EOL deals with biological classification (as detailedbelow) suggest that not only is biological classificationimportant to the field of biology, but it is also seen as fun-damental to the work occurring at EOL.
Standards and classification challenges at EOL emanatefrom debates within the greater biology community, as wellas usability decisions made by EOL. Although we discussthese issues separately, they are inter-related since thebroader classification debates in biology are reified in theways that EOL presents its aggregated content. Disputes onhow to classify species are woven through the history oftaxonomy and evolutionary biology. Classification facespersistent challenges in the form of deep knowledge gaps inhow much is unknown about biodiversity, arguments overnomenclature and species’ names, as well as the extent towhich species should be grouped or separated (Wheeler,2008). In addition, every year new species are discovered aswell as go extinct, constantly changing the scope and hier-archy of classification (Gonzalez-Oreja, 2008). Further-more, existing work on taxonomy is sporadic, with much ofthe research focused on specific types of organisms, whileothers have been largely ignored (Tautz et al. 2003). Oneresult of these longstanding issues is the existence of com-peting classifications within the biological sciences, each ofwhich is recognized as imperfect. Technology for supportinga consensus has not yet solved the challenges of this work(Hine, 2008).
The debates from the greater biological community cometo a head at EOL because the work occurring at EOLrequires that there be a standard type of classification chosento organize the effort and the information in the community.One curator described how she encountered “conflicts inhigher taxonomies,” including one about a particular speciesthat prompted someone to be “really mad about the way itwas done and who was doing it.” Other interviewees anddiscussions in the forums agreed with the sentiment that “intaxonomy there are huge disputes.” Such strong feelingsemerge because taxonomists and species specialists growaccustomed to using a preferred schema. EOL, as an aggre-gator of content based on different classifications, isunavoidably thrown into the middle of such debates, causingtensions within the content curators’ community.
Having only one biological classification option as a pointof contention. Early in its history, all EOL species pageswere organized using the Catalogue of Life (COL) annualchecklist and classification, the product of global partner-ships. Many content curators preferred to use familiar clas-sifications and did not like being forced into using COL,
1See Society of Systematic Biologists: http://systbiol.org/
1098 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi
even though it is considered one of the most comprehensivespecies catalogues available (Orrell, 2011). When askedwhich classification they preferred, survey respondentsnamed multiple systems (e.g., CAS, COF, COL, EOL,Global Checklist, ITIS, TOLWeb, WoRMS); no single clas-sification system was preferred by the majority of respon-dents. Early Google Group discussions clarified theproblems associated with having a rigid classification stan-dard as the basis for which pages exist; a surveyed curatorexplained how EOL “species pages [are] based on nomen-clatorial lists derived second- or third-hand from variousother databases . . . [and] . . . species names that are nomen-clatorially available are not necessarily equivalent to speciestaxa as currently conceptualized by biologists. . . . Thisnomenclatorial chaos has made it extremely difficult tocurate.” One survey respondent described how “the [choiceof] taxonomy on the main part of EOL is a big turn off forrecruiting the group that I am targeting.”
Obstacles for creating master taxonomic classifications. InOctober 2009 the EOL informatics team responded to initialcurator complaints about COL and added a function thatallows content curators to choose the default taxonomicclassification by which they could navigate the site. Partici-pants expressed widely varying opinions about being able tomanipulate specific classifications and integrate new self-designated classifications on EOL. One survey questionasked users of a particular tool (LifeDesk), that is used tocreate a modified or new classification on EOL; of all theparticipants who answered the question, only a third haveused this tool for creating new classifications, but the vastmajority of them (80%) indicated that they would like tohave their classification integrated into EOL, and a smallermajority (56%) would have liked to submit their classifica-tions to other large-scale catalogues as well. However, inpractice, most users do not modify or create new classifica-tions. According to survey participants, the main reasons forthis have to do with usability—”[It is] difficult to determinehow to add or change taxa previously listed in classification”and “I’m not really familiar with the features”—and effort—”[I don’t have the] time.”
Quality Control
Quality control challenges faced by EOL arise because itaggregates content from a variety of sources, including bothtraditional scientific content and nontraditional user-generated content. The decision to incorporate data fromnontraditional sources including Wikipedia and Flickr wasmeant to help increase species’ coverage, while taking stepsto maintain the validity of the data. Wikipedia articlesimported into EOL are initially marked as “nonreviewed”material. Curators can approve the content, which removesthe markings. The specific version of the Wikipedia articlethat was imported and approved remains unchanged even ifthe original Wikipedia article is modified. Content curatorsare encouraged to fix any problems with the page on
Wikipedia itself before approving the text, but cannot makechanges to the EOL version of the content. This is similar tothe recommendation that EOL curators should submit prob-lems they identify to scientific publishers and informationproviders rather than changing them on EOL. The differenceis that Wikipedia allows them to directly make their ownedits. Images of species are offered to EOL by members ofthe EOL Images Flickr group2 which, as of this writing,includes over 2,500 members and 90,000 images. These areautomatically harvested using user-generated tags andplaced on EOL pages, initially with the same marking,suggesting that they have not yet been reviewed. As theyare reviewed, they are then trusted or not trusted bycontent curators on EOL.
Most participants support the integration of content fromnontraditional sources (e.g., Wikipedia, Flickr), but wouldlike this content vetted and/or clearly identified. Though anarguably controversial aspect of EOL, many survey respon-dents and interviewees shared positive feelings regardingEOL’s aggregation of nontraditional sources. When surveyedabout integration of such sources, curators responded: “It’s agood start”; “Many of them are [a] good source of informa-tion”; “Fine by me”; and “Many hands make light work—can’t expect all the pages to be populated by experts.”However, even the most enthusiastic respondents expressed adesire to assure that the content was vetted and/or clearlyidentified as a nontraditional source on EOL. One inter-viewee captured this sentiment by stating that integratingnontraditional content was a “good idea as long as you keepthings apart. The most important thing is that the end userneeds to know where stuff comes from and as long as that isgiven, it is a really good thing to mix specialist content withamateur content.”Alarge group of survey respondents postedsimilar thoughts: “I do not see any problem with that in [the]case that the source is clearly shown” and “As long as it isclearly marked up as unvetted.” Several other curatorsstressed the need to allow them to closely monitor nontradi-tional resources: “It is often wrong—curators should be ableto ‘exclude’ content or mark it as ‘erroneous’ ” and “Fine aslong as it is clearly marked as unvetted.”
Some participants strongly oppose the use of nontraditionalcontent from sources such as Wikipedia. A small but vocalminority of survey participants expressed strong oppositionto the inclusion of nontraditional data. They used harshwords such as “contamination,” “muddling,” and “vandal-ism” to express their concerns. This opposition primarilystems from integrating what participants view as lowerquality content. One interviewee stated, “It may be better tohave no information about a taxon available on the web thaninformation that has not been vetted and may be incorrect.”Some survey participants expressed an even stronger disap-proval when asked their opinion of integrating nontradi-tional content from sources like Wikipedia and Flickr using
2http://www.flickr.com/groups/encyclopedia_of_life/
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1099DOI: 10.1002/asi
the following language: “not acceptable”; “many sources arefull of errors”; “should not be allowed. Too many reams ofspurious data”; “unvetted content continues to perpetuateerrors that slip through the scientific peer review process thathave been passed into the public domain”; and “a lot of whatI have seen is either old or of poor quality.” Although theseopinions were about nonpeer-reviewed, nontraditionalcontent in general, some survey participants went so far as tosingle out Wikipedia and Flickr as undesirable content pro-viders, for example: “In several cases, typos in Flickr con-tributions have resulted in ‘phantom’ species being added toEOL”; “Wikipedia, I’m not so sure about . . . Wikipedia haslots of mistakes”; and “Not a fan of Flickr material.”
A different set of respondents was concerned that usingWikipedia content in EOL makes the distinction between thetwo, and the role of EOL as a scientifically trusted resource,less clear. One interviewee stated, “If EOL = Wikipedia,then why have EOL? A worthy question.” Another said: “Isuspect that many professionals will get frustrated with theproliferation of unvetted material and lose interest in con-tributing, and that as a result, EOL will become so much likeWikipedia that it will lose its relevance, and the point ofEOL was to have vetted content that was checked by spe-cialists. If this is the route to be taken then why wouldanyone come to EOL instead of just going to Wikipedia?”And another: “Don’t think this is a good idea. . . . Whatwould then make EOL different from Wikipedia?”
There is no consensus about where the work of fixing errorsshould occur. The EOL content curation community, by itsnature, faces numerous quality control challenges that stemfrom the variance in accuracy, terminology, and quality ofthe curated resources, whether these resources are tradi-tional (scientific) or nontraditional (user-generated). Tradi-tional resources are often slow to make changes and fixerrors because of their hierarchical and measured work pro-cesses, leading to frustration on the part of the content cura-tors who were surveyed: “EOL suggests I communicate withthe original authors of the source database . . . where nobodyis currently funded to update or repair it.” Yet the alternativeof fixing errors directly on EOL is problematic as well, asthe incorrect information persists in the original source.Unlike traditional resources that necessitate preapproval forchanges and fixing errors, on Wikipedia curators can directlyfix errors without having to wait for editorial approval. Somecurators who were interviewed liked being able to directlyedit Wikipedia pages before importing them to EOL: “[EOLaggregating Wikipedia content] is in some ways easierbecause you can go in and edit on Wikipedia much easierthan on EOL” and “When I heard [EOL] was going to[aggregate Wikipedia], I thought, ‘Oh, great, I’ll make a lotof changes to Wikipedia . . . and have them load into EOLinstead of contributing to EOL, this is easier.” A potentialproblem is having multiple versions of a Wikipedia page,one inside EOL and one outside EOL, or in Wikipedia itself,which could “divide contributors and decrease the quality ofcontent.”
Despite the positive reaction of some participants tomaking edits on Wikipedia, those with a low opinion of thequality of Wikipedia content did not like being forced tomake edits there. For example, one interviewee noted that“Wikipedia is full of errors. . . . I never have time to go cleanthem all up, there’s always errors . . . it’s like a full time joband nobody has full time to do it.” Others were less con-demning of Wikipedia, but still recognized that it is “hard tokeep up with the volume” of data from there and it is a “bigjob to vet it all,” let alone fix it.
Social Integration Challenges
EOL, as a content curation community, brings togethertwo distinct populations: professional scientists and citizenscientists. Each population brings to the table its ownmotivations, practices, accreditation procedures, and norms(Hara, Solomon, Kim, & Sonnenwald, 2003; Ling et al.,2005; O’Hara, 2008; Van House, 2002a). Amalgamating thedifferences between the two populations into an efficientwork process is a major challenge. This section details thevarious obstacles to social integration: (a) the role of open-source resources in the scientific community and (b) conflictamong scientists and citizen scientists.
The role of content curation communities within the largerscientific community. Despite endorsements from promi-nent scientists such as Wilson (2003), and EOL providingthe esteemed and competitive Rubenstein Fellowships, thelarger scientific community has not yet explicitly legiti-mized curation work that occurs at EOL. This happens for anumber of reasons and has multiple implications for EOLand other content curation communities in the scientificrealm. Below, we address three challenges associated withthis issue including (a) a lack of familiarity with the project,(b) lack of legitimacy of curation work within the largerscientific community, and (c) lack of resources to supportthe effort.
Lack of familiarity with EOL. Many scientists do not knowabout EOL despite its growing popularity among Internetusers; they do not have a clear mental model of how EOLrelates to larger scientific endeavors, or know how they caneffectively contribute to EOL. Unlike scientific journals andconferences, which have been around for centuries, contentcuration communities like EOL are a new and unfamiliarscientific genre. As was evident from the survey, scientistswho were not familiar with EOL’s aim were hesitant toparticipate because they did not know what was expected ofthem (“[I] was not aware of requirements / needs of theprogram”). Others were confused about the professionalprerequirements needed for an individual to be able to par-ticipate in the curation activities (“I doubt I am qualified”).
Lack of legitimacy of content curation work. Anotherproblem is that the scientific community does not yet con-sider work on EOL as legitimate scientific activity. The
1100 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi
scientific community can be thought of as a constellation ofcommunities of practice (Wenger, 1998); it comprises indi-viduals belonging to one or more research disciplines andsubdisciplines (i.e., communities) that share a domain, apractice, and a sense of community. As communities ofpractice, scientific communities socially construct what con-stitutes a legitimate contribution to the community. Publish-ing in certain journals (and not others), presenting at certainconferences, reviewing papers, and receiving grants are allwell-established academic practices that are legitimateactivities among most scientific communities. Scientists,especially junior scientists, must present a strong researchtrack, situated within the discipline’s major publicationoutlets and other legitimized activities, to advance (Latour &Woolgar, 1979).
Although content curation communities like EOL con-tribute indirectly to advancing science through improvedoutreach and provision of information that can supportscience, the work of curating existing scientific knowledgeis not as valued by scientific communities as creating novelscientific knowledge. This marginalizes the contributions ofcontent curators who are torn between using their precioustime on traditional, legitimate activities and curation activi-ties that are not yet fully legitimized, but are highly valuedby the curators themselves and other interested parties (e.g.,the public). The result is that curation activities are notconsidered a priority among many scientists who were sur-veyed: “This does not count with respect to my universityaccounting—not publications, not grants, not teaching—soit is very hard to find time to do this sort of thing.” Curationactivities are also almost completely unsupported by thescientific community as a whole, as several curators com-mented: “iI has been difficult to convince tenured scientiststo contribute” and “A number of scientists in my field haveexpressed reservations about sharing content with EOL. So,there is not [sic] impetus from the community, and I have notpursued possible sharing of content due to the concerns thatpeople have expressed.”
Recognizing the importance of individual attribution inscientific domains, EOL has documented each curator’s con-tributions. For example, following a link to “Who can curatethis page?” on a species page takes a reader to a list ofcontent curators. Clicking on any of their names will provideinformation and links to all of their contributions within thesite. As one interviewee noted, this strategy is rewarding tosome participants: “Having your content represented onEOL is more rewarding because your name is associatedwith it, people can know what you’ve done.” However, itdoes not address the larger issue of the scientific communityrecognizing these curation activities as legitimate contribu-tions to science, as would be evidenced by, for example,tenure committees counting the number and quality ofcurated species pages or improvements to the EOL taxo-nomic classification scheme.
Lack of resources for content curation work. Issues offunding, time availability, and legitimacy of curation work
are all intertwined. Almost 40% of the survey participantswho were asked about their reason for not actively curatingEOL content associated funding, recognition, and timeaspects: “Lack of time/funding . . . I have no research grants. . . volunteer work is a luxury” and “Time is my biggestlimitation. I’ve thought of using LifeDesk for the fauna ofmy region but need some support to do this—hint, hint.”Without formal recognition by the scientific community ofopen-access content curation efforts, significant sustainablefunding will be lacking. This will keep curation activities inthe realm of “volunteer work” rather than “scientific work,”which will in turn make it harder to receive certain types offunding. For now, the opportunity cost of the time spentcurating at EOL is borne by the content curators themselves,which limits participation.
Professionals and nonprofessionals: Collaboration andconflict. A different social integration challenge, whichEOL faces, is the need to bridge between different contribu-tor populations. Scientists and citizen scientists participatein scientific endeavors for different reasons: citizen scien-tists participate in collective scientific activities out of bothegoistic and altruistic motivations, such curiosity, sense ofcommunity, and commitment to conservation and relatededucational efforts (Evans et al., 2005; Raddick et al., 2010;Rotman et al., 2012), while professional scientists contrib-ute to the formal processes and structure of “the scientificmethod” to advance science and further their own profes-sional career (Cohn, 2008; Latour & Woolgar, 1979).Designing a community that appeals to these distinct popu-lations can be challenging, although it also affords opportu-nities to leverage the differences to accomplish more thancould be accomplished with only one of the groups.
EOL curators in the study held highly divergent viewsregarding the role citizen scientists can and should play incuration activities at EOL, and their willingness to workwith them. The main reason that scientists were reluctant tocollaborate with citizen scientists was their fear that citizenscientists do not uphold the same rigorous scientific stan-dards that professional scientists are committed to: “I thinkthat the benefit/cost ratio of participation by nonexpertsshifts in groups that are popular with amateurs and untraineddilettantes.”
For other scientists who were interviewed, collaborationswith citizen scientists or nonprofessionals were perceived asa net gain: “[Experts and nonexperts] contributing to thesame resource pages . . . can be both useful and potentiallydamaging, though I haven’t seen any of the later” and “[I]really like working with the Flickr community. I like theimages. [It is] always fun working with people and seeingtheir different approaches.” However, even those who sup-ported collaboration with citizen scientists expressed adesire to distinguish clearly between trusted, professionalscientists and untrusted nonprofessionals (e.g., “amateurs,”“untrained dilettantes,” “nonexperts”). Surveyed scientistsexpressed nearly universal desire to “vet,” “approve,” and“control” content submitted by citizen scientists, or at least
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1101DOI: 10.1002/asi
have it be clearly differentiated from expert content (e.g.,“content should be marked with different levels of trustabil-ity”). They preferred to retain control of their expertopinions and privileged status that gives them the finalword. This view positions citizen scientists at the bottomof the curation ladder and professional scientists at the top,maintaining the traditional power balance between themandreducing some of the collaborative effect of contentcuration communities.
Discussion—Facing the Challenges That ContentCuration Communities Present
Content curation communities flourish in various fields.From science to marketing to hobbies, they serve users andcommunities who find content aggregation, annotation, andarchiving valuable. They may differ in the type of contentthey curate (images, texts, preformed data, etc.), in their use(personal or collective) and in their structure (centralizedand hierarchal vs. decentralized and flat), but, surprisingly,many of them share similar challenges. Table 1 providesexamples of content curation communities from differentdomains, and uses the “informational” and “social” chal-lenges framework identified in EOL to comment on them.Although detailed analysis of each example was beyond thescope of this article, the table provides insights into thecommonalities and differences among content curationcommunities.
When comparing the various examples of content cura-tion communities with EOL (Table 1), we can clearly seethat most similarities can be found between EOL andcontent curation communities in the academic domain, suchas chemistry and linguistics. This is not surprising given theoverreaching purpose of these communities: to share andexchange data and knowledge among a broad spectrum ofusers, emphasizing professional academically orientedusers. A more important distinction is that these academiccommunities’ need to uphold scientific principles and aca-demic rigor, in both the work processes they endorse and thequality of the data they offer. These needs emphasize thechallenges that pertain to information validation and struc-ture as well as social integration aspects that are tied to theinformation. Communities addressing more “leisurely”activities, such as news consumption, reading and otherhobbies, represent a different genre of content curation,where the consumption of curated materials can be indi-vidual or collaborative. More importantly, the data qualityissue, while relevant, can be less controversial and plays asmaller role in establishing the structure and the legitimacyof the community.
Information Integration
In this section, we address the second research question:“What are the key information and social design choicesavailable to designers of content curation communities?”and place it in the broader context of scientific content
curation communities. We outline design recommendationsthat may help to address the challenges related to informa-tion integration and social integration. Some of these rec-ommendations defy neat classification as they apply to boththemes, so they are discussed jointly.
The ongoing debates regarding standardization choicesare a major challenge for information integration on EOLand other taxonomic content curation sites: Standardization,e.g., creating taxonomies and controlled vocabularies, pro-vides a universal framework for scientists to communicateand it presents a shared representation of a field of knowl-edge that provides the structure for understanding andcollaboration. Information standards can be viewed asboundary objects (Bowker & Star, 2000; Star, 2010) orboundary infrastructures (Star, 2010). Boundary infrastruc-tures, in their broadest sense, are items that create a sharedspace or “a sort of arrangement that allow different groups towork together without consensus” but within organic set-tings that require them to address “information and workrequirements” (Star, 2010, p. 602). EOL focuses on spec-ies classification systems that allow for communicationbetween scientists with different subspecialties, languages,and resources. A careful approach is needed for choosingtaxonomies that will serve as boundary infrastructures andhence support shared understanding and communication.Some possible options could include the following.
Option 1: Choose or develop a master standard. We usethe example of classification to address the broader chal-lenge of creating and maintaining uniform data standards.The initial approach was for EOL to select one master tax-onomy to serve as a unifying boundary object; EOL decidedon the COL as the default consensus classification. Althoughthis approach is the most parsimonious, it is not withoutproblems. As indicated by the findings above, many partici-pants were not pleased with this option. In addition, usingEOL as a platform to foster a new master or universal clas-sification schema would drastically shift the focus of theproject, in terms of purpose and resource allocation. It alsomay not be stable and entirely consistent, as agreement onbreakpoints may not be universal or enduring. Therefore,although this option may be relevant to other content cura-tion communities that face standardization challenges, it isnot a plausible solution for EOL, unless it can be achieved ina fluid way that does not promise consistency or authority.This is reminiscent of the notion that these databases “do notsweep away the past but engage with it in dynamic fashion(Hine, 2008, p. 150).
Option 2: Simultaneously support multiple standards. Analternative option is to do what EOL currently does, andallow content curators and users to choose which taxonomicclassification to use when viewing species pages. It is clearthat this approach helps support content curators’ prefer-ences and allows for additional taxonomic information to bepulled into EOL. However, it does not push the larger com-munity toward the creation of a unifying boundary standard,
1102 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi
which could benefit the scientific community and enablemore meaningful interdisciplinary exchanges. This optionalso necessitates extensive support for users, whether byproviding detailed documentation and help and discussionpages, or by automating and streamlining the selectionprocess.
Option 3: Flexibility in determining the default stan-dard. Alternative options could also be explored. A com-promise between the two options would be to supportmultiple taxonomic classifications, but let the content cura-tors determine the default classification for any given set ofspecies. This has the benefit of helping novices who may notknow which standard is most appropriate to choose, as wellas assisting in the development of a unified taxonomic clas-sification. In addition, EOL could allow users and contentcurators to create and share lists of species (e.g., species ina particular ecosystem, species that are significant to users)to supplement taxonomic classifications as meaningful waysto browse and explore EOL.
Noticeably absent from the EOL participants’ dialoguesaround standards was a recognition of the potential role ofinformation professionals. This is not surprising given the“invisible” nature of information work more generally(Bates, 1999; Ehrlich & Cash, 1999). Information pro-fessionals’ extensive experience working with multiplestandards, open-access resources, content databases, andinterfaces could be leveraged, as they bridge between thedifferent subcommunities that form the larger content cura-tion community, address standardization challenges, andcreate new mechanisms for data curation. EOL currentlyemploys information professionals on the project, but maybenefit from active solicitation of microcontributions fromacademic science librarians and other information profes-sionals who would join the ranks of EOL curators. Thesemicrocontributions may include work related to standard-ization, as well as other traditional information work suchas verifying citations and writing species summaries (afamiliar activity to those who have done abstracting work).
Other factors that complicate the role of curators arisefrom the rapidly changing digital information world.Data curated in many content curation communities, andspecifically scientific content curation communities, aresubmitted in different formats (text, tables, graphical,visualizations) using multiple media (e.g., static anddynamic visualizations, audio, video, and multimedia) withdifferent technical standards. Knowing the source of thedata is becoming increasingly complex. Content curationcommunities like EOL start to draw from not only scien-tific (e.g., GBIS), vetted, and known sources (e.g., Wiki-pedia) but also the growing number of volunteer citizenscience projects that are springing up across the world(iNaturalist, iSpot, Project Noah). Keeping track of thesources of this data, checking the validity of the data andensuring appropriate credit is given to contributors willbecome increasingly complex as more people becomeinvolved.
To assist both curators and contributors, more visualiza-tion tools that clearly articulate changes over time could bedeveloped, akin to “Historyflow,” which was used for theanalysis of wiki contributions (Viégas, Wattenberg, &Dave, 2004) or “Tempoviz,” which is used for tracking net-works of contribution over time (Ahn, Taieb-Maimon,Sopan, Plaisant, & Shneiderman, 2011; Bongwon, Chi,Kittur, & Pendleton, 2008). Visualizations can be apowerful tool at both the macrolevel and the microlevel.On the macro level, visualizations can be used to uncoverthe implicit activity behind the curation scene, as well aspower relations between curators, data gaps, and areas thatrequire intense community efforts. On the microlevel, visu-alization tools can offer curators and users insights aboutthe history of each contribution, whether it replaced previ-ous content, what is its quality and values as assessed bythe community, and who curated or modified it.
Social Integration
Large-scale content curation communities such as EOLcan succeed only if they find ways of facilitating effectivecollaboration among different groups of contributors, whobring significant domain-specific knowledge with them,whether they are professional credentialed scientists, volun-teer ecology citizen scientists, or information professionals.We envision a successful model for a content curation com-munity as one that is based on a hybrid structure, whereprofessional scientists collaborate with citizen scientists,with the assistance of professional information specialists,all committed to the same idea of open-access content cura-tion. To achieve this and overcome the social integrationissues discussed above and issues pertaining to qualitycontrol, which are intertwined with social integration, wesuggest the following design recommendations: (a) estab-lishing active subgroups within large content curation com-munities, (b) establishing the role of open-access resourcesin the scientific community, (c) facilitating appropriate task-routing, (d) recognizing other contributors’ efforts, and (e)building on and developing new techniques to motivate andrecognize participation.
Establishing active subgroups within large content curationcommunities. The size and scope of crowdsourced contentcuration communities make community-building efforts dif-ficult. One challenge is that members have diverse interestsand are associated with multiple communities of practice,each of which has its own shared domain of interest (e.g., forEOL, insects, plant biology), mutual engagement (e.g., con-ferences, journals, relationships), and shared repertoire (e.g.,terminology, values, ways of doing things). In its earlystages of development, content curation communities likeEOL focus on creating their own community of practicerelated to curation of content. When they grow sufficiently,we propose that they actively develop interconnected sub-groups that mirror other existing communities of practice(e.g., for EOL, invasive species, birds, phylogenetics). Such
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1103DOI: 10.1002/asi
subgroups could create master lists (“collections”) of pagesof shared interest and can discuss relevant topics. This canpromote users’ sense of community, emphasizing collabora-tion over individual data curation. By welcoming subgroupswithin their walls, content curation communities will notonly be places from which to send and receive content, theywill also become a place to engage directly in the corepractices of its members and subgroups.
Understanding the need to support the various practicesand norms of its sub-communities, EOL is currently roll-ing out tools that will facilitate a federated network ofsubcommunities.
Establishing the value of community-curated content in thescientific community. Scientists are committed to the tra-ditional standards that are set by their individual scientificcommunities. Currently, work on open-access data reposi-tories is not highly valued in the scientific promotionprocess, or in acquiring funding for research. Several routesof action can help in changing this predicament:
• Active efforts to integrate community curated data into sci-entific publications. This can be achieved by directing effortsto promote content curation communities and their productswithin the scientific community (in scientific journals andconferences). It can also be done structurally, by creatingeasy-to-use import tools that will enable scientists to pullcurated data into their databases for continuous work (e.g. viathe EOL API), or by consolidating vetted content into scien-tific publications (e.g., citing the curated source in back-ground material, text mining the community for new insights,or aggregating and repurposing analyzable data). Forexample, the ChemSpider content curation community(Pence and Williams, 2010) is integrated with Royal Societyof Chemistry journals and databases like PubMed, which usethe curated content to facilitate new types of searches.
• Providing financial support for active content curators.EOL’s Rubenstein Fellowships is a program for budding sci-entists that provides financial support and mentorship forup to 1 year. Rubinstein Fellowships are competitive andprestigious positions and can contribute to the scientists’ cre-dentials and future funding. Similar efforts can help in estab-lishing the scientists’ status within their respective scientificcommunities, and simultaneously enhance the role of thecontent curation communities as reputable entities within thescientific establishment. Interestingly, there is an emergenceof a novel “biocurator” profession in molecular biology(Sanderson, 2011), suggesting that some scientific communi-ties acknowledge the merits of such activities. However,developing and continuously supporting positions that upholdthese activities requires a substantial financial investment thatnot all content curation communities have. This is especiallytrue for budding content curation communities that arise outof community members’ interests and are not backed up bywell-established institutions.
• Endorsing curation work at academic institutions. Manyactivities performed by scholars such as reviewing articlesand grants, serving on editorial boards, and engaging withlocal communities are endorsed by academic institutions thatexpect their faculty members to engage in them even though
they are not explicitly paid to do so. Recognizing contribu-tions to content curation communities as a form of highlyvalued professional service, like journal reviewing, could goa long way towards motivating participation in curating tasks.
Facilitating the process—Getting the right task to the rightpeople. In all content curation communities, but especiallyin hybrid ones that are built upon collaboration betweendifferent user populations, a special emphasis should beput on facilitating a streamlined curation process. Part ofwhat makes peer-production communities, like those whodevelop open source software and Wikipedia, successful isthe ability of individuals to assign themselves to the tasksthat they can complete most readily (Benkler, 2002).Nobody knows a person’s interests and skills better thanhimself; furthermore, people are highly motivated to con-tribute when they feel that their contribution is needed andunique (Ling et al., 2005). What is needed, then, are toolsthat facilitate the matching of individuals skills and interestswith curation needs. The use of Watchlists (as on Wikipedia)that notify a user when changes are made to a page they are“watching” can help people stay abreast of changes butdoesn’t point users to gaps in the existing content. Cosley,Frankowski, Terveen, and Riedl (2006) suggested bridgingthis gap in the Wikipedia community with their automatictask routing tool that uses past editing behavior to recom-mend new pages a user may want to edit. Similar appro-aches could be modified and applied to content curationcommunities.
Many other options exist as well, such as allowing indi-vidual content curators to post lists of microcontributions or“wanted ads” that other contributors can monitor (i.e., sub-scribe to) or have sent to them based on a profile they havefilled out describing their interests and level of expertise.Posting EOL needs to the EOL Flickr group has worked wellfor soliciting needed photos, but this has not yet been scaledup or automated for other photos or other types of content.Such routing processes can also recommend relativelysimple tasks at first and, after an “initiation” period, advanceto more complex and high-level tasks, such as vettingresources. This would also allow users to grow and progressand work towards attaining a higher status within the contentcuration community. No matter the mechanism, helpingpotential contributors know what is needed and empoweringthem to accomplish those tasks is paramount to the successof content curation communities.
We have identified quality control, both in regards tocontent and to curation, as a crucial factor for the success ofa content curation community. Creating a mechanism fortask routing can also allow for better quality control:Improved control and feedback of the work that is beingdone within the content curation community will rid thecommunity of some of the inherent weariness of “dalli-ances” and “contamination” of resources and hesitation ofassociation with nonprofessionals. Such task routing emu-lates the traditional scientific apprenticeship process, inwhich a novice is accepted into the scientific community
1104 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi
through legitimate peripheral participation (Lave & Wenger,1991). Through this process of initiation, citizen scientistsand novice curators can become trustworthy and establishedpartners in the content curation community. EOL expects tolaunch tools to support this process in the near future.
Recognizing other contributors’ efforts. To enable socialintegration, users first have to acknowledge other users’presence and contributions to the community. In large-scalecommunities, such as EOL, this is an especially difficult taskbecause of the federated nature of the activity. Therefore,explicit mechanisms are required to create such awareness.Portraying users and their contribution, by way of attributingthe data to the users who curated it, on the front page of thecommunity on a rotating basis, as well as on other desig-nated pages, can increase other users’ awareness of them andheighten their status within the community (Kriplean et al.,2008; Preece & Shneiderman, 2009). EOL currently uses“featured pages,” as does Wikipedia, which could be aug-mented by “featured curators” or “featured contributor”pages to recognize important contributors, make clear tofirst-time visitors that it is a community-created site, andpoint to model content curators for others to emulate.Another form of attribution could be facilitated by creatinga feedback mechanism that will notify users whether thedata they contributed or curated was vetted and approved.Similarly, a feedback mechanism could also alert a user tothe use and reuse of the data he contributed both within thecommunity and by third parties such as when it is down-loaded, viewed, or is cited in publications. By providingperiodic personal reports of the number of data downloads,citations, and reuse, and sharing these statistics with theentire community, the contributor is openly appreciated andhis social presence within the community is heightened.Scientists could include data from these reports on theircurricula vitae to help legitimize the work being done.
Other mechanisms that are needed for social integrationare internal conflict resolution tools, such as Wikipedia“talk” pages (cf. Kittur & Kraut, 2008) or GenWeb’s “griev-ance committee.” The proposed federated subcommunitiescan provide a place to discuss disagreements and issues thatstem from different disciplinary backgrounds and practices.In this context, dedicated and long-time contributors (bothscientists and citizen scientists) may also be conferred therole of “mediators” in case topical and procedural conflictsarise. EOL is currently instituting a role of “master cura-tors,” who will take upon themselves some of the aforemen-tioned social facilitation roles.
Building on and developing new techniques to motivate andrecognize participation. Some techniques for motivatingcontributions, such as “thank you” feedback notes, and rec-ognizing contributors participation, such as reputationsystems, have already been mentioned. But these examplesare just the tip of today’s iceberg in promoting the socialparticipation needed for content curation communities toflourish. iSpot, a site for teaching about biodiversity devel-
oped by the UK Open University (http://www.ispot.org.uk/),provides activities for students and the public to engage,motivate, and reward contributors. iSpot’s state-of-the-artreputation system combines recognizing expertise and accu-racy of suggestions with effort and social interaction: Userswho correctly helped identify species receive “reputationpoints” based on the level of agreement and helpfulness theyreached. Reputation is based on collective agreement anddiscussion and cannot be easily manipulated. Buildingsimilar, task-based reputation systems is an important steptowards improved social integration. An even bolder stepwould be to implement systems that “transfer” reputationfrom one content curation community to another. This way,participants need not start afresh when contributing toseveral communities, as their expertise will already havebeen substantiated. This is a potent idea, but one that needsto be implemented carefully: Transferring reputation is mostrelevant to content curation communities that share topicalsimilarities, as different domains require different types andlevels of knowledge and expertise.
Conclusion
In this article, we define a new genre of online commu-nity, “content curation communities,” using the EOL as acase study. We conducted a multimethod analysis, combin-ing survey and interview data with observations and analysisof online interaction. Our aim was to explore and understandthe challenges faced by content curation communities and torecommend potential design solutions. Although our focushas been on EOL, we believe many of the challenges anddesign recommendations will apply to other content cura-tion communities as well.
In the past, the contribution of citizen scientists, natural-ists, and enthusiasts to the development of science, specifi-cally astronomy, biology, and geology, has been welldocumented (Ellis & Waterton, 2004; Schiff, Van House, &Butler, 1997; Van House, 2002b), but their participation wasusually considered peripheral and limited to one-sided datatransfer with little or no mutual knowledge exchangebetween scientists and enthusiasts, but this is changing.Content curation communities offer the opportunity to shiftthe balance towards more collaborative practices, whereboth scientists and citizen scientists work together to createa vetted and reputable repository that can be used for tradi-tional scientific endeavors and for education beyond thescientific community.
Bringing together heterogeneous populations of con-tributors is challenging from both practical and social stand-points. Creative design solutions, such as those discussedabove, may ameliorate these challenges.
EOL is larger and more ambitious than most contentcuration communities, yet it can be viewed as a typical andrepresentative instance of the new genre of collaborativeinformation resources. It is still evolving and growing, andin its growth, it is currently facing the challenges outlined inthis article. By rethinking and redesigning some of the
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1105DOI: 10.1002/asi
online tools available to content curators, the community isattempting to face these challenges. This is done based onnot only the lessons learned from the daily operation of EOLand the findings of this study but also feedback receivedfrom EOL curators. An approach that speaks to contributors’needs and allows their voice to be heard may not onlysupport the design changes of the content curation commu-nity but also enhance contributors’ sense of community andtheir willingness to further contribute to the community inthe future. EOL is already implementing some of thesechanges in its current version that was recently released.These changes will be the basis for future studies of contentcuration communities.
It should be noted that the study is not devoid of limita-tions. The concept of content curation communities is arelatively new one, and although we have attempted to focuson the most pressing issues faced by EOL, other typesof content curation communities, within and outside thescientific realm, may face other challenges that were notaddressed in this article. Comparable examples of collabo-rations between scientists and citizen scientists in otherdisciplines (e.g., astronomy or chemistry) may surface dif-ferent challenges. Similarly, our analysis of scientificcontent curation communities follows the current structureof the academic world. The advance of crowdsourcingcitizen-science projects may bring about a fundamentalchange to how science is done and to the balance betweenscientists and volunteers. This change will necessitate a newlook at the informational and societal infrastructures thatfacilitate scientific work. Despite these limitations, webelieve the concept of content curation communities isenduring and a useful frame for starting a dialogue aboutrelated communities. Furthermore, the focus on informationand social challenges seems to apply well to multiplecontent curation communities as shown in Table 1.
Future research on content curation communities shouldencompass other important issues as follows: (a) motiva-tional factors affecting professionals’ and volunteers’ par-ticipation in content curation communities, (b) the role ofinformation professionals within content curation com-munities, (c) strategies for developing a strong sense ofcommunity within content curation communities, and (d)different types of content curation communities and designstrategies that support them. Our future work will addressthese topics.
References
Ahn, J., Taieb-Maimon, M., Sopan, A., Plaisant, C., & Shneiderman, B.(2011). Temporal visualization of social network dynamics: prototypesfor nation of neighbors. Social Computing, Behavioral-Cultural Model-ing and Prediction. In Salerno, J., Yang, S., Nau, D. & Chai, S.K. (Eds.),(Vol. 6589, 309–316). Berlin: Springer. DOI: 10.1007/978-3-642-19656-0_43
Bates, M.J. (1999). The invisible substrate of information science. Journalof the American Society for Information Science and Technology,50(12), 1043–1050.
Benkler, Y. (2002). Coase’s Penguin, or, Linux and “The Nature of theFirm.” The Yale Law Journal, 112(3), 369–446.
Birnholtz, J.P., & Bietz, M.J. (2003). Data at work: Supporting sharing inscience and engineering.
Bongwon S., Chi, E.H., Kittur, A., & Pendleton, B.A. (2008). Lifting theveil: improving accountability and social transparency in Wikipedia withwikidashboard. In Proceedings of the SIGCHI conference on HumanFactors in Computing Systems (CHI ‘08) (pp. 1037–1040). New York:ACM Press.
Borgman, C.L. (2007). Scholarship in the digital age: Information, infra-structure, and the Internet. Cambridge, MA: The MIT Press.
Bos, N., Zimmerman, A., Olson, J., Yew, J., Yerkie, J., Dahl, E., & Olson,G. (2007). From shared databases to communities of practice: A tax-onomy of collaboratories. Journal of Computer Mediated Communica-tion, 12(2), 652–672.
Bowker, G.C., & Leigh Star, S. (2000). Sorting things out: classificationand its consequences. Cambridge, MA: The MIT Press.
Buneman, P., Cheney, J., Tan, W.-C., & Vansummeren, S. (2008). Curateddatabases. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGARTSymposium on Principles of Database Systems (PODS ’08) (pp. 1–12).New York: ACM Press.
Cohn, J.P. (2008). Citizen science: Can volunteers do real research?BioScience, 58(3), 192–197.
Corbin, J., & Strauss, A.C. (2007). Basics of qualitative research:techniques and procedures for developing grounded theory (3rd ed.).Thousand Oaks, CA: Sage.
Cosley, D., Frankowski, D., Terveen, L., & Riedl, J. (2006). Using intelli-gent task routing and contribution review to help communities buildartifacts of lasting value. In Proceedings of the SIGCHI conference onHuman Factors in computing systems (CHI ‘06), R. Grinter, T. Rodden,P. Aoki, E. Cutrell, R. Jeffries, and G. Olson (Eds.). NewYork, NY: ACMPress 1037–1046. DOI: 10.1145/1124772.1124928
Eisenhardt, K.M. (1989). Building theories from case study research. TheAcademy of Management Review, 14(4), 532–550.
Ellis, R., & Waterton, C. (2004). Environmental citizenship in themaking: the participation of volunteer naturalists in UK biologicalrecording and biodiversity policy. Science and public policy, 31(2),95–105.
Encylcopedia of Life. (2011). Retrieved from www.eol.org/aboutEhrlich, K., & Cash, D. (1999). The invisible world of intermediaries:
A cautionary tale. Computer Supported Collaborative Work, 8(1–2),147–167.
Evans, C., Abrams, E., Reitsma, R., Roux, K., Salmonsen, L., & Marra, P.P.(2005). The neighborhood nestwatch program: Participant outcomes of acitizen science ecological research project. Conservation Biology, 19(3),589–594.
Feagin, J.R., Orum, A.M., & Sjoberg, G. (Eds.). (1991). A case for the casestudy. Chapel Hill, NC: The University of North Carolina.
Finholt, T.A. (2002). Collaboratories. Annual review of informationscience and technology, 36(1), 73–107.
Forte, A., & Bruckman, A. (2005). Why do people write for wikipedia?Incentives to contribute to open. content publishing. Proc. of GROUP 05Workshop: Sustaining Community: The Role and Design of IncentiveMechanisms in Online Systems. Sanibel Island, FL. 6–9.
Gonzalez-Oreja, J. A. (2008). The Encyclopedia of Life vs. the brochure oflife: Exploring the relationships between the extinction of species and theinventory of life on Earth. Zootaxa. 1965, 61–68.
Hansen, D.L., Ackerman, M., Resnick, P., & Munson, S. (2007). VirtualCommunity Maintenance with a Repository. In Proceedings of theAmerican Society for Information Science and Technology 44(1), 1–20.
Hara, N., Solomon, P., Kim, S.L., & Sonnenwald, D.H. (2003).An emerging view of scientific collaboration: Scientists’ perspectiveson collaboration and factors that impact collaboration. Journal of theAmerican Society for Information Science and Technology, 54(10), 952–965.
Hine, C. (2008). Systematics as cyberscience: Computers, change andcontinuity in science. Cambridge, MA: MIT Press.
Hughes, B. (2005). Metadata quality evaluation: Experience from the openlanguage archives community. Digital Libraries: International Collabo-ration and Cross-Fertilization, 135–148.
1106 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012DOI: 10.1002/asi
Kittur, A., Chi, E., Pendleton, B.A., Suh, B., & Mytkowicz, T. (2007).Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise ofthe Bourgeoisie. Alt.CHI, 2007. San Jose, CA.
Kittur, A., & Kraut, R.E. (2008). Harnessing the wisdom of crowds inwikipedia: Quality through coordination. In Proceedings of the 2008ACM Conference on Computer Supported Cooperative Work (CSCW‘08) (pp. 37–46). New York: ACM Press.
Kriplean, T., Beschastnikh, I., & McDonald, D.W. (2008). Articulations ofwikiwork: uncovering valued work in wikipedia through barnstars. InProceedings of the 2008 ACM Conference on Computer SupportedCooperative Work (CSCW ‘08) (pp. 47–56). New York: ACM Press.
Lampe, C., & Resnick, P. (2004). Slash (dot) and burn: Distributed mod-eration in a large online conversation space. In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems (CHI‘04) (pp. 543–550). New York: ACM Press.
Latour, B., & Woolgar, S. (1979). Laboratory life: The construction ofscientific facts: Princeton Univ Press.
Lave, J., & Wenger, E. (1991). Situated Learning: Legitimate peripheralparticipation. Cambridge, UK: Cambridge University Press.
Li, Q., Cheng, T., Wang, Y., & Bryant, S.H. (2010). PubChem as a publicresource for drug discovery. Drug Discovery Today, 15(23–24), 1052–1057. doi: 10.1016/j.drudis.2010.10.003
Ling, K., Beenen, G., Ludford, P., Wang, X., Chang, K., Li, X., Cosley, D.,Frankowski, D., Terveen, L., Al Mamunur, R., Resnick, P., & Kraut, R.(2005). Using social psychology to motivate contributions to onlinecommunities. Journal of Computer-Mediated Communication, 10(4),Article 10.
Maddison, D.R., Schulz, K.S., & Maddison, W.P. (2007). The tree of lifeweb project. Zootaxa, 1668, 19–40.
Nov, O. (2007). What motivates wikipedians? Communications of theACM, 50(11), 60–64.
O’Hara, K. (2008). Understanding geocaching practices and motivations. InProceedings of the 26thAnnual SIGCHI Conference on Human Factors inComputing Systems. (CHI ‘08) (pp. 1177–1186). NewYork: ACM Press.
Orrell, T. (2011). ITIS: The integrated taxonomic information system.Retrieved from http://www.catalogueoflife.org/col
Patton, M.Q. (1990). Qualitative evaluation and research methods.Newbury Park, London: Sage.
Pence, H.E., & Williams, A. (2010). ChemSpider: An Online ChemicalInformation Resource Journal of Chemical Education 2010 87(11),1123–1124.
Preece, J. (2000). Online communities: Designing usability, supportingsociability. Chichester, UK: John Wiley & Sons.
Preece, J., & Shneiderman, B. (2009). The reader-to-leader framework:Motivating technology-mediated social participation. AIS Transactionson Human-Computer Interaction, 1(1), 5.
Raddick, J., Lintott, C.J., Schawinski, K., Thomas, D., Nichol, R.C., Andree-scu, D., Bamford, S., Land, K.R., Murray, P., Slosar, A., Szalay, A.S., &Vandenberg, J. (2007). Galaxy zoo: An experiment in public scienceparticipation. Bulletin of the American Astronomical Society, 39,892.
Raddick, M.J., Bracey, G., Gay, P.L., Lintott, C.J., Murray, P., Schawinski,K., Szalay, A., & Vandenberg, J. (2010). Galaxy zoo: Exploring themotivations of citizen science volunteers. Astronomy Education Review,9(1), 010103. doi:http://dx.doi.org/10.3847/AER2009036
Rotman, D., & Preece, J. (2010). The “WeTube” in YouTube—Creating anonline community through video sharing. International Journal of Web-based Communities, 6(2), 317–333.
Rotman, D., Preece, J., Hammock, J., Procita, K., Hansen, D., Parr, C.S.,Lewis, D., & Jacobs, D. (2012). Dynamic changes in motivation incollaborative ecological citizen science projects. In Proceedings of theACM Conference on Computer Supported Collaborative Work (CSCW‘12). New York: ACM press.
Sanderson, K. (2011). Bioinformatics: Curation generation. Nature,470(7333), 295–296.
Schiff, L.R., Van House, N.A., & Butler, M. H. (1997). Understandingcomplex information environments: a social analysis of watershed plan-ning. In Proceedings of the Second ACM International Conference onDigital libraries (DL ‘97) (pp. 161–168). New York: ACM Press.
Star, L.S. (2010). This is not a boundary object: Reflections on the originof a concept. Science, Technology & Human Values, 35(5), 601–617.
Sullivan, B., Wooda, C.L., Iliffa, M.J., Bonneya, R.E., Finka, D., &Kelling, S. (2009). eBird: A citizen-based bird observation networkin the biological sciences. Biological Conservation, 142(10), 2282–2292.
Tautz, D., Arctander, P., Minelli, A., Thomas, R.H., & Vogler, A.P. (2003).A plea for DNA taxonomy. Trends in Ecology & Evolution, 18(2),70–74.
Van House, N.A. (2002a). Digital libraries and practices of trust: networkedbiodiversity information. Social Epistemology, 16(1), 99–114.
Van House, N.A. (2002b). Trust and epistemic communities in biodiversitydata sharing. In Proceedings of the Second ACM/IEEE-CS Joint Con-ference on Digital libraries (JCDL ‘02) (pp. 231–239). New York: ACMPress.
Viégas, F.B., Wattenberg, M., & Dave, K. (2004). Studying cooperation andconflict between authors with history flow visualizations. In Proceedingsof the SIGCHI Conference on Human Factors in Computing Systems(CHI ’04) (pp. 575–582). New York: ACM Press.
Wenger, E. (1998). Communities of practice—Learning, meaning, andidentity. Cambridge, MA: Cambridge University Press.
Wheeler, Q. (2008). The new taxonomy. New York: CRC Press.Wilson, E.O. (2003). The encyclopedia of life. Trends in Ecology &
Evolution, 18(2), 77–80.Yates, D., Wagner, C., & Majchrzak, A. (2010). Factors affecting shapers of
organizational wikis. Journal of the American Society for InformationScience and Technology, 61(3), 543–554.
Yin, R.K. (2008). Case study research: Design and methods (4th ed.).Thousand Oaks, CA: Sage.
Zimmerman, A. (2007). Not by metadata alone: the use of diverse forms ofknowledge to locate data for reuse. International Journal on DigitalLibraries, 7(1), 5–16.
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2012 1107DOI: 10.1002/asi