wildbook: crowdsourcing, computer vision, and data science ... · wildbook: crowdsourcing, computer...

8
Wildbook: Crowdsourcing, computer vision, and data science for conservation. Tanya Y. Berger-Wolf University of Illinois at Chicago Chicago, IL [email protected] Daniel I. Rubenstein Princeton University Princeton, NJ [email protected] Charles V. Stewart Rensselaer Polytechnic Inst. Troy, NY [email protected] Jason A. Holmberg WildMe.org Portland, OR [email protected] Jason Parham Rensselaer Polytechnic Inst. Troy, NY [email protected] Sreejith Menon Bloomberg LP New York, NY [email protected] 1. INTRODUCTION How many African elephants are left in the world and how fast are they being lost to poaching? How far do whales travel? How many turtle hatchlings sur- vive? Answers to these basic questions are critical to sav- ing these and other endan- gered species and to the con- servation of the biodiversity of our planet. However, this basic data is barely available for just a handful of species. The official body that tracks the conservation status of planet’s species, International Union for Conservation of Nature (IUCN) Red List of Threatened Species[21], currently has over 79,000 species [37]. Yet, the Living Planet report, the most comprehensive effort to track the population dynamics of species around the world, includes just 10,300 populations of just 3,000 species [13]. That’s not even 4%! Scientists do not have the capacity to observe every species at the needed spatio-temporal scales and resolutions and there are not enough GPS collars and satellite tags to do so. Moreover, invasive tracking can be dangerous to the animals [50]. Images of animals and their environment, intentionally and opportunistically collected, are quickly becoming one of the richest, most abundant, highest coverage and widely avail- able source of data on wildlife. Coming from camera traps, cameras mounted on vehicles or UAVs (drones), photographs taken by tourists, citizen scientists, field assistants, scien- tists, and public photo streams, many thousands of images may be collected per day from just one location. Taking ad- vantage of this rich but big and messy source of data is only possible if we leverage computational approaches for every stage of the process, including image collection, informa- tion extraction, data modeling, and query processing. We have developed algorithms and built a system, called Wild- Bloomberg Data for Good Exchange Conference. 24-Sep-2017, Chicago, IL, USA. book1 , based on the state of the art machine learning and data management approaches. Wildbook is an autonomous computational system that starts with an arbitrary hetero- geneous collection of photographs of animals (Figure 1). Wildbook can detect various species of animals in those im- ages (using DCNN) and identify individual animals of most striped, spotted, wrinkled or notched species [11]. Wild- book can find matches within the database or determine that the individual is new. Once an animal has been iden- tified, it can be tracked in other photographs. It stores the information about who the animals are and where and when they are there in a fully developed database, and provides query tools to that data for scientists researching population demographics, species distributions, individual interactions and movement patterns, as well as conservation managers and citizen scientists. [24]. Wildbook system allows to add biological data, as simple as sex and age, but also habitat and weather information, which allows to truly do population counts, birth/death dy- namics, species range, social interactions or interactions with other species, including humans. An example of a Wildbook for whales, Flukebook, page is shown in Figure 2. Using Wildbook, it is possible to connect the information about sightings of animals (who? where? when?) derived from images to additional relevant data, providing the his- toric, current, and projected context of these sightings, thus enabling new science, conservation, and education, at un- precedented scales and resolution. By layering additional data sets, covering everything from climate change and ex- treme weather to habitat ecology, agricultural development, urbanization, deforestation, the exotic animal trade, and the spread of disease, a much more detailed and useful picture of what is happening — and why — can be constructed within our architecture. 2. EXAMPLES OF WILDBOOK USES AND IMPACT Using our system, estimates of population sizes and move- ment patterns can be far more accurate, creating a better understanding of social structures and breeding of species, relationships between predators and prey, and responses to 1 http://Wildbook.org arXiv:1710.08880v1 [cs.CY] 24 Oct 2017

Upload: others

Post on 24-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Wildbook: Crowdsourcing, computer vision, and data science ... · Wildbook: Crowdsourcing, computer vision, and data science for conservation. ... lic’s understanding of the workings

Wildbook: Crowdsourcing, computer vision, and datascience for conservation.

Tanya Y. Berger-WolfUniversity of Illinois at Chicago

Chicago, [email protected]

Daniel I. RubensteinPrinceton University

Princeton, [email protected]

Charles V. StewartRensselaer Polytechnic Inst.

Troy, [email protected]

Jason A. HolmbergWildMe.org

Portland, [email protected]

Jason ParhamRensselaer Polytechnic Inst.

Troy, [email protected]

Sreejith MenonBloomberg LPNew York, NY

[email protected]

1. INTRODUCTIONHow many African elephantsare left in the world andhow fast are they beinglost to poaching? How fardo whales travel? Howmany turtle hatchlings sur-vive? Answers to these basicquestions are critical to sav-ing these and other endan-gered species and to the con-servation of the biodiversity

of our planet. However, this basic data is barely availablefor just a handful of species. The official body that tracks theconservation status of planet’s species, International Unionfor Conservation of Nature (IUCN) Red List of ThreatenedSpecies™ [21], currently has over 79,000 species [37]. Yet,the Living Planet report, the most comprehensive effort totrack the population dynamics of species around the world,includes just 10,300 populations of just 3,000 species [13].That’s not even 4%! Scientists do not have the capacity toobserve every species at the needed spatio-temporal scalesand resolutions and there are not enough GPS collars andsatellite tags to do so. Moreover, invasive tracking can bedangerous to the animals [50].

Images of animals and their environment, intentionally andopportunistically collected, are quickly becoming one of therichest, most abundant, highest coverage and widely avail-able source of data on wildlife. Coming from camera traps,cameras mounted on vehicles or UAVs (drones), photographstaken by tourists, citizen scientists, field assistants, scien-tists, and public photo streams, many thousands of imagesmay be collected per day from just one location. Taking ad-vantage of this rich but big and messy source of data is onlypossible if we leverage computational approaches for everystage of the process, including image collection, informa-tion extraction, data modeling, and query processing. Wehave developed algorithms and built a system, called Wild-

Bloomberg Data for Good Exchange Conference.24-Sep-2017, Chicago, IL, USA.

book™1, based on the state of the art machine learning anddata management approaches. Wildbook is an autonomouscomputational system that starts with an arbitrary hetero-geneous collection of photographs of animals (Figure 1).Wildbook can detect various species of animals in those im-ages (using DCNN) and identify individual animals of moststriped, spotted, wrinkled or notched species [11]. Wild-book can find matches within the database or determinethat the individual is new. Once an animal has been iden-tified, it can be tracked in other photographs. It stores theinformation about who the animals are and where and whenthey are there in a fully developed database, and providesquery tools to that data for scientists researching populationdemographics, species distributions, individual interactionsand movement patterns, as well as conservation managersand citizen scientists. [24].

Wildbook system allows to add biological data, as simpleas sex and age, but also habitat and weather information,which allows to truly do population counts, birth/death dy-namics, species range, social interactions or interactions withother species, including humans. An example of a Wildbookfor whales, Flukebook, page is shown in Figure 2.

Using Wildbook, it is possible to connect the informationabout sightings of animals (who? where? when?) derivedfrom images to additional relevant data, providing the his-toric, current, and projected context of these sightings, thusenabling new science, conservation, and education, at un-precedented scales and resolution. By layering additionaldata sets, covering everything from climate change and ex-treme weather to habitat ecology, agricultural development,urbanization, deforestation, the exotic animal trade, and thespread of disease, a much more detailed and useful picture ofwhat is happening — and why — can be constructed withinour architecture.

2. EXAMPLES OF WILDBOOK USES ANDIMPACT

Using our system, estimates of population sizes and move-ment patterns can be far more accurate, creating a betterunderstanding of social structures and breeding of species,relationships between predators and prey, and responses to

1http://Wildbook.org

arX

iv:1

710.

0888

0v1

[cs

.CY

] 2

4 O

ct 2

017

Page 2: Wildbook: Crowdsourcing, computer vision, and data science ... · Wildbook: Crowdsourcing, computer vision, and data science for conservation. ... lic’s understanding of the workings

Figure 1: The Wildbook pipeline, starting with a collection of images, through species detection and individualanimal identification, to the web-based data management layer and individual animal page.

Figure 2: An example of a Flukebook (Wildbook for whales) page displaying information for an individual.

Page 3: Wildbook: Crowdsourcing, computer vision, and data science ... · Wildbook: Crowdsourcing, computer vision, and data science for conservation. ... lic’s understanding of the workings

environmental pressures, including land use by humans andlong-term climate patterns. Wildlife managers are betterable to monitor the health of entire populations, discoverdangerous trends, and reduce conflicts between humans andwildlife. Access to information about individual animals,particularly visual information, can also increase the pub-lic’s understanding of the workings of science and its role inguiding conservation. By contributing their photographs forscientific studies, visitors to parks and nature preserves inreturn learn the life histories of the individual animals theyphotograph and become connected to research projects andto the animals. We now present several examples of realimpact a system like Wildbook can make in conservationpolicy, science, and public engagement.

2.1 Evidence-based conservation policy: LewaThe first deployment of Wildbook was in January 2015 atLewa Wildlife Conservancy2 in Kenya helping manage theendangered Grevy’s zebra population. The information fromWildbook for Grevy’s showed that there are not enough ba-bies surviving to adulthood mainly due to the lion predation.This lead to a change in the lion population managementpolicy in Lewa helping save the endangered zebra.

2.2 Crowdsourcing accurate conservation data:GZGC and GGR

Knowing the number of individual animals within a pop-ulation (a population census) is one of the most impor-tant statistics for research and conservation management inwildlife biology. Moreover, a current population census isoften needed repeatedly over time in order to understandchanges in a population’s size, demographics, and distribu-tion. This enables assessments of the effects of ongoing con-servation management strategies. Furthermore, the numberof individuals in a population is seen as a fundamental basisfor determining its conservation status.

Unfortunately, producing a population census is difficult todo at scale and across large geographical areas using tradi-tional, manual methods. One of the most popular and preva-lent techniques for producing a population size estimate ismark-recapture [35, 36] via a population count. However,performing a mark-recapture study can be prohibitively de-manding when the number of individuals in a populationgrows too large [43], the population moves across large dis-tances, or the species is difficult to capture due to evasive-ness or habitat inaccessibility. More importantly, however,a population count is not as robust as a population census;a count tracks sightings whereas a census tracks individuals.A census is stronger because it can still produce a popula-tion size estimate implicitly but also unlocks more powerfulecological metrics that can track the long-term trends of in-dividuals. In recent years, technology has been used to helpimprove censusing efforts towards more accurate populationsize estimates [9, 14, 46, 47] and scale up3. However, thesetypes of population counts are still typically custom, one-offefforts, with no uniform collection protocols or data analysis,and do not attempt to accurately track individuals within apopulation across time.

2http://www.lewa.org/3penguinwatch.org, mturk.com

Cars Cameras PhotographsGZGC 27 55 9,406GGR 121 162 40,810

Table 1: The number of cars, participating cameras(citizen scientists), and photographs collected be-tween the GZGC and the GGR. The GGR had over3-times as many citizen scientists who contributed4-times the number of photographs for processing.

To address the problems with collecting data and producinga scalable population census, we performed the followingusing Wildbook technology [33]:

• using citizen scientists [10,20] to rapidly collect a largenumber of photographs over a short time period (e.g.two days) and over an area that covers the expectedpopulation, and

• using computer vision algorithms to process these pho-tographs semi-automatically to identify all seen ani-mals.

We showed that this proposed process can be leveraged atscale and across large geographical areas by analyzing theresults of two completed censuses. The first census is theGreat Zebra and Giraffe Count (GZGC) held March 1-2,2015 at the Nairobi National Park in Nairobi, Kenya to es-timate the resident populations of Masai giraffes (Giraffacamelopardalis tippelskirchi) and plains zebras (Equus quagga).The second is the Great Grevy’s Rally (GGR) held January30-31, 2016 in a region of central and northern Kenya cov-ering the known migratory range of the endangered Grevy’szebra (Equus grevyi). While our method relies heavily oncollecting a large number of photographs, it is designed tobe simple enough for the average person to help collect them.Any volunteers typically must only be familiar with a dig-ital camera and be able to follow a small set of collectionguidelines.

The number of cars, volunteers, and the number of pho-tographs taken for both rallies can be seen in Table 1.

Mark-recapture is a standard way of estimating the size ofan animal population [8, 35, 36]. Typically, a portion of thepopulation is captured at one point in time and the indi-viduals are marked as a group. Later, another portion ofthe population is captured and the number of previouslymarked individuals is counted and recorded. Since the num-ber of marked individuals in the second sample should beproportional to the number of marked individuals in the en-tire population (assuming consistent sampling processes andcontrolled biases), the size of the entire population can beestimated.

The population size estimate is calculated by dividing the to-tal number of marked individuals during the first capture bythe proportion of marked individuals counted in the second.The formula for the simple Lincoln-Peterson estimator [32]is:

Nest =K ∗ nk

,

Page 4: Wildbook: Crowdsourcing, computer vision, and data science ... · Wildbook: Crowdsourcing, computer vision, and data science for conservation. ... lic’s understanding of the workings

Annots. Individuals EstimateGZGC Masai 466 103 119 ± 4GZGC Plains 4,545 1,258 2,307 ± 366GGR Grevy’s 16,866 1,942 2,250 ± 93

Table 2: The number of annotations, matched in-dividuals, and the final mark-recapture populationsize estimates for the three species. The Lincoln-Peterson estimate has a 95% confidence range.

where Nest is the population size estimate, n is the numberof individuals captured and marked during the first capture,K is the number of individuals captured during the secondcapture, and k is the number of recaptured individuals thatwere marked from the first capture.

By giving the collected photographs to a computer visionpipeline, a semi-automated and more sophisticated censuscan be made. The speed of processing large qualities ofphotographs allows for a more thorough analysis of the age-structure of a population, the distribution of males andfemales, and the movements of individuals and groups ofanimals, etc. By tracking individuals, related to [23, 44],our method is able to make more confident claims aboutstatistics for the population. The more individuals that aresighted and resighted, the more robust the estimate andecological analyses will be.4 The resulting estimates of thepopulations of Plains zebra and Maasai giraffe in NairobiNational Park and of the species census of the Grevy’s ze-bra are the most accurate to date and the Grevy’s zebranumbers are now used as the official numbers of the Grevy’szebra global population size by IUCN Red List [39].

2.3 Crowdsourcing conservation data at scale:Whaleshark, Flukebook, online social me-dia

Today, Wildbooks for over a dozen species are available orare in the process of development. Wildbook for whales,Flukebook (http://flukebook.org/), started with just over800 individuals less than two years ago, is fully functionaland helps track, protect, and study more than 11,600 marinemammals. Wildbook for whale sharks (http://whaleshark.org/) is the longest running animal sighting website whichstarted with a couple of hundred individual animals 16 yearsago as a volunteer effort and has more than 8,500 indi-viduals with over 40,000 sightings today. IUCN Red Listuses whaleshark.org for global population estimates [34].There are several projects with the WWF, World WildlifeFund, with wildbooks for Saimaa Ringed seals (http://norppagalleria.wwf.fi/), lynx (http://lynx.wildbook.org), and sea turtles (the real IoT, Internet of Turtles: http://iot.wildbook.org/). As each Wildbook goes online, thenumber of identified individuals for each species grows overan order of magnitude within less than a year and Wildbookbecomes the most reliable source of data for the species.Moreover, over the last year, we showed that social mediacan be a reliable source of information about animal popula-

4Portions of the results in this section were previously re-ported in two technical reports: [40] for the GZGC and [3]for the GGR.

tions [29] and Whaleshark.org uses public videos [48], whileFlukebook is starting to use public images as a supplemen-tary source of information.

3. WITH GREAT DATA COMES GREAT RE-SPONSIBILITY

3.1 Bias and accuracyLike all data, photographic samples of animal ecology arebiased. These biases may lead to inaccurate population sizeor dynamic estimates. For example, even for such an iconicspecies as snow leopard, the last population estimate wasin 2003 and “many of the estimates are acknowledged to berough and out of date” [22]. Yet, its conservation status canchange depending on just a difference of a few individuals forsome geographic locations. Human observers tend to overes-timate population sizes since they may misidentify the sameindividual as different ones, while photo id provides evidencethat those are indeed the same. Thus, image-based censuscan be used to support the more accurate population counts,which, in turn, will affect conservation status or policies fora species. That is indeed a big responsibility.

To administer a correct population census, we must takethese biases into account explicitly as different sources ofphotographs inherently come with different forms of bias.For example, stationary camera traps, cameras mounted onmoving vehicles, and drones are each biased by their loca-tion, by the presence of animals at that location, by pho-tographic quality, and by the camera settings (such as sen-sitivity of the motion sensor) [15, 17, 18, 38]. These factorsresult in biased samples of species and spatial distributions,which recent studies are trying to overcome [1,25,53].

Any human observer, including scientists and trained fieldassistants, is affected by observer bias [26, 27]. Specifically,the harsh constraint of being at a single given location at agiven time makes sampling arbitrary. Citizen scientists, asthe foundation of the data collection, have additional vari-ances in a wide range of training, expertise, goal alignment,sex, age, etc. [12]. Nonetheless, recent ecological studies arestarting to successfully take advantage of this source of data,explicitly testing and correcting for bias [49]; recent compu-tational approaches address the question of if and how datafrom citizen scientists can be considered valid [51], whichcan be leveraged with new studies in protocol design andvalidation. There are multiple biases that influences the fi-nal outcome of estimating population of a certain speciesfrom images that are obtained from social media. Some ofthe most prominent biases that influence the data we obtainfrom social media are outlined in Figure 3. There are severallayers of biases, accumulating in the resulting bias of esti-mating animal population properties from images. First,there are biases in the types of animals that people typ-ically photograph in sufficient numbers in the first place.These may be charismatic or endangered species, or simplythe ones easily observed. Second, there are biases in whatimages people take versus which ones they decide to sharepublicly on social media. These range from the HawthorneEffect [4, 31, 42, 45] of changing behavior when knowing tobe observed, to biases introduced by the demographics ofthe person sharing [6,31] and the choice of the social mediaplatform [7,16,30]. There are biases of our notions of beauty

Page 5: Wildbook: Crowdsourcing, computer vision, and data science ... · Wildbook: Crowdsourcing, computer vision, and data science for conservation. ... lic’s understanding of the workings

and aesthetics and cultural differences. Any mark-recapturemodel used to estimate the population size makes many as-sumptions and introduces its own biases. The fundamentalquestion, however, is: Do any of these actually affect theestimates of the population size and other parameters and ifso, how? Menon has begun to answer this question [28] buta lot of work remains to be done. Moreover, combining thesedifferently biased sources of data mutually constrains thesebiases and allows much more accurate statistical estimatesthan any one source of data would individually allow [5].

3.2 SecurityThe technological sophistication of wildlife crime followedthe increasing connectivity of remote conservation areas andavailability of technology and data [19]. To mitigate some ofthe effects, nature conservancies warn tourists not to geotagtheir photos [54] of endangered species but most forget orare anaware that they are doing it in the first place. Thepush for open research data is often in conflict with con-servation, and new technologies such as animal recognitionfrom photographs, add new challenges. Data security andprivacy matters for endangered species, from snow leopardsand elephants to pangolins and tortoises. Many of the issuesare similar to the problems that humans face in an onlineglobalized world, from insider threats to jurisdictional tan-gles, from multilevel policies to secondary-use hazards, fromcovert channels to geofencing, and from security economicsto usability [2]. Yet, in the context of wildlife conserva-tion, there is no information security policy. It is urgent toaddress the issues of privacy and security for image-basedwildlife data. To start, Wildbook has worked with the Cen-ter for Trustworthy Scientific Cyberinfrastructure to designa secure system to prevent data leakage and poachers’ accessyet more work remains to be done.

4. CONCLUSIONS AND CHALLENGESWe have designed, implemented and deployed a prototype,and are continuing to develop Wildbook, an image-basedecological information system. Using image analysis algo-rithms and state-of-the-art information management infras-tructure, Wildbook adds images, opportunistically and in-tentionally crowdsourced and scientifically collected, to thesource of data about animals and provides the analyticaltools to gain scientific and conservation insight from thosedata. As the new type of data, the images are not only aug-menting the scale and resolution of existing scientific andconservation inference, but allow new types of questions thatlead to new scientific understanding of why animals do whatthey do, as well as a change in the conservation policy. More-over, we have already demonstrated that Wildbook providesa platform and a tool for a much more personal and commit-ted public engagement in science and conservation than hasbeen available to date. By enabling events such as the GreatZebra and Giraffe Count and Great Grevy’s Rally, Wildbookboth presents an instant, easily available, no-training-neededroute for general public contribution to science and conser-vation, as well as creating a personal bond with animals andnature by providing an instant individual animal identity.Moreover, it provides data for evidence-based conservationpolicy at large scale and high resolution over time, space,and individual animals.

However, to achieve these new insights and engagement,

many challenges need to be overcome. In addition to themany computational, scientific, and societal challenges, thereare two directly related to data. First, Wildbook requires anew infrastructure that can function, synchronize, and coor-dinate across many platforms, from the mobile phones andGPS cameras of the citizen scientists, the bandwidth andelectricity-starved research stations and conservation out-posts in remote uninstrumented locations, to the cloud in-frastructure containing information about entire species andregions, as well as the algorithms necessary for its analysis.The data-related aspects of this infrastructure challengesare about information aggregation, integration, synchroniza-tion, semantic complexity, and access control. The secondbig data-related challenge that the new data sources and theenabled use of those data present is the unknown data biasesthat challenge traditional computational methods and ana-lytical tools. From the simplest population size and speciesrange estimates, the traditional methods rely on a uniformrandom sampling regime. While it is not clear that the as-sumption is true for any of the data collection methods, itis most definitely does not hold for the data coming fromcitizen scientists’ and tourists’ photographs. Our prelimi-nary analysis shows that there are wide variations in thenumber and rate of images taken and complex patterns ofcamera and species fatigue that arise from demographic, cul-tural, and event-specific factors. Accounting for or leverag-ing those in designing the new generation of analytical toolsis a challenge and a goal of Wildbook and it is critical forreliable conservation policy decisions.

Finally, one potential use of Wildbook that presents verydifferent data-related challenges is as a tool in wildlife crimeprevention. Wildbook ability to track individuals throughphotographs during their life, as well as identifying these in-dividuals by a reasonably sized part of the body later, giventhat part has been previously photographed, allows photo-graphic evidence to be used both in forensics and as a deter-rence in poaching, killing, and illegal trafficking of animals.Wildlife crime is threatening to wipe out many charismaticspecies from the planet: rhino population (across species)is down 90% from its height [41] and 100,000 elephantswere killed over the last 3 years in Africa for ivory [52].Many charismatic fauna species, such as leopards, elephants,tigers, zebras, snow leopards, turtles, and tortoises, haveindividual-level uniquely identifiable body patterns, essen-tially ‘body-prints’. With an Wildbook app, a law enforce-ment official would be able to take a picture of an animalcrime victim (alive or not) and be able to find a match inthe reference database if one exists. Thus, the identity, geo-graphic origin and life history of the animal will be instantlyavailable. There are two types of primary users. The firstare conservation and wildlife managers responsible for over-seeing a particular endangered species of identifiable fauna.They collect photographs and submit information to Wild-book to build up the databases. The second type of usersare law enforcement officials who would take pictures of an-imals or their hides or carcasses, submit these to Wildbookto obtain, if known, the identity and origin of the specimen.

The use of Wildbook or a similar information system forthe purposes of conservation of highly endangered speciesor wildlife crime prevention presents a unique problem indata security and privacy. Ironically, every new technology

Page 6: Wildbook: Crowdsourcing, computer vision, and data science ... · Wildbook: Crowdsourcing, computer vision, and data science for conservation. ... lic’s understanding of the workings

Figure 3: High level schematic representation of the problem of population estimation of wildlife speciesusing images from social media, its challenges and biases in play.

is a “double-edged sword” for wildlife and is often used bythe criminals to aid in illegal wildlife trade [19]. Thus, thelocation, current or predicted, of valuable or highly endan-gered species or individuals must be protected. Privacy andsecurity protocols must be developed to protect animal data.The benefits of opening data for science, conservation, andpublic engagement must be weighed against the threat ofspecies extinction.

AcknowledgementsThis work was supported in part by a gift from MicrosoftResearch, funding from the Department of Computer Sci-ence at the University of Illinois at Chicago, and NSF grantsCNS-1453555 EAGER: Prototype of an Image-Based Eco-logical Information System (IBEIS) and EF-1550853 EAGER-NEON: Image-Based Ecological Information System (IBEIS)for Animal Sighting Data for NEON. We would also like toacknowledge in-kind support, volunteer time, and collabora-tions with Mpala Research Centre (Kenya), Lewa WildlifeConservancy (Kenya), WildlifeDriect.org, Grevy’s Zebra Trust,and Kenya Wildlife Service.

5. REFERENCES[1] M. Ancrenaz, A. Hearn, J. Ross, R. Sollmann, and

A. Wilting. Handbook for Wildlife Monitoring usingCamera-traps. BBEC, Kota Kinabalu, Sabah,Malaysia, 2012.

[2] R. Anderson and T. Berger-Wolf. Privacy for Tigers,in prep.

[3] T. Berger-Wolf, J. Crall, J. Holberg, J. Parham,C. Stewart, B. L. Mackey, P. Kahumbu, andD. Rubenstein. The Great Grevy’s Rally: The Need,Methods, Findings, Implications and Next Steps.Technical Report, Grevy’s Zebra Trust, Nairobi,Kenya, Aug. 2016.

[4] M. S. Bernstein, A. Monroy-Hernandez, D. Harry,P. Andre, K. Panovich, and G. G. Vargas. 4chanand/b: An analysis of anonymity and ephemerality ina large online community. In ICWSM, pages 50–57,2011.

[5] T. J. Bird, A. E. Bates, J. S. Lefcheck, N. A. Hill,R. J. Thomson, G. J. Edgar, R. D. Stuart-Smith,S. Wotherspoon, M. Krkosek, J. F. Stuart-Smith,G. T. Pecl, N. Barrett, and S. Frusher. Statisticalsolutions for error and bias in global citizen sciencedatasets. Biological Conservation, 173(1):144 – 154,2014.

[6] J. Brenner and M. Duggan. The demographics ofsocial media users. Consultado en, 2013.

[7] A. Bruns and S. Stieglitz. Twitter data: what do theyrepresent? it-Information Technology, 56(5):240–245,2014.

[8] D. Chapman and N. Chapman. Fallow deer: theirhistory, distribution, and biology. Dalton, Ithaca, NY,1975.

[9] M. J. Chase, S. Schlossberg, C. R. Griffin, P. J.Bouche, S. W. Djene, P. W. Elkan, S. Ferreira,F. Grossman, E. M. Kohi, K. Landen, P. Omondi,A. Peltier, S. J. Selier, and R. Sutcliffe.Continent-wide survey reveals massive decline inAfrican savannah elephants. PeerJ, 4(1):e2354, Aug.2016.

[10] J. P. Cohn. Citizen science: Can volunteers do realresearch? BioScience, 58(3):192–197, Mar. 2008.

[11] J. Crall, C. Stewart, T. Berger-Wolf, D. Rubenstein,and S. Sundaresan. Hotspotter - patterned speciesinstant recognition. Applications of Computer Vision(WACV), pages 230–237, 2013.

[12] J. L. Dickinson, B. Zuckerberg, and D. N. Bonter.Citizen Science as an Ecological Research Tool:Challenges and Benefits. Annual Review of Ecology,Evolution, and Systematics, 41(1):149–172, 2010.

[13] W. W. F. for Nature. The living planet report, 2016.Accessed May 1, 2017.

[14] T. Forrester, W. J. McShea, R. W. Keys, R. Costello,M. Baker, and A. Parsons. eMammal–citizen sciencecamera trapping as a solution for broad-scale,long-term monitoring of wildlife populations. In NorthAmerica Congress for Conservation Biology, Missoula,Montana, July 2014.

Page 7: Wildbook: Crowdsourcing, computer vision, and data science ... · Wildbook: Crowdsourcing, computer vision, and data science for conservation. ... lic’s understanding of the workings

[15] R. J. Foster and B. J. Harmsen. A critique of densityestimation from camera-trap data. The Journal ofWildlife Management, 76(2):224–236, 2012.

[16] S. Gonzalez-Bailon, N. Wang, A. Rivero,J. Borge-Holthoefer, and Y. Moreno. Assessing thebias in samples of large online networks. SocialNetworks, 38:16–27, 2014.

[17] A. Hodgson, N. Kelly, and D. Peel. Unmanned AerialVehicles (UAVs) for Surveying Marine Fauna: ADugong Case Study. PLoS ONE, 8(11):e79556, 2013.

[18] V. Hombal, A. Sanderson, and D. R. Blidberg.Multiscale adaptive sampling in environmentalrobotics. In In Proceedings of the 2010 IEEEConference on Multisensor Fusion and Integration forIntelligent Systems, pages 80–87, Salt Lake City, UT,Sept. 2010.

[19] M. Hower. Poacher’s delight: Technology is adouble-edged sword for wildlife. Green Biz, 2015.

[20] A. Irwin. Citizen Science: A Study of People,Expertise and Sustainable Development. Environmentand Society. Routledge, New York, NY, 1995.

[21] IUCN Global Species Programme. The IUCN red listof threatened species website, 2017. Accessed May 14,2017.

[22] R. Jackson, D. Mallon, T. McCarthy, R. Chundaway,and B. Habib. Panthera uncia. The IUCN Red List ofThreatened Species 2008: e.T22732A9381126.http://dx.doi.org/10.2305/IUCN.UK.2008.RLTS.

T22732A9381126, 2008. [Online; accessed 07 July2017].

[23] G. M. Jolly. Explicit estimates from capture-recapturedata with both death and immigration-stochasticmodel. Biometrika, 52(1/2):225–247, 1965.

[24] R. Kondos. From binoculars to big data: Citizenscientists use emerging technology in the wild.O’Reilly Media, July 2017.

[25] N. W. Maputla, C. T. Chimimba, and S. M. Ferreira.

Calibrating a camera trapaASbased biasedmarkaASrecapture sampling design to survey theleopard population in the N’wanetsi concession,Kruger National Park, South Africa. African Journalof Ecology, 51(3):422–430, 2013.

[26] D. M. Marsh and T. J. Hanlon. Observer gender andobservation bias in animal behaviour research:experimental tests with red-backed salamanders.Animal Behaviour, 68(6):1425 – 1433, 2004.

[27] D. M. Marsh and T. J. Hanlon. Seeing What WeWant to See: Confirmation Bias in Animal BehaviorResearch. Ethology, 113(11):1089–1098, 2007.

[28] S. Menon. Animal Wildlife Population EstimationUsing Social Media Images. Master’s thesis, Universityof Illinois at Chicago, Chicago, IL, 2017.

[29] S. Menon, T. Berger-Wolf, E. Kiciman, L. Joppa,C. V. Stewart, J. Parham, J. Crall, H. J., andJ. Van Oast. Animal population estimation usingflickr images. In In Proceedings of the 2ndInternational Workshop on the Social Web forEnvironmental and Ecological Monitoring (SWEEM2017), Troy, NY, 2017.

[30] F. Morstatter, J. Pfeffer, and H. Liu. When is itbiased?: assessing the representativeness of twitter’sstreaming api. In Proceedings of the 23rd International

Conference on World Wide Web, pages 555–556.ACM, 2014.

[31] A. Olteanu, C. Castillo, F. Diaz, and E. Kiciman.Social Data: Biases, Methodological Pitfalls, andEthical Boundaries. pages 1–44, 2016.

[32] S. W. Pacala and J. Roughgarden. Populationexperiments with the Anolis lizards of St. Maartenand St. Eustatius. Ecology, 66(1):129–141, Feb. 1985.

[33] J. Parham, J. Crall, C. Stewart, T. Berger-Wolf, andD. Rubenstein. Animal Population Censusing at Scalewith Citizen Science and Photographic Identification.2016.

[34] S. Pierce and B. Norman. Rhincodon typus. TheIUCN Red List of Threatened Species 2016:e.T19488A2365291. http://dx.doi.org/10.2305/IUCN.UK.2016-1.RLTS.T19488A2365291, 2016.[Online; accessed 07 July 2017].

[35] R. Pradel. Utilization of capture-mark-recapture forthe study of recruitment and population growth rate.Biometrics, 52(2):703–709, June 1996.

[36] D. S. Robson and H. A. Regier. Sample size inPetersen mark-recapture experiments. Transactions ofthe American Fisheries Society, 93(3):215–226, July1964.

[37] A. S. Rodrigues, J. D. Pilgrim, J. F. Lamoreux,M. Hoffmann, and T. M. Brooks. The value of theiucn red list for conservation. Trends in ecology &evolution, 21(2):71–76, 2006.

[38] J. M. Rowcliffe, R. Kays, C. Carbone, and P. A.Jansen. Clarifying assumptions behind the estimationof animal density from camera trap rates. The Journalof Wildlife Management, 77(5):876–876, 2013.

[39] D. Rubenstein, B. Low Mackey, Z. Davidson,F. Kebede, and S. King. Equus grevyi. The IUCN RedList of Threatened Species 2016: e.T7950A89624491.http://dx.doi.org/10.2305/IUCN.UK.2016-3.RLTS.

T7950A89624491, 2016. [Online; accessed 07 July2017].

[40] D. I. Rubenstein, C. V. Stewart, T. Y. Berger-Wolf,J. Parham, J. Crall, C. Machogu, P. Kahumbu, andN. Maingi. The Great Zebra and Giraffe Count: ThePower and Rewards of Citizen Science. TechnicalReport, Kenya Wildlife Service, Nairobi, Kenya, July2015.

[41] Save the Rhino. Rhino population figures.https://www.savetherhino.org/rhino_info/rhino_

population_figures, 2017. [Online; accessed 05September 2017].

[42] S. Y. Schoenebeck. The secret life of online moms:Anonymity and disinhibition on youbemom. com. InICWSM. Citeseer, 2013.

[43] G. Seber. The Estimation of Animal Abundance andRelated Parameters. Blackburn, Caldwell, NJ, 2edition, 2002.

[44] G. A. Seber. A note on the multiple-recapture census.Biometrika, 52(1/2):249–259, 1965.

[45] M. Shelton, K. Lo, and B. Nardi. Online media forumsas separate social lives: A qualitative study ofdisclosure within and beyond reddit. iConference 2015Proceedings, 2015.

[46] R. Simpson, K. R. Page, and D. De Roure.

Page 8: Wildbook: Crowdsourcing, computer vision, and data science ... · Wildbook: Crowdsourcing, computer vision, and data science for conservation. ... lic’s understanding of the workings

Zooniverse: Observing the World’s Largest CitizenScience Platform. In In Proceedings of the 23rdInternational Conference on World Wide Web, WWW’14 Companion, pages 1049–1054, New York, NY,2014. ACM.

[47] A. Swanson, M. Kosmala, C. Lintott, R. Simpson,A. Smith, and C. Packer. Snapshot Serengeti,high-frequency annotated camera trap images of 40mammalian species in an African savanna. ScientificData, 2(150026):1–14, June 2015.

[48] J. Van Oast and J. Parham. Finding Big Fish Online:How Wild Me Is Turning Vacation Videos Into SharkScience with Artificial Intelligence. http://www.prweb.com/releases/2017/05/prweb14334385.htm,2017. [Online; accessed 07 July 2017].

[49] A. J. van Strien, C. A. van Swaay, and T. Termaat.Opportunistic citizen science data of animal speciesproduce reliable estimates of distribution trends ifanalysed with occupancy models. Journal of AppliedEcology, 50(6):1450–1458, Sept. 2013.

[50] C. Welch. Orca killed by satellite tag leads to criticismof science practices. National Geographic, Oct. 2016.

[51] A. Wiggins, G. Newman, R. D. Stevenson, andK. Crowston. Mechanisms for Data Quality andValidation in Citizen Science. In 2011 IEEE SeventhInternational Conference on e-Science Workshops,pages 14–19, Washington, DC, 2011.

[52] G. Wittemyer, J. M. Northrup, J. Blanc,I. Douglas-Hamilton, P. Omondi, and K. P. Burnham.Illegal killing for ivory drives global decline in Africanelephants. PNAS, 111(36):13117–13121, 2014.

[53] Y. Xue, I. Davies, D. Fink, C. Wood, and C. P.Gomes. Avicaching: A two stage game for biasreduction in citizen science. In In Proceedings of the2016 International Conference on Autonomous Agents& Multiagent Systems, pages 776–785, Sao Paulo,Brazil, 2016.

[54] M. Zhang. Geotagged Wildlife Photos Help PoachersKill Endangered Animals. PetaPixel, June 2014.