rescuing an endangered species with monte carlo ai

1

Rescuing an Endangered Species with Monte Carlo AI

Tom Dietterichbased on work by Dan Sheldon et al.

2

Overview• Collaborative project to develop optimal conservation

strategies for Red-Cockaded Woodpecker (RCW)– Institute for Computational Sustainability (Cornell and

OSU):Daniel Sheldon, Bistra Dilkina, Adam Elmachtoub, Ryan Finseth, Ashish Sabharwal, Jon Conrad, Carla P. Gomes, David Shmoys

– The Conservation Fund: Will Allen, Ole Amundsen, Buck Vaughan

• Recent paper: Maximizing the Spread of Cascades Using Network Design, UAI 2010

3

Red-Cockaded Woodpecker

• Originally wide-spread species in S. US• Population now shrunken to 1% of original

size5000 breeding groups~12,000 birdsFederally-listed endangered species

• Lifestyle:– nests in holes in 80yo+ Longleaf pine trees– sap from the trees defends the nest– takes several years to excavate the hole

• Will colonize man-made holesWikipedia

4

Spatial Conservation Planning• What is the best land acquisition and management strategy to

support the recovery of the Red-Cockaded Woodpecker (RCW)?

5

Problem Setup

• Given limited budget, what parcels should we conserve to maximize the expected number of occupied patches in T years?

Conserved parcels

Available parcels

Currentpatches

Potentialpatches

6

Metapopulation Model

• Population dynamics in fragmented landscape

• Stochastic patch occupancy model (SPOM)– Patches = occupied /

unoccupied– Colonization– Local extinction

SPOM: Stochastic Patch Occupancy Model– Patches are either occupied or unoccupied– Two types of stochastic events:

• Local extinction: occupied unoccupied• Colonization: unoccupied occupied (from neighbor)

– Independence among all events

Time 1 Time 2

7

8

Network Cascades

• Models for diffusion in (social) networks– Spread of information, behavior, disease, etc.– E.g.: suppose each individual passes rumor to

friends independently with probability ½

Note: “activated” nodes are those reachable by red edges

SPOM Probability Model

k

j

i

j

pij

1-βj

plj

𝑡−1 𝑡l

i

k

l

9

• To determine occupancy of patch at time – For each occupied patch from time , flip

coin with probability to see if colonizes – If is occupied at time , flip a coin with

probability to determine survival (non-extinction)

– If any of these events occurs, is occupied

• Parameters:– : colonization probability– : extinction probability– Simple parametric functions of patch-size,

inter-patch distance, etc.

10

Monte Carlo Simulation of a SPOM• Key idea: a metapopulation model is a cascade in the

layered graph representing patches over time

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e1 2 3 4 5

Colonization

Non-extinction

Patches

Time

11

Metapopulation = Cascade• Key idea: a metapopulation model is a cascade in the


a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e1 2 3 4 5

Patches

Time

12



a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e1 2 3 4 5

Patches

Time

13



a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e1 2 3 4 5

Patches

Time

14



a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e1 2 3 4 5

Patches

15

Monte Carlo Simulations• Each simulation can produce a different cascade

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e1 2 3 4 5

Patches

Insight #1: Objective as Network Connectivity

• Conservation objective: maximize expected # occupied patches at time T

• Cascade objective: maximize expected # of target nodes reachable by live edges

i

j

k

l

m

i

j

k

l

m

i

j

k

l

m

i

j

k

l

m

i

j

k

l

m targets

Live edges

16

Insight #2: Management as Network Building

• Conserving parcels adds nodes and (stochastic) edges to the network

Parcel 1

Parcel 2

Initial network

17


• Conserving parcels adds nodes to the network

Parcel 1

Parcel 2

Initial network

18


• Conserving parcels adds nodes to the network

Parcel 1

Parcel 2

Initial network

19

20

Monte Carlo Evaluation of a Proposed Purchase Plan

set of reachable nodes at time • Goal is to maximize , where is our purchasing

plan• Run multiple simulations. Count the number

of occupied parcels at time . Compute the average:

21

Research Question

• How many samples do we need to get a good estimate?

• Answer: We can use basic statistical methods (confidence intervals and hypothesis tests) to measure the accuracy of our estimate.95% confidence interval for the mean is our estimate; is the true value

• We can increase until the accuracy is high enough

22

Evaluating a Purchase Plan• Plan 1: Purchase nothing

Initial network

Parcel 1

Parcel 2

23

Plan 2: Purchase Parcel #1

Initial network

Parcel 1

Parcel 2

24

Plan 3: Purchase Parcels 1 and 2

Initial network

Parcel 1

Parcel 2

25

How many different purchasing plans are there for parcels?

• We can’t afford to evaluate them all

Solution Strategy(aka Sample Average Approximation)

26

1. Assume we own all parcels. Run multiple simulations of bird propagation

2. Join all of those simulations into a single giant graph– Goal of maximizing expected # of

occupied patches at time is approximated by # of reachable patches in the giant graph

3. Define a set of variables , one for each parcel that we can buy

4. Solve a mixed integer program to decide which variables are and which are

𝑥1

𝑥1

𝑥1

𝑥2

𝑥2

𝑥2

27

Solving the Deterministic Problem

• CPLEX commercial optimization package (sold by IBM; free to universities)

• Applies a method known as Branch and Bound• NP-Hard, so can take a long time but often

finds a solution if the problem isn’t too big or too hard

28

Experiments

• 443 available parcels• 2500 territories• 63 initially occupied• 100 years

• Population model is parameterized based (loosely) on RCW ecology

• Short-range colonizations (<3km) within the foraging radius of the RCW are much more likely than long-range colonizations

29

Greedy Baselines• Adapted from previous work on influence

maximization

• Start with empty set, add actions until exhaust budget– Greedy-uc – choose action that results in biggest immediate

increase in objective [Kempe et al. 2003]– Greedy-cb – use ratio of benefit to cost [Leskovec et al.

2007]

• These heuristics lack performance guarantees!

30

Results

M = 50, N = 10, Ntest = 500

Upper bound!

31

Results

M = 50, N = 10, Ntest = 500

Upper bound!

32

Results

Conservation Reservoir

Initial population

M = 50, N = 10, Ntest = 500

Upper bound!

Conservation Strategies

33

• Both approaches build outward from source– Greedy buys best

patches next to currently-owned patches

– Optimal solution builds toward areas of high conservation potential

• In this case, the two strategies are very similar

Conservation Reservoir

Source population

34

A Harder Instance

Move the conservation reservoir so it is more remote.

35

Conservation Strategies

Greedy Baseline

SAA Optimum (our approach)

$150M $260M $320M

Build outward from sources

Path-building (goal-setting)

36

Shortcomings of the Method

• All parcels are purchased at time – Reality: money arrives incrementally

• All parcels are assumed to be for sale at – Reality: parcel availability and price can vary from year to

year• How about an MDP? Each year we can see where the

birds actually spread to and then update our purchase plans accordingly– This is a very hard MDP, no known solution method

• Current method is very slow

37

Status

• The Conservation Fund is making purchasing decisions based (partially) on the plans computed using this model

• Alan Fern, Shan Xue, and Dan Sheldon have developed an extension that proposes a schedule for purchasing the parcels

rescuing an endangered species with monte carlo ai

Documents

conservation organizations

layered graph

optimal conservation

cascadekey idea

endangered specieslifestyle

spread of rcw

limited management budget

spomkey idea