computational intro: conservation and biodiversity wildlife corridor design

64
Computational Intro: Conservation and Biodiversity Wildlife Corridor Design Topics in Computational Sustainability Spring 2010 oint work with Jon Conrad, Bistra Dilkina, Willem van Hoeve, Ashish Sabharwal, and Jordan Sutter Carla P. Gomes

Upload: tamber

Post on 03-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Computational Intro: Conservation and Biodiversity Wildlife Corridor Design. Carla P. Gomes. Joint work with Jon Conrad, Bistra Dilkina, Willem van Hoeve, Ashish Sabharwal, and Jordan Sutter. Topics in Computational Sustainability Spring 2010. Outline. Wildlife corridor design problem - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Computational Intro:Conservation and Biodiversity

Wildlife Corridor Design

Topics in Computational SustainabilitySpring 2010

Joint work with Jon Conrad, Bistra Dilkina, Willem van Hoeve, Ashish Sabharwal, and Jordan Sutter

Carla P. Gomes

Page 2: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Outline

Wildlife corridor design problem– Problem Definition

How hard is it to solve it? – Concepts of Problem Complexity

How to model it?– Mixed Integer Programming formulation and other issues

How to solve it?– How to scale up solutions?

Experimental Results Research Questions

2

Page 3: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

3

Problem Definition

Page 4: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Conservation and Biodiversity :Wildlife Corridors

New York Times (Science) 2006

Wildlife CorridorsPreserve wildlife against

land fragmentation

Link core biological areas, allowing animal movement

between areas.

Limited budget; must maximize environmental benefits/utility

Page 5: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Conservation and Biodiversity :Grizzly Bear Wildlife Corridors

Wildlife Corridors link core biological areas, allowing animal movement between areas.

Typically: low budgets to implement corridors.

Example:

Goal: preserve grizzly bear populations in the U.S. Northern Rockies by creating

wildlife corridors connecting 3 reserves:

Yellowstone National Park; Glacier Park and Salmon-Selway Ecosystem

Page 6: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Real world instance:

Corridor for grizzly bears in the Northern Rockies, connecting:

YellowstoneSalmon-Selway EcosystemGlacier Park

Grizzly Bear Corridor inNorthern Rockies

Cost

Habitat Suitability

can be a challenging Machine Learning problem

Study area ~ 320,000 sq km

Page 7: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Wildlife Corridor Design:Problem Definition

(Informal English Definition )

Instance:– A set of parcels and their neighborhood relationships– A set of reserves or terminals (subset of the parcels)– The cost and the utility (habitat suitability) per parcel

Question:– What is the set of connected parcels, containing the reserves, maximizing the utility, such that the total cost does not exceed a given budget C?

Reserve

Land parcelCost and utility info omitted

Page 8: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Example

8

Budget 10 Budget 11

Cost = 10;Utility = 9 Cost = 11;Utility = 10

cost

utility

Page 9: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Example

9

cost

utility

Budget 10 Budget 11

Cost = 10;Utility = 9 Cost = 11;Utility = 10

Min Cost solution

Cost = 7;Utility = 5

Page 10: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Wildlife Corridor Design: (Graph Representation)

Input:– A set of parcels and their neighborhood relationship– A set of reserves or terminals (subset of the parcels)– The cost and the utility (habitat suitability) per parcel

Output:– A set of connected parcels, containing the reserves maximizing the utility, such that the total cost does not

exceed a given budget C

10

Reserve

Land parcel

Undirected Graph Representation

G=(V,E)

Cost and utility info omitted in the pictures

Page 11: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

The Connection Subgraph Problem(Optimization Version)

Instance– An undirected graph G = (V,E)– Terminal vertices T V– Vertex cost function: c(v); utility function: u(v)– Cost bound / budget C;

Question

What’s the subgraph H of G with

maximum utility such that– H is connected and contains T– cost(H) C?

Utility optimization version : given C, maximize utility

11

11

Cost optimization version : given U, minimize cost

Page 12: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

The Connection Subgraph Problem(Decision Version)

Instance – An undirected graph G = (V,E)– Terminal vertices T V– Vertex cost function: c(v); utility function: u(v)– Cost bound / budget C; desired utility U

Question

Is there a subgraph H of G such that– H is connected and contains T– cost(H) C; utility(H) U ?

12

12

Page 13: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

13

Connection Subgraph: other possible applications

Social networks What characterizes the connection between two individuals?

The shortest path? Size of the connected component?A “good” connected subgraph?

If a person is infected with a disease, who else is likely to be? Which people have unexpected ties to any members of a list of

other individuals?

Vertices in graph: people; edges: know each other or not

[Faloutsos, McCurley, Tompkins ’04]

Project: Find other applications of the connection graph problem and variants and apply/extend ideas presented in this lecture.

Page 14: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

14

Concepts of Problem Complexity:Easy vs. hard problems

Page 15: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

How hard (complex) is it to solve the

connection sub-graph problem?

Before answering this question…

15

Page 16: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

How do computer scientists differentiate between good (efficient) and bad (not efficient) algorithms

The yardstick is that any algorithm that runs in no more than polynomial time is an efficient algorithm;

everything else is not.

Page 17: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Efficient algorithms

Not efficient algorithms

Ordered functions by their growth rates

cOrder

constant 1

logarithmic 2

polylogarithmic 3

nr ,0<r<1

nsublinear 4

linear 5

nr ,1<r<2 subquadratic 6

quadratic 7

cubic 8

nc,c≥1

rn, r>1

polynomial 9

exponential 10

lg n

lgc n

n3

n2

Page 18: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

C. P. Gomes

18

Roughly Speaking…Roughly Speaking…

Size of instanceN

Cost(run time)

exponentialquadratic

linear

logarithmic

constant

Page 19: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

exponential

polynomial

N2

Binary B&B alg.

Polynomial vs. exponential growth (Harel 2000)

LP’s interior pointMin. Cost Flow AlgsTransportation AlgAssignment AlgDijkstra’s alg.

Page 20: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

20

How can we show a problem is efficiently solvable?– We can show it constructively. We provide an algorithm and

show that it solves the problem efficiently. E.g.:

Shortest path problem - Dijkstra’s algorithm runs in polynomial time. Therefore the shortest path problem can be solved efficiently.

Linear Programming – The Interior Point method has polynomial worst-case complexity. Therefore Linear programming can be solved efficiently.

(*) The simplex method has exponential worst case complexity/ However, in practice the simplex algorithm seems to scale as m3, where m is the number of functional constraints.

Page 21: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

21

How can we show a problem is not efficiently solvable?

– How do you prove a negative? Much harder!!!

– This is the aim of complexity theory.

Page 22: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

22

Easy (efficiently solvable) problems vsHard Problems

Easy Problems - we consider a problem X to be “easy” or efficiently solvable, if there is a polynomial time algorithm A for solving X. We denote by P the class of problems solvable in polynomial time.

Hard problems --- everything else. Any problem for which there is no polynomial time algorithm is an intractable problem.

Page 23: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

23

EXPONENTIAL FUNCTION

POLYNOMIAL FUNCTIONHard Computational

ProblemsScale Exponentially

In the worst case

EXPONENTIAL-TIMEALGORITHMS

EXPLOSIVECOMBINATORICS

ExperimentDesignGoal

Start

Software & HardwareVerification

Satisfiability

(A or B) (D or E or not A)

Data Analysis& Data Mining

Fiber optics routing

Capital BudgetingAnd Financial Appl. Information

Retrieval

Protein Folding

And Medical ApplicationsCombinatorial

Auctions

Planning and SchedulingAnd Supply Chain Management

Many more applications!!!

Tackling practical size instances

requires powerful computational and mathematical tools!

NP-Complete andNP-Hard Problems

Page 24: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

The connection subgraph problem is NP-Hard.

Connections in networks: Hardness of feasibility versus optimality. Conrad, J., C. Gomes, W.-J. van Hoeve, A. Sabharwal, and J. Suter. Proc. CPAIOR 07, 2007 pages 16–28.

How hard (complex) is the connection subgraph problem?

Unfortunately that means we don’t know of good, efficient (polynomial time) algorithms to solve this problem.

We believe the connection subgraph problem is intractable:

Computer scientists only know of exponential time algorithms to solve it (and computer scientists strongly believe that no polynomial time algorithm will ever be found, but there is no

prove either way)

Page 25: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

The connection subgraph problem is NP-Hard!

Worst Case Result!Real-world problems are not necessarily

worst case and they possess hidden sub-structure

that can be exploited allowing scaling up of solutions.

Connections in networks: Hardness of feasibility versus optimality. Conrad, J., C. Gomes, W.-J. van Hoeve, A. Sabharwal, and J. Suter. Proc. CPAIOR 07, 2007 pages 16–28.

Should we give up on finding good solutions?

Page 26: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Encoding the connection subgraph problem as a

Mixed Integer Programming Problem

26

Page 27: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Root (r)Max Flow = 9

Single commodity Flow Encoding

– Variables: xi , binary variable, for each vertex i ( 1 if included in corridor ; 0 otherwise)

Yij, continuous variable for each edge flow ij

– Cost constraint: i cixi C

– Utility optimization function: maximize i uixi

– Connectedness: use a single commodity flow encoding

11

51

1

3

1 2

1

6 1 1

Page 28: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Single Commodity Flow: MIP Max utility

Budget constraint

Reserves

Total flow

Flow balance

Incoming edges allowed only if selected

This is what makes the problem hard

Note: E’ is the set of directed edges, obtained from replacing each undirected edge of E with two directed edges.

Page 29: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

29

Solving the Mixed Integer Programming Encoding

Cplex – state of the art MIP solver

Branch and Bound LP relaxation Cut generation

connectionsubgraphinstance

MIPmodel

feasibility + optimization

CPLEXsolution

Page 30: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

30

Experimental Results

Page 31: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

31

Synthetic Instances for Evaluation

Problem evaluated on semi-structured graphs

m x m lattice / grid graph with k terminals Inspired by the conservation corridors problem

Place a terminal each on top-left and bottom-right Maximizes grid use

Place remaining terminals randomly Assign uniform random costs and utilities

from {0, 1, …, 10}

m = 4 k = 4

Page 32: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

32

Standard MIPResults: without terminals

No terminals “find the connected component that maximizes the utility within the given budget”

Pure optimization problem; always feasible Still NP-hard

Budget fraction

Run

time

(logs

cale

)

0 0.2 0.4 0.6 0.8

0.01

1

10

0

10

000

6 x 6

8 x 8

10 x 10

A clear easy-hard-easypattern with uniform

random costs & utilities

Note 1: plot in log-scale for betterviewing of the sharp transitions

Note 2: each data point is medianover 100+ random instances

Page 33: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

33

Standard MIP:

3 terminals (feasibility vs. optimization)

Split instances into feasible and infeasible; plot median runtime For feasible ones : computation involves proving optimality For infeasible ones: computation involves proving infeasibility

Infeasible instances take much longer than the feasible ones!

Page 34: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

May 23, 2008 Ashish Sabharwal CP-AI-OR '08 34

connectionsubgraphinstance

MIPmodel

feasibility + optimization

CPLEXsolution

Problem? MIP+Cplex really weak at

feasibility testing Poor scaling: couldn’t even get

close to handling real data

Can we do better?

Results: with terminals

Page 35: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

A Related Problem (ignoring utilities):Minimum Cost solution -

The Steiner Tree Problem

Input – An undirected graph G = (V,E)– Terminal vertices T V– Edge cost function: c(e);

Question

What’s the subgraph H of G

with minimum cost such that – H is connected and contains T?

35

35

If the edge costs are all positive, then the resulting subgraph is obviously a tree.

Page 36: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

The Steiner Tree Problem:Min cost tree connecting the terminals

Also NP-Hard but

When we only have two terminals shortest path(e.g., Dijkstra algorithm or algorithm based on dynamic

programming)

Bounded number of terminals Fixed parameter tractable algorithm

36

Page 37: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

The Steiner Tree Problem:Min cost tree connecting the terminals

Three terminals (as in the case of our grizzly bear problem)

Algorithm ---in order to connect the three terminals - find where to place the root of the tree compute all pairs shortest paths (easy algorithm based on dynamic programming or even Dijkstra’s)

Algorithm also used for the starting point of a greedy solution – start with the minimum cost corridor and extend it greedily by picking the nodes with decreasing util/cost ratio to use the remaining budget

Algorithm also used for pruning (nodes that are too far away and connecting them to the terminals is beyond the budget can be pruned)

37

Page 38: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Solving the connection subgraph problem: Two Phase Approach

1st Phase – compute the minimum Steiner tree based algorithm and produces a greedy solution

This phase runs in polynomial time for a constant number of terminal nodes.

2nd Phase - Refines the greedy solution to produce an optimal solution with Cplex

38

Page 39: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Solving the connection subgraph problem: Phase !

1st Phase – compute the minimum Steiner tree based algorithm– Produces the minimum cost solution– Produces shortest path information used for pruning the serach

space - the all-pairs-shortest-paths matrix – Produces a greedy (and often sub-optimal) solution for feasible

instances (highest util/cost ratio parcels are selected to use the remaining budget)

This phase runs in polynomial time for a constant number of terminal nodes.

39

Page 40: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Solving the connection subgraph problem: Phase II

Refines the greedy solution to produce an optimal solution with Cplex– Greedy solution is passed to Cplex as the starting solution (Cplex

can change it).– The all-pairs-shortest-paths matrix computed in Phase I is also

passed on to Phase II. It is used to statically (i.e., at the beginning) prune away all nodes that are easily deduced to be too far to be part of a solution (e.g., if the minimum Steiner tree containing that node and all of the terminal vertices already exceeds the budget). This significantly reduces the search space size, often in the range of 40-60%.

Computes an optimal solution (or the optimal extended-mincost solution) to the utility-maximization version of the connection subgraph problem.

40

Page 41: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Solving the Connection Sub-Graph Problem:Exploiting Structure (A Hybrid MIP/CP Approach)

CPLEX

connectionsubgraphinstance

solution

MIPmodel

optimization feasibility

compute min-costSteiner tree

ignore utilities

greedily extendmin-cost solution

to fill budget

APSPmatrix0 3 6 2 83 0 7 4 16 7 0 5 92 4 5 0 18 1 9 1 0

min-cost solution

dynamicpruning

higher utilityfeasible solution

starting solution

40-60%pruned

“like” knapsack: max u/c

Conrad, G., van Hoeve, Sabharwal, Sutter 2008

Page 42: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

42

10x10 random lattices, 3 reserves

~20x improvementin runtime on

feasible instances

Infeasible instancessolved instantaneously!

Page 43: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

43

10x10 random lattices, 3 reserves

Peak of hardnessstill strongly

correlated withbudget slack

Gap between optimaland extended-optimal

solutions

Page 44: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

44

Experimental Results: Yellowstone case

Page 45: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Real world instance:

Corridor for grizzly bears in the Northern Rockies, connecting:

YellowstoneSalmon-Selway EcosystemGlacier Park

Grizzly Bear Corridor inNorthern Rockies

Cost

Habitat Suitability

can be a challenging Machine Learning problem

Study area ~ 320,000 sq km

Page 46: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Min Cost Solution for Different Granularities

46

Page 47: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

47

Real Data, 50x50km Parcels

Gap between optimaland extended-optimal

solutions peaks in acritical region right

after min-cost

50x50km Parcels

Page 48: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

48

Real Data, 40x40km Parcels

Gap between optimaland extended-optimal

solutions peaks in acritical region right

after min-cost

40x40km Parcels

Page 49: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

49

Page 50: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

50

Page 51: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

51

Page 52: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

52

Page 53: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Research Issues

53

Page 54: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Encodings

Encodings

– Complete Methods (proof of optimality) Other MIP formulations that scale better in practice? Other formulations that allow us to prove optimality faster? Other paradigms (e.g., constraint based, SAT modulo theories,

extensions of SAT solvers, Mixed logic programming)?

– Incomplete Methods (cannot prove optimality but may find good solutions)

Simulated annealing, genetic algorithms etc

– Hybrid complete/incomplete methods

54

Page 55: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Bistra Dilkina is interested in these issues 55

Approximation results

Cost optimization NP-hard to approximate within a factor of 1.36– Utility version?

Related Work Moss & Rabani 2001/2007

– Node-Weighted Steiner Tree – costs and utilities on nodes– Approximation results

Costa et al 2006/2008/2009– Steiner Tree with Budget, Revenues and Hop Constraints– Costs and utilities on edges– Directed Steiner Tree encoding and Branch-and-Cut

Page 56: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Models Are Important!!!

Single Commodity Flow

Directed Steiner Tree

Captures Better the Connectedness Structure !

Exponential Number of Constraints !

Provides good upper bounds!

Quite compact (poly size)

Conrad, Dilkina, Gomes, van Hoeve, Sabharwal, Sutter 2007, 2008, 2009

Page 57: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

A broad class of applications for projects

A family of problems - spatially targeted interventions Conservation and Biodiversity

Site Selection, Reserve Network Design, Wildlife Corridors Social Welfare

Portfolios of Asset-based poverty interventions

Bistra Dilkina 2009

Page 58: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Spatially targeted interventions

Select a subset A of spatially-explicit actions U– Maximize a sustainability function F– Such that cost of actions does not exceed limited budget B

max F(A) s.t. C(A) <= B

Complexity added by:– Spatial constraints (connectivity, distance, etc)– Data Uncertainty – Dynamics: Meta-population models, Climate change

Bistra Dilkina 2009

Page 59: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Additional Levels of Complexity: Stochasticity, Uncertainty, Large-Scale Data Modeling

60

• Multiple species (hundreds or thousands), with interactions (e.g. predator/prey).

• Biological and ecological issues (for a species and within-species )

• Movements and migrations;

• Climate change

• Other factors(e.g., different models of land conservation (e.g., purchase, conservation easements, auctions) typically over different time periods).

What different objective functions can weconsider for preserving species - biodiversity?

• How to estimate population distributions and habitat suitability? Where and how to collect data?

Eastern Phoebe Migration

Bagged Decision TreesDaniel Fink,Wesley Hochachka, Art Munson, Mirek Riedewald,

Ben Shaby, Giles Hooker, and Steve Kelling, 2009.

Steven Philips, Miro Dudik & Rob Schapire

Maxent

Source: Daniel Fink.

Information Sciences

Page 60: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

Summary

Wildlife corridor problem

problem formulation

computational complexity issues

models and solution approaches Research questions

Our approaches clearly outperform approaches reported in the literature!

61

Page 61: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

The End !

62

Page 62: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

63

Theoretical Results: 1

NP-completeness: reduction from the Steiner Tree problem, preserving the cost function. Idea:– Steiner tree problem already very similar– Simulate edge costs with node costs– Simulate terminal vertices with utility function

NP-complete even without any terminals– Recall: Steiner tree problem poly-time solvable with constant

number of terminals

Also holds for planar graphs

Page 63: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

64

v1 vn

v2

v3

Theoretical Results: 2

NP-hardness of approximating cost optimization (factor 1.36): reduction from the Vertex Cover problem

Reduction motivated by Steiner tree work [Bern, Plassmann ’89]

vertex cover of size k iff connection subgraph with cost bound C = k and utility U = m

Page 64: Computational Intro: Conservation and Biodiversity Wildlife Corridor Design

65