predictive habitat distribution models, leire ibaibarriaga

38
Introduc)on to Sta)s)cal Modelling Tools for Habitat Models Development, 2628 th Oct 2011 EUROBASIN, www.eurobasin.eu

Upload: euro-basin-programme

Post on 02-Nov-2014

823 views

Category:

Education


0 download

DESCRIPTION

Key lecture for the EURO-BASIN Training Workshop on Introduction to Statistical Modelling for Habitat Model Development, 26-28 Oct, AZTI-Tecnalia, Pasaia, Spain (www.euro-basin.eu)

TRANSCRIPT

Page 1: Predictive Habitat Distribution Models, Leire Ibaibarriaga

Introduc)on  to  Sta)s)cal  Modelling  Tools  for  Habitat  Models  Development,  26-­‐28th  Oct  2011  EURO-­‐BASIN,  www.euro-­‐basin.eu  

Page 2: Predictive Habitat Distribution Models, Leire Ibaibarriaga

2

OUTLINE

• Why to model?

• Habitat models

• Model properties

• Steps for modelling

• What about data?

Page 3: Predictive Habitat Distribution Models, Leire Ibaibarriaga

3

WHY TO MODEL?

• “All models are wrong, some models are useful” (G. Box)

• Models are how we understand the world:

We see the world through models

We learn about the world using formal descriptions

• Model types:

– Static vs dynamic

– Explanatory vs predictive

– Deterministic vs stochastic

– Discrete vs continuous

Page 4: Predictive Habitat Distribution Models, Leire Ibaibarriaga

4

HABITAT MODELS

• Habitat models are focused on how environmental factors controlthe distribution of species and communities.

• Multiple applications:

– Biogeography, impact of the global change, management,conservation, ecology, …

• New conceptual and operative advances due to the growth incomputing power, e.g. GIS, remote sensing, new statisticalmodelling tools (computer intensive), etc

Page 5: Predictive Habitat Distribution Models, Leire Ibaibarriaga

5

MODEL PROPERTIES

Some desirable model properties:

• Parsimony (Occam’s razor): “All things being equal, the simplest solution tends to be the best one”

• Tractability: easy to be analysed

• Conceptually insightful: reveal fundamental properties

• Generalizability: can be applied to other situations/species/…

• Empirical consistency: consistent with the available data

• Falsifiability: can be tested by observations

• Predictive precision

Page 6: Predictive Habitat Distribution Models, Leire Ibaibarriaga

6

MODEL PROPERTIES

Levins (1966); Sharpe (1990); Guisan and Zimmermann (2000)

Predictive habitatdistribution models

Page 7: Predictive Habitat Distribution Models, Leire Ibaibarriaga

7

MODEL PROPERTIES

The more complex model is not necessarily the best…

GENERALITY

COMPLEXITY

Page 8: Predictive Habitat Distribution Models, Leire Ibaibarriaga

8

STEPS FOR MODELLING

1) Conceptual phase

2) Model formulation

3) Model calibration

4) Spatial predictions

5) Model evaluation

6) Model applicability

Page 9: Predictive Habitat Distribution Models, Leire Ibaibarriaga

9

STEPS FOR MODELLING

Guisan and Zimmermann (2000)

Page 10: Predictive Habitat Distribution Models, Leire Ibaibarriaga

10

1. Conceptual phase

• Some sort of theoretical model should be in mind, before a statistical model is even considered

• This phase includes:

– Literature review

– Define an up-to-date conceptual model

– Set multiple hypothesis

– Assess available and missing data

– Identify appropriate sampling strategy for new data

– Choose appropriate spatio-temporal resolution and geographic extent

– Identify the most appropriate statistical methods for the other phases

Page 11: Predictive Habitat Distribution Models, Leire Ibaibarriaga

11

STEPS FOR MODELLING

Guisan and Zimmermann (2000)

Page 12: Predictive Habitat Distribution Models, Leire Ibaibarriaga

12

2. Model formulation

• The model depends on the type of response variable and its associated probability distribution

Distribution Examples

Gaussian Biomass

Poisson Individual counts

Negative Binomial Individual counts

Multinomial Communities

Binomial Presence/absence

Page 13: Predictive Habitat Distribution Models, Leire Ibaibarriaga

13

2. Model formulation

Guisan and Zimmermann (2000)

Page 14: Predictive Habitat Distribution Models, Leire Ibaibarriaga

14

oct-11 © AZTI-Tecnalia 14

0 2 4 6 8 10

010

2030

4050

x

y

2. Model formulationR

EG

RE

SSIO

N A

NA

LY

SIS

Page 15: Predictive Habitat Distribution Models, Leire Ibaibarriaga

15

© AZTI-Tecnalia 15

0 2 4 6 8 10

010

2030

4050

x

y

2. Model formulationR

EG

RE

SSIO

N A

NA

LY

SIS

Page 16: Predictive Habitat Distribution Models, Leire Ibaibarriaga

16

oct-11 © AZTI-Tecnalia 16

2. Model formulation

0.0 0.2 0.4 0.6 0.8 1.0

-50

510

x

y

RE

GR

ESS

ION

AN

AL

YSI

S

Page 17: Predictive Habitat Distribution Models, Leire Ibaibarriaga

17

oct-11 © AZTI-Tecnalia 17

2. Model formulation

0.0 0.2 0.4 0.6 0.8 1.0

-50

510

x

y

RE

GR

ESS

ION

AN

AL

YSI

S

Page 18: Predictive Habitat Distribution Models, Leire Ibaibarriaga

18

oct-11 © AZTI-Tecnalia 18

2. Model formulation

The response variable y can follow distributions like:

NORMAL, BINOMIAL, POISSON, GAMMA, etc

LINK FUNCTION

RE

GR

ESS

ION

AN

AL

YSI

S

McCullagh and Nelder (1989); Dobson (2008)

Page 19: Predictive Habitat Distribution Models, Leire Ibaibarriaga

19

oct-11 © AZTI-Tecnalia 19

2. Model formulation

The response variable y can follow distributions like:

NORMAL, BINOMIAL, POISSON, GAMMA, etc

LINK FUNCTION

RE

GR

ESS

ION

AN

AL

YSI

S

SMOOTHS

Hastie and Tibshirani (1990); Wood (2006)

Page 20: Predictive Habitat Distribution Models, Leire Ibaibarriaga

20

oct-11 © AZTI-Tecnalia 20

2. Model formulation

Modelo lineal (LM)

Modelo lineal generalizado (GLM)

Modelo aditivo generalizado (GAM)

Modelo aditivo (AM)

RE

GR

ESS

ION

AN

AL

YSI

S

Page 21: Predictive Habitat Distribution Models, Leire Ibaibarriaga

21

2. Model formulation

Other regression models:

• Mixed models: LM, GLM and GAMs including random effectterms. Useful for meta-analysis.

• Quantile regression: the quantiles are modelled instead of the mean. Useful for finding limiting factors

• Segmented regression: the model changes depending on a partition of the explanatory variable. Useful for detectingregime changes

• Spatial autocorrelation and autoregressive modelsRE

GR

ESS

ION

AN

AL

YSI

S

Page 22: Predictive Habitat Distribution Models, Leire Ibaibarriaga

22

2. Model formulation

• Classification is the placement of species and/or sample units into groups based on the environmental variables

CL

ASS

IFIC

AT

ION

TE

CH

NIQ

UE

S

Page 23: Predictive Habitat Distribution Models, Leire Ibaibarriaga

23

2. Model formulation

• Classification is the placement of species and/or sample unitsinto groups based on the environmental variables

• Many techniques included: classification decision tree,regression decision tree, rule-based classification, maximum-likelihood classification

• Mainly two groups:

– Supervised classification: a training data set is required(groups are known beforehand)

– unsupervised classification: groups are unknown and needto be defined, like in cluster analysis

CL

ASS

IFIC

AT

ION

TE

CH

NIQ

UE

S

Page 24: Predictive Habitat Distribution Models, Leire Ibaibarriaga

24

2. Model formulation

• The environmental envelope of a species is defined as the setof environments within which it is believed that the species canpersist (Walker and Cocks, 1991)

EN

VIR

ON

ME

NT

AL

EN

VE

LO

PE

S

Page 25: Predictive Habitat Distribution Models, Leire Ibaibarriaga

25

2. Model formulation

• The environmental envelope of a species is defined as the setof environments within which it is believed that the species canpersist (Walker and Cocks, 1991)

• Examples of models:

– BIOCLIM: minimal rectilinear envelopes based onclassification trees

– HABITAT: convex polytope envelopes based onclassification trees

– DOMAIN: based on multivariate distance metrics

EN

VIR

ON

ME

NT

AL

EN

VE

LO

PE

S

Page 26: Predictive Habitat Distribution Models, Leire Ibaibarriaga

26

2. Model formulation

• Ordination is the arrangement or ‘ordering’ of species and/or sample units along gradients

• Usually applied to community data matrices (row: species, column: samples, value: abundance)

OR

DIN

AT

ION

TE

CH

NIQ

UE

S

Page 27: Predictive Habitat Distribution Models, Leire Ibaibarriaga

27

2. Model formulation• Indirect gradient analysis (no environmental data used)

– Distance-based approaches:

• Polar ordination, Principal Coordinates Analysis, Nonmetric Multidimensional Scaling

– Eigenanalysis-based approaches

• Linear model

– Principal Components Analysis

• Unimodal model

– Correspondence Analysis, Detrended Correspondence Analysis

• Direct gradient analysis (environmental data used)

– Linear model

• Redundancy Analysis

– Unimodal model

• Canonical Correspondence Analysis , Detrended Canonical Correspondence AnalysisO

RD

INA

TIO

N T

EC

HN

IQU

ES

ter Braak and Prentice (1988)

Page 28: Predictive Habitat Distribution Models, Leire Ibaibarriaga

28

2. Model formulation

• Models inspired in the human-brain (interconnected group ofneurons)

• They define a non-linear function, decomposed further as aweighted sum of functions, that similarly can be furtherdecomponsed, etc. So, complex non-parametric model (black-box?)

• Adjusted by varying parameters, connection weights, orspecifics of the architecture such as the number of neurons ortheir connectivity

• Few examples available yet

NE

UR

AL

NE

TW

OR

KS

Page 29: Predictive Habitat Distribution Models, Leire Ibaibarriaga

29

STEPS FOR MODELLING

Guisan and Zimmermann (2000)

Page 30: Predictive Habitat Distribution Models, Leire Ibaibarriaga

30

3. Model calibration

• It includes model fitting (find the best value of the unknownparameters to improve the agreement between the data and modeloutputs) and model selection (which explanatory variables to beincluded)

• To take into account:

– Use of predictors that are ecologically relevant: direct vs indirect(proxy) variables

– Correlation between explanatory variables

• Each method has each own diagnostic tools according to theirassumptions, e.g, in regression models the residual deviance

Page 31: Predictive Habitat Distribution Models, Leire Ibaibarriaga

31

STEPS FOR MODELLING

Guisan and Zimmermann (2000)

Page 32: Predictive Habitat Distribution Models, Leire Ibaibarriaga

32

4.Spatial predictions

• Spatial predictions can be done on the data set used for calibrationor on new data sets. Care must be taken if predictions are done in anew data set with new combinations between the explanatoryvariables and for values outside the range of values in the data setfor calibration

• GIS tools are very often used, but still many statistical models arenot implemented in a GIS environment

Page 33: Predictive Habitat Distribution Models, Leire Ibaibarriaga

33

STEPS FOR MODELLING

Guisan and Zimmermann (2000)

Page 34: Predictive Habitat Distribution Models, Leire Ibaibarriaga

34

5. Model evaluation

• The aim is to evaluate the predictive power of a model

• If only one data set is available (we have used the data set forcalibration), bootstrap, cross-validation, jacknife

• If other data sets are available (independent of the calibration dataset), predicted and observed values are compared using:

– the same goodness of fit measure as used for model calibration

– any other measure of association

The data sets for calibration and evaluation are called respectivelytraining and evaluation data sets. Sometimes the original singledata set is split in two (split-sample approach)

Page 35: Predictive Habitat Distribution Models, Leire Ibaibarriaga

35

STEPS FOR MODELLING

Guisan and Zimmermann (2000)

APPLICABILITY

Page 36: Predictive Habitat Distribution Models, Leire Ibaibarriaga

36

6. Model applicability

• It refers to the domain over which a validated model can be properlyused

• Potential uses (Decoursey, 1992):

– Screening

– Research

– Planning, monitoring and assessment

Page 37: Predictive Habitat Distribution Models, Leire Ibaibarriaga

37

WHAT ABOUT DATA?

• Data is even more important than the model itself.

• Usually from multiple sources: surveys (continuous, stations, verticalprofiles), remote sensing, circulation models, …

• The scale of the response and the environmental variables might notbe the same. Need to define a common scale unit. Sometimesinterpolation might be needed. This might include additionaluncertainities

• Simple exploratory statistics and figures can be very useful beforeeven start thinking on any model. They also help to spot errors in thedata.

Page 38: Predictive Habitat Distribution Models, Leire Ibaibarriaga

Introduc)on  to  Sta)s)cal  Modelling  Tools  for  Habitat  Models  Development,  26-­‐28th  Oct  2011  EURO-­‐BASIN,  www.euro-­‐basin.eu