predictive habitat distribution models, leire ibaibarriaga

Introduc)on to Sta)s)cal Modelling Tools for Habitat Models Development, 26-‐28th Oct 2011 EURO-‐BASIN, www.euro-‐basin.eu

OUTLINE

• Why to model?

• Habitat models

• Model properties

• Steps for modelling

• What about data?

WHY TO MODEL?

• “All models are wrong, some models are useful” (G. Box)

• Models are how we understand the world:

We see the world through models

We learn about the world using formal descriptions

• Model types:

– Static vs dynamic

– Explanatory vs predictive

– Deterministic vs stochastic

– Discrete vs continuous

HABITAT MODELS

• Habitat models are focused on how environmental factors controlthe distribution of species and communities.

• Multiple applications:

– Biogeography, impact of the global change, management,conservation, ecology, …

• New conceptual and operative advances due to the growth incomputing power, e.g. GIS, remote sensing, new statisticalmodelling tools (computer intensive), etc

MODEL PROPERTIES

Some desirable model properties:

• Parsimony (Occam’s razor): “All things being equal, the simplest solution tends to be the best one”

• Tractability: easy to be analysed

• Conceptually insightful: reveal fundamental properties

• Generalizability: can be applied to other situations/species/…

• Empirical consistency: consistent with the available data

• Falsifiability: can be tested by observations

• Predictive precision

MODEL PROPERTIES

Levins (1966); Sharpe (1990); Guisan and Zimmermann (2000)

Predictive habitatdistribution models

MODEL PROPERTIES

The more complex model is not necessarily the best…

GENERALITY

COMPLEXITY

STEPS FOR MODELLING

1) Conceptual phase

2) Model formulation

3) Model calibration

4) Spatial predictions

5) Model evaluation

6) Model applicability

STEPS FOR MODELLING

Guisan and Zimmermann (2000)

1. Conceptual phase

• Some sort of theoretical model should be in mind, before a statistical model is even considered

• This phase includes:

– Literature review

– Define an up-to-date conceptual model

– Set multiple hypothesis

– Assess available and missing data

– Identify appropriate sampling strategy for new data

– Choose appropriate spatio-temporal resolution and geographic extent

– Identify the most appropriate statistical methods for the other phases

STEPS FOR MODELLING

2. Model formulation

• The model depends on the type of response variable and its associated probability distribution

Distribution Examples

Gaussian Biomass

Poisson Individual counts

Negative Binomial Individual counts

Multinomial Communities

Binomial Presence/absence

oct-11 © AZTI-Tecnalia 14

0 2 4 6 8 10

2. Model formulationR

0 2 4 6 8 10

2. Model formulationR

0.0 0.2 0.4 0.6 0.8 1.0

The response variable y can follow distributions like:

NORMAL, BINOMIAL, POISSON, GAMMA, etc

LINK FUNCTION

McCullagh and Nelder (1989); Dobson (2008)

The response variable y can follow distributions like:

NORMAL, BINOMIAL, POISSON, GAMMA, etc

LINK FUNCTION

SMOOTHS

Hastie and Tibshirani (1990); Wood (2006)

Modelo lineal (LM)

Modelo lineal generalizado (GLM)

Modelo aditivo generalizado (GAM)

Modelo aditivo (AM)

Other regression models:

• Mixed models: LM, GLM and GAMs including random effectterms. Useful for meta-analysis.

• Quantile regression: the quantiles are modelled instead of the mean. Useful for finding limiting factors

• Segmented regression: the model changes depending on a partition of the explanatory variable. Useful for detectingregime changes

• Spatial autocorrelation and autoregressive modelsRE

• Classification is the placement of species and/or sample units into groups based on the environmental variables

• Classification is the placement of species and/or sample unitsinto groups based on the environmental variables

• Many techniques included: classification decision tree,regression decision tree, rule-based classification, maximum-likelihood classification

• Mainly two groups:

– Supervised classification: a training data set is required(groups are known beforehand)

– unsupervised classification: groups are unknown and needto be defined, like in cluster analysis

• The environmental envelope of a species is defined as the setof environments within which it is believed that the species canpersist (Walker and Cocks, 1991)

• Examples of models:

– BIOCLIM: minimal rectilinear envelopes based onclassification trees

– HABITAT: convex polytope envelopes based onclassification trees

– DOMAIN: based on multivariate distance metrics

• Ordination is the arrangement or ‘ordering’ of species and/or sample units along gradients

• Usually applied to community data matrices (row: species, column: samples, value: abundance)

2. Model formulation• Indirect gradient analysis (no environmental data used)

– Distance-based approaches:

• Polar ordination, Principal Coordinates Analysis, Nonmetric Multidimensional Scaling

– Eigenanalysis-based approaches

• Linear model

– Principal Components Analysis

• Unimodal model

– Correspondence Analysis, Detrended Correspondence Analysis

• Direct gradient analysis (environmental data used)

– Linear model

• Redundancy Analysis

– Unimodal model

• Canonical Correspondence Analysis , Detrended Canonical Correspondence AnalysisO

ter Braak and Prentice (1988)

• Models inspired in the human-brain (interconnected group ofneurons)

• They define a non-linear function, decomposed further as aweighted sum of functions, that similarly can be furtherdecomponsed, etc. So, complex non-parametric model (black-box?)

• Adjusted by varying parameters, connection weights, orspecifics of the architecture such as the number of neurons ortheir connectivity

• Few examples available yet

STEPS FOR MODELLING

3. Model calibration

• It includes model fitting (find the best value of the unknownparameters to improve the agreement between the data and modeloutputs) and model selection (which explanatory variables to beincluded)

• To take into account:

– Use of predictors that are ecologically relevant: direct vs indirect(proxy) variables

– Correlation between explanatory variables

• Each method has each own diagnostic tools according to theirassumptions, e.g, in regression models the residual deviance

STEPS FOR MODELLING

4.Spatial predictions

• Spatial predictions can be done on the data set used for calibrationor on new data sets. Care must be taken if predictions are done in anew data set with new combinations between the explanatoryvariables and for values outside the range of values in the data setfor calibration

• GIS tools are very often used, but still many statistical models arenot implemented in a GIS environment

STEPS FOR MODELLING

5. Model evaluation

• The aim is to evaluate the predictive power of a model

• If only one data set is available (we have used the data set forcalibration), bootstrap, cross-validation, jacknife

• If other data sets are available (independent of the calibration dataset), predicted and observed values are compared using:

– the same goodness of fit measure as used for model calibration

– any other measure of association

The data sets for calibration and evaluation are called respectivelytraining and evaluation data sets. Sometimes the original singledata set is split in two (split-sample approach)

STEPS FOR MODELLING

APPLICABILITY

6. Model applicability

• It refers to the domain over which a validated model can be properlyused

• Potential uses (Decoursey, 1992):

– Screening

– Research

– Planning, monitoring and assessment

WHAT ABOUT DATA?

• Data is even more important than the model itself.

• Usually from multiple sources: surveys (continuous, stations, verticalprofiles), remote sensing, circulation models, …

• The scale of the response and the environmental variables might notbe the same. Need to define a common scale unit. Sometimesinterpolation might be needed. This might include additionaluncertainities

• Simple exploratory statistics and figures can be very useful beforeeven start thinking on any model. They also help to spot errors in thedata.

Introduc)on to Sta)s)cal Modelling Tools for Habitat Models Development, 26-‐28th Oct 2011 EURO-‐BASIN, www.euro-‐basin.eu

predictive habitat distribution models, leire ibaibarriaga

cal modelling

habitat models

model formulation

model formulation

modelling

data sets

environmental

environmental

Education

ostiralak- leire unibaso

leire,maialen 5. edukiontzia

la rochapea leire

l'ajuntament leire

la victoria de leire

ices cooperative research report - vliz.be · 4.4.2 pof...

winterday.pps leire t. 5ºa

trends and climate change projections of the marine ... ·...

leire - stilo-mueble · 4 leire 5 melaminada los elementos...

ibaibarriaga xvii - ermua...2. modalitatea erreportajea...

juanita ibaibarriaga - ermua · title: juanita ibaibarriaga...

portfólio leire 2013

animaliak leire

second bilbao data science workshop€¦ · mar a xos e...

leire materia

leire, unai, julen

esaera zaharrak leire&andrea

cvn - leire gartzia rivero

leire lópez, 10 años

erresuma batua leire