international council for the not to be cited without ... doccuments/2006/o/o0606.pdf · mean of...

International Council for the Exploration of the Seas

Not to be cited without prior reference to the authors Theme session O: Spatio-temporal characteristics of fish populations in relation to environmental forcing functions as a component of ecosystem-based assessment: effects on catchability ICES CM 2006/O:06

Comparative study of habitat modelling strategies to investigate marine fish life cycle: A case

study on whiting in the Eastern English Channel

S. Vaz, S. Pavoine, P. Koubbi, C. Loots, F. Coppin

ABSTRACT

Fish habitat is an area where the environmental conditions are suitable to survive and live in a spontaneous state i.e. environmental factors define the abundance of a particular species. Habitat modelling was used to relate withing (Merlangus merlangus) spatial distribution to environmental factors. This study was based on data obtained from IFREMER’s Channel Ground Fish Survey, including both species abundance and environmental data. Adults, and juveniles stages where treated separately to study ontogenic shifts in the spatial distribution of Whiting. Three methodologies allowing for the modelling of habitat were used including measures of fit and model validation techniques. In brief, habitat modelling based on glm or gam (delineating realised habitat) and multi-linear quantile regressions (predicting potential habitat) were used to relate species abundance to depth, temperature, salinity, seabed stress and sediment type. Stepwise selection resulted in habitat models that described species affinity with a subset of significant environmental variables and that were used to map whiting habitats using GIS. Models outputs were compared amongst themselves as well as with interpolated ouputs of observed patterns of distribution obtained by geostatistics. The best performing method will be identified and the resulting models discussed for each life stage studied. This work will help numerical ecologist choosing the appropriate methodology to model species distribution. Spatially explicit habitat modelling should help elaborating guidelines for the conservation and protection of natural habitats of marine living resources in the face of climate change and anthropogenic disturbances. Key-words: Whiting, Eastern English Channel, Fish Habitat, Habitat modelling, GIS Contact author: S. Vaz: Ifremer, Laboratoire Ressources Halieutiques, 150 quai Gambetta, BP699, 62321, Boulogne/mer, France [tel: (+33) 3 21 99 56 00, fax: (+33) 3 21 99 56 01, e-mail: [email protected]

Introduction

The relationship between a species and its environment depends on a complex set of

responses from this species to the numerous biotic and abiotic factors affecting its survival,

growth and reproductive ability. This relationship is described through the concepts of habitat

and ecological niche that define the location where environmental conditions are suitable for

an organism to occur and integrate abiotic and biotic factors to account for inter and intra

specific interaction effect on the organism presence and abundance respectively.

A large number of modelling techniques exist enabling to predict species distribution. Guisan

et al (2006) detailed a large number of techniques that could be used for such purpose

including multiple regression techniques. These methods generally consist into generating a

model that summarises the relationship between a species presence or abundance and

available and supposedly explanatory environmental variables. As such they can also be

referred as habitat modelling techniques. Once able to predict the affinity of a species to

different type of habitats and providing that the spatial distribution of these habitats is known,

one may predict this species distribution. Generating maps of predicted species distributions is

often the main driver behind the construction of species distribution models.

This study, based on bottom trawl scientific survey data on whiting (Merlangius merlangus)

adult and juveniles stages, aimed at applying and comparing the outputs obtained using three

methodologies allowing for the modelling of habitat. Habitat modelling based on generalised

linear modelling (GLM), generalised additive modelling (GAM) both delineating realistic

habitat and multi-linear quantile regressions (RQ), predicting potential habitat, were used to

relate species abundance to depth, temperature, salinity, seabed stress and sediment type. The

objectives were to compare the model outputs and to identify the best performing method.

Material and methods

Data collection

Since 1988, the Channel GroundFish Survey has taken place every year in October on board

the research vessel Gwen Drez and covered the Eastern English Channel and Dover Strait.

The systematic sampling schemes aim at achieving 90 to 120 stations depending on weather

conditions using a high vertical opening bottom trawl (about 3 m), with 10 mm length of

mesh side, was used and hauls were preferably towed facing the current. Position and water

depth were systematically recorded during each haul (Fig. 1).

After each haul, all captured species were sorted, identified and counted. The abundance

indices at each station were standardised into density per km². Whiting length at maturity data

relevant to the area was available in the literature (www.fishbase.org, an information system

with key data on the biology of fishes, Froese and Pauly,Eds, 2006). The 26cm (length at

maturity) threshold was used to calculate the proportion of mature (adult) and immature

(juvenile) whiting based on their recorded length distributions.

Since 1997, a Micrel hydrological probe, attached on the headrope of the trawl, has been

recording temperature and salinity every 15 seconds. Temperature and salinity conditions

were homogeneous throughout the water column due to shallowness, strong currents and wind

and tidal mixing. The recorded sub-surface values during trawl haul were averaged and and

constituted in situ observations of the hydrological conditions associated which the catch

(Figs.2b and 2c).

Bed shear stress (in Newtons per m2) was estimated using a 2D hydrodynamic model of the

north-west European shelf developed at the Proudman Oceanographic Laboratory (Aldridge

and Davies, 1993). Bed shear stress is a function of the maximum predicted tidal current and

reflect its resulting pressure on the seabed (Fig 2d).

The sediment type was obtained from the Larsonneur et al., (1979) map of the English

Channel simplified into six main categories of deposit : rock, pebble, gravel, coarse sand, fine

sand and mud, according to granulometric criteria. These criteria enhanced the importance of

smaller particles on one hand, and of coarse particles on the other (Fig 2e)

Geostatistics:

Geostatistics embody a suite of methods for analysing spatial data. It is basically a

methodology for estimating the values of a property of interest at non sampled locations from

more or less sparse sample data points. Geostatistical estimation is known by the general term

kriging which also produces an estimation of the interpolation error. It is different from other

interpolators because it uses a model of the spatial auto-correlation pattern of the variable of

interest – the variogram. Geostatistics and kriging were used extensively to produce species

distribution and environmental continuous maps.

Predictive models of species distribution

The basis of virtually all species distribution modelling approaches in current use is the

estimation of mean or median (central tendency) species responses to environmental factors.

Generalised linear modelling (GLM) and Generalised additive modelling (GAM) fall within

this family of model and tend to describe the “realised habitat”, where the species was

actually observed. Both GLM and GAM required a two step modelling procedure whereby

the presence-absence data are modelled separately from the mean response conditional on

presence only, this to deal with zero-inflated count data, which are common in species

distribution models of abundance data (Barry & Welsh, 2002) .

GLM is a regression model less rigid than classical linear regression and may be applied to

data that are not necessarily normally distributed. They enable to linearly relate a combination

of predictors to the mean of the response variable through a link function. This function

ensures the data transformation towards linearity and maintains the model predictions within a

range of value coherent with the original data. A large set of alternative distribution families

may be chosen and to each distribution type a set of corresponding link functions relate the

mean of the response variable to the linear predictor.

GAM are flexible and powerful tools to model non-linear relationships. It derives from GLM

and its functioning is similar only it enables to additively relate a combination of non-

parametric functions of the predictors to the mean of the response variable through a link

function. The functions used to build the predictor combination are smoothing spline and

loess functions. Spline function realises polynomial regressions in small intervals of the

predictor range. The function spline over the whole range is obtained by joining together these

polynoms at the nodes between smaller intervals. The degree of the spline function

correspond to the degree of the polynomial regressions used. Loess functions are locally

weighted regression replacing the value of a given observation by the result of a linear

regression on the neighbouring points, weighted by their distance to the observation being

predicted. The non-parametric smoothing depends on the chosen maximum distance between

the predicted value and the neighbouring observations used for the regression. This distance

conditions the smoothing window size, which is proportional to the smoothing intensity.

For both methods, models were determined using well-established Akaike Information

Criterion (AIC) based stepwise selection already implemented in R. For presence/absence

data, binomial modelling with logit link function was chosen to obtain a prediction of the

probability of presence of whiting. For non-null abundance data, the data was log-transformed

to achieve normality and gaussian modelling with identity link function was used to predict

positive density on a log scale. The probability of presence was then used as a weighting

factor on the positive density prediction to obtain the final predicted value.

Quantile Regression

The real response of a species to a given limiting factor can only be quantified if all other

factors occur at non-limiting levels. This situation is unlikely with observations from the

natural world, the meaningful determination of species response to environmental variables

required the use of non-standard statistical methods. In quantile regression (RQ), any part of

the data distribution may be modelled rather than the mean and the study of the upper-bound

of response data (between 0.75 and 0.95 quantiles) as a function of environmental factor

result in the potential habitat being modelled rather rather than the realised habitat.

Model selection with RQ modelling is made complicated by the large number of candidate

models that can be estimated over a range of quantiles. Model selection was enabled by

initially fitting model to all available continuous variables (third order polynomials and first

order interactions between continuous environmental parameters. Sediment type was

introduced as a categorical factor coded by 4 dummy variables both as a main effect and in

first order interactions with the continuous environmental variables. Then, starting from the

initial full model, terms were removed by a process of backward elimination but extended to

RQ modelling, with the aim of arriving at a model where all terms remained significant (P <

0.05) on at least one of the visited quantiles. The study of the confidence interval of the

selected model over the specified range of quantile result in choosing the quantile with the

smallest CI as the final model of potential species distribution. Model building and inference

was achieve using the available routines in R package “quantreg”.

Model validation

Models were validating using datasets internal and external to their development where

observed and predicted values of species abundance could be compared. From these datasets

of observed and predicted densities, 1000 bootstrap datasets were generated by resampling

with replacement within the range of observed and predicted densities. Three types of

statistical tests comparing observed and predicted abundances were applied to these

bootstrapped dataset and the results of these tests were averaged. Spearman rank correlation,

t-test for comparison of mean (for GLM and GAM only) and correct classification test (for

RQ only, Eastwood et al., 2003) were used.

Model comparison

To compare the models adjustment, the prediction error (absolute difference between

observed and predicted species response) was used. Observed and predicted data may not be

distributed in the same way (this is particularly the case for RQ modelling were predicted

response are much higher than observed rates) thus affecting the result of the comparison.

Observed and predicted values from the three models where first centred, standardised and re-

scaled to fall between 0 and 1 to be all expressed on the same scale. Average difference rates

were computed for each model and error maps were produced.

Results

Spatial distribution

Geostatistical analyses and kriging were used to explore the spatial structure of adult and

juvenile whiting in the Eastern English Channel over 18 years (1988 – 2005). For each year

and life stage, a variogram was modelled and its parameters were used to produce kriged

estimate and estimation error. Only the average distribution maps and corresponding

cumulated kriging error maps are presented here (Fig.3). These show the very eastern and

coastal distribution of Whiting for both life stages. The kriging error is minimal in areas

where high abundance of this species occurs, auguring well for the representativeness of the

produced maps.

Habitat models

Models were developed using data from 1997 to 2005 for which full environmental set were

available (n = 855). Ten models predicting the potential distribution of whiting were

developed : Five for each life stage (juveniles and adults) including two models (binomial and

gaussian) for GLM and GAM type and one model for quantile regression. Explanatory

variables found significant during model selection are presented in Table 1.

Table 1 : Significant explanatory variables for selected models for Juveniles (j) and adults (a).

GLM GAM Binomial

model Gaussian

Model Binomial

model Gaussian

Model

RQ

j a j a j a j a j a Temperature X X X X X X X X

Salinity X X X X X X Depth X X X X X X X X X

Bedstress X X X X X X X X X X Sediment X X X X X X X X X X

Bedstress and sediment type were found significant for all models and whatever the life stage

considered. These variables were also systematically involved in significant interactions

retained during model selection for GLM and RQ. Seabed stress and sediment are significant

predictors of the species presence and abundance in the Eastern English Channel. Depth,

temperature and salinity are also selected nine, eight and six times out of ten respectively and

have on the whole a significant effect on whiting distribution. RQ models appear to be more

parsimonious than GLM or GAM models especially since these methods required two model

to obtain the final prediction.

Selected model deviance, when available, was compared to the maximum deviance for a

model with no explanatory variables. The percentage of explained deviance enables to know

how much of the data variability is explained by the selected model (Table 2)

Table 2: Percentage of explained deviance in the selected models

GLM GAM Binomial

model Gaussian

Model Binomial

model Gaussian

Model Juveniles 45% 26% 41% 26%

Adults 29% 29% 29% 27%

The GLM models explained as much or more data variation than the GAM models. In both

cases, the habitat modelled seemed to have a larger impact on the species probability of

presence than its abundance level. The abundance level of the species may be conditional to

other predictors, in particular biotic factors, that are not taken into account in this study.

Predicting spatial distribution of whiting

The models selected were applied to available digital maps of environmental predictors and

resulted in predicted realistic (GLM and GAM) or potential abundance maps (Figs 4-6). The

predicted distribution was coherent with interpolated maps and fairly similar to each other

highlighting the robustness of all three regression methods.

Model validation

The data used for model development were re-used for internal validation. Data from 1988 to

1996 (n = 649) were kept for external model validation. For these data, missing salinity and

temperature observation were obtained by re-sampling the average salinity and temperature

maps (fig. 2b and 2c) at the exact location of the observations.

What ever the data set used for validation, all models passed the Spearman correlation test

(ρs) revealing a high and significant positive correlation between observed and predicted

abundance value. GLM and GAM models also passed the T-test of comparison of means

verifying the hypothesis that observed and predicted values had similar means. Finally RQ

models also passed the correct-classification test (positive difference value) showing that the

models delineate well the upper bound envelop of the data distribution and correctly describe

the limiting effect of the modelled habitat (Table 3).

Table 3 : Model validation results

a) Internal model validation (n = 855, 1000 bootstrapped dataset)

Life stage

Model ρs (ρs) p-value

T-test Degree of Correct Classification

GLM 0,61 *** 0,67 GAM 0,58 *** 0,65

Juveniles

RQ 0,57 *** 1,32 GLM 0,58 *** 0,64 GAM 0.58 *** 0,62

Adults

RQ 0,55 *** 1,81 b) External model validation (n = 649, 1000 bootstrapped dataset)

Life stage

Model ρs (ρs) p-value

T-test Degree of Correct Classification

GLM 0,54 *** 0,59 GAM 0,57 *** 0,32

Juveniles

RQ 0,52 *** 0,54 GLM 0,52 *** 0,61 GAM 0.56 *** 0,59

Adults

RQ 0,50 *** 1,89 Spearman correlation test (ρs); p-value < 0.001 (***)

Model comparison and error maps

The prediction errors generated by GLM and GAM models were close although slightly

smaller for GLM (Fig 6). Quantile regression models comparatively generated higher errors

than the two other type of model. This pattern is confirmed in the error distribution maps (Fig.

7)where areas with relatively high errors corresponded to areas of high abundance of the

species. Overall, the error rate was relatively reduced, never going over 30% error what ever

the method considered.

Discussion

Overall, the models produced performed quite well and yielded similar results. The models

developed here passed all the validation tests proposed based both on internal and external

validation. This result highlights the robustness of the proposed model selection procedures

and model building. In that respect, all three methods may be used indistinctly.

Like most species modelling techniques, GLM, GAM and RQ do not account for spatial

autocorrelation between the environmental predictors or due to aggregation behaviour,

competitive exclusion, and density dependence of the species itself. Species distribution

maps constructed using geostatistical analyses showed similar spatial patterns to those

constructed from the specified habitat models. This suggests that using methods that do not

explicitly account for spatial autocorrelation may not necessarily result in inaccurate maps of

predicted species distributions.

GLM and GAM yielded more accurate prediction than RQ which modeled the upper bound of

the data distribution and therefore over-estimated the species abundance. GLM prediction

seemed slightly more accurate than GAM although the difference was very close in this study

case. The use of GAM for species distribution modeling is however complicated by the use of

non-parametric function require some additional steps and manipulation to obtain digital maps

as the use of GLM regression coefficient applied directly to the predictors digital maps is very

straightforward. Moreover, from the ecological understanding point of view, GLM have the

advantage to explicit interactions while these are implicitly taken into account by the

smoothing functions in GAM. Therefore, for the sole purpose of habitat modeling and if the

number of relevant predictors is sufficient, GLM may be more adequate and may yield

excellent results.

RQ modeling however, has the unique advantage of enabling to model the upper bounds of

species-environment relationships (Cade et al., 1999). This is more ecologically relevant in

being better able to detect the effects of limiting factors on species’ responses. The model

selection procedure based on null-hypothesis testing and backwards elimination extended to

RQ proved successful in arriving at models that estimated the limiting effect of the

environment of whiting. Such species distribution models tend to describe potential rather

than actual patterns of species distributions (Eastwood et al., 2003, Carpentier et al., 2005). In

this sense, “potential habitat”, where the environmental conditions are suitable, were

described, in opposition to “realised habitat”, which is the part of the potential habitat where

the species actually occurs. Maps showing potential species distributions are less likely to

underestimate species responses’ to the environment, and therefore have subsequent benefits

for precautionary management principles.

Habitat modeling has many implication in the field of land management and conservation

(Guisan and Zimmermann, 2000). Predictive geographical modeling tools depend on the

analysis and quantification of species–environment relationship. Besides their usefulness for

ecological research, predictive geographical modeling may also be useful to assess the impact

of accelerated land use and other environmental change on the distribution of organisms, to

improve faunistic atlases (e.g. Carpentier et al., 2005)) or to set up conservation priorities.

References

Aldridge, J. N., Davies, A. M., 1993. A high-resolution three-dimensional hydrodynamic tidal

model of the Eastern Irish Sea. Journal of Physical Oceanography, 23 (2): 207-224

Barry, S.C., Welsh, A.H., 2002. Generalized additive modelling and zero inflated count data.

Ecological Modelling, 157, 179-188.

Cade BS, Terrell JW, Schroeder RL, 1999. Estimating effects of limiting factors with

regression quantiles. Ecology 80:311-323

Carpentier, A., Vaz, S., Martin, C. S., Coppin, F., Dauvin, J. –C., Desroy, N., Dewarumez, J.–

M., Eastwood, P. D., Ernande, B., Harrop, S., Kemp, Z., Koubbi, P., Leader-Williams, N.,

Lefèbvre, A., Lemoine, M., Meaden, G. J., Ryan, N., Walkey, M., 2005. Eastern Channel

Habitat Atlas for Marine Resource Management (CHARM), Atlas des Habitats des

Ressources Marines de la Manche Orientale, INTERREG IIIA

Eastwood PD, Meaden GJ, Carpentier A, Rogers SI (2003) Estimating limits to the spatial

extent and suitability of sole (Solea solea) nursery grounds in the Dover Strait. Journal of Sea

Research 50:151-165

Froese, R., Pauly, D., Editors, 2006. FishBase. World Wide Web electronic publication.

www.fishbase.org, version (06/2006).

Guisan, A., Lehman, A., Ferrier, S., Austin, M., Overton, J., Aspinall, R., Hastie, T., 2006.

Making better biogeographical predictions of species’distributions. Journal of Applied

Ecology, 43 : 386–392.

Guisan A, Zimmerman NE (2000) Predictive habitat distribution models in ecology.

Ecological Modelling 135:147-186

Larsonneur C, Vaslet D, Auffret J–P (1979) Les Sédiments Superficiels de la Manche, Carte

Géologique de la Marge Continentale Française, Bureau des Recherches Géologiques et

Minières, Ministère de l'Industrie, Service Géologique National, Orléans, France

Figure 1. Trawl haul positions of the CGFS survey (1988-2005)

a)

)
e
Figure 2. Environnemental maps used for spatialised data prediction of Whi

abundance from habitat models sélectionnés : (a) Depth ; (b) Average surface sali

(Oct. 1997-2005); (c) Average surface temperature (Oct. 1997-2005); (d) Seabed sh

stress ; (e) Seabed sediment types .

d)
c)
b)

ting

nity

ear

)
a
)
b
Figure 3: Whiting (Merlangius merlangus) average distribution from 1988 to 2005 and

corresponding cumulated kriging error maps : a Juveniles (< = 26 cm), b Adults ( > 26cm).

Juveniles Adults

)
a
)
b
)
c
Figure 4 : Species distribution predicted by GLM : a) probability of presence predicted by the

selected binomial GLM fitted to presence/absence data; b) abundance predicted by the

selected gaussian GLM fitted to log transformed non-null density data; c) potential abundance

resulting from above models combination

Juveniles Adults

)
a
)
b
)
c
PGAM DlogGAM

Figure 4 : Species distribution predicted by GAM : a) probability of presence predicted by the

selected binomial GAM fitted to presence/absence data; b) abundance predicted by the

selected gaussian GAM fitted to log transformed non-null density data; c) potential abundance

resulting from model combination.

Juveniles Adults

Figure 5 : Potential species distributions predicted by quantile regression (94th quantile for

juveniles and 90th quantile for adults).

a) b)

Figure 6 : Boxplot of error distribution between observed and predicted values (centered,

standardised and rescale between 0-1) by GLM (1), GAM (2) and RQ (3) selected models for

juveniles (a) and adults (b).

Juveniles Adults

)
a
)
b
)
c
Figure 7 : Spatial distribution of error values for GLM (a), GAM (b) and RQ (c) models.

international council for the not to be cited without ... doccuments/2006/o/o0606.pdf · mean of...

Documents