ecological-niche factor analysis: how to compute …€¦ · ecological niche, this factor analysis...

10
2027 Ecology, 83(7), 2002, pp. 2027–2036 q 2002 by the Ecological Society of America ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE HABITAT-SUITABILITY MAPS WITHOUT ABSENCE DATA? A. H. HIRZEL, 1 J. HAUSSER, 1 D. CHESSEL, 2 AND N. PERRIN 1,3 1 Laboratory for Conservation Biology, Institute of Ecology, University of Lausanne, CH-1015 Lausanne, Switzerland 2 UMR CNRS 5023, Laboratoire de Biome ´trie et Biologie Evolutive, Universite ´ Lyon I, 69622 Villeurbanne Cedex, France Abstract. We propose a multivariate approach to the study of geographic species dis- tribution which does not require absence data. Building on Hutchinson’s concept of the ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution of the localities where the focal species was observed to a reference set describing the whole study area. The first factor extracted maximizes the marginality of the focal species, defined as the ecological distance between the species optimum and the mean habitat within the reference area. The other factors maximize the specialization of this focal species, defined as the ratio of the ecological variance in mean habitat to that observed for the focal species. Eigenvectors and eigenvalues are readily interpreted and can be used to build habitat-suitability maps. This approach is recommended in situations where absence data are not available (many data banks), unreliable (most cryptic or rare species), or meaningless (invaders). We provide an illustration and validation of the method for the alpine ibex, a species reintroduced in Switzerland which presumably has not yet recolonized its entire range. Key words: Capra ibex; ecological niche; GIS; habitat suitability; marginality; multivariate analysis; presence–absence data; specialization; species distribution; Switzerland. INTRODUCTION Conservation ecology nowadays crucially relies on multivariate, spatially explicit models in all research areas requiring some level of ecological realism. This includes population viability analyses (Akc ¸akaya et al. 1995, Akc ¸akaya and Atwood 1997, Roloff and Haufler 1997), biodiversity-loss risk assessment (Akc ¸akaya and Raphael 1998), landscape management for endangered species (Livingston et al. 1990, Sanchez-Zapata and Calvo 1999), ecosystem restoration (Mladenoff et al. 1995, 1997), and alien-invaders expansions (Higgins et al. 1999). Such studies often conjugate the power of Geographical Information Systems (GIS) with multi- variate statistical tools to formalize the link between the species and their habitat, in particular to quantify the parameters of habitat-suitability models. Most frequently used among multivariate analyses are logistic regressions (Jongman et al. 1987, Peeters and Gardeniers 1998, Higgins et al. 1999, Manel et al. 1999, Palma et al. 1999), Gaussian logistic regressions (ter Braak and Looman 1987, Legendre and Legendre 1998), discriminant analyses (Legendre and Legendre 1998, Livingston et al. 1990, Manel et al. 1999), Ma- halanobis distances (Clark et al. 1993), and artificial neural networks (Manel et al. 1999, O ¨ zesmi and O ¨ zes- mi 1999, Spitz and Lek 1999). All these methods share largely similar principles: Manuscript received 27 March 2000; revised 4 March 2001; accepted 26 July 2001; final version received 12 November 2001. 3 Corresponding author. E-mail: [email protected] 1) The study area is modeled as a raster map com- posed of N adjacent isometric cells. 2) The dependent variable is in the form of presence/ absence data of the focal species in a set of sampled locations. 3) Independent ecogeographical variables (EGV) de- scribe quantitatively some characteristics for each cell. These may express topographical features (e.g., alti- tude, slope), ecological data (e.g., frequency of forests, nitrate concentration), or human superstructures (e.g., distance to the nearest town, road density). 4) A function of the EGV is then calibrated so as to classify the cells as correctly as possible as suitable or unsuitable for the species. The details of the function and of its calibration depend on the analysis. Sampling the presence/absence data is a crucial part of the process. The sample must be unbiased to be representative of the whole population. Absence data in particular are often difficult to obtain accurately. A given location may be classified in the ‘‘absence’’ set because (1) the species could not be detected even though it was present (McArdle 1990, Solow 1993; for example, Ke ´ry [2000] found that 34 unsuccessful visits were needed before one can assume with 95% confi- dence that the snake Coronella austriaca was absent from a given site), (2) for historical reasons the species is absent even though the habitat is suitable, or (3) the habitat is truly unsuitable for the species. Only the last cause is relevant for predictions, but ‘‘false absences’’ may considerably bias analyses. Here we propose a new approach specifically de- signed to circumvent this difficulty. Requiring only

Upload: others

Post on 16-Aug-2020

25 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE …€¦ · ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution

2027

Ecology, 83(7), 2002, pp. 2027–2036q 2002 by the Ecological Society of America

ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTEHABITAT-SUITABILITY MAPS WITHOUT ABSENCE DATA?

A. H. HIRZEL,1 J. HAUSSER,1 D. CHESSEL,2 AND N. PERRIN1,3

1Laboratory for Conservation Biology, Institute of Ecology, University of Lausanne, CH-1015 Lausanne, Switzerland2UMR CNRS 5023, Laboratoire de Biometrie et Biologie Evolutive, Universite Lyon I, 69622 Villeurbanne Cedex, France

Abstract. We propose a multivariate approach to the study of geographic species dis-tribution which does not require absence data. Building on Hutchinson’s concept of theecological niche, this factor analysis compares, in the multidimensional space of ecologicalvariables, the distribution of the localities where the focal species was observed to areference set describing the whole study area. The first factor extracted maximizes themarginality of the focal species, defined as the ecological distance between the speciesoptimum and the mean habitat within the reference area. The other factors maximize thespecialization of this focal species, defined as the ratio of the ecological variance in meanhabitat to that observed for the focal species. Eigenvectors and eigenvalues are readilyinterpreted and can be used to build habitat-suitability maps. This approach is recommendedin situations where absence data are not available (many data banks), unreliable (mostcryptic or rare species), or meaningless (invaders). We provide an illustration and validationof the method for the alpine ibex, a species reintroduced in Switzerland which presumablyhas not yet recolonized its entire range.

Key words: Capra ibex; ecological niche; GIS; habitat suitability; marginality; multivariateanalysis; presence–absence data; specialization; species distribution; Switzerland.

INTRODUCTION

Conservation ecology nowadays crucially relies onmultivariate, spatially explicit models in all researchareas requiring some level of ecological realism. Thisincludes population viability analyses (Akcakaya et al.1995, Akcakaya and Atwood 1997, Roloff and Haufler1997), biodiversity-loss risk assessment (Akcakaya andRaphael 1998), landscape management for endangeredspecies (Livingston et al. 1990, Sanchez-Zapata andCalvo 1999), ecosystem restoration (Mladenoff et al.1995, 1997), and alien-invaders expansions (Higginset al. 1999). Such studies often conjugate the power ofGeographical Information Systems (GIS) with multi-variate statistical tools to formalize the link betweenthe species and their habitat, in particular to quantifythe parameters of habitat-suitability models.

Most frequently used among multivariate analysesare logistic regressions (Jongman et al. 1987, Peetersand Gardeniers 1998, Higgins et al. 1999, Manel et al.1999, Palma et al. 1999), Gaussian logistic regressions(ter Braak and Looman 1987, Legendre and Legendre1998), discriminant analyses (Legendre and Legendre1998, Livingston et al. 1990, Manel et al. 1999), Ma-halanobis distances (Clark et al. 1993), and artificialneural networks (Manel et al. 1999, Ozesmi and Ozes-mi 1999, Spitz and Lek 1999). All these methods sharelargely similar principles:

Manuscript received 27 March 2000; revised 4 March 2001;accepted 26 July 2001; final version received 12 November 2001.

3 Corresponding author.E-mail: [email protected]

1) The study area is modeled as a raster map com-posed of N adjacent isometric cells.

2) The dependent variable is in the form of presence/absence data of the focal species in a set of sampledlocations.

3) Independent ecogeographical variables (EGV) de-scribe quantitatively some characteristics for each cell.These may express topographical features (e.g., alti-tude, slope), ecological data (e.g., frequency of forests,nitrate concentration), or human superstructures (e.g.,distance to the nearest town, road density).

4) A function of the EGV is then calibrated so as toclassify the cells as correctly as possible as suitable orunsuitable for the species. The details of the functionand of its calibration depend on the analysis.

Sampling the presence/absence data is a crucial partof the process. The sample must be unbiased to berepresentative of the whole population. Absence datain particular are often difficult to obtain accurately. Agiven location may be classified in the ‘‘absence’’ setbecause (1) the species could not be detected eventhough it was present (McArdle 1990, Solow 1993; forexample, Kery [2000] found that 34 unsuccessful visitswere needed before one can assume with 95% confi-dence that the snake Coronella austriaca was absentfrom a given site), (2) for historical reasons the speciesis absent even though the habitat is suitable, or (3) thehabitat is truly unsuitable for the species. Only the lastcause is relevant for predictions, but ‘‘false absences’’may considerably bias analyses.

Here we propose a new approach specifically de-signed to circumvent this difficulty. Requiring only

Page 2: ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE …€¦ · ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution

2028 A. H. HIRZEL ET AL. Ecology, Vol. 83, No. 7

FIG. 1. The distribution of the focal species on any ecogeographical variable (black bars) may differ from that of thewhole set of cells (gray bars) with respect to its mean (mS ± mG), thus allowing marginality to be defined. It may also differwith respect to standard deviations (sS ± sG), thus allowing specialization to be defined.

presence data as input, the Ecological-Niche FactorAnalysis (ENFA) computes suitability functions bycomparing the species distribution in the EGV spacewith that of the whole set of cells. In the present paper,we expose the concepts behind the ENFA, develop themathematical procedures required (and implemented inthe software Biomapper), and illustrate this approachthrough an habitat-suitability analysis of the Alpineibex (Capra ibex).

MARGINALITY, SPECIALIZATION, AND THE

ECOLOGICAL NICHE

Species are expected to be nonrandomly distributedregarding ecogeographical variables. A species with anoptimum temperature, for instance, is expected to occurpreferentially in cells lying within its optimal range.This may be quantified by comparing the temperaturedistribution of the cells in which the species was ob-served with that of the whole set of cells. These dis-tributions may differ with respect to their mean andtheir variances (Fig. 1). The focal species may showsome marginality (expressed by the fact that the speciesmean differs from the global mean) and some special-ization (expressed by the fact that the species varianceis lower than the global variance).

Formally, we define the marginality (M) as the ab-solute difference between global mean (mG) and speciesmean (mS), divided by 1.96 standard deviations (sG) ofthe global distribution

zm 2 m zG SM 5 . (1)1.96sG

Division by sG is needed to remove any bias intro-duced by the variance of the global distribution: a cell

randomly chosen from a distribution is a priori ex-pected to lie that much further from the mean as thevariance in distribution is large. The coefficient weight-ing sG (1.96) ensures that marginality will be mostoften be between zero and one. Namely, if the globaldistribution is normal, the marginality of a randomlychosen cell has only a 5% chance of exceeding unity.A large value (close to one) means that the specieslives in a very particular habitat relative to the referenceset. Note that equation (1) is given here mainly to ex-plain the principle of the method; the operational def-inition of marginality implemented in our software isprovided by equation (10), which is a multivariate ex-tension of (1).

Similarly, we define the specialization (S ) as the ratioof the standard deviation of the global distribution (sG)to that of the focal species (sS),

sGS 5 . (2)sS

A randomly chosen set of cells is expected to havea specialization of one, and any value exceeding unityindicates some form of specialization. We reemphasizethat specific values for these indexes are bound to de-pend on the global set chosen as reference, so that aspecies might appear extremely marginal or specializedon the scale of a whole country, but much less so ona subset of it.

Extending these statistics to a larger set of variablesdirectly leads to Hutchinson’s (1957) concept of theecological niche, defined as a hyper-volume in the mul-tidimensional space of ecological variables withinwhich a species can maintain a viable population(Hutchinson 1957, Begon et al. 1996). The concept is

Page 3: ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE …€¦ · ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution

July 2002 2029ECOLOGICAL-NICHE FACTOR ANALYSIS

used here exactly in the same sense: by ecological nichewe refer to the subset of cells in the ecogeographicalspace where the focal species has a reasonable prob-ability to occur. This multivariate niche can be quan-tified on any of its axes by an index of marginality andspecialization.

Some of these axes are obviously more interestingthan others, and this is why a factor analysis is intro-duced. The reasons are actually double. First, ecolog-ical variables are not independent. As more and moreare introduced in the description, multicollinearity andredundancy arise. One aim of factor analyses is to trans-form V correlated variables into the same number ofuncorrelated factors. As these factors explain the sameamount of total variance, subsequent analyses may berestricted to the few important factors (e.g., those ex-plaining the largest part of the variance) without losingtoo much information.

Second, specialization is expected to depend on in-teractions among variables. For instance, the temper-ature one species prefers might vary with humidity.Species may thus specialize on a combination of var-iables, rather than on every variable independently. Afactor analysis may allow extraction of the linear com-binations of original variables on which the focal spe-cies shows most of its marginality and specialization.In Principal Component Analyses (Cooley and Lohnes1971, Legendre and Legendre 1998), axes are chosenso as to maximize the variance of the distribution. InENFA, by contrast, the first axis is chosen so as toaccount for all the marginality of the species, and thefollowing axes so as to maximize specialization, i.e.,the ratio of the variance in the global distribution tothat in the species distribution.

FACTOR EXTRACTION

Outline of the principles

We use raster maps, which are grids of N isometriccells covering the whole study area. Each cell of a mapcontains the value of one variable. Ecogeographicalmaps contain continuous values, measured for each ofthe V descriptive variables. Species maps contain boo-lean values (0 or 1), a value of 1 meaning that thepresence of the focal species was proved on this cell.A value of zero simply means absence of proof.

Each cell is thus associated to a vector whose com-ponents are the values of the EGV in the underlyingarea, and can be represented by a point in the multi-dimensional space of the EGVs. If distributions aremultinormal, the scatterplot will have the shape of ahyper-ellipsoid (Fig. 2). The cells where the focal spe-cies was observed constitute a subset of the globaldistribution and are plotted as a smaller hyper-ellipsoidwithin the global one. The first factor, or marginalityfactor, is the straight line passing through the centroidsof the two ellipsoids. The species marginality is thedistance between these centroids, standardized as in Eq.

1. Fig. 2 plots this step for a three-dimensional initialset.

To obtain the specialization factors, the referencesystem is changed in order to transform the speciesellipsoid into a sphere, the variance of which equalsunity in each direction. In this new metrics, the firstspecialization factor is the one that maximizes the var-iance of the global distribution (while orthogonal tothe marginality factor). The other specialization factorsare then extracted in turn, each step removing one di-mension from the space, until all V factors are extract-ed. All specialization factors are orthogonal in the sensethat the distribution of the species subset on any factoris uncorrelated with its distribution on the others. Asfactors are sorted by decreasing order of specialization,the first few (F) will thus generally contain most of therelevant information. Their small number and inde-pendence make them easier to use than the originalEGVs, so that all following operations will be restrictedto them. In particular, the suitability of any cell for thefocal species (be it classified as 0 or 1 for observationdata) will be calculated according to its position in theF-dimensional space.

Mathematical procedures

Ecogeographical variables are first normalized as faras possible, e.g., through Box-Cox transformation (So-kal and Rohlf 1981). Though multinormality is theo-retically needed for factor extraction through eigen-system computation (Legendre and Legendre 1998),this method seems quite robust to deviations from nor-mality (Glass and Hopkins 1984). EGVs are then stan-dardized by retrieving means and dividing by standarddeviations:

x 2 xij jz 5 (3)i j sxj

where xij is the value of the variable xj in cell i, xj themean of this variable over all cells, and s its standardxj

deviation. Let Z be the N 3 V matrix of standardizedmeasurements zij. The V 3 V covariance matrix amongstandardized variables is then computed as

1TR 5 Z Z (4)G N

where ZT is the transposed matrix of Z. Because ofstandardization (Eq. 3), RG is also a correlation matrix.

The NS lines of Z corresponding to the NS cells wherethe focal species was detected are then stored in a newNS 3 V matrix (say S), from which the V 3 V speciescovariance matrix is calculated:

1TR 5 S S. (5)S N 2 1S

Note that in contrast to RG, RS is not a correlationmatrix, since standardization was performed on theglobal data set, not on the species subset.

Page 4: ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE …€¦ · ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution

2030 A. H. HIRZEL ET AL. Ecology, Vol. 83, No. 7

FIG. 2. Geometrical interpretation of the Ecological-Niche Factor Analysis. Square cells of the study area are representedin a three-EGV space. The larger, lighter balloon represents the global cloud of cells, while the smaller, darker balloonrepresents the subset of cells where the focal species was observed. The straight line passing through their centroids (mG andmS) is the marginality factor. In order to extract the variance associated with this factor, the cell coordinates are projectedon a plane p perpendicular to it, thereby producing the two ellipses. In reality, those operations are typically conducted with20–30 EGVs.

Let u be a normed vector of the EGV space. Thevariance of the global distribution on this vector isuTRGu, while that of the species distribution is uTRSu.The first specialization factor should thus maximize theratio Q(u) 5 uTRGu/uTRSu. However, this vector mustalso be orthogonal to the marginality factor m, givenas the vector of means over the V columns of S:

NS1m 5 z . (6)O i j5 6N i51S

The problem therefore becomes that of finding thevector u that maximizes Q(u) under the constraint mTu5 0. This is equivalent to finding u, such that

T u R u 5 1ST u m 5 0 (7)

Tu R u max. G

A change in variables allows us to rewrite the prob-lem

Tv v 5 1Tv y 5 0 (8)

Tv Wv max

where v 5 u, y 5 z/ , and z 5 m, W 5½ T 2½R Ïz z RS S

RG is a symmetric matrix. It can be shown that2½ 2½R RS S

the solution is given by the first eigenvector of

T TH 5 (I 2 yy )W(I 2 yy ). (9)V V

Indeed,1) y is an eigenvector of H because Hy 5 (IV 2

yyT)W (IV 2 yyT)y 5 0;2) H is symmetrical and thus admits a base of or-

thonormed eigenvectors so that Hv 5 lv ⇒ vTy 5 0;and,

3) vTHv is maximum for the first eigenvector, whichalso maximizes vTWv since vTy 5 0 ⇒ vTHv 5 vT(IV

2 yyT)W (IV 2 yyT) v 5 vTWv.The V eigenvectors of H are then back transformed,

and the new eigenvectors (u 5 v) are stored in a2½RS

matrix U. These vectors are RS-orthogonal (all Su dis-tributions have variance 1 and are uncorrelated). Fur-thermore, due to the constraint that u be orthogonal tom, this system has one null eigenvalue. The corre-sponding eigenvector is thus deleted from U, and m issubstituted instead as the first column. It should benoted that, although all marginality is accounted for bythe first factor, this factor is not ‘‘pure,’’ in that theniche of the focal species may also display some re-striction on it, in addition to its departure from the

Page 5: ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE …€¦ · ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution

July 2002 2031ECOLOGICAL-NICHE FACTOR ANALYSIS

FIG. 3. The suitability of any cell from the global distri-bution is calculated from its situation (arrow) relative to thespecies distribution (histogram) on all selected niche factors.Specifically, it is calculated as twice the dashed area (sum ofall cells from the species distribution that lie as far or fartherfrom the median dashed vertical line) divided by the totalnumber of cells from the species distribution (surface of thehistogram).

mean. The amount of specialization on this first axisis provided by the difference between the traces (sumof all eigenvalues) of W and H.

All these procedures are implemented in the softwareBiomapper (A. H. Hirzel, J. Hausser, and N. Perrin,University of Lausanne, Lausanne, Switzerland).

Interpretation of the factors

The coefficients mi of the marginality factor expressthe marginality of the focal species on each EGV, inunits of standards deviations of the global distribution.The higher the absolute value of a coefficient, the fur-ther the species departs from the mean available habitatregarding the corresponding variable. Negative coef-ficients indicate that the focal species prefers valuesthat are lower than the mean with respect to the studyarea, while positive coefficients indicate preference forhigher-than-mean values. An overall marginality M canbe computed over all EGV as

V2mO i!i51

M 5 (10)1.96

so that the marginalities of different species within agiven area can be directly compared.

The coefficients of the next factors receive a differentinterpretation: the higher the absolute value, the morerestricted is the range of the focal species on the cor-responding variable. Note that only absolute valuesmatter here, since signs are arbitrary. The eigenvalueli associated to any factor expresses the amount ofspecialization it accounts for, i.e., the ratio of the var-iance of the global distribution to that of the speciesdistribution on this axis. Eigenvalues usually rapidlydecrease from the second factor to the last one, so thatonly the first four or five axes are useful to computehabitat suitability. Different criteria may be used forthe selection process, such as direct comparison withthe broken-stick distribution, or threshold value for cu-mulative variance.

A global specialization index can be computed as

V

lO i!i51

S 5 (11)V

and can be used for among-species comparisons, pro-vided the same area is used as reference.

HABITAT-SUITABILITY MAPS

A variety of methods can be envisaged to computethe suitability for the focal species of any cell from thestudy area. Among the several alternatives tested, thefollowing approach turned out to be quite robust andwas implemented in Biomapper. It builds on a countof all cells from the species distribution that lay as faror farther apart from the median than the focal cell on

a factor axis. This count is normalized in such a waythat the suitability index ranges from zero to one.

Practically, this is performed by dividing the speciesrange on each selected factor in a series of classes, insuch a way that the median would exactly separate twoclasses (Fig. 3). For every cell from the global distri-bution, we count the number of cells from the speciesdistribution that lay either in the same class or in anyclass farther apart from the median on the same side(Fig. 3). Normalization is achieved by dividing twicethis number by the total number of cells in the speciesdistribution. Thus, a cell laying in one of the two clas-ses directly adjacent to the median would score one,and a cell laying outside the species distribution wouldscore zero.

An overall suitability index of the focal cell can thenbe computed from a combination of its scores on eachfactor. In order to account for the differential ecologicalimportance of the factors, we attribute equal weight tomarginality and specialization, but, while all the mar-ginality component goes to the first factor, the spe-cialization component is apportioned among all factorsproportionally to their eigenvalue (the marginality fac-tor may thus take more than half of the weight if italso accounts for some specialization).

Repeating this procedure for each cell allows to pro-duce a habitat-suitability map, where suitability valuesrange from 0 to 1. To convert this quantitative (or semi-quantitative) map into a presence/absence one, athreshold value may be chosen, above which the cellwill be considered as suitable. A validation data setcan be used to find out the best threshold value, e.g.,through the ROC plot method (Zweig and Campbell1993, Fielding and Bell 1997), given some a priori costvalues to each type of inferential error. Typically, sinceour method builds on presence data only, lower costs

Page 6: ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE …€¦ · ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution

2032 A. H. HIRZEL ET AL. Ecology, Vol. 83, No. 7

TABLE 1. Nature and source of the 34 ecogeographical variables used in the Ecological-Niche Factor Analysis (ENFA) ofibex distribution.

Officialdatabase Topic Source† Derived EGV

AS85RDHMGWNVector 200SB

cover usetopographyhydrographyland mapibex colonies

OFSOFSOFSOFTOFEFP

frequency and proximity of rock, snow, forests, meadows, etc.altitude, slope, aspect, SD of altitudeproximity of rivers and lakesproximity of villages, towns, railways, roads, etc.calibrating and validation of presence data sets

†Abbreviations: OFT, Swiss Office of Topography; OFS, Swiss Office of Statistics; and OFEFP, Swiss Federal Office ofEnvironment.

TABLE 2. Variance explained by the first five (out of 34) ecological factors, and coefficientvalues for the 13 most important initial variables.

Margin-ality Spec. 1 Spec. 2 Spec. 3 Spec. 4

EGV (46%) (11%) (8%) (6%) (4%)

Rock frequencyGrass frequencyAltitudeDistance to grassDistance to agricultural meadows

0.3500.3210.269

20.2450.231

20.10520.02120.365

0.02120.062

20.13220.04420.56120.096

0.017

20.18620.043

0.00520.121

0.588

20.16320.039

0.11120.07320.784

Frequency of .308 slopeFrequency of dense forestDistance to secondary roadsFrequency of agricultural meadowsDistance to towns

0.23020.228

0.22220.212

0.209

20.0350.025

20.01220.907

0.012

0.07420.72720.04920.011

0.049

0.0110.037

20.1290.236

20.021

0.00420.001

0.11520.383

0.0021 SD of altitudeDistance to forestsDistance to villages

0.2040.2030.200

0.0000.0150.003

20.0050.1650.017

20.03220.06820.097

20.0440.1080.320

Notes: EGVs are sorted by decreasing absolute value of coefficients on the marginality factor.Positive values on this factor mean that ibex prefer locations with higher values on the cor-responding EGV than the mean location in Switzerland. Signs of coefficient have no meaningon the specialization factors. The amount of specialization accounted for is given in parenthesesin each column heading.

should be attributed to cells wrongly considered as suit-able than to cells wrongly considered as unsuitable.

AN APPLICATION TO THE ALPINE IBEX

The alpine ibex (Capra ibex) was exterminated fromthe Swiss Alps in the last century due to excessivehunting. Reintroduction attempts starting in 1911 werehighly successful, and colonies have since grown rap-idly throughout the Swiss Alps. As a legally protectedspecies, ibex populations were carefully monitoredsince their reintroduction, so that presence data arehighly reliable. However, as their expansion is still hin-dered by the patchiness of their habitat, they presum-ably do not occupy yet all suitable regions. Absencedata therefore do not necessarily reflect poor-qualityhabitat, a point which strongly advocates for the useof an analysis relying on presence data only.

The whole of Switzerland was chosen as referencearea, and modeled as a raster map based on the SwissCoordinate System (plane projection), comprising4 145 530 square cells of 1 ha (100 3 100 m) each. Weused 34 ecogeographical variables derived from gov-ernmental data bases (Table 1). Topographical data (al-titude, slope, and aspect) were directly obtained as

quantitative variables. Frequency and distance datawere derived from boolean variables describing soiloccupancy, as official data bases attribute each cell toone category only (snow, rocks, meadow, forest, build-ing, etc), according to a regular sampling. Distance dataexpress the distance between the focal cell and theclosest cell belonging to a given category. Frequencydescribes the proportion of cells from a given categorywithin a circle of 1200 m radius around the focal cell.Circle surface is ;5 km2, which corresponds to themean area explored daily by individual ibexes (Ab-derhalden and Buchli 1997).

The presence database was a digitized map of ibexhome ranges. The polygons drawn by fauna managerswere converted in raster format at the same resolutionas EGV maps. This raster was then randomly parti-tioned into two data sets, every cell having a 0.5 chanceof belonging to each set. The first set (101 564 cells)was used to calibrate the model, and the other set(101 550 cells) to validate it. Application of the ENFAmethod to the calibration set provided an overall mar-ginality of M 5 1.1 and an overall specialization valueof S 5 2.2, showing that ibex’s habitat differs drasti-cally from the mean conditions in Switzerland, and that

Page 7: ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE …€¦ · ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution

July 2002 2033ECOLOGICAL-NICHE FACTOR ANALYSIS

FIG. 4. Habitat-suitability map for alpine ibex in Switzerland, as computed from ENFA. The scale on the right, marked‘‘HS values,’’ shows the habitat suitability values represented by each shade in the map. The inset is displayed at a largerscale at bottom left, with ibex presence data on the right. In the habitat-suitability map, light shading denotes areas moresuitable for ibex, and dark shading denotes areas less suitable. In the ibex presence map at bottom right, dark shading denotesibex presence. The largest suitable patch is indeed occupied, while the smaller one is not, being either too small or unreachable.

ibex are quite restrictive on the range of conditionsthey withstand. The five factors retained (out of the 34computed) accounted for 74% of the total sum of ei-genvalues (that is, 100% of the marginality and 74%of the specialization). The marginality factor alone ac-counted for 46% of this total specialization, a quiteimportant value, meaning that ibex display a very re-stricted range on those conditions for which they most-ly differ from background Switzerland conditions.

Marginality coefficients (Table 2) showed that ibexesare essentially linked to high-altitude, steep, and rockyslopes, rich in pastures (rock frequency 5 0.35, altitude

5 0.27, frequency of slopes .308 5 0.23, grass fre-quency 5 0.32, distance to grass 5 20.24). By con-trast, ibex tends to avoid forest (frequency 5 20.23)and human activities (distance to secondary roads 50.22, distance to agricultural meadows 5 0.23). Aspect(northness, eastness) as well as snow and water (lakes,rivers) had only marginal effects. The very large ei-genvalue (76.6) attributed to this first factor means thatrandomly chosen cells in Switzerland are ;80 timesmore dispersed on this axis than the cells were ibexwas recorded. Or in other words, ibex are extremelysensitive to shifts from their optimal conditions on this

Page 8: ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE …€¦ · ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution

2034 A. H. HIRZEL ET AL. Ecology, Vol. 83, No. 7

FIG. 5. Box plots presenting the distributions of the hab-itat-suitability values for the whole set of cells (left) and thevalidation subset (right). The latter is made of the 101 550cells with ibex that were not included in the analysis. Boxesdelimit the interquartile range, the middle line indicating themedian; whiskers encompass the 80% confidence interval.The two distributions obviously differ, as the global one ismostly confined to low values (suitability, 5–33%), while thevalidation set concentrates on high-suitability values (75–99%).

axis. The next factors account for some more special-ization, mostly regarding agricultural meadow fre-quency and altitude (second factor) as well as forestfrequency (third factor), showing some sensitivity toshifts away from their optimal values on these vari-ables.

A suitability map was built from these five factorsfor the whole of Switzerland, which is plotted on Fig.4. Enlargement of a small part of it shows one suitablepatch where ibex was indeed recorded (on the right),as well as one smaller suitable patch (on the left) whereibex was not recorded. This patch presumably was notcolonized because of its isolation, or too small to sus-tain a viable colony on the long term. As false positiveslike this give no indication about the quality of ourmodel, standard quality estimators like the kappa index(Monserud and Leemans 1992), which attribute thesame importance to false positives and false negatives,cannot be used to validate it. Instead, we evaluated thedistribution of suitability values of cells from the val-idation set. As shown in Fig. 5, these cells differ dras-tically from the global distribution. Predicted suitabil-ity exceeds 0.5 in 83% of cells, which differs highlysignificantly (P , 0.0001, bootstrap test) from the val-ue of 24% expected if cells were randomly chosen fromthe global distribution.

DISCUSSION

Niche factors and distribution maps

The originality of the present approach lies in thefact that it builds on the concept of ecological niche,

which is central to the whole field of ecology. A basictenet of the niche theory is that fitness (or habitat suit-ability) does not bear monotonic relationships withconditions or resources, but instead decreases from ei-ther side of an optimum. In this respect, our approachdiffers fundamentally from other techniques like dis-criminant functions or first-order regressions, whererelationships are assumed linear and monotonic. Ac-cordingly, our analysis directly provides two key mea-surements regarding the niche of the focal species,namely those of marginality and of specialization. Out-puts thus have intuitive ecological meaning, and allowdirect comparisons with the niche of different species.Application of the analysis to evaluate, e.g., speciespackaging or niche-overlap measurement among mem-bers of a guild, would be straightforward extensionsof the present approach (e.g., Doledec et al. 2000)

Our application to ibex data, for instance, providesquantitative estimates of marginality and specializationfor this species which evidence its very peculiar eco-logical requirements. Furthermore, interpretation of thefactors in terms of the EGVs turns out to be very con-sistent with the experience of field specialists. In par-ticular, the EGVs that correlate with the marginalityfactor are precisely those most often cited as particu-larly relevant for ibex ecology (Rauch 1941, Nievergelt1966, Nievergelt and Zingg 1986, Hausser 1995, Hin-denlang and Nievergelt 1995).

These results obviously suffer from the same caveatas any inferential approach: a variable might turn outto correlate with one of the main axes not because ofits intrinsic importance, but because it correlatesstrongly with another crucially important variable.ENFA is a purely descriptive method and cannot extractcausality relations. Nonetheless, it provides (at worst)important cues about preferential conditions, and re-mains a powerful tool to draw potential habitat maps.

In this respect, a limitation of our software is that itdoes not yet include confidence intervals on distribu-tion maps. Increasingly, conservation managers are de-manding risk analyses that incorporate uncertainties inmodel predictions. These could clearly be obtainedthrough the bootstrapping of presence data. Though notyet implemented in Biomapper, this procedure will cer-tainly provide an important and useful extension.

A second limitation, less easy to deal with, is thatENFA only handles linear dependencies within the spe-cies niche. Multiplicative or nonlinear interactions can-not be accommodated in the present state, exceptthrough transformations or nonlinear combinations ofthe original ecogeographical variables.

A third limitation is that some EGVs may turn outto be constant in S, or in linear combination with otherEGVs, which makes RS singular. This is likely to hap-pen with coarsely measured data or small species datasets. Whenever this happens, Biomapper identifies theconstant or correlated EGVs so that the user can remove(one of) them from the analysis. An alternative ap-

Page 9: ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE …€¦ · ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution

July 2002 2035ECOLOGICAL-NICHE FACTOR ANALYSIS

proach would obviously consist of improving the fieldsample, either by increasing the presence data set orby measuring EGVs on a finer scale.

Finally, a last important point to emphasize again isthat our approach characterizes ecological niches rel-ative to a reference area. Marginality and specializationare thus bound to depend on the geographic limits ofthe study area. Some species may turn out to occur atthe very edge of their distribution, and may thus appearquite specialized in the reference set, however wide-spread they might be otherwise. Reciprocally, ibexwould have appeared much less marginal and special-ized had our sampling area been restricted to the Alps.The same distinction must be applied here as the onemade between fundamental and realized niches (Hutch-inson 1957). Our analysis does not investigate funda-mental niches, but only their specific realization withina given geographical context.

ENFA vs. logistic regressions

With respect to more standard techniques, a crucialadvantage of ENFA is that it does not require absencedata. Presence data are compared instead with back-ground environment. This of course implies that pres-ence data should be unbiased samples of actual distri-butions, which we suspect might not be the case ofmany available database, since sampling efforts are fre-quently biased with respect to environment. However,though this problem is difficult to circumvent, the pointmust also be made that database often simply lack anyabsence data. And when available, these may turn outto be either unreliable (in the case of cryptic or poorlyknown species) or meaningless (in the case of invadingspecies, or those living in fragmented habitats wheresome patches have become extinct). As many speciesenter one of these categories, our approach potentiallyhas a wide application range. In particular, predictionsabout the expected expansion of invading species seema promising test bed.

In the case of stable populations from well-knownspecies, one might prefer more classical approachessuch as logistic regressions, able to extract relevantinformation from absence or abundance data. Thispoint deserves proper investigation, in order to localizethe threshold where the benefits gained from incor-porating absence data are compensated by the costsinduced by their possible poor quality. Investigationsare presently in progress to compare the power ofENFA to that of classical logistic regression analysesunder different biological and sampling scenarios, inorder to assess their respective advantages and incon-veniences. Preliminary results show ENFA to be morerobust than classical logistic regressions with respectto several habitat-occupancy scenarios (Hirzel et al.2001).

Finally, the point must also be made that the pro-cedures used by standard stepwise analyses to selectsignificant EGVs from the original set turn out to be

highly sensitive to the algorithms chosen, as well as tothe input order. Consequences are that (1) many trialsare needed in order to sort out the ‘‘best’’ model, and(2) variables that bear a causal relationship to the focalspecies’ presence might well be lost in the process, ifother EGVs present spurious correlations. This impliessome subjective choices, and requires a good a prioriknowledge of the focal species’ ecology. In contrast,our factor analysis does not reject any input EGV, butonly weights them. The subjective components and apriori knowledge required are thereby kept minimal,and correlations among variables and axes are imme-diately visible and interpretable.

ACKNOWLEDGMENTS

This research was supported by the Swiss Federal Officefor Environment, Forest and Landscape (OFEFP), grant0310.3600.305. We warmly thank H. J. Blankenhorn for hisenthusiastic endorsement of the Ibex Project, and all peoplewho tested Biomapper, particularly P. Patthey for his thoroughcheck and judicious suggestions.

LITERATURE CITED

Abderhalden, W., and C. Buchli. 1997. Steinbockprojekt Al-bris/SNP, Schlussbericht. ARINAS and FORNAT, Zernez(Graubunden), Switzerland.

Akcakaya, H. R., and J. L. Atwood. 1997. A habitat-basedmetapopulation model of the California Gnatcatcher. Con-servation Biology 11:422–434.

Akcakaya, H. R., M. A. McCarthy, and J. L. Pearce. 1995.Linking landscape data with population viability analysis:management options for the helmeted honeyeater Lichen-ostomus melanops cassidix. Biological Conservation 73:169–176.

Akcakaya, H. R., and M. G. Raphael. 1998. Assessing humanimpact despite uncertainty viability of the northern spottedowl metapopulation in the northwestern USA. Biodiversityand Conservation 7:875–894.

Begon, M., J. L. Harper, and C. R. Townsend. 1996. Ecology.Blackwell Science, Oxford, UK.

Clark, J. D., J. E. Dunn, and K. G. Smith. 1993. A multi-variate model of female black bear habitat use for a geo-graphic information system. Journal of Wildlife Manage-ment 57:519–526.

Cooley, W. W., and P. R. Lohnes. 1971. Multivariate dataanalysis. John Wiley, New York, New York, USA.

Doledec, S., D. Chessel, and C. Gimaret-Carpentier. 2000.Niche separation in community analysis: a new method.Ecology 81:2914–2927.

Fielding, A. H., and J. F. Bell. 1997. A review of methodsfor the assessment of prediction errors in conservation pres-ence/absence models. Environmental Conservation 24:38–49.

Glass, G. V., and K. D. Hopkins. 1984. Statistical methodsin education and psychology. Second edition. Prentice Hall,London, UK.

Hausser, J. 1995. Mammiferes de Suisse. Birkhauser, Bale,Switzerland.

Higgins, S. I., D. M. Richardson, M. C. Richard, and T. H.Trinder-Smith. 1999. Predicting the landscape-scale dis-tribution of alien plants and their threat to plant diversity.Conservation Biology 13:303–313.

Hindenlang, K., and B. Nievergelt. 1995. Capra ibex L.,1758. Pages 450–456 in J. Hausser, editor. Mammiferes deSuisse. Birkhauser, Basel, Switzerland.

Hirzel, H., V. Helfer, and F. Metral. 2001. Assessing habitat-suitability models with a virtual species. Ecological Mod-elling 145:111–121.

Page 10: ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE …€¦ · ecological niche, this factor analysis compares, in the multidimensional space of ecological variables, the distribution

2036 A. H. HIRZEL ET AL. Ecology, Vol. 83, No. 7

Hutchinson, G. E. 1957. Concluding remarks. Cold SpringHarbour Symposium on Quantitative Biology 22:415–427.

Jongman, R. H. G., C. J. F. ter Braak, and O. F. R. VanTongeren. 1987. Data analysis in community and landscapeecology. Cambridge University Press, Cambridge, UK.

Kery, M. 2000. Ecology of small populations. Dissertation.University of Zurich, Zurich, Switzerland.

Legendre, L., and P. Legendre. 1998. Numerical ecology.Second English edition. Elsevier Science BV, Amsterdam,The Netherlands.

Livingston, S. A., C. S. Todd, W. B. Krohn, and R. B. Owen.1990. Habitat models for nesting bald eagles in Maine.Journal of Wildlife Management 54:644–657.

Manel, S., J. M. Dias, S. T. Buckton, and S. J. Ormerod.1999. Alternative methods for predicting species distri-bution: an illustration with Himalayan river birds. Journalof Applied Ecology 36:734–747.

McArdle, B. H. 1990. When are rare species not there? Oikos57:276–277.

Mladenoff, D. J., R. C. Haight, T. A. Sickley, and A. P. Wy-deven. 1997. Causes and implications of species restora-tion in altered ecosystems: a spatial landscape projectionof wolf population recovery. Bioscience 47:21–31.

Mladenoff, D. J., T. A. Sickley, R. G. Haight, and A. P. Wy-devens. 1995. A regional landscape analysis and predictionof favorable Gray Wolf habitat in the northern Great Lakesregion. Conservation Biology 9:279–294.

Monserud, R. A., and R. Leemans. 1992. Comparing globalvegetation maps with the Kappa statistic. Ecological Mod-elling 62:275–293.

Nievergelt, B. 1966. Der Alpensteinbock (Capra ibex L.) inseinem Lebensraum: ein okologischer Vergleich verschie-dener Kolonien. Mammalia depicta, Hamburg, West Ger-many.

Nievergelt, B., and R. Zingg. 1986. Capra ibex Linnaeus,1758—Steinbock. Pages 384–404 in J. Niethammer and F.

Krapp, editors. Handbuch der Saugetiere Europas: Paar-hufer. AULA-Verlag, Wiesbaden, Germany.

Ozesmi, S. L., and U. Ozesmi. 1999. An artificial neuralnetwork approach to spatial habitat modelling with inter-specific interaction. Ecological Modelling 116:15–31.

Palma, L., P. Beja, and M. Rodgrigues. 1999. The use ofsighting data to analyse Iberian lynx habitat and distribu-tion. Journal of Applied Ecology 36:812–824.

Peeters, E. T. H., and J. J. P. Gardeniers. 1998. Logistic re-gression as a tool for defining habitat requirements of twocommon gammarids. Freshwater Biology 39:605–615.

Rauch, A. 1941. Le bouquetin dans les Alpes. Payot, Paris,France.

Roloff, G. J., and J. B. Haufler. 1997. Establishing populationviability planning objectives based on habitat potentials.Wildlife Society Journal 25:895–904.

Sanchez-Zapata, J. A., and J. F. Calvo. 1999. Raptor distri-bution in relation to landscape composition in semi-aridMediterranean habitats. Journal of Applied Ecology 36:254–262.

Sokal, R. R., and F. J. Rohlf. 1981. Biometry: the principlesand practice of statistics in biological research. W.H. Free-man, New York, New York, USA.

Solow, A. R. 1993. Inferring extinction from sighting data.Ecology 74:962–964.

Spitz, F., and S. Lek. 1999. Environmental impact predictionusing neural network modelling: an example in wildlifedamage. Journal of Applied Ecology 36:317–326.

ter Braak, C. J. F., and C. W. N. Looman. 1987. Regression.Pages 29–77 in R. H. G. Jongman, C. J. F. ter Braak, andO. F. R. Van Tongeren, editors. Data analysis in communityand landscape ecology. Cambridge University Press, Cam-bridge, UK.

Zweig, M. H., and G. Campbell. 1993. Receiver-operatingcharacteristic (ROC) plots: a fundamental evaluation toolin clinical medicine. Clinical Chemistry 39:561–577.