rodolphe devillers (almost) everything you always wanted to know (or maybe not…) about...
TRANSCRIPT
Rodolphe Devillers
(Almost) everything you
always wanted to know (or maybe
not…) about Geographically
Weighted Regressions
JCU Stats Group, March 2012
Outline
• Background• Spatial autocorrelation• Spatial non-stationarity• Geographically Weighted Regressions (GWR)
Outline
• Background• Spatial autocorrelation• Spatial non-stationarity• Geographically Weighted Regressions (GWR)
Background
Decrease in cod populations
1984
1985
Decrease in cod populations
1986
Decrease in cod populations
1987
Decrease in cod populations
1988
Decrease in cod populations
1989
Decrease in cod populations
1990
Decrease in cod populations
1991
Decrease in cod populations
1992
Decrease in cod populations
1993
Decrease in cod populations
1994
Decrease in cod populations
Scientific surveys
Fisheries observers
4 species
> 800 000 records
GeoCod Project (2006-…)
Biological Data
Goal: Get a better understanding of the spatial and temporal dynamics of some fish/shellfish species in the NW Atlantic region, and their relationship with the physical environmentalEnvironmental Data
Temperature
Salinity
Remote Sensing
> 300 GB
Fisheries data
Collection
Environmental data
Other data(Bathy,
etc.)
Integration Analysis
Normalized database
Visualization
1 2 3 4
GeoCod project
Context
• A number of statistical methods can be used• Testing spatial statistics
SpeciesEnvironne
ment
?
Outline
• Background• Spatial autocorrelation• Spatial non-stationarity• Geographically Weighted Regressions (GWR)
Spatial autocorrelation
• “…the property of random variables taking values, at pairs of locations a certain distance apart, that are more similar (positive autocorrelation) or less similar (negative autocorrelation) than expected for randomly associated pairs of observations.” (Legendre, 1993)
Spatial autocorrelation - Basics
Positive(Neighbours more similar)
Neutral(Random)
Negative(Neighbours less similar)
http://www.spatialanalysisonline.com/
Spatial autocorrelation – is it common?
• Elevation• Air/water temperature
• Air humidity• Disease distribution• Species abundance• Housing value• Etc.
Spatial autocorrelation – why bother?• Spatial autocorrelation in the data leads to spatial autocorrelation in the residuals
GWR Residuals
-.76 - -.35-.34 - -.09-.08 - .09.10 - .26.27 - .56
OLS Residuals
-1.34 - -.53-.52 - -.19-.18 - .08.09 - .37.38 - .92
0 100 200 30050Kilometers
±Moran's I = 0.144 Moran's I = 0.372
Spatial autocorrelation – why bother?• Most statistics are based on the assumption that the values of observations in each sample are independent of one another
• Consequence: it will violate the assumption about the independence of residuals and call into question the validity of hypothesis testing
• Main effect:• Standard errors are underestimated,• t-scores are overestimated (= increases the chance of a
Type I error = Incorrect rejection of a Null Hypothesis)• Sometime inverts the slope of relationships.
Spatial autocorrelation – how to measure it?• Measures of spatial autocorrelation:
• Moran’s I
• Geary’s C
• Others (e.g. Getis’ G)
Spatial autocorrelation – How can I deal with it?• Many ways to handle this:
• Subsampling, adjusting type I error, adjusting the effective sample size, etc. (Dale and Fortin (2002) Ecoscience 9(2))
• Autocovariate regressions, spatial eigenvector mapping (SEVM), generalised least squares (GLS), conditional autoregressive models (CAR), simultaneous autoregressive models (SAR), generalised linear mixed models (GLMM), generalised estimation equations (GEE), etc. (More details: Dormann et al. (2007) Ecography 30)
• If spatial autocorrelation is not stationary: GWR
Outline
• Background• Spatial autocorrelation• Spatial non-stationarity• Geographically Weighted Regressions (GWR)
Stationarity
• Classical regression models are valid under the assumptions that phenomena are stationary temporally and spatially (=statistical parameters such as the mean, the variance or the spatial autocorrelation do not vary depending on the geographic position)
• E.g. Coral bleaching = 0.55 Temperature + 0.37 Nutrients + … - …
• Studies (in various fields, including terrestrial ecology) have shown that they are rarely stationary
Global vs Local Statistics
Simpson Paradox
Local spatial statistics
• Local Indicators of Spatial Association (LISA)• Local Moran’s I (used to detect clustering)• Getis-Ord Gi* (hotspot analysis)• Look at GeoDa (free software from Luc Anselin -
http://geodacenter.asu.edu/)
• Local regressions: GWR
Outline
• Background• Spatial autocorrelation• Spatial non-stationarity• Geographically Weighted Regressions (GWR)
• Brunsdon, Fortheringham and CharltonGWR
GWR
• Increasingly used in various fields (mostly since 2006, and even more since integrated into ArcGIS)
• Sally: yes, it is also available in R… (spgwr)
• Criticized by some authors (e.g. Wheeler 2005, Cho et al. 2009) when using collinear data, potentially leading to:
• Occasional inflation of the variance• Rare inversion of the sign of the regression
GWR
Windle, M., Rose, G., Devillers, R. and Fortin, M.-J. Exploring spatial non-stationarity of fisheries survey data using geographically weighted regression (GWR): an example from the Northwest Atlantic. ICES Journal of Marine Science, 67: 145-154.
GWR
• Geographically Weighted Regression (GRW)
• (μ,ν): geographic coordinates of the samples
• Multiple regression model (global)
• y: dependent variable, x1 to xp: independent variables, β0: origin, β1 to βp: coefficients, ε: error.
Cod presence/absence (threshold at 5 kg) for the Fall 2001
Method
Government fisheries scientific survey data (Fisheries and Oceans Canada)
Method – Data interpolation
Method
Combining data in a single point data file
Exporting data points in a file (.dbf)
Temperature
Cod
Crab
Shrimp
Year 2001
Method
GWR software (version 3.0)
200km used for tests
About 25 minutes per file of 5500 points
Fixed
Variable
Results
Test of spatial stationarity of independent variables used in the regression
Spatial stationarity
Spatial non-stationarity
Results spatial stationarity
Windle et al. (accepted) - MEPS
Stationarity of bottom temperatureused to model shrimp biomass
Results
Comparison of regression models
Results
Test of the spatial auto-correlation of the residuals
Results
Results
Results
K-means clustering of the t values of the GWR coefficients
Positive relationship between crab and shrimp, weak relationship with the coast
Negative relationship with crab and distance, positive with shrimp
Stronger negative relationship with crab
Results
GAM systematically has lower AIC values, suggesting a non-linear relationship between cod and the variables used in the analysis
Strong
WeakAIC: Akaike Information Criterion
Results
1985 1986 1987 1988 1989 1990 1991 19920
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Logistic_R2 GAM_R2 GWR_R2_ave
Year
R2Min and
max GWR coefficients (R2)
Model power decreases with years
GWR coefficients– Capelan1985
1986
1987
1988
1989
1990
1991
1992
GWR coefficients – Catch per Unit Effort1985
1986
1987
1988
1989
1990
1991
1992
Conclusions
• The spatial structure of data matters• Ecology (and mostly marine ecology) is still in the process of adopting such methods
• GWR is an interesting method but can be hard to interpret and should be used together with other methods
Questions?
http://www.ucs.mun.ca/~rdeville/geocod
Technical questions beyond my knowledge: Matt Windle ([email protected])
Technical questions beyond Matt’s knowledge: [email protected] (allow for several months for an answer)