exploratory tools for spatial data: diagnosing spatial autocorrelation main message when modeling...

20
Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelat Main Message when modeling & analyzing spatial data: SPACE MATT lationships between observations from independent data can be analy umerous ways. Some include: Estimation through Stochastic Dependencies 2. Spatial Regression: Deterministic structure of the mean function. 3. Lattice Modeling: expressing observations as functions of neighboring values. ter Emphasis: exploratory tools for spatial data must allow some ins spatial structure in the data.

Upload: kristian-foster

Post on 05-Jan-2016

234 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation

Main Message when modeling & analyzing spatial data: SPACE MATTERS!

Relationships between observations from independent data can be analyzed in numerous ways. Some include:

1. Estimation through Stochastic Dependencies

2. Spatial Regression: Deterministic structure of the mean function.

3. Lattice Modeling: expressing observations as functions of neighboring values.

Chapter Emphasis: exploratory tools for spatial data must allow some insight into the spatial structure in the data.

Page 3: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Example of using lattice modeling to demonstrate importance of retaining spatial information:

10 X 10 lattices filled with 100 observations drawn at random.

Lattice A is a completely random assignment of observations to lattice positions.

Lattice B is an assignment to positions such that a value is surrounded by values similar in magnitude.

Page 4: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Histograms of the 100 observed values that do not take into account spatial position will be identical for the two lattices:

Note: The density estimate is not an estimate of the probability distribution of the data; that requires a different formula. Even if the histogram is calculated by lumping data across spatial locations appears Gaussian does not imply that the data are a realization of a Gaussian random field.

Page 5: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Plotting observed values against the average value of the nearest neighbors the difference in the spatial distribution between the two lattices emerge:

The data in lattice A are not spatially correlated and the data in lattice B are very strongly autocorrelated.

Terminology:

Page 6: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Distinguishing between spatial and non-spatial arrangements can detect outliers.

In a box plot or a stem & leaf plot, outliers are termed “distributional.”A “spatial” outlier in an observation that is unusual compared to its surrounding values.

Diagnosing Spatial Outliers:

Median-Polish the data, meaning remove the large scale trends in the data by some outer outlier-resistant method, and to look for outlying observations in a box-plot of the median-polished residuals.

Use of Lag Plots (Previous example)

Page 7: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Concerning Mercer and Hall Grain Yield. 1

S+Spatial States Code:Bwplot(y~grain, data=wheat, ylab=“Row”, xlab= “Grain Yield”)Bwplot (x~grain,data=wheat, ylab=“Column”, xlab= “Grain Yield”)

Page 8: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Describing, Diagnosing, and Testing the Degree of Spatial Autocorrelation

Geostatistical Data: the empirical semivariogram provides an estimate of the spatial structure.

Lattice data JOINT-COUNT statistics have been developed for binary and nominal data.

Moran (1950) and Geary (1954): developed autocorrelation coefficients for continuous attributes observed on lattices.

Coefficient Moran’s “I” and Geary’s “C.”

Comparing an estimate of the covariation among the Z(s) to an estimate of their variation. 2

Page 9: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Let Z(si), i= 1,2,3,…,n denote the attribute Z observed at site si and Ui= Z(si)- Z its centered version.

wij denotes the neighborhood connectivity weight between sites si and sj with wii= 0.

Page 10: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

In the absence of spatial autocorrelation, I has an expected value E[I]= -1/(n-1)

values I > E[I] indicate positive autocorrelation. values I < E[I] indicate negative autocorrelation.

To determine whether a deviation of I from its expectation is statistically significant one relies on the asymptotic distribution of I which is Gaussian with mean -1/(n-1) and variance δ2

I.

The hypothesis of no spatial autocorrelation is rejected at the α x 100% significance level if |Zobs| = |I- E[I]| / σI

is more extreme than the za/2 cutoff of a standard Gaussian distribution.

Page 11: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

1. Assume Z(si) are Gaussian

Under Null Hypothesis, Z(si) are assumed G(μ,σ2), so that Ui ~ (0, σ2(1-1/n))

2. Randomization Framework

Z(si) are considered fixed; randomly permuted among the n lattice sites.

There are n! equally likely random permutations and σI

2

is the variance of the n! Moran I values. 3

Best Alternative to Randomization.

Page 12: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Calculates the Zobs statistics and p-values under the Gaussian and randomization assumption.

Data containing the W matrix (W= [wij] ) is passed to the macro through the w_data option. (we are utilizing SAS®macro %MoranI)

For rectangular lattices: use the macro %ContWght (in file \SASMacros\ContiguityWeights.sas) calculates the W matrices for classical neighborhood definitions.

Page 13: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

%include ‘DriveLetterofCDROM: \Data\SAS\MercerWheatYieldData.sas’;%include ‘DriveLetterofCDROM: \SASMacros\ContiguityWeights.sas’;%include ‘DriveLetterofCDROM: \SASMacros\MoranI.sas’;

Title1 “Moran’s I for Mercer and Hall Wheat Yield, Rook’s Move”;%Contwght (rows=30, cols=25, move=rook, out=rook);%MoranI(data=mercer, y=grain, row=row, col=col, w_data=rock);

4

Page 14: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

1. Sensitive to large scale trends in data2. Very sensitive to the choice of the neighborhood matrix W

If the rook definition (edges abut) is replaced by the bishop’s move (touching corners), the autocorrelation remains significant but the value of the test statistic is reduced by about 50%.

Title1 Moran’s I for Mercer and Hall Wheat Grain Data, Bishop’s Move”; %ContWght (row=20, cols=25, move=bishop, out=bishop); %MoranI(data=mercer, y=grain, row=row, col=col, w_data=bishop);

5

Page 15: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Linear Model: Z=1.4 + 0.1x + 0.2y +0.002x2 + e, e~iidG(0,1), where x and y are the lattice coordinates.

Data simulate; do x= 1 to 10; do y= 1 to 10; z= 1.4 + 0.1*x + 0.2*y +0.002*x*x + rannor(2334); output; end; end;Run; Title1 “Moran’s I for independent data with large-scale trend”;%ContWght(rows=10, cols=10, move=rock, out=rock);%MoranI(data=simulate, y=z, row=x, col=y, w_data=rook)

Test indicates strong positive “autocorrelation” which is an artifact of the changes in E[Z] rather than stochastic spatial dependency among the sites.

Page 16: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

IF trend contamination distorts inferences about the spatial autocorrelation coefficient, then it seems reasonable to remove the trend and calculate the autocorrelation coefficient from the RESIDUALS.

The residual vector

Modified I test statistic

The mean and variance differ a little bit, now, the E[I*] depends on the weights W and the X matrix. (6)

Page 17: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Title1 “Moran’s I for Mercer and Hall Wheat Yield Data”;Title 2 “Calculated for Regression Residuals”;%include “DriveLetterofCDROM: \SASMacros\MoranResiduals.sas’;Data xmat: set mercer; x1= col; x2= col**2, x3= col**3; keep x1 x2 x3Run;%RegressI(xmat=xmat, data=mercer, z=grain, weight=rook, local=1);

This particular code fits a large scale mean model with cubic column effects and no row effects. This adds higher order terms for column effects and leaves the results essentially unchanged.

7

Page 18: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

The value of Zobs is slightly reduced from Output 9.3(slide 14) indicating that the column trends did add some false autocorrelation. P value is highly significant, conventional tests for independent data is not a fun analysis.

Page 19: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Optional Parameter: local= 8

LISA: Local Indicator of Spatial Association

The interpretation is that if the test statistics is < Expected Value then sites connected to each site si have attribute values dissimilar from Z(si)

A high (low) value at si is surrounded by low (high) values.

If the test statistic is > Expected Value, then a high (low) value at Z(s i) is surrounded by high (low) values at connected sites.

Page 20: Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Graph shows detrended Mercer and Hall grain yield data with sites with positive LISAs. Hot-spots where autocorrelation is locally much greater than for the remainder of the lattice is obvious.