-referenced variable x€¦ · geary„s c is based on paired comparisons of values of a...
TRANSCRIPT
3.2.3 Geary‘s c
Geary„s c is based on paired comparisons of values of a geo-referenced variable X
In neighbouring regions to measure spatial autocorrelation.
Geary„s c with unstandardized weights :
(3.12)
Geary„s c with standardized weights wij:
(3.13)
with
*ijw
Range of Geary„s c: [0; 2], Expectation of C under independency: E( C)=1
(spatial randomness)
Positive spatial autocorrelation: 0 ≤ C < 1
Negative spatial autocorrealtion: 1 < C ≤ 2
n
1i
2i0
n
1i
n
1j
2ji
*ij
)xx(S2
)xx(w)1n(
C
n
1i
n
1j
*ij0 wS
n
1i
2i
n
1i
n
1j
2jiij
)xx(n2
)xx(w)1n(
C
1
Table 1: Weighted squared differences wij(xi-xj)2
Example:
We show the calculation of Geary„s c in the five-region example with the standar-
dized weight matrix.
Standardized weight matrix: Observation vector:
01000
3/103/13/10
03/103/13/1
03/13/103/1
002/12/10
W
2
3
6
6
8
x
Region 1 2 3 4 5
1 0 (1/2)(8-6)2=2 (1/2)(8-6)2=2 0 0
2 (1/3)(6-8)2=1 1/3 0 (1/3)(6-6)2=0 (1/3)(6-3)2=3 0
3 (1/3)(6-8)2=1 1/3 (1/3)(6-6)2=0 0 (1/3)(6-3)2=3 0
4 0 (1/3)(3-6)2=3 (1/3)(3-6)2=3 0 (1/3)(3-2)2=1/3
5 0 0 0 1(2-3)2=1 0
Sum of weighted squared differences 20 2
The sum of sqared deviations from the mean has already been calculated with
Moran„s I (section 3.2.1):
5
1i
2i 24)xx(
Geary„s c with standardized weights wij (n=5):
3333,0240
80
2452
20)15(C
0 ≤ C=0,3333 < 1: positive spatial autocorrelation
3
Comparison between Moran„s I and Geary„s c:
Evaluating spatial autocorrelation with Moran„s I and Geary„s c leads to similar
but not identical results.
Griffith (1987) notes that simulation experiments suggest that the inverse relation-
ship between Moran's I and Geary's C is basically linear in nature. Departures
from linearity are ascribed to differences in what each of the two indices measure.
Geary's C deals with paired comparisons and Moran's I with covariations.
The relation between Moran's I and Geary's C can be compared by randomization
experiments
4
Figure: Relation between Moran's I and Geary's C for 20000 statistics
generated using rook contiguity
5
3.2.4 Getis-Ord G statistic
Getis and Ord (1992) have suggested a somewhat different approach to measuring
spatial association using a distance-based contiguity matrix. Neighbourhoods are
defined by a critical distance d. All regions within the critical distance d from a spa-
tial unit i are neighbours of that region.
The Getis-Ord G statistic are conceived for assessing overall spatial concentration.
An application of the G statistic is restricted to geo-referenced variables with posi-
tive values and a natural origin.
G statistic:
(3.14)
The G statistic measures the proportion of the sum of the products of each xi with
an xj value within a distance d from i to the total sum of all products xi·xj, j≠i. G(d)
provides a evidence of global spatial clustering of high values (“hot spots”). A low
value of G(d) will occur in case of low value clustering but may emerge also in case
of negative spatial autocorrelation.
high G(d): overall concentration of high attribute values
low G(d): lack of overall concentration of high attribute values
Range: 0 ≤ G ≤ 1
n
1i ijji
n
1i ijjiij
xx
xx)d(w
)d(G
6
Weights of the binary matrix W(d):
With respect to a unique usage of global and local Getis-Ord statistics
( section 3.3.2) we set the elements of the main diagonal wii equal to 1.
Note that the G statistic is not affected by this definition.
otherwise,0
ddif,1)d(w
ijij
7
Example:
1 2
3
4 5
Distances between regions are measured by distances between their centres. In
our five-region example,
we impute the following distances between centres (in km):
Region 1 2 3 4 5
1 0 6 5 11 14
2 6 0 4 5 8
3 5 4 0 7 10
4 11 5 7 0 3
5 14 8 10 3 0
8
The above table covers the entries of the distance matrix D:
0310814
307511
107045
85406
1411560
D
We set the critical distance d equal to 7.5 kilometres. The spatial weight matrix
W(d) corresponding with d=7.5 reads:
Because of the particular choice of the critical distance, W(d=7.5) is – apart from
the main diagonal elements - identical with the „ordinary“ first-order contiguity
matrix W*.
11000
11110
01111
01111
00111
)5.7d(W
9
Region 1 2 3 4 5
1 - 86=48 86=48 83=24 82=16
2 68=48 - 66=36 63=18 62=12
3 68=48 66=36 - 63=18 62=12
4 38=24 36=18 36=18 - 32=6
5 28=16 26=12 26=12 23=6 -
j≠i Sum of products xixj, j≠i 476
Calculation of the denominator of (3.14):
Calculation of the numerator of (3.14):
Region 1 2 3 4 5
1 - 86=48 86=48 0 0
2 68=48 - 66=36 63=18 0
3 68=48 66=36 - 63=18 0
4 0 36=18 36=18 - 32=6
5 0 0 0 23=6 -
j≠i Sum of weighted products wij(d)xixj, j≠i 348
G statistic:
7311.0476
348G
Observation vector x: '23668x
10
Test for global spatial clustering
Null hypothesis H0: Lack of spatial concentration of attribute values
Alternative hypothesis H1: Spatial concentration of high attribute values
Expected value of G(d):
(3.16))1n(n
W)]d(G[E
with
n
1i ijij )d(wW
Test statistic: (3.15))G(Var
)G(EG)G(Z
a
~ N(0,1)
Variance of G(d):
(3.17)
with
4,3,2,1r,xmn
1i
rir
(rth non-centered moment of X multiplied by n)
22 )]G(E[)G(E)G(Var
)mBmmBmmBmBmB(
)3n)(2n)(1n(n)mm(
1)G(E
4143132
21241
220
22
21
2
11
221
20 W3nSS)3n3n(B
]W3nS2S)nn[(B 221
21
]W6S)3n(nS2[B 2212
2213 W8S)1n(2S)1n(4B
2214 WSSB
n
1i ij
2jiij1 )]d(w)d(w[
2
1S
n
1i
2ii
2n
1i ij ijjiij2 )ww()d(w)d(wS
with
ij
iji )d(ww and
ij
jii )d(ww
12
Example:
In order to test for global spatial clustering on the basis of the G statistic, we
have to compute its expected value and variance.
Expected value of G(d):
Calculation of W:
Regio
n
1 2 3 4 5
1 - 1 1 0 0
2 1 - 1 1 0
3 1 1 - 1 0
4 0 1 1 - 1
5 0 0 0 1 -
ΣΣ j≠i Sum of wij, j≠i 12
6.020
12
)15(5
12
)1n(n
W)]d(G[E
13
242122
1)]d(w)d(w[
2
1S 25
1i ij
2jiij1
2)d(wwij
j11
3)d(wwij
j22
3)d(ww
ijj33
Row sums of W(d):
3)d(wwij
j44
1)d(wwij
j55
2)d(wwij
1j1
Column sums of W(d):
3)d(wwij
2j2
3)d(wwij
3j3
3)d(wwij
4j4
1)d(wwij
5j5
12843636361626664
)11()33()33()33()22()ww(S
22222
222225
1i
2ii2
14
Variance of G(d):
104432640312
123128524)3535(W3nSS)3n3n(B 22221
20
368)4321280480(
]1231285224)55[(]W3nS2S)nn[(B 22221
21
80)8641024240(
)126128)35(2452(]W6S)3n(nS2[B 22212
011521536384
128128)15(224)15(4W8S)1n(2S)1n(4B 22213
401212824WSSB 22214
15
Observation vector of the
attribute variable X:
23668'x
25)23668(xmn
1i
1i1
Moments of X multiplied by n:
149)49363664()23668(xm 22222n
1i
2i2
979)827216216512()23668(xm 33333n
1i
3i3
6785)1681129612964096()23668(xm 44444n
1i
4i4
16
)mBmmBmmBmBmB(
)3n)(2n)(1n(n)mm(
1)G(E
4143132
21241
220
22
21
2
]2540979250)1492580(6785368149104[
)35()25()15(5)14925(
1
422
22
477426.01298078427189120
1
)156250000745000024968802308904(27189120
1
117426.06.0477426.0)]G(E[)G(E)G(Var 222
17
Test statistic:
Critical value (α=0.05, one-sided test):
z1-α = 1.6449
Test decision:
z(G) = 0.3826 < z0.95 = 1,6449 => Accept H0
Interpretation:
No global evidence for substantive spatial clustering of high unemployment
regions
Hint:
As the normal approximation requires a large sample size, the test on the Getis-
Ord G statistic has only been performed here for illustrative purposes.
3826.0342675.0
1311.0
117426.0
6.07311.0
)G(Var
)G(EG)G(z
18
3.3 Local indicators of spatial association (LISA)
While global spatial autocorrelation analysis aims at summarizing the strength
of spatial dependencies by a single statistic, local spatial autocorrelation analy-
sis focuses on heterogeneity of spatial association over space. Instead of a single
global statistic, location-specific statistics are provided.
Local indicators of spatial association (LISA) provide detailed information on
spatial clustering (Anselin, 1995). The LISA for each observation gives an indication
of the extent of substantial spatial clustering of similar values around that observation.
Some LISA have also the property that their sum or average is proportional to the
global counterpart.
LISA aims at identifying local clusters and spatial outliers. Local clusters are charac-
terized by a concentration of high or low values of an attribute variable X. A spatial
clustering of contiguous high-value regions is called a „hot spot“, whereas a concen-
tration of low-value regions defines a „cold spot“. Both cases are associated with
positive local autocorrelation. Spatial outliers are regions with a reversed local
orientation compared to the predominant global one. When positive global spatial
autocorrelation has been established, regions with negative local autocorrelation
coefficients represent spatial outliers.
19
We deal with three well-known local indicators of spatial association,
- the local Moran statistic (Anselin, 1995),
- the Getis-Ord Gi statistic (Getis and Ord, 1992),
- the Getis-Ord Gi* statistic (Getis and Ord, 1992),
which complement one another with regard to identification of spatial clusters and
spatial outliers. The local Moran coefficient is adapted for identifying spatial
outliers and general but not specific clustering formations. For the latter purpose
the Getis-Ord Gi and Gi*statistics have to be applied. They can distinguish be-
tween „hot spots“ and „cold spots“ both of which are characterized by high posi-
tive spatial autocorrelation.
20
3.3.1 Local Moran statistic
n
1j
2j
n
1jjiji
i
n/)xx(
)xx(w)xx(
I
The Local Moran statistic Ii detects local spatial autocorrelation. The Ii„s are indica-
tors of local instability. They decompose Moran's I into contributions for each loca-
tion.
According to this property, Local Moran statistics can be used for two purposes:
- Indicators of local spatial clusters,
- Diagnostics for outliers in global spatial patterns.
Local Moran statistic:
(3.15)
Numerator
Determines the sign of Ii:
+, if both the ith region and the neighbouring have above or below average
values in the geo-referenced variable X
-, if the ith region has an above (below) and the neighbouring regions have a be-
low (above) average values in X
Denominator
Standardization of the cross-product by the variance sx² of the geo-referenced va-
riable X21
Relationsship between global and local Moran statistics:
The average of the Ii's coincides with Moran's I:
n
1iiI
n
1I
Expected value (under independence):
1n
1
1n
w)I(E i
i
(3.16)
1wwn
1jiji
with
22
Note: Random permutation tests on local Moran„s I statistics are available in pro-
grams like GeoDa and R (package spdep). Because of the high computational ex-
pense, the testing approach is introduced in the computer exercise. In the following
example local Moran„s I statistics are interpreted descriptively.
Example:
We calculate Local Moran statistics with the standardized weights wij.
5x
01000
3/103/13/10
03/103/13/1
03/13/103/1
002/12/10
W
2
3
6
6
8
x
The sum of sqared deviations from the mean has already been calculated with
Moran„s I (section 3.2.1):
Standardized weight matrix: Observation vector ( ):
Expected value: 25.015
1
1n
1
1n
w)I(E i
i
8.4245
1)xx(
n
1s
5
1j
2j
2x
23
● Region 1:
Weighted sum of deviations from the mean:
1)56(2
1)56(
2
1)xx(w
5
1jjj1
6250.08.4
3
8.4
1)58(I1
Local Moran statistic:
● Region 2:
Weighted sum of deviations from the mean:
Local Moran statistic: 1389.04.14
2
8.4
)3/2()56(I2
3
2)53(
3
1)56(
3
1)58(
3
1)xx(w
5
1jjj2
24
● Region 3:
Weighted sum of deviations from the mean:
3
2)53(
3
1)56(
3
1)58(
3
1)xx(w
5
1jjj3
Local Moran statistic: 1389.04.14
2
8.4
)3/2()56(I3
● Region 4:
Weighted sum of deviations from the mean:
Local Moran statistic:
3
1)52(
3
1)56(
3
1)56(
3
1)xx(w
5
1jjj4
1389.04.14
2
8.4
)3/1()53(I4
25
● Region 5:
Weighted sum of deviations from the mean:
2)53(1)xx(w5
1jjj5
Local Moran statistic: 2500.18.4
6
8.4
)2()52(I5
Moran„s I = Average of Local Moran Statistics:
[Section 3.2.1: I = 0,4583 (with standardized weights)]
4583.05/2917.2
5/)2500.11389.01389.01389.06250.0(I5
1I
5
1ii
Interpretation:
- A local spatial clustering is identified around region 5 and to a somewhat
less extent around region 1, as both I5 and I1 exceed the global Moran I
value noticeably.
- Since all Ii values exceed their expected value, no outlying region
with respect to orientation is identified. 26
3.3.2 Getis-Ord G statistics
The Getis-Ord Gi and Gi* statistics are local measures of spatial concentration.
They indicate the extent of spatial clustering of high values („hot spots“) or low
values („cold spots“) of an attribute variable X around region i
.
As with the global G statistic contiguity is defined by distance bands.
The Gi and Gi* statistics differ in excluding or including observation i from summa-
tion. While observation i is excluded in Gi, it is included in the computation of Gi*
(Getis and Ord, 1992).
Gi statistic:
(3.16)
Gi* statistic:
(3.17)
ijj
ijjij
ix
x)d(w
G
n
1jj
n
1jjij
*i
x
x)d(w
G
27
Expected values of Gi and Gi*:
(3.18) E(Gi) = Wi / (n-1) with (3.19) ij
iji )d(wW
(3.20) E(Gi*) = Wi* / n with (3.21)
n
1jij
*i )d(wW
Local spatial concentration of high values (“hot spots”):
Values of Gi and Gi* above their expected values
Local spatial concentration of low values (“cold spots”):
Values of Gi and Gi* below their expected values
28
Note: Getis and Ord (1995) also provide standardized Gi and Gi* statistics that
are asymptotically normally distributed. The normal test is even valid for sample
sizes as low as eight when the underlying distribution is not too skewed. For small
samples, however, the random permutation test is preferable. The testing approa-
ches are available in GeoDa and R (package spdep). In the following example the
Gi and Gi* statistics are interpreted descriptively.
Example:
We calculate the local Getis-Ord statistics Gi and Gi* for the five regions by using
the spatial weights matrix
which is defined in section 3.2.4 (global G statistic) for a distance band of 7.5
kilometres.
As the denominator of (3.17) does not vary across regions, it has to be calcu-
lated only once using the entries of the observation vector x:
Denominator of (3.17):
'23668x
5
1ij 2523668x
11000
11110
01111
01111
00111
)5.7d(W
29
Region 1:
Gi* statistic:
7059.017
12
2366
6161
x
x)5.7d(w
G
1jj
1jjj1
1
5.04
2
15
0011
1n
)5.7d(w
)G(E1j
j1
1
8.025
20
23668
616181
x
w
Gn
1jj
n
1jx)5.7d(j1
*1
j
6.05
3
5
00111
n
)5.7d(w
)G(E
n
1jj1
*1
Gi statistic:
G1 > E(G1) Tendency of spatial concentration of high values around region1
(hot spot)
G1* > E(G1
*) Tendency of spatial concentration of high values (hot spot):
region1 and surrounding
30
Region 2:
Gi* statistic:
75.04
3
15
0111
1n
)5.7d(w
)G(E2j
j2
2
8947.019
17
2368
316181
x
x)5.7d(w
G
2jj
2jjj2
2
92.025
23
23668
31616181
x
w
Gn
1jj
n
2jx)5.7d(j2
*2
j
8.05
4
5
01111
n
)5.7d(w
)G(E
n
2jj2
*2
Gi statistic:
G2 > E(G2) Tendency of spatial concentration of high values around region 2
(hot spot)
G2* > E(G2
*) Tendency of spatial concentration of high values (hot spot):
region 2 and surrounding
31
Region 3:
Gi* statistic:
75.04
3
15
0111
1n
)5.7d(w
)G(E3j
j3
3
8947.019
17
2368
316181
x
x)5.7d(w
G
3jj
3jjj3
3
92.025
23
23668
31616181
x
w
Gn
1jj
n
3jx)5.7d(j3
*3
j
8.05
4
5
01111
n
)5.7d(w
)G(E
n
3jj3
*3
Gi statistic:
G3 > E(G3) Tendency of spatial concentration of high values around region 3
(hot spot)
G3* > E(G3
*) Tendency of spatial concentration of high values (hot spot):
region 3 and surrounding32
Region 4:
Gi* statistic:
6364.022
14
2668
216161
x
x)5.7d(w
G
4jj
4jjj4
4
75.04
3
15
1110
1n
)5.7d(w
)G(E4j
j4
4
68.025
17
23668
21316161
x
w
Gn
1jj
n
4jx)5.7d(j4
*4
j
8.05
4
5
11110
n
)5.7d(w
)G(E
n
4jj4
*4
Gi statistic:
G4 < E(G4) Tendency of spatial concentration of low values around region 4
(cold spot)
G4* < E(G4
*) Tendency of spatial concentration of low values (cold spot):
region 4 and surrounding33
Region 5:
Gi* statistic:
1304.023
3
3668
31
x
x)5.7d(w
G
5jj
5jjj5
5
25.04
1
15
1000
1n
)5.7d(w
)G(E5j
j5
5
2.025
5
23668
2131
x
w
Gn
1jj
n
5jx)5.7d(j5
*5
j
4.05
2
5
11000
n
)5.7d(w
)G(E
n
5jj5
*5
Gi statistic:
G5 < E(G5) Tendency of spatial concentration of low values around region 5
(cold spot)
G5* < E(G5
*) Tendency of spatial concentration of low values (cold spot):
region 5 and surrounding
34