advanced spatial analysis spatial regression modeling

92
Advanced Spatial Advanced Spatial Analysis Analysis Spatial Regression Spatial Regression Modeling Modeling GISPopSci Day 2 Paul R. Voss Paul R. Voss and and Katherine J. Curtis Katherine J. Curtis

Upload: goldy

Post on 11-Jan-2016

83 views

Category:

Documents


1 download

DESCRIPTION

Advanced Spatial Analysis Spatial Regression Modeling. Paul R. Voss and Katherine J. Curtis. Day 2. GISPopSci. Review of yesterday. Overview of spatial data and spatial data analysis Why “spatial is special” characteristics of spatial data scale dependence edge effects - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advanced Spatial Analysis Spatial Regression Modeling

Advanced Spatial AnalysisAdvanced Spatial Analysis

Spatial Regression ModelingSpatial Regression Modeling

GISPopSci

Day 2

Paul R. VossPaul R. Vossandand

Katherine J. CurtisKatherine J. Curtis

Page 2: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Review of yesterday• Overview of spatial data and spatial data analysis• Why “spatial is special”

– characteristics of spatial data– scale dependence– edge effects– heterogeneity– autocorrelation

– problems caused by spatial data– iid assumptions of standard linear model violated– need for tools to deal with non-iid error variance-covariance

among other problems (boundary problems)

• Classes of problems in spatial data analysis• Review OLS assumptions & violations• Importance of EDA/ESDA

Page 3: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Outline for today• Spatial processes

– spatial heterogeneity– spatial dependence

• Global spatial autocorrelation & weights matrices– Moran’s I– Geary’s c

• Understanding & measuring local spatial association– Moran scatterplot– LISA statistics

• Lab: spatial autocorrelation in GeoDa & R

Page 4: Advanced Spatial Analysis Spatial Regression Modeling

Questions?

GISPopSci

Page 5: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Recall from yesterday…

“What makes the methods of modern [spatial data analysis] different from many of their predecessors is that they have been developed with the recognition that spatial data have unique properties and that these properties make the use of methods borrowed from aspatial disciplines highly questionable”

Fotheringham, Brunsdon & Charlton

Quantitative GeographySage, 2000:xii

Page 6: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

And what are these “unique properties”?

• Spatial heterogeneity

• Spatial dependence

• Spatial inhomogeneity

• Contagion

Lattice data

Event data

“Spatial Effects” or “Spatial Processes”

Page 7: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

“Spatial Effects”(“Spatial Processes”)

• Spatial effects are properties of spatial data resulting in the tendency for spatially proximate observations of an attribute Z(s) in R to be more alike than more distant observations (Tobler’s 1st Law)

• Clustering in space can result from attributes shared by nearby areas in the study region that make them different from other areas in the region (identifiable or not) or from some type of spatially patterned interaction among neighboring units or both

• Think about it as:– reactive processes (which we will call “spatial heterogeneity”)

• We will try to model this process with covariates, but generally we will fail

– interactive processes (which we will call “spatial dependence”)

Page 8: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Spatial Heterogeneity

• Typified by regional differentiation; a large scale spatial process expressing itself across the entire region under study; arises from regional differences in the DGP

• Reflects the “spatial continuities” of social processes which, “taken together help bind social space into recognizable structures” – a “mosaic of homogeneous (or nearly homogeneous)” areas in which each is different from its neighbors (Haining, 1990:22)

• We study using 1st-order analytical tools & perspectives (primarily linear or non-linearregression)

… exists when the mean, and/or variance-covariance structure of the DGP “drifts”

over a mapped process

Page 9: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Spatial Heterogeneity (2)

• No spatial interaction is assumed in the process generating spatial heterogeneity. Follows from the intrinsic uniqueness of each location

• A troublesome property, because an assumption of spatial homogeneity (spatial stationarity) is assumed to provide the necessary replication for drawing inferences from the process

• Moreover, spatial stationarity is an assumption underlying spatial dependence testing & modeling

• Thus, we’re going to work pretty hard to model the 1st-order spatial effects

• The definition also includes drift in covariance structure, and this is the crucial aspect of spatial heterogeneity for many analysts. We take this issue up on Thursday

Page 10: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

If we assume spatial heterogeneity is a reasonable DGP to entertain…

• We are saying that apparent clustering in the data is not a result of spatial interaction among areas; no small-scale neighbor influences; no contagion; no 2nd-order spatial effects

• For areal data, we are presuming that we can specify a regression model with suitable covariates such that residual autocorrelation evaporates

• We allow that purely spatial structural effects may have to be part of our model specification (trend surface)

Page 11: Advanced Spatial Analysis Spatial Regression Modeling

Questions about the notion of spatial heterogeneity?

GISPopSci

Page 12: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Spatial Dependence

• This sounds a lot like spatial autocorrelation (not yet formally defined)… but we will not use the terms interchangeably [not all authors are this cautious]

• It means a lack of independence among observations (by definition); but “functional relationship” is the key

• Expresses itself as a small-scale, localized, short-distance spatial process; 2nd-order spatial process

… the existence of a functional relationship between what happens at one point in space

and what happens elsewhere

Page 13: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Spatial Dependence (2)

• For the examination of data on an irregular spatial lattice (e.g., counties), this spatial process generally is handled through the exogenous declaration of a “neighborhood” defined for each observation (and is operationalized by a “weights matrix”)

• Returning to the formal expression for our data (defined yesterday), z(s) = f (X,s,β) + ε(s) , we assume that the 1st-order part of the model leaves behind a disturbance vector ε(s) with a spatial dependence process that is stationary (and, usually, isotropic – although environmental scientists will generally disagree with this last bit)

Page 14: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Spatial Dependence (3)

• This process follows informally from the so-called “First Law of Geography”

• Follows formally from a spatial property known as “ergodicity”, where we permit spatial interaction to occur only over a very limited region

• We assume the process is ergodic in order to limit the number of parameters we estimate

• Recall from yesterday:

number of parameters = ((n x n) – n)/2 covariances + n variances

+ k+1 parameters

z(s) = f (X,s,β) + u(s)where u(s) is a random vector with mean 0 and variance Var[u(s)] = Σ(θ)

Page 15: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

If we assume spatial dependence is a reasonable DGP…

• We are saying that clustering in our data results from some type of spatial interaction; existence of small-scale neighbor influences

• We believe we can theoretically posit reasons why clustering results from spatial interaction (“contagion”) among our units of observation

• We need to be thinking about what kind of dependence model should best introduce neighboring effects

• Time to dig into the 2nd-order spatial analysis toolkit

Page 16: Advanced Spatial Analysis Spatial Regression Modeling

Questions about the notion of spatial dependence?

GISPopSci

Page 17: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

So, which is it… spatial dependence orspatial heterogeneity? (Frankly, it’s not even a

proper question! Why?) You almost never know

Notions of spatial heterogeneity and spatial dependence are not really attributes of the data; they’re concepts we

invent to make sense of the data. Indeed, our model may incorporate both processes

Sqrt(PPOV)

Better questions might be: “What do we have?” “What would we like to know?” “What questions

can these data answer?” “What spatial tools do we need to turn to?” “How should we think about

modeling these data given what we want to learn?”

Page 18: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Recall from yesterday…

Page 19: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Page 20: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Page 21: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Page 22: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Which is which? How to proceed?

“reactive” process “interactive” processYou almost never know

Page 23: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

So… how should we proceed?

A useful admonition to keep in mind as we get started:

Page 24: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

“All models are wrong, some models are useful”

G.E.P.Box“Robustness in the Strategy of Scientific Model Building”

pp. 201-236 in Lanner and Wilkerson (eds.)Robustness in Statistics

Academic Press, 1979

Page 25: Advanced Spatial Analysis Spatial Regression Modeling

Questions?

GISPopSci

Page 26: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Global Spatial Autocorrelation

Page 27: Advanced Spatial Analysis Spatial Regression Modeling

• When correlated errors arise from a specification with omitted variables, OLS estimates of t-test values are unreliable

– The OLS estimates are not efficient

– Under positive spatial autocorrelation, the std. errors of the parameter estimates are biased downward

– Informally, you can think of this as arising because the OLS model “thinks” it’s getting more information from the observations than it is

– Correlated errors inflate the value of the R2 statistic

• When correlated errors result from endogeneity, OLS regression parameter estimates are biased and inconsistent

GISPopSci

Recall from yesterday…

Page 28: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

So, where do we go from here?

• Specifically, how do we develop a means to (statistically) differentiate among different kinds of maps?

• That is, can we quantify different kinds of map patterns?

• And once we develop a statistic for describing (quantifying) different kinds of map patterns, can we derive the sampling distribution for this statistic and thus make inferential claims about one map vs. another map?

Page 29: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Suppose we observe the following map of southern counties from the 2000 Census

(% African-American)

Page 30: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

The question for us then is this:If African-Americans had somehow been allocated

in a random fashion to the southern counties, would this observed spatial distribution be a likely outcome of such an allocation procedure?

But what does this question mean? Does it even make sense?

% African American

Page 31: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

What does it mean to ask about the spatial distribution of a census variable as if the observations are an outcome from

some type of sampling experiment?• The data are… well, the data; right?• We’ve got all the counties, not just a sample of them• The data for each county (% African-American) are based

on complete count census data, not on a sample.• So what can it possibly mean to ask whether this percent

has been allocated in a “random fashion” (or not)?• We’re looking at the complete “universe of observations.”

It’s not a sample. Or is it??• Sometimes asked: “Is it a sample of 1,387 (counties)?” Or

is it, rather, a sample of 1 (single realization of a stochastic process)?

Page 32: Advanced Spatial Analysis Spatial Regression Modeling

Answers to these questions point toward the concept of a data generating process (DGP)

This is the conceptual notion that our data actually represent just one realization of a very large number of possible outcomes

GISPopSci

Page 33: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

There are a number of formal perspectives on this topic

and a terrific quote…

Page 34: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

“[Our data often render] the idea that one is working with a (spatial) sample somewhat remote. Great imagination has gone into turning what appears to be a population into a sample, thereby making statistical theory relevant…”

Graham J. G. Upton & Bernard FingletonSpatial Data Analysis by Example, Vol. I(Wiley & Sons, 1985:325)

Page 35: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Sampling Perspectives

Generally there are four spatial sampling perspectives discussed in the literature

based on the sampling design:

with replacement? Yes No

order Important? Yes No

Page 36: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Two common of these sampling perspectives

Sampling without replacement, order is important (“nonfree sampling” or “randomization”): Perspective 2

Sampling with replacement, order is important (“free sampling” or “normalzation”): Perspective 1

Page 37: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Example using the southern counties

The question becomes: How unusual is the pattern (the Moran statistic) in map 1 given the 1,387! possible permutations of

these results under an assumption of nonfree sampling?

This was our “observation”

But here’s just one other under nonfree sampling

Page 38: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

If you subscribe to the randomization approach…

To operationalize this, all (or many of) the different arrangements that are possible need to be identified in order to construct the sampling distribution for our statistic

(When the sampling distribution is simply too large to construct, we can estimate it.

Such an approximate sampling distribution is sometimes called a

“reference distribution”)

Page 39: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Thus…If we create 1,387! maps (or a large sample from

this huge number) and derive our spatial autocorrelation statistic for each of these maps, we then have a reference distribution against

which to weigh the one we actually observed. It might look something like this:

Page 40: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

This will become important when we have a spatial autocorrelation measure and wish to know if it is large enough to reject the null hypothesis of no spatial autocorrelation

GeoDa uses an empirical permutation method to derive the variance of the Moran spatial

autocorrelation statistic

R [spdep package] gives us a choice for generating the variance of a Moran statistic

estimate under either free or nonfree sampling

Why should I care about this?

Page 41: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

By the way…

Australian, Pat Moran, published a version of what

was to become known as the Moran test for spatial

clustering in 1948

Andrew Cliff and J. Keith Ord generalized the Moran statistic to test for spatial autocorrelation among residuals from a linear regression model (under iid normal assumptions) and worked out both the large sample distribution and small sample moments in the 1970s

Page 42: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Spatial Autocorrelation

(Positive) spatial autocorrelation is the coexistence of attribute value similarity and locational similarity

It’s the common, every day, confirmation of Tobler’s first law

Formally expressed as a moment condition:

jiforyEyEyyEyyCov jijiji 0,

Page 43: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Measuring Spatial Autocorrelation

• Global spatial autocorrelation measures– do the data as a whole exhibit a spatial pattern, or are

observations spatially random?– most common measure: Moran’s statistic

• Local indicators of spatial association (LISA) statistics– identifies which units are significantly spatially

autocorrelated with neighboring units– Identifies clustering (“hot spots,” “cold spots”)– localized Moran statistic

Two classes of tests for spatial autocorrelation:

Page 44: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Global Moran’s I

In

w

w y y y y

y yij

j

n

i

n

ij i j

j

n

i

n

i

i

n

( )

( )( )

( )11

11

2

1

covariance term

normalization term to scale I to the overall variance in the dataset

Page 45: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Moran’s I coefficient as ameasure of spatial autocorrelation

n

i

n

iii

n

iii

xy

yyxx

yyxxr

1

2

1

2

1

)()(

))((

Pearson product-moment correlation

n

jj

n

ii

n

i

n

jjiij

n

i

n

jij

x

xxxx

xxxxw

w

nI

1

2

1

2

1 1

1 1)()(

))((

Moran’s I coefficient

feasible range: -1 to +1feasible range: -1 to +1

(sort of)

Page 46: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

But… the calculation of the global Moran’s I (or similar measures)requires

the definition of a weights matrix

In

w

w y y y y

y yij

j

n

i

n

ij i j

j

n

i

n

i

i

n

( )

( )( )

( )11

11

2

1

Page 47: Advanced Spatial Analysis Spatial Regression Modeling

And, as an aside, if you are interested in the intellectual

history & background of where today’s measures of

autocorrelation originate, see special issue of…

GISPopSci

Geographical Analysis (October, 2009) Vol. 41, Issue 4

Page 48: Advanced Spatial Analysis Spatial Regression Modeling

We need to know something about weights matrices before

we can proceed

GISPopSci

Page 49: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Okay… again, the derivation of the global Moran’s I statistic (and similar statistics) requires the

specification of a weights matrix

In

w

w y y y y

y yij

j

n

i

n

ij i j

j

n

i

n

i

i

n

( )

( )( )

( )11

11

2

1

Let’s figure out what these wij elements are…

Page 50: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

So, consider this “map”

Page 51: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Let’s say we’re interested in area i = 6

Which areas are “neighbors” of area 6?

Page 52: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Queens and Rooks(and—occasionally—Bishops)

These terms are self explanatory, referring to which types of adjacent cells

we choose to include as “neighbors”

Queen Rook Bishop

Page 53: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Under a (1st order) “queen” criterion

Let’s shift our thinking from the map to a matrix

Page 54: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

…21 16

2

1

16

.

.

.

Obs. i

Neighbors j

Page 55: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

…21 16

2

1

16

.

.

.

6

.

.

.Obs. i

Neighbors j6

Page 56: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Now let’s be a little more precise (the literature is not in agreement

on these matters)

• “Contiguity matrix” – a general term that identifies neighbors with 1 and non-neighbors with 0

• “Weights matrix” – We will almost always reserve this term to refer to a row-standardized contiguity matrix, with weights:

0 wij 1

• There a many varieties of weights matrices

Page 57: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Σ

1 1 1 1 32 1 1 1 1 1 53 1 1 1 1 1 5

4 1 1 1 35 1 1 1 1 1 56 1 1 1 1 1 1 1 1 87 1 1 1 1 1 1 1 1 88 1 1 1 1 1 59 1 1 1 1 1 510 1 1 1 1 1 1 1 1 811 1 1 1 1 1 1 1 1 812 1 1 1 1 1 513 1 1 1 314 1 1 1 1 1 515 1 1 1 1 1 516 1 1 1 3

i

j

Simple

contiguity

matrix

{cij}

Queen1

Criterion

zeros implicit

Page 58: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Σ

1 1/3 1/3 1/3 1

2 1/5 1/5 1/5 1/5 1/5 1

3 1/5 1/5 1/5 1/5 1/5 1

4 1/3 1/3 1/3 1

5 1/5 1/5 1/5 1/5 1/5 1

6 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1

7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1

8 1/5 1/5 1/5 1/5 1/5 1

9 1/5 1/5 1/5 1/5 1/5 1

10 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1

11 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1

12 1/5 1/5 1/5 1/5 1/5 1

13 1/3 1/3 1/3 1

14 1/5 1/5 1/5 1/5 1/5 1

15 1/5 1/5 1/5 1/5 1/5 1

16 1/3 1/3 1/3 1

i

j

wc

cij

ij

ijj

Common row

standardized

weights

matrix

zeros implicit

Page 59: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

So, the elements of the weights matrix serve somewhat the role of an indicator

variable in this equation. Nearby observations have non-zero weights; distant observations have zero weight

In

w

w y y y y

y yij

j

n

i

n

ij i j

j

n

i

n

i

i

n

( )

( )( )

( )11

11

2

1

Page 60: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

1

72

63

44

55

46

57

48

49

510

611

312

413

314

415

116

2

Now consider these yi values, i = 1,…,16

Armed with this “map”, let’s now

define what we mean by

a “spatial lag”

Page 61: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

For i = 6, the spatial lag operator w6j yj is given by:

w y6 j j

w yj jj

j

61

1 6

1

87

1

86

1

84

1

84

1

84

1

85

1

86

1

83

= 4.9 (zeros not shown)

In words, how would we describe this value of 4.9?

Page 62: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

1

72

63

44

55

46

57

48

49

510

611

312

413

314

415

116

2

Can you define the spatial lag for map area 16?

How many neighbors

under a Q1 definition?

What are the weights here?

Recall the weights matrix.

Page 63: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Σ

1 1/3 1/3 1/3 1

2 1/5 1/5 1/5 1/5 1/5 1

3 1/5 1/5 1/5 1/5 1/5 1

4 1/3 1/3 1/3 1

5 1/5 1/5 1/5 1/5 1/5 1

6 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1

7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1

8 1/5 1/5 1/5 1/5 1/5 1

9 1/5 1/5 1/5 1/5 1/5 1

10 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1

11 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1

12 1/5 1/5 1/5 1/5 1/5 1

13 1/3 1/3 1/3 1

14 1/5 1/5 1/5 1/5 1/5 1

15 1/5 1/5 1/5 1/5 1/5 1

16 1/3 1/3 1/3 1

i

j

wc

cij

ij

ijj

Common row

standardized

weights

matrix

zeros implicit

Page 64: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

1

72

63

44

55

46

57

48

49

510

611

312

413

314

415

116

2

Can you define the spatial lag for map area 16?

How many neighbors

under a Q1 definition?

What are the weights here?

Recall the weights matrix.

What’s the spatial lag

for area 16?

Page 65: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

So, for i = 16, the spatial lag operator w16j yj is given by:

(again, zeros not shown)

And how do we describe the value of 2.7?

16

11616

j

jjjj yww

7.213

14

3

13

3

1

Page 66: Advanced Spatial Analysis Spatial Regression Modeling

The previous several slides have generated the wij elements based

on adjacency. There are lots of other options; e.g.,…

• Distance and inverse distance• k nearest neighbors (knn)• Cliff-Ord weights• Weights matrix should be driven as much as

possible by theory• GeoDa allows us to create a knn weights matrix,

but has difficulty using it• Lots of options in R• Can edit W in a general text editor

GISPopSci

Page 67: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Feasible Range of Moran’s I

• Function of n• Function of the particular weights matrix

used• Function of the structure of the tesselation• Minimum/maximum theoretical values

generally just above |1|• As a practical matter, the minimum

empirical value for an irregular lattice is generally around -0.6

Page 68: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

In general, the spatial lag is expressed (in matrix notation) as:

W y

w yj jj

j

ii

i

1

1 6

1

1 6

where W is a (16 x 16) weights matrix and

y is a (16 x 1) column vector

Page 69: Advanced Spatial Analysis Spatial Regression Modeling

Some interesting questions that might be addressed using spatial

lag operator:

• Local tax rates

(“spillover” in y?)• Expenditures for police

(“spillover” in x?)• Demographic analysis: Are Quitman &

Tallahatchie counties (two contiguous counties in the Mississippi Delta) really two separate observations?

(“spillover” in ε ?)GISPopSci

Page 70: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

We can simplify the expression for Moran’s I using matrix algebra

In

w

w y y y y

y yij

j

n

i

n

ij i j

j

n

i

n

i

i

n

( )

( )( )

( )11

11

2

1

zz

WzzI

'

'

Assume W row standardized and zi = yi - y

Page 71: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Expected Value of Moran’s I Under Hypothesis of No Spatial

Autocorrelation

E In

( ) 1

1

Page 72: Advanced Spatial Analysis Spatial Regression Modeling

Variance of Moran’s I Under Hypothesis of No Spatial Autocorrelation

• Theoretical variance: very messy• Cliff & Ord (1973; 1991) derived the theoretical

asymptotic moments of I (under two different assumptions regarding the DGP)

• Boots & Tiefelsdorf (1995) have derived the exact (small sample) moments of I, but, again, it’s messy

• Anselin & Bera (1998:267) give the first two moments of I for OLS errors

• Again… GeoDa derives an empirical standard deviation using a permutation (simulation) approach

GISPopSci

Page 73: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

If n is large…

ZI E I

V a r I

( )

( )

Page 74: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Global Geary’s c

cn

w

w y y

y yij

j

n

i

n

ij i j

j

n

i

n

i

i

n

1

211

2

11

2

1

( )

( )

( )

Robert (Roy) Charles Geary (1896-1983)

Can we deconstruct this?

Page 75: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Expected Value of Geary’s c Under Hypothesis of No Spatial

Autocorrelation

E c( ) 1

Page 76: Advanced Spatial Analysis Spatial Regression Modeling

As with Moran’s I, the variance of Geary’s c under hypothesis of no spatial

autocorrelation is messy

• But, Cliff & Ord (1973; 1981) derived the theoretical asymptotic moments of c

• GeoDa doesn’t provide access to this test statistic

• As with Moran’s I, under the null hypothesis of no spatial autocorrelation, c is asymptotically ~ N(0,1)

GISPopSci

Page 77: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Thus, if n is large…

Zc E c

V a r c

( )

( )

Page 78: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Measures of spatial autocorrelation are scale

dependent

Page 79: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Moran’s I = -1.00 Moran’s I = +0.33

Page 80: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

LISA StatisticsStandard citation:

Anselin, Luc. 1995. “Local Indicators of Spatial Association – LISA.” Geographical Analysis 27:93-115.

Page 81: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Anselin’s Moran Scatterplot

Standard citation:

Anselin, Luc. 1996. “The Moran Scatterplot as an ESDA Tool to Assess Local Instability in Spatial Association.” Pp. 111-125 in Fischer, Manfred, Henk J. Scholten, and David Unwin (eds.) Spatial Analytical Perspectives on GIS: GISDATA 4 (London: Taylor & Francis).

Terrific ESDA tool

Page 82: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Moran Scatterplot of PPOV (1st Order Queen Weights)

H-H

L-L

L-H

H-L

Page 83: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Moran Scatterplot of PPOVGood way to check the outliers

Page 84: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

LISA Map of PPOV (1st Order Queen Weights)

Page 85: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

LISA Map of PPOV (1st Order Queen Weights)

Page 86: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Testing for spatial autocorrelation in our

data is important

Unfortunately, identifying and quantifying the extent of spatial autocorrelation doesn’t tell us

what’s causing it

Page 87: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

It does alert us to the presence of Spatial “Effects” (or Spatial

“Processes”) at work in our data

Spatial dependenceSpatial heterogeneity

• Conceptually, these are very different processes and thus are modeled in very different ways

• Each precludes a straightforward application of standard econometric models

• Each is indicated by spatial autocorrelation

Page 88: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

And when spatial autocorrelation in our data is indicated…

• At least one assumption of the standard linear regression model probably is violated (the classical independence assumption)

• The latent information content in the data is diminished

• We need to do something about it:– get rid of it; model it away– take advantage of it; bring it into the model

• Either spatial dependence or spatial heterogeneity (or both) should be entertained as potential data-generating models

Page 89: Advanced Spatial Analysis Spatial Regression Modeling

Here’s where we pick things up tomorrow morning

GISPopSci

Page 90: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Readings for today• Anselin, Luc. 1996. “The Moran Scatterplot as an ESDA Tool to Assess

Local Instability in Spatial Association.” Pp. 111-125 in Fischer, Manfred, Henk J. Scholten, and David Unwin (eds.) Spatial Analytical Perspectives on GIS: GISDATA 4 (London: Taylor & Francis).

• Tolnay, Stewart E., Glenn Dean, & E.M. Beck. 1996. “Vicarious Violence: Spatial Effects on Southern Lynchings, 1890-1919.” American Journal of Sociology 102(3):788-815.

• Tobler, Waldo R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.” Economic Geography 46:234-240.

• Getis, Arthur. 2007. “Reflections on Spatial Autocorrelation.” Regional Science and Urban Economics 37:491-496.

• Getis, Arthur. 2008. “A History of the Concept of Spatial Autocorrelation: A Geographer’s Perspective.” Geographical Analysis 40:297-309.

• Anselin, Luc. 2005. Exploring Spatial Data with GeoDa: A Workbook, (chapters 15-18).

• Anselin, Luc. 2007. Spatial Regression Analysis in R: A Workbook, (chapter 3).

Page 91: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Afternoon Lab

Spatial autocorrelation (using GeoDa & R)

Page 92: Advanced Spatial Analysis Spatial Regression Modeling

GISPopSci

Questions?