dem 7263 fall 2015 spatial glm's

10/7/2015 Spatial GLM’s

file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/SpatialGLMs.html 1/33

Spatial GLM’sCorey S. Sparks, Ph.D.September 23, 2015

GLM’s for Spatial DataAggregate dataTypically when we get data with a spa+al component, they are aggregated to some geography * Tract,block, county, ZIP code, etc. * These are typically either raw counts + Count of deaths, crimes, births,people with some characteristic (eg poverty) > numerator * Hopefully with a population at risk >denominator, or rates that someone has calculated + Mortality rates from NCHS or some census rate

GLM’s are a class of statitical models that link the mean of the distribuion to a linear combination ofcovariates by some form of link func)on. Where g() is the link function. The linkfunction for a binomial distribution is typically the logit, or: , or for the Poisson model,

Modeling optionsInherently, we want to see what influences the rate in our areas

This is for aggregate dataSo we want regression models for our area- based counts that properly model the independentvariableYou have a few options – Binomial, Poisson, Normal, etc

#I subset to only Texas counties this time, for the sake of computational time.spdat<-readShapePoly("~/Google Drive/dem7263/data/usdata_mort.shp")spdat<-spdat[substr(spdat$COFIPS,1,2)=="48",]spdat$population<-as.numeric(as.character(spdat$Population))spdat$deaths<-as.numeric(as.character(spdat$Deaths))#Create a k=4 nearest neighbor setus.nb4<-knearneigh(coordinates(spdat), k=4)us.nb4<-knn2nb(us.nb4)us.nb4<-make.sym.nb(us.nb4)us.wt4<-nb2listw(us.nb4, style="W")

Binomial model for countsTo use the Binomial model, you have two options Your observations are 1/0, we have done this before.This says that each area has the event or not OR Your observations are a combination of y and n, where yis a count of some event which has happened to some portion of the population at risk, n. For either of

g(u) = g(E(Y)) = βX ′

g(u) = log( )π1+π

g(u) = log(μ)



these, you are modeling the probability of observing an event, or really the logit of the probability: * The good thing is that you have already done this! * All interpretation of parameters is

the same as logistic regression! * All model testing is the same too!

fit.bin.us<-glm(cbind(deaths, population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, family = binomial)summary(fit.bin.us)

round(exp(coef(fit.bin.us)), 3)

## (Intercept) scale(ppersonspo) scale(p65plus) scale(pblack_1) ## 0.010 1.024 1.409 1.035 ## scale(phisp) I(RUCC >= 7)TRUE ## 0.943 0.879

#This should be the crude death rate, per 1,000 persons1000 * mean(fitted(fit.bin.us))

## ## Call:## glm(formula = cbind(deaths, population) ~ scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= ## 7), family = binomial, data = spdat)## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -27.6665 -1.7024 0.9846 4.0827 17.2912 ## ## Coefficients:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -4.631185 0.002016 -2296.70 <2e-16 ***## scale(ppersonspo) 0.024078 0.002151 11.20 <2e-16 ***## scale(p65plus) 0.343010 0.001851 185.29 <2e-16 ***## scale(pblack_1) 0.034587 0.001286 26.90 <2e-16 ***## scale(phisp) -0.059135 0.002439 -24.24 <2e-16 ***## I(RUCC >= 7)TRUE -0.128607 0.005381 -23.90 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for binomial family taken to be 1)## ## Null deviance: 79455.8 on 251 degrees of freedom## Residual deviance: 9901.4 on 246 degrees of freedom## AIC: 12098## ## Number of Fisher Scoring iterations: 4

πlog( ) = βπ

1+π X ′

exp(β)



## [1] 9.701682

hist(1000*fitted(fit.bin.us), main="Texas Crude Death Rate Distribution from Binomial Model")

Poisson model for counts



Poisson model for countsFor the Poisson model, you observe some count of events in each area (crimes, deaths, etc), and you alsohave some population at risk in each area. This is generally not the same for each area, but could be, youwant a Poisson rate , which is really the average count (y). We have several ways of modeling thePoisson count:

Pure count If each area has the same risk set, Rate, include an offset term in your model to incorporateunequal risk , where n is the population at risk in each area. This is called theoffset term in the model. Standardized ratio incorporate differential exposure as an expected count

. Again, all interpretation of parameters is done on a log scale, so = %change in the mean, or % change in the rate, for a 1 unit change in X. All testing is done in the samemanner.

#Form the expected countspdat$E<-spdat$population * (sum(spdat$deaths)/sum(spdat$population))fit.poi.us<-glm(deaths~offset(log(E))+scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, family = poisson)summary(fit.poi.us)

## ## Call:## glm(formula = deaths ~ offset(log(E)) + scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), family = poisson, ## data = spdat)## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -27.7942 -1.6952 0.9956 4.1290 17.3567 ## ## Coefficients:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.368526 0.002005 183.76 <2e-16 ***## scale(ppersonspo) 0.024405 0.002142 11.39 <2e-16 ***## scale(p65plus) 0.342493 0.001841 185.99 <2e-16 ***## scale(pblack_1) 0.034490 0.001281 26.92 <2e-16 ***## scale(phisp) -0.059474 0.002430 -24.48 <2e-16 ***## I(RUCC >= 7)TRUE -0.128789 0.005355 -24.05 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for poisson family taken to be 1)## ## Null deviance: 80038.6 on 251 degrees of freedom## Residual deviance: 9990.2 on 246 degrees of freedom## AIC: 12189## ## Number of Fisher Scoring iterations: 4

(λ)

log(y) = β + log(n)X ′

log(y) = β + log(E)X ′ exp(β)



round(exp(coef(fit.poi.us)), 3)


hist(fitted(fit.poi.us)/spdat$E,main="Texas SMR Distribution from Poisson Model" )



Normal (Gaussian) ModelIn both the Binomial and Poisson case, if both n and y are large, the Normal model will often work This isgenerally not the case for rare events To do this, we assume the rate is our outcome, not the count, andassume Gaussian errors So our model is: This is just OLS! Or the more exotic SARspecifications we saw last week (http://rpubs.com/corey_sparks/109650). Of course, you need to evaluate

f racyn = β +X ′ ei

http://rpubs.com/corey_sparks/109650



the Normal model’s assumptions and see if you can use it. The other models also need to be examined forlinearity, but since they don’t have the same structure as the Normal model, they have fewer things tocheck.

fit.g.us<-glm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, family = gaussian)summary(fit.g.us)

## ## Call:## glm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), family = gaussian, ## data = spdat)## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -0.0042947 -0.0005506 0.0000912 0.0006799 0.0029490 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.006e-02 9.931e-05 101.332 < 2e-16 ***## scale(ppersonspo) 2.757e-04 1.210e-04 2.278 0.023568 * ## scale(p65plus) 2.437e-03 9.636e-05 25.292 < 2e-16 ***## scale(pblack_1) 2.196e-04 9.112e-05 2.410 0.016702 * ## scale(phisp) -5.250e-04 1.444e-04 -3.636 0.000337 ***## I(RUCC >= 7)TRUE -3.663e-04 1.676e-04 -2.186 0.029786 * ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for gaussian family taken to be 1.232979e-06)## ## Null deviance: 0.00196081 on 251 degrees of freedom## Residual deviance: 0.00030331 on 246 degrees of freedom## AIC: -2705.7## ## Number of Fisher Scoring iterations: 2

round(1000*coef(fit.g.us), 3)

## (Intercept) scale(ppersonspo) scale(p65plus) scale(pblack_1) ## 10.064 0.276 2.437 0.220 ## scale(phisp) I(RUCC >= 7)TRUE ## -0.525 -0.366

hist(1000*fitted(fit.g.us),main="Texas Crude Death Rate Distribution from Gaussian Model" )



When using the Binomial and Poisson GLM on areal data, you often run into overdispersion * What’soverdispersion? Class? For Binomial and Poisson distributions, the mean and the variance are functionsof one another (variance = mean for Poisson). So when you have more variability than you expect in yourdata, you have overdispersion. This basically says your data don’t fit your model, and is a problembecause overdispersion leads to standard errors for our model parameters that are too small. But, we canfit other models that don’t make such assumptions, or allow there to be more variability.



An easy check on this is to compare the residual deviance to the residual degrees of freedom. Theyratio should be 1 if the model fits the data.

Modeling Overdispersion via a “Quasi” distributionFor the Poisson and Binomial, we can fit a “quasi” distribution that adds an extra parameter to allow themean-variance relationship to not be constant. For e.g. in the Binomial we get:

, instead of For Poisson we get: , instead of

This allows us to include a rough proxy for a “dispersion parameter” for the distribution. Naturally this isfixed at 1 for basic models, and estimated in the quasi models, we can look to see if is much bigger than1. If overdispersion is present and not accounted for you could identify a relationship as being significantwhen it is not!

fit.qbin.us<-glm(cbind(deaths, population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, family = quasibinomial)summary(fit.qbin.us)

## ## Call:## glm(formula = cbind(deaths, population) ~ scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= ## 7), family = quasibinomial, data = spdat)## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -27.6665 -1.7024 0.9846 4.0827 17.2912 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -4.631185 0.012714 -364.269 < 2e-16 ***## scale(ppersonspo) 0.024078 0.013561 1.776 0.077033 . ## scale(p65plus) 0.343010 0.011672 29.387 < 2e-16 ***## scale(pblack_1) 0.034587 0.008108 4.266 2.84e-05 ***## scale(phisp) -0.059135 0.015380 -3.845 0.000154 ***## I(RUCC >= 7)TRUE -0.128607 0.033929 -3.790 0.000189 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for quasibinomial family taken to be 39.7523)## ## Null deviance: 79455.8 on 251 degrees of freedom## Residual deviance: 9901.4 on 246 degrees of freedom## AIC: NA## ## Number of Fisher Scoring iterations: 4

var(Y) = n 0 p 0 (1 + p) 0 ϕ var(Y) = n 0 p 0 (1 + p)var(Y) = λ 0 ϕ var(Y) = λ



fit.qpoi.us<-glm(deaths~offset(log(E))+scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, family = quasipoisson)summary(fit.qpoi.us)

## ## Call:## glm(formula = deaths ~ offset(log(E)) + scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), family = quasipoisson, ## data = spdat)## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -27.7942 -1.6952 0.9956 4.1290 17.3567 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.368526 0.012699 29.021 < 2e-16 ***## scale(ppersonspo) 0.024405 0.013561 1.800 0.073141 . ## scale(p65plus) 0.342493 0.011660 29.374 < 2e-16 ***## scale(pblack_1) 0.034490 0.008111 4.252 3.01e-05 ***## scale(phisp) -0.059474 0.015386 -3.866 0.000142 ***## I(RUCC >= 7)TRUE -0.128789 0.033904 -3.799 0.000183 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for quasipoisson family taken to be 40.09362)## ## Null deviance: 80038.6 on 251 degrees of freedom## Residual deviance: 9990.2 on 246 degrees of freedom## AIC: NA## ## Number of Fisher Scoring iterations: 4

round(exp(coef(fit.poi.us)), 3)


Notice the dispersion parameters in both of these fits, both were around 50, they are assumed to be 1,although substantively, there was no difference in the interpretation of the model effects.

Other 2 - parameter distributionsOf course, we could just fit other distributional models to our data, popular choices are:Normal – y has mean and variance



Negative binomial – Adds a shape parameter to Poisson – has a mean and varianceBeta binomial – Adds a shape parameter to binomial – p has a mean and a variance

library(MASS)fit.nb.us<-glm.nb(deaths~offset(log(E))+scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat)summary(fit.nb.us)

## ## Call:## glm.nb(formula = deaths ~ offset(log(E)) + scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= ## 7), data = spdat, init.theta = 70.16980546, link = log)## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -3.3010 -0.6124 0.0721 0.6028 2.2726 ## ## Coefficients:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.36759 0.01111 33.092 < 2e-16 ***## scale(ppersonspo) 0.03208 0.01433 2.238 0.025192 * ## scale(p65plus) 0.25781 0.01124 22.929 < 2e-16 ***## scale(pblack_1) 0.02178 0.01036 2.102 0.035571 * ## scale(phisp) -0.05766 0.01719 -3.354 0.000796 ***## I(RUCC >= 7)TRUE -0.03978 0.01966 -2.023 0.043096 * ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for Negative Binomial(70.1698) family taken to be 1)## ## Null deviance: 1312.84 on 251 degrees of freedom## Residual deviance: 251.57 on 246 degrees of freedom## AIC: 3144.8## ## Number of Fisher Scoring iterations: 1## ## ## Theta: 70.17 ## Std. Err.: 6.99 ## ## 2 x log-likelihood: -3130.805

round(exp(coef(fit.nb.us)), 3)

λ




hist(fitted(fit.nb.us)/spdat$E,main="Texas SMR Distribution from Negative Binomial Model" )



library(aod)fit.bbin.us<-betabin(cbind(deaths, population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), random=~1, spdat@data, method="SANN")summary(fit.bbin.us)



## Beta-binomial model## -------------------## betabin(formula = cbind(deaths, population) ~ scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= ## 7), random = ~1, data = spdat@data, ... = pairlist(method = "SANN"))## ## Convergence was obtained after 2000 iterations.## ## Fixed-effect coefficients:## Estimate Std. Error z value Pr(> |z|)## (Intercept) -4.484e+00 2.309e-02 -1.942e+02 0.000e+00## scale(ppersonspo) 8.867e-02 2.443e-02 3.629e+00 2.843e-04## scale(p65plus) 8.415e-03 2.022e-02 4.162e-01 6.773e-01## scale(pblack_1) -1.971e-01 2.386e-02 -8.263e+00 2.220e-16## scale(phisp) -1.050e-01 3.055e-02 -3.436e+00 5.914e-04## I(RUCC >= 7)TRUE 1.259e-01 3.627e-02 3.471e+00 5.183e-04## ## Overdispersion coefficients:## Estimate Std. Error z value Pr(> z)## phi.(Intercept) 8.394e-04 2e-13 0e+00 1e+00## ## Log-likelihood statistics## Log-lik nbpar df res. Deviance AIC AICc ## -1.856e+03 7 245 1.528e+03 3.726e+03 3.727e+03

hist(1000*fitted(fit.bbin.us),main="Texas Crude Death Rate Distribution from Beta Binomial Model" )



Modeling Overdispersion via Hierarchical modelsWe can also consider using a random-effect model to help with overdispersion. One way of doing this isto add an observation-level random effect to our model, so now we have: with

or with .log( ) = β +π

1+π X ′ ui

U N(0, )ui σ2 log(λ) = β +X ′ ui U N(0, )ui σ2



library(lme4)fit.hpoi.us<-glmer(deaths~offset(log(E))+scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)+(1|COFIPS), spdat@data, family=poisson)summary(fit.hpoi.us)

## Generalized linear mixed model fit by maximum likelihood (Laplace## Approximation) [glmerMod]## Family: poisson ( log )## Formula: deaths ~ offset(log(E)) + scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7) + (1 | COFIPS)## Data: spdat@data## ## AIC BIC logLik deviance df.resid ## 3152.1 3176.8 -1569.0 3138.1 245 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -1.47313 -0.11097 0.02923 0.16553 1.18443 ## ## Random effects:## Groups Name Variance Std.Dev.## COFIPS (Intercept) 0.01475 0.1215 ## Number of obs: 252, groups: COFIPS, 252## ## Fixed effects:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.36071 0.01128 31.96 < 2e-16 ***## scale(ppersonspo) 0.03168 0.01458 2.17 0.029754 * ## scale(p65plus) 0.25972 0.01145 22.69 < 2e-16 ***## scale(pblack_1) 0.02317 0.01053 2.20 0.027869 * ## scale(phisp) -0.05786 0.01746 -3.31 0.000922 ***## I(RUCC >= 7)TRUE -0.04065 0.01995 -2.04 0.041570 * ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Correlation of Fixed Effects:## (Intr) scl(pp) sc(65) sc(_1) scl(ph)## scl(pprsns) 0.057 ## scl(p65pls) 0.283 -0.432 ## scl(pblc_1) -0.149 -0.475 0.349 ## scale(phsp) 0.011 -0.821 0.578 0.609 ## I(RUCC>=7)T -0.688 -0.087 -0.380 0.180 -0.003

round(exp(fixef(fit.hpoi.us)), 3)




hist(fitted(fit.hpoi.us)/spdat$E,main="Texas SMR Distribution from Hierarchical Poisson Model" )

fit.hbin.us<-glmer(cbind(deaths, population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)+(1|COFIPS), spdat@data,family=binomial )summary(fit.hbin.us)



## Generalized linear mixed model fit by maximum likelihood (Laplace## Approximation) [glmerMod]## Family: binomial ( logit )## Formula: cbind(deaths, population) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7) + (1 | COFIPS)## Data: spdat@data## ## AIC BIC logLik deviance df.resid ## 3147.2 3171.9 -1566.6 3133.2 245 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -1.47874 -0.11176 0.02936 0.16631 1.18769 ## ## Random effects:## Groups Name Variance Std.Dev.## COFIPS (Intercept) 0.01474 0.1214 ## Number of obs: 252, groups: COFIPS, 252## ## Fixed effects:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -4.63933 0.01128 -411.1 < 2e-16 ***## scale(ppersonspo) 0.03171 0.01458 2.2 0.029617 * ## scale(p65plus) 0.25976 0.01145 22.7 < 2e-16 ***## scale(pblack_1) 0.02315 0.01053 2.2 0.027948 * ## scale(phisp) -0.05789 0.01746 -3.3 0.000917 ***## I(RUCC >= 7)TRUE -0.04064 0.01995 -2.0 0.041675 * ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Correlation of Fixed Effects:## (Intr) scl(pp) sc(65) sc(_1) scl(ph)## scl(pprsns) 0.057 ## scl(p65pls) 0.283 -0.432 ## scl(pblc_1) -0.149 -0.475 0.349 ## scale(phsp) 0.011 -0.821 0.578 0.609 ## I(RUCC>=7)T -0.688 -0.087 -0.380 0.180 -0.003

hist(1000*fitted(fit.hbin.us),main="Texas Crude Death Rate Distribution from Hierarchical Binomial Model" )



So, who’s best here? AIC is a good rule:

test<-AIC(fit.bbin.us)AICs<-c(AIC(fit.bin.us),AIC(fit.poi.us),AIC(fit.g.us), AIC(fit.nb.us), test@istats$AIC, AIC(fit.hbin.us),AIC(fit.hpoi.us))plot(AICs, type="l", lwd=1.5, xaxt="n", xlab="")axis(1, at=1:7,labels=F) #7= number of modelslabels<-c("Bin", "Poi","Gau", "NBin","BetaBin", "HierBin", "HierPoi" )text(1:7, par("usr")[3]-.25, srt=45, adj=1, labels=labels, xpd=T)mtext(side=1, text="Model Specification", line=3)symbols(x= which.min(AICs), y=AICs[which.min(AICs)], circles=1, fg=2,lwd=2,add=T)



Autocorrelation in GLMsWhile we don’t have the same assumption about residuals for GLMs as we do for the Gaussian (OLS)model, we still may be curious if our model’s residuals display spatial clustering. While it’s notrecommended (https://stat.ethz.ch/pipermail/r-sig-geo/2009-November/007034.html), we can use thelm.morantest() function to examine correlation among the residuals.

lm.morantest(fit.poi.us, listw = us.wt4)

https://stat.ethz.ch/pipermail/r-sig-geo/2009-November/007034.html



## ## Global Moran's I for regression residuals## ## data: ## model: glm(formula = deaths ~ offset(log(E)) + scale(ppersonspo) +## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 7),## family = poisson, data = spdat)## weights: us.wt4## ## Moran I statistic standard deviate = 3.8068, p-value = 7.039e-05## alternative hypothesis: greater## sample estimates:## Observed Moran's I Expectation Variance ## 1.625924e-01 -8.222115e-06 1.824432e-03

lm.morantest(fit.bin.us, listw = us.wt4)

## ## Global Moran's I for regression residuals## ## data: ## model: glm(formula = cbind(deaths, population) ~ scale(ppersonspo)## + scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 7),## family = binomial, data = spdat)## weights: us.wt4## ## Moran I statistic standard deviate = 1.6679, p-value = 0.04766## alternative hypothesis: greater## sample estimates:## Observed Moran's I Expectation Variance ## 0.1623571 -1.2045497 0.6716097

lm.morantest(fit.nb.us, listw = us.wt4)



## ## Global Moran's I for regression residuals## ## data: ## model: glm.nb(formula = deaths ~ offset(log(E)) +## scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +## scale(phisp) + I(RUCC >= 7), data = spdat, init.theta =## 70.16980546, link = log)## weights: us.wt4## ## Moran I statistic standard deviate = 3.9806, p-value = 3.437e-05## alternative hypothesis: greater## sample estimates:## Observed Moran's I Expectation Variance ## 0.1696496093 -0.0002054652 0.0018208068

Spatial nuisance and GLMsInherently, we are kind of doing spatial analysis since we are using areal units as our observations, rightPaul? There are other options that have been explored that promote not modeling space explicityly butinstead filtering out the nuisance of spatial autocorrelation. Spatial Filtering is a method Griffith and Peres-Neto (2006) (http://www.esajournals.org/doi/abs/10.1890/0012-9658(2006)87%5B2603:SMIETF%5D2.0.CO%3B2) and Dray S, Legendre P and Peres-Neto PR (2005)(http://www.sciencedirect.com/science/article/pii/S0304380006000925) both describe this method. TheMoran eigenvector filtering function is intended to remove spatial autocorrelation from the residuals of(generalised) linear models. Tiefelsdorf M, Griffith DA. (2007) (http://www.spatialfiltering.com/) alsodescribe a method of using eigenvectors of the spatial weight matrix to filter out autocorrelation in amodel.

Typically this is done by 1) forming the spatial weight matrix, 2)compute the eigenvectors of the centeredweight matrix: , where I is the identity matrix, and 1 is a vector of 1’s. 3) selectthe eigenvectors to be included in a linear model as spatial predictors, so that the level of autocorrelationin the model is minimized, until Moran’s I is nonsigificant. We can do this using the ME() in the spdeplibrary.

ME() example

#you have to provide the theta value for the NB family, I got this from the model fit above.me.fit<-ME(deaths~offset(log(E))+scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat@data, family=negative.binomial(70.17), listw = us.wt4, verbose=T,alpha=.05 )

## eV[,161], I: 0.1411555 ZI: NA, pr(ZI): 0.01## eV[,234], I: 0.1165919 ZI: NA, pr(ZI): 0.01## eV[,39], I: 0.0982866 ZI: NA, pr(ZI): 0.06

(I + 1 /n)W(I + 1 /n)1′ 1′

http://www.esajournals.org/doi/abs/10.1890/0012-9658(2006)87%5B2603:SMIETF%5D2.0.CO%3B2

http://www.sciencedirect.com/science/article/pii/S0304380006000925

http://www.spatialfiltering.com/



me.fit

## Eigenvector ZI pr(ZI)## 0 NA NA 0.01## 1 161 NA 0.01## 2 234 NA 0.01## 3 39 NA 0.06

fits<-data.frame(fitted(me.fit))spdat$me161<-fits$vec161spdat$me234<-fits$vec234

spplot(spdat, c("me161", "me234"), at=quantile(c(spdat$me161, spdat$me234), p=seq(0,1,length.out = 7)), col.regions=brewer.pal(n=7, "Greys"), main="Spatial Distribution Moran Eigenvectors 161 & 234")



clean.nb<-glm.nb(deaths~offset(log(E))+scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)+fitted(me.fit), spdat@data)summary(clean.nb)

## ## Call:## glm.nb(formula = deaths ~ offset(log(E)) + scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= ## 7) + fitted(me.fit), data = spdat@data, init.theta = 73.40428524, ## link = log)## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -3.1490 -0.5935 0.1117 0.6388 2.2814 ## ## Coefficients:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.36388 0.01095 33.222 < 2e-16 ***## scale(ppersonspo) 0.03401 0.01409 2.414 0.015781 * ## scale(p65plus) 0.25258 0.01121 22.523 < 2e-16 ***## scale(pblack_1) 0.02145 0.01020 2.103 0.035491 * ## scale(phisp) -0.06034 0.01699 -3.553 0.000381 ***## I(RUCC >= 7)TRUE -0.03219 0.01951 -1.650 0.099005 . ## fitted(me.fit)vec161 -0.12309 0.12241 -1.006 0.314644 ## fitted(me.fit)vec234 0.19407 0.12088 1.606 0.108374 ## fitted(me.fit)vec39 0.33911 0.12335 2.749 0.005974 ** ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for Negative Binomial(73.4043) family taken to be 1)## ## Null deviance: 1367.15 on 251 degrees of freedom## Residual deviance: 251.14 on 243 degrees of freedom## AIC: 3140.5## ## Number of Fisher Scoring iterations: 1## ## ## Theta: 73.40 ## Std. Err.: 7.34 ## ## 2 x log-likelihood: -3120.455

lm.morantest(clean.nb, listw=us.wt4)



## ## Global Moran's I for regression residuals## ## data: ## model: glm.nb(formula = deaths ~ offset(log(E)) +## scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +## scale(phisp) + I(RUCC >= 7) + fitted(me.fit), data = spdat@data,## init.theta = 73.40428524, link = log)## weights: us.wt4## ## Moran I statistic standard deviate = 3.8382, p-value = 6.198e-05## alternative hypothesis: greater## sample estimates:## Observed Moran's I Expectation Variance ## 0.1655784395 -0.0001879006 0.0018652724

Auto-modelsOne model that has been proposed is the auto - model. This includes an auto-covariate in the model,which is an average of the neighboring values outcome: . This can be used forbinomial or Poisson but people don’t really seem to like this.

#Creating an auto- model#this involves creating an "auto-covariate"spdat$lag_rate<-lag.listw(x=us.wt4, var=(spdat$deaths/spdat$population))#binomial responsefit.bn.auto<-glm(cbind(deaths, population)~lag_rate+scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), family=binomial, data=spdat)summary(fit.bn.auto)

log(y) = β + WyX ′



## ## Call:## glm(formula = cbind(deaths, population) ~ lag_rate + scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= ## 7), family = binomial, data = spdat)## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -28.156 -1.863 1.055 3.645 18.616 ## ## Coefficients:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -4.776658 0.009746 -490.136 < 2e-16 ***## lag_rate 14.901955 0.975323 15.279 < 2e-16 ***## scale(ppersonspo) 0.011381 0.002305 4.938 7.88e-07 ***## scale(p65plus) 0.328584 0.002081 157.927 < 2e-16 ***## scale(pblack_1) 0.038738 0.001315 29.465 < 2e-16 ***## scale(phisp) -0.040886 0.002712 -15.076 < 2e-16 ***## I(RUCC >= 7)TRUE -0.131294 0.005379 -24.407 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for binomial family taken to be 1)## ## Null deviance: 79455.8 on 251 degrees of freedom## Residual deviance: 9668.5 on 245 degrees of freedom## AIC: 11867## ## Number of Fisher Scoring iterations: 4

#likewise for a poissonspdat$lagcount<-lag.listw(x=us.wt4, var=spdat$deaths/spdat$E)fit.poi.auto<-glm(deaths~lagcount+offset(log(E))+scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), family=poisson, data=spdat)summary(fit.poi.auto)



## ## Call:## glm(formula = deaths ~ lagcount + offset(log(E)) + scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= ## 7), family = poisson, data = spdat)## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -28.249 -1.855 1.072 3.659 18.694 ## ## Coefficients:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.222329 0.009703 22.912 < 2e-16 ***## lagcount 0.100923 0.006544 15.421 < 2e-16 ***## scale(ppersonspo) 0.011612 0.002296 5.058 4.23e-07 ***## scale(p65plus) 0.328030 0.002069 158.538 < 2e-16 ***## scale(pblack_1) 0.038660 0.001310 29.518 < 2e-16 ***## scale(phisp) -0.041102 0.002702 -15.211 < 2e-16 ***## I(RUCC >= 7)TRUE -0.131483 0.005353 -24.564 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for poisson family taken to be 1)## ## Null deviance: 80039 on 251 degrees of freedom## Residual deviance: 9753 on 245 degrees of freedom## AIC: 11954## ## Number of Fisher Scoring iterations: 4

#likewise for a negative binomialfit.nb.auto<-glm.nb(deaths~lagcount+offset(log(E))+scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat)summary(fit.nb.auto)



## ## Call:## glm.nb(formula = deaths ~ lagcount + offset(log(E)) + scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= ## 7), data = spdat, init.theta = 72.068416, link = log)## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -3.1730 -0.5644 0.0412 0.6208 2.1977 ## ## Coefficients:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.22353 0.06025 3.710 0.000207 ***## lagcount 0.09948 0.04096 2.429 0.015158 * ## scale(ppersonspo) 0.02508 0.01451 1.729 0.083893 . ## scale(p65plus) 0.24730 0.01190 20.784 < 2e-16 ***## scale(pblack_1) 0.02231 0.01024 2.178 0.029393 * ## scale(phisp) -0.04223 0.01824 -2.315 0.020624 * ## I(RUCC >= 7)TRUE -0.04668 0.01965 -2.376 0.017511 * ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for Negative Binomial(72.0684) family taken to be 1)## ## Null deviance: 1344.77 on 251 degrees of freedom## Residual deviance: 251.61 on 245 degrees of freedom## AIC: 3141## ## Number of Fisher Scoring iterations: 1## ## ## Theta: 72.07 ## Std. Err.: 7.20 ## ## 2 x log-likelihood: -3124.969

lm.morantest(fit.bn.auto, listw=us.wt4)



## ## Global Moran's I for regression residuals## ## data: ## model: glm(formula = cbind(deaths, population) ~ lag_rate +## scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +## scale(phisp) + I(RUCC >= 7), family = binomial, data = spdat)## weights: us.wt4## ## Moran I statistic standard deviate = 1.5315, p-value = 0.06283## alternative hypothesis: greater## sample estimates:## Observed Moran's I Expectation Variance ## 0.1490254 -1.1968593 0.7723265

lm.morantest(fit.poi.auto, listw=us.wt4)

## ## Global Moran's I for regression residuals## ## data: ## model: glm(formula = deaths ~ lagcount + offset(log(E)) +## scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +## scale(phisp) + I(RUCC >= 7), family = poisson, data = spdat)## weights: us.wt4## ## Moran I statistic standard deviate = 3.4757, p-value = 0.0002548## alternative hypothesis: greater## sample estimates:## Observed Moran's I Expectation Variance ## 1.490526e-01 -8.185909e-06 1.839284e-03

lm.morantest(fit.nb.auto, listw=us.wt4)



## ## Global Moran's I for regression residuals## ## data: ## model: glm.nb(formula = deaths ~ lagcount + offset(log(E)) +## scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +## scale(phisp) + I(RUCC >= 7), data = spdat, init.theta = 72.068416,## link = log)## weights: us.wt4## ## Moran I statistic standard deviate = 3.0207, p-value = 0.001261## alternative hypothesis: greater## sample estimates:## Observed Moran's I Expectation Variance ## 0.1291953881 -0.0002095377 0.0018352684

spdat$resnb<-rstudent(fit.nb.auto)spdat$resnb1<-rstudent(fit.nb.us)spdat$resclean<-rstudent(clean.nb)spplot(spdat,c("resnb1", "resnb", "resclean"),at=quantile(c(spdat$resnb1,spdat$resnb, spdat$resclean), p=seq(0,1,length.out = 7) ), col.regions=brewer.pal(n=7, "Greys"), main="Spatial Distribution Neg. Bin Residuals", names.attr=c("NB Regresssion","Auto NB Regression", "Filtered NB"))



lm.morantest(fit.nb.us, listw=us.wt4)



## ## Global Moran's I for regression residuals## ## data: ## model: glm.nb(formula = deaths ~ offset(log(E)) +## scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +## scale(phisp) + I(RUCC >= 7), data = spdat, init.theta =## 70.16980546, link = log)## weights: us.wt4## ## Moran I statistic standard deviate = 3.9806, p-value = 3.437e-05## alternative hypothesis: greater## sample estimates:## Observed Moran's I Expectation Variance ## 0.1696496093 -0.0002054652 0.0018208068

lm.morantest(fit.nb.auto, listw=us.wt4)

## ## Global Moran's I for regression residuals## ## data: ## model: glm.nb(formula = deaths ~ lagcount + offset(log(E)) +## scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +## scale(phisp) + I(RUCC >= 7), data = spdat, init.theta = 72.068416,## link = log)## weights: us.wt4## ## Moran I statistic standard deviate = 3.0207, p-value = 0.001261## alternative hypothesis: greater## sample estimates:## Observed Moran's I Expectation Variance ## 0.1291953881 -0.0002095377 0.0018352684

lm.morantest(clean.nb, listw=us.wt4)



## ## Global Moran's I for regression residuals## ## data: ## model: glm.nb(formula = deaths ~ offset(log(E)) +## scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +## scale(phisp) + I(RUCC >= 7) + fitted(me.fit), data = spdat@data,## init.theta = 73.40428524, link = log)## weights: us.wt4## ## Moran I statistic standard deviate = 3.8382, p-value = 6.198e-05## alternative hypothesis: greater## sample estimates:## Observed Moran's I Expectation Variance ## 0.1655784395 -0.0001879006 0.0018652724

We almost get rid of the autocorrelation in the binomial model using the autoregressive rate.

dem 7263 fall 2015 spatial glm's

Science