spatial models for data reported as counts over geographic areas

Post on 10-Feb-2016

33 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

SPATIAL MODELS FOR DATA REPORTED AS COUNTS OVER GEOGRAPHIC AREAS. Gary Simon, 28 APRIL 2006. With special thanks… Frank LoPresti, Academic Computing Services, GIS Group Kevin Tun, Stern I.T. Group. Here’s an interesting obscure formula. Consider a set of points: - PowerPoint PPT Presentation

TRANSCRIPT

SPATIAL MODELS

FOR DATA

REPORTED AS COUNTS

OVER GEOGRAPHIC AREAS

Gary Simon, 28 APRIL 2006

With special thanks…

Frank LoPresti, Academic Computing

Services, GIS Group

Kevin Tun, Stern I.T. Group

Here’s an interesting obscure formula. Consider a set of points:

Point 1: (x1 , y1)

Point 2: (x2 , y2)

….

Point n: (xn , yn)

Connect the points in order. Draw a line from point 1 to point 2, then from point 2 to point 3, …., from point n-1 to point n. Finally draw a line from point n back to point 1.

Assume that none of the segments cross, so that this is a polygon.

The area of the resulting polygon is given by

1 11

12

n

i i i ii

x y x y

The + occurs when the perimeter is drawn counter-clockwise, the – when drawn clockwise.

The data:

K regions

Counts zl , z2 , …, zK

Total count z+

Populations P1, P2 , …, PK

Total population P+

The obvious null hypothesis of uniformity is tested by

G2 = 1

2 logK k

kkk

zz Pz

P

Uniformity is often rejected. What should be the alternative to uniformity?

Techniques like kriging assess covariance structure and not the structure of the expected counts.

There are also techniques that measure spatial association (Cliff and Ord, 1973, 1981) with I and with c, and these also relate to covariance notions. Cliff, A.D. and Ord, J.K. (1981) Spatial Autocorrelation, London: Pion.

Cliff, A.D. and Ord, J.K. (1981) Spatial Processes: Models and Applications, London: Pion.

Spatial association can also be given angular interpretations (Simon, 1997). Simon, Gary (1997) An Angular Version of Spatial Correlations, with Exact Significance Tests,

Geographical Analysis, vol 29, #3, pp 267-278.

Let’s form a model for the “spatial force” and give this model a central location or hot spot.

Note this location as s = .x

y

ss

Here sx and sy are parameters to be estimated.

Let f(z) be the spatial force at

location z = .

Then let f(z) =

=

xy

2

22

c z s

1 11

c

z s z s

Since f(z) = ,

f(s) = c .

At any z with = α ,

f(z) = .

Thus α is a “half-strength” distance.

2

22

c z s

z s

2c

In this form, the only role of c is to assure the condition

1 1

EK K

k kk k

z z

This can be generalized to mix uniform and hot-spot features.

f(z) = 2

221c

z s

The parameter ω assesses the strength of the hot-spot relative to uniformity.

Negative ω notes a protective effect.

The maximum likelihood expected counts { ek } will be used in the test statistic

1

2 logK k

kk k

zz

e

G2 =

The value of ek will be computed as

Pk × “average” force on county k

scaled so that 1 1

K K

k kk k

z e

Consider cancer rates in Florida. “Age-Adjusted Death Rates for Florida, 1998 – 2002.”

http://www.stateofflorida.com

Florida has 67 counties.

There were 38,814 cases in a population of 15,982,378. The rate is 2.43 per 1,000.

The G2 statistic is 2,816.27 on 66 degrees of freedom.

The cancer rates are not uniform.

The maximum likelihood fit occurred at parameter values

sx = 375.8877

sy = 300.6793

α = 13.4375

ω = 2.325

This fit has G2 = 2,246.93 on

67 - 4 = 62 degrees of freedom.

This is still an inadequate fit, but the reduction in G2 is 569.34 with four degrees of freedom.

The fitted values are these:

The hot spot is at (82.56 w long, 28.80 n lat), in Citrus County.

Map information comes in (longitude, latitude) form that needs to be converted to (x, y) form in (say) miles.

Each degree of latitude has the same mile equivalent.

North Pole

Equatorial plane

One degree of latitude

cuts off same arc length

at all latitudes.

However, a degree of longitude represents a small distance near the poles and a large distance near the equator.

Equator

30° N Latitude

Problem: Find the length of one degree of longitude at latitude θ.

Solution: Form a triangle with one corner at the north pole, an angle of one degree at the north pole, and with sides 90°-θ.

In a spherical triangle, the sides also have angle measure.

Equator

30° N Latitude

We can use the law of sines for spherical triangles:

sin sin sinsin sin sin

a b cA B C

A, B, C are the angles and a, b, c are the sides.

The computation of E(zk) = ek is found as Pk × “average” force on county k.

This average force could be f(ck), where ck is the center of the county.

Instead we will use

where denotes the county and h is the two-dimensional variable of integration.

Areaf d h hB

B

Area BThe value of can be obtained from outside sources.

The challenge comes in finding

This can be difficult even for simple figures; is not simple.

f d h hB

Finding requires

some organized description of , the boundary of .

Fortunately, such descriptions are available from mapping programs.

f d h hB

Consider this geographical region:

Mapping program MapInfo will export an MIF file giving coordinates of (latitude, longitude) points on the boundary.

The file has layout 26-75 40.1288-75.0154 40.1378-75.1094 40.0454...-75 40.0294-74.9755 40.0485-74.9893 40.1259-75 40.1288

A graph of these points:

With the boundary so identified, county is a polygon, so the task of finding is equivalent to integrating over that polygon.

f d h hB

The mathematics can be done with Green’s theorem.

Green’s theorem for connected region and for scalar functions P and Q of two variables is

=

P dx Q dy

B

Q P dx dyx y

B

The boundary needs to be parameterized as a function of a single variable, say t. This is possible when the boundary is made up of simple curves or, as in the MapInfo story, straight lines.

The line connecting

to

is parameterized as

,k kx yB B 1 1,k kx y B B

1

1

; 0 1k k k

k k k

x t x x x tt

y t y y y t

B B B

B B B

Note that dy means . 1k ky y dt B B

In the statement of Green’s theorem,

=

let’s use and

so that 1Q P

x y

Q P dx dyx y

B

P dx Q dy

B

1Qx

0Py

Green’s theorem is now

=

= Area() =

Q P dx dyx y

B

dx dyB

P dx Q dy

B

This solves as

P(x, y) = 0 and Q(x, y) = x

and then

Area() = x dyB

With the boundary given as a polygon, the calculation is routine.

The consequence is

Area() =

where m is the number of boundary points of region .

1 11

12

m

k k k kk

x y x y

B B B B B

This calculation finds the area of region and, as a side benefit, discovers whether the point ordering was clockwise or counter-clockwise.

We need also the integrated force function

f d h hB

Match

to Green’s theorem

=

with P(x, y) ≡ 0

and

f d h hB

Q P dx dyx y

B

P dx Q dy

B

,Q f f x yx

h

This means that we need to be able to find

Q(x, y) =

The solution is Q(x, y) =

,f dx f x y dx z

21

2 22 2tan x

y y

c x scxy s y s

Then

=

=

f d h hB

,Q x y dyB

21

2 22 2tan x

y y

c x scx dyy s y s

B

Let , , … ,

be the boundary points of . Then

1 1,x yB B 2 2,x yB B ,m mx yB B

B B

1 segment

, ,m

k k

Q x y dy Q x y dy

B

B

Segment k connects point k to point k + 1.

(Last segment goes back to point 1.)

Each segment is parameterized by t, with 0 t 1. The integral can be found by any reasonable approximation method.

If the interval is short, use the average of the integrand at the endpoints.

In particular,

Segment

,k

Q x y dy 1 1

Segment

, ,

2k k k k

k

Q x y Q x ydy

B B B B

=

=

1

1 11

0

, ,2

k k k kk k

Q x y Q x yy y dt

B B B BB B

1 11

, ,

2k k k k

k k

Q x y Q x yy y

B B B BB B

The summation over k collapses to the very simple form

1 11

,

2

mk k

k kk

Q x yy y

B B B

B B

The counter k is to be interpreted mod(m).

For any values of sx, sy, α, ω it is possible to express a Poisson likelihood and thus to get maximum likelihood estimates.

This is not easy computation.

Trevelyan, Smallman-Raynor, and Cliff provided a spatial analysis of the 1916 polio epidemic that hit the northeastern United States.

Trevelyan, Barry, Smallman-Raynor, Matthew, and Cliff, Andrew D. (2005) The Spatial Structure of Epidemic

Emergence: Geographical Aspects of Poliomyelitis in North-eastern USA, July-October 1916, Journal of the RoyalStatistical Society, Series A, vol 168, part 4, pp 701-722.

Their region of inquiry:

County-based data for 148 counties.

These counties had total population 20,532,602 and 20,777 cases of polio.

This is about 1.01 cases per thousand people.

Observed polio rates:

Test for uniformity gives

G2 = 16,713.64

147 degrees of freedom

Maximum likelihood estimates:

sx = 450.78

sy = 135.77

α = 56.80

ω = 15.66The center is offshore, east of Ocean County,

New Jersey.

The display of fitted rates:

Fit measure is G2 = 7,045.73143 degrees of freedom

Reduction in G2 is 9,667.91, for four degrees of freedom.

Next step: Use the integrated force function as a carrier in a Poisson regression.

The End

top related