pattern statistics michael f. goodchild university of california santa barbara

31
Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Pattern StatisticsPattern Statistics

Michael F. Goodchild

University of California

Santa Barbara

Page 2: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

OutlineOutline

Some examples of analysis Objectives of analysis Cross-sectional analysis Point patterns

Page 3: Pattern Statistics Michael F. Goodchild University of California Santa Barbara
Page 4: Pattern Statistics Michael F. Goodchild University of California Santa Barbara
Page 5: Pattern Statistics Michael F. Goodchild University of California Santa Barbara
Page 6: Pattern Statistics Michael F. Goodchild University of California Santa Barbara
Page 7: Pattern Statistics Michael F. Goodchild University of California Santa Barbara
Page 8: Pattern Statistics Michael F. Goodchild University of California Santa Barbara
Page 9: Pattern Statistics Michael F. Goodchild University of California Santa Barbara
Page 10: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

What are we trying to do?What are we trying to do?

Infer process– processes leave distinct fingerprints on the

landscape– several processes can leave the same

fingerprints• enlist time to resolve ambiguity• invoke Occam's Razor• confirm a previously identified hypothesis

Page 11: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

AlternativesAlternatives

Expose aspects of pattern that are otherwise invisible– Openshaw– Cova

Expose anomalies, patterns Convince others of the existence of

patterns, problems, anomalies

Page 12: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Cross-sectional analysisCross-sectional analysis

Social data collected in cross-section– longitudinal data are difficult to construct– difficult for bureaucracies to sustain– compare temporal resolution of process to

temporal resolution of bureaucracy Cross-sectional perspectives are rich in

context– can never confirm process– though they can perhaps falsify– useful source of hypotheses, insights

Page 13: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

What kinds of patterns are of interest?What kinds of patterns are of interest?

Unlabeled objects– how does density vary?– do locations influence each other?– are there clusters?

Labeled objects– is the arrangement of labels random?– or do similar labels cluster?– or do dissimilar labels cluster?

Page 14: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

First-order effectsFirst-order effects

Random process (CSR)– all locations are equally likely– an event does not make other events more likely

in the immediate vicinity

First-order effect– events are more likely in some locations than

others– events may still be independent– varying density

Page 15: Pattern Statistics Michael F. Goodchild University of California Santa Barbara
Page 16: Pattern Statistics Michael F. Goodchild University of California Santa Barbara
Page 17: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Second-order effectsSecond-order effects

Event makes others more or less likely in the immediate vicinity– clustering– but is a cluster the result of first- or second-

order effects?– is there a prior reason to expect variation in

density?

Page 18: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Testing methodsTesting methods

Counts by quadrat– Poisson distribution

!)( remrP mr

Page 19: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Deaths by horse-kick in the Prussian armyDeaths by horse-kick in the Prussian army

Mean m = 0.61, n = 200

Deaths per yr 0 1 2 3 4

Probability 0.543 0.331 0.101 0.021 0.003

Number of years expected

109.0 66.3 20.2 4.1 0.6

Number of years observed

109 65 22 3 1

Page 20: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Towns in IowaTowns in Iowa

1173 towns, 154 quadrats 20mi by 10mi0 3 2.4

1 10 9.9

2 11 20.6

3 31 28.7

4 35 30.0

5 28 25.0

6 23 17.4

7 6 10.4

8 6 5.4

9+ 1 4.0

Chisquare with 8 df = 12.7

Accept H0

Page 21: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Distance to nearest neighborDistance to nearest neighbor

Observed mean distance ro

Expected mean distance re = 1/2d

– where d is density per unit area Test statistic:

nd

rrz eo

26136.0

Page 22: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Towns in IowaTowns in Iowa

622 points tested 643 per unit area Observed mean distance 3.52 Expected mean distance 3.46 Test statistic 0.82 Accept H0

Page 23: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

But what about scale?But what about scale?

A pattern can be clustered at one scale and random or dispersed at another

Poisson test– scale reflected in quadrat size

Nearest-neighbor test– scale reflected in choosing nearest

neighbor– higher-order neighbors could be analyzed

Page 24: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Weaknesses of these simple methodsWeaknesses of these simple methods

Difficulty of dealing with scale Second-order effects only

– density assumed uniform Better methods are needed

Page 25: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

K-function analysisK-function analysis

K(h) = expected number of events within h of an arbitrarily chosen event, divided by d

How to estimate K?– take an event i– for every event j lying within h of i:

• score 1

Page 26: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Allowing for edge effectsAllowing for edge effects

score < 1

Page 27: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

The K functionThe K function

In CSR K(h) = h2

So instead plot:

hhKhL 5.0

Page 28: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

What about labeled points?What about labeled points?

How are the points located?– random, clustered, dispersed

How are the values assigned among the points?– among possible arrangments– random– clustered– dispersed

Page 29: Pattern Statistics Michael F. Goodchild University of California Santa Barbara
Page 30: Pattern Statistics Michael F. Goodchild University of California Santa Barbara

Moran and Geary indicesMoran and Geary indices

i j iiij

i jjiij

axw

xxwn

c 2

2

2

1

n

i

n

i

n

jiji

n

i

n

jjiij

wax

axaxwn

I

1 1 1

2

1 1

Page 31: Pattern Statistics Michael F. Goodchild University of California Santa Barbara