pattern statistics michael f. goodchild university of california santa barbara

Pattern StatisticsPattern Statistics

Michael F. Goodchild

University of California

Santa Barbara

OutlineOutline

Some examples of analysis Objectives of analysis Cross-sectional analysis Point patterns

What are we trying to do?What are we trying to do?

Infer process– processes leave distinct fingerprints on the

landscape– several processes can leave the same

fingerprints• enlist time to resolve ambiguity• invoke Occam's Razor• confirm a previously identified hypothesis

AlternativesAlternatives

Expose aspects of pattern that are otherwise invisible– Openshaw– Cova

Expose anomalies, patterns Convince others of the existence of

patterns, problems, anomalies

Cross-sectional analysisCross-sectional analysis

Social data collected in cross-section– longitudinal data are difficult to construct– difficult for bureaucracies to sustain– compare temporal resolution of process to

temporal resolution of bureaucracy Cross-sectional perspectives are rich in

context– can never confirm process– though they can perhaps falsify– useful source of hypotheses, insights

What kinds of patterns are of interest?What kinds of patterns are of interest?

Unlabeled objects– how does density vary?– do locations influence each other?– are there clusters?

Labeled objects– is the arrangement of labels random?– or do similar labels cluster?– or do dissimilar labels cluster?

First-order effectsFirst-order effects

Random process (CSR)– all locations are equally likely– an event does not make other events more likely

in the immediate vicinity

First-order effect– events are more likely in some locations than

others– events may still be independent– varying density

Second-order effectsSecond-order effects

Event makes others more or less likely in the immediate vicinity– clustering– but is a cluster the result of first- or second-

order effects?– is there a prior reason to expect variation in

density?

Testing methodsTesting methods

Counts by quadrat– Poisson distribution

!)( remrP mr

Deaths by horse-kick in the Prussian armyDeaths by horse-kick in the Prussian army

Mean m = 0.61, n = 200

Deaths per yr 0 1 2 3 4

Probability 0.543 0.331 0.101 0.021 0.003

Number of years expected

109.0 66.3 20.2 4.1 0.6

Number of years observed

109 65 22 3 1

Towns in IowaTowns in Iowa

1173 towns, 154 quadrats 20mi by 10mi0 3 2.4

1 10 9.9

2 11 20.6

3 31 28.7

4 35 30.0

5 28 25.0

6 23 17.4

7 6 10.4

8 6 5.4

9+ 1 4.0

Chisquare with 8 df = 12.7

Accept H0

Distance to nearest neighborDistance to nearest neighbor

Observed mean distance ro

Expected mean distance re = 1/2d

– where d is density per unit area Test statistic:

rrz eo

26136.0

Towns in IowaTowns in Iowa

622 points tested 643 per unit area Observed mean distance 3.52 Expected mean distance 3.46 Test statistic 0.82 Accept H0

But what about scale?But what about scale?

A pattern can be clustered at one scale and random or dispersed at another

Poisson test– scale reflected in quadrat size

Nearest-neighbor test– scale reflected in choosing nearest

neighbor– higher-order neighbors could be analyzed

Weaknesses of these simple methodsWeaknesses of these simple methods

Difficulty of dealing with scale Second-order effects only

– density assumed uniform Better methods are needed

K-function analysisK-function analysis

K(h) = expected number of events within h of an arbitrarily chosen event, divided by d

How to estimate K?– take an event i– for every event j lying within h of i:

• score 1

Allowing for edge effectsAllowing for edge effects

score < 1

The K functionThe K function

In CSR K(h) = h2

So instead plot:

hhKhL 5.0

What about labeled points?What about labeled points?

How are the points located?– random, clustered, dispersed

How are the values assigned among the points?– among possible arrangments– random– clustered– dispersed

Moran and Geary indicesMoran and Geary indices

i j iiij

i jjiij

axaxwn

pattern statistics michael f. goodchild university of california santa barbara

Documents

queuing and transportation transportation logistics prof....

chapter 11 spatial analysis credit to prof michael goodchild

integrating space-time analysis michael f. goodchild...

professor goodchild cee 500

frequent pattern mining - mat uc santa barbara ·...

cee 320 fall 2008 queuing cee 320 anne goodchild

research experience for teachers (ret) as professional...

uc santa barbara...

mythbusters local anesthesia edition jason h. goodchild,...

by: barbara pretty patterns striped patterns are any pattern...

enhancing the vision of cybergis michael f. goodchild...

future directions for geolibraries michael f. goodchild...

twenty years of progress: giscience in...

formalizing place in geographic …good/papers/502.pdf1...

gis, giscience, and spatial data: an american perspective...

spatial thinking and the gis user interface michael f....

james goodchild ex historiaex historiaex historia 166

rachel goodchild products

thinking critically about geospatial data quality michael f....

transportation logistics cee 587 professor goodchild 3/30/09