pattern statistics michael f. goodchild university of california santa barbara
Post on 22-Dec-2015
215 views
TRANSCRIPT
Pattern StatisticsPattern Statistics
Michael F. Goodchild
University of California
Santa Barbara
OutlineOutline
Some examples of analysis Objectives of analysis Cross-sectional analysis Point patterns
What are we trying to do?What are we trying to do?
Infer process– processes leave distinct fingerprints on the
landscape– several processes can leave the same
fingerprints• enlist time to resolve ambiguity• invoke Occam's Razor• confirm a previously identified hypothesis
AlternativesAlternatives
Expose aspects of pattern that are otherwise invisible– Openshaw– Cova
Expose anomalies, patterns Convince others of the existence of
patterns, problems, anomalies
Cross-sectional analysisCross-sectional analysis
Social data collected in cross-section– longitudinal data are difficult to construct– difficult for bureaucracies to sustain– compare temporal resolution of process to
temporal resolution of bureaucracy Cross-sectional perspectives are rich in
context– can never confirm process– though they can perhaps falsify– useful source of hypotheses, insights
What kinds of patterns are of interest?What kinds of patterns are of interest?
Unlabeled objects– how does density vary?– do locations influence each other?– are there clusters?
Labeled objects– is the arrangement of labels random?– or do similar labels cluster?– or do dissimilar labels cluster?
First-order effectsFirst-order effects
Random process (CSR)– all locations are equally likely– an event does not make other events more likely
in the immediate vicinity
First-order effect– events are more likely in some locations than
others– events may still be independent– varying density
Second-order effectsSecond-order effects
Event makes others more or less likely in the immediate vicinity– clustering– but is a cluster the result of first- or second-
order effects?– is there a prior reason to expect variation in
density?
Testing methodsTesting methods
Counts by quadrat– Poisson distribution
!)( remrP mr
Deaths by horse-kick in the Prussian armyDeaths by horse-kick in the Prussian army
Mean m = 0.61, n = 200
Deaths per yr 0 1 2 3 4
Probability 0.543 0.331 0.101 0.021 0.003
Number of years expected
109.0 66.3 20.2 4.1 0.6
Number of years observed
109 65 22 3 1
Towns in IowaTowns in Iowa
1173 towns, 154 quadrats 20mi by 10mi0 3 2.4
1 10 9.9
2 11 20.6
3 31 28.7
4 35 30.0
5 28 25.0
6 23 17.4
7 6 10.4
8 6 5.4
9+ 1 4.0
Chisquare with 8 df = 12.7
Accept H0
Distance to nearest neighborDistance to nearest neighbor
Observed mean distance ro
Expected mean distance re = 1/2d
– where d is density per unit area Test statistic:
nd
rrz eo
26136.0
Towns in IowaTowns in Iowa
622 points tested 643 per unit area Observed mean distance 3.52 Expected mean distance 3.46 Test statistic 0.82 Accept H0
But what about scale?But what about scale?
A pattern can be clustered at one scale and random or dispersed at another
Poisson test– scale reflected in quadrat size
Nearest-neighbor test– scale reflected in choosing nearest
neighbor– higher-order neighbors could be analyzed
Weaknesses of these simple methodsWeaknesses of these simple methods
Difficulty of dealing with scale Second-order effects only
– density assumed uniform Better methods are needed
K-function analysisK-function analysis
K(h) = expected number of events within h of an arbitrarily chosen event, divided by d
How to estimate K?– take an event i– for every event j lying within h of i:
• score 1
Allowing for edge effectsAllowing for edge effects
score < 1
The K functionThe K function
In CSR K(h) = h2
So instead plot:
hhKhL 5.0
What about labeled points?What about labeled points?
How are the points located?– random, clustered, dispersed
How are the values assigned among the points?– among possible arrangments– random– clustered– dispersed
Moran and Geary indicesMoran and Geary indices
i j iiij
i jjiij
axw
xxwn
c 2
2
2
1
n
i
n
i
n
jiji
n
i
n
jjiij
wax
axaxwn
I
1 1 1
2
1 1