statistics - zhejiangzhejiang.cs.ua.edu/.../spatialbigdata/slides/spatialstatistics.pdf · what is...
TRANSCRIPT
![Page 2: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/2.jpg)
What is Spatial Statistics?• Statistics• The study of collection, analysis, interpretation of data• Descriptive v.s. inferential
• Spatial statistics• Statistics for spatial data (point, line, polygon, raster)• Variables indexed in 2D or 3D, random locations• Unique properties
• Non i.i.d.• Spatial autocorrelation• Isotropy v.s. anisotropy• Stationarity v.s. non-stationarity
2
![Page 3: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/3.jpg)
Categories of Spatial Statistics• Geostatistics – point reference data• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷is 𝑟-dimensional Euclidean space• 𝑌 is random, 𝑠 is fixed
• Lattice statistics – areal data• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is a tessellation of Euclidean space• 𝑌 is random, 𝑠 is fixed
• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare point locations• 𝑠 is random
3U.S.riverstreamgaugeobservations Electionresultbycountyin2016 Shooting,Chicago2010Source:http://assets.dnainfo.comSource:http://brilliantmaps.comSource:USGS.gov
![Page 4: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/4.jpg)
Part I: Geostatistics• Point reference data• A stochastic process: 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is 𝑟-dimensional
Euclidean space• Example:
• What is Geostatistics used for?• Exploratory data analysis• Spatial interpolation
4
U.S.riverstreamgaugeobservations Measuresatminingsites
![Page 5: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/5.jpg)
Applications• Estimate precipitation based on records at a set of
weather stations• Infer ground water level based on sensor readings of
a set of gauges• Predict mineral resources based on samples at a
limited number of sites
5U.S.riverstreamgaugeobservations
![Page 6: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/6.jpg)
Spatial Stationarity• 𝑌 𝑠 : 𝑠 ∈ 𝐷 , 𝐷 is 𝑟-dimensional Euclidean space• 𝑌 𝑠 is strictly stationary when• Distribution unchanged when locations shifted• For any 𝑛 ≥ 1, any 𝑛locations 𝑠), 𝑠*, 𝑠+, … , 𝑠- , ℎ ∈ 𝑅6• 𝑌(𝑠)), 𝑌(𝑠*), … , 𝑌(𝑠- ) , 𝑌(𝑠) + ℎ), 𝑌(𝑠* + ℎ), … , 𝑌(𝑠- + ℎ) has
same distribution• 𝑌 𝑠 is weakly stationary when• Mean, (co)variance unchanged when locations shifted• 𝐸 𝑌 𝑠 = 𝜇= ≡ 𝜇 (constant mean)• 𝐶𝑜𝑣 𝑌 𝑠 , 𝑌 𝑠 + ℎ = 𝐶(ℎ) for all ℎ ∈ 𝑅6
6
Covarianceacrossanytwolocationsissimplyafunctionofh!
Oftentoostrong,notrealistic!
Illustrativeexamplesource:http://azvoleff.com/
![Page 7: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/7.jpg)
Variogram• Tobler’s first law of geography:
• “Everything is related to everything else, but near things are more related than distant things.”
• How is “difference” (“irrelevance”) of two observations increase with distance?
• How to measure the range of spatial “relatedness”?
• 𝑌 𝑠 is intrinsically stationary when• 𝐸 𝑌 𝑠 = 𝜇= ≡ 𝜇 (constant mean)• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*≡ 2𝑟 ℎ for any s, ℎ• 2𝑟 ℎ is called variogram• 𝑟 ℎ is called semi-variogram
• 𝑌 𝑠 is isotropy if𝑟 ℎ ≡ 𝑟( ℎ )
7
“Difference” across two locations only depends on h!
With isotropy, can plot a curve for r(|h|)!
Avg.annualprecip.(wrcc.dri.edu)
But is this assumption always valid?
h
s
Illustrationoffirstlawofgeographysource:http://azvoleff.com/
![Page 8: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/8.jpg)
Variogram Plot• Models of semi-variogram
8
𝑟 𝑑 = G𝜏* + 𝜎*𝑖𝑓𝑑 > 00𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑟 𝑑 =
𝜏* + 𝜎*𝑖𝑓𝑑 >1𝜙
𝜏* + 𝜎*3𝜙𝑑 − 𝜙𝑑 *
2 𝑖𝑓0 < 𝑑 ≤1𝜙
0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒Linear
Spherical
![Page 9: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/9.jpg)
Variogram Example
9
Diameteratbreastheightontrees
SeeRexample!
![Page 10: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/10.jpg)
Variogram v.s. Covarigram• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*= 2𝑟 ℎ• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*= 𝑉𝑎𝑟 𝑠 + ℎ + 𝑉𝑎𝑟 𝑠 −2𝐶𝑜𝑣 𝑠 + ℎ, 𝑠 = 2𝐶𝑜𝑣 0 − 2𝐶𝑜𝑣 ℎ• Thus, 𝑟 ℎ = 𝐶 0 − 𝐶 ℎ , r ℎ and 𝐶 ℎ related!• Covariance 𝐶 ℎ is a function of ℎ
10
![Page 11: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/11.jpg)
Ordinary Kriging• Problem: • Given observations at locations 𝑦(𝑠)), 𝑦(𝑠*), … , 𝑦(𝑠- ) , • How to predict 𝑦(𝑠Y)
• Assumptions:• Weak (intrinsic) stationarity • Known covariance 𝑪 𝒉• Unknown constant 𝐸 𝑌 ≡ 𝜇• Linear estimation:𝑦\(𝑠Y) = ∑ 𝑙.𝑦(𝑠.)-
._)
• Approach:• Minimize expected square loss!
11𝐸 𝑦(𝑠Y −` 𝑙.𝑦(𝑠.)
-
._))*
![Page 12: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/12.jpg)
Ordinary Kriging
12
Minimize:𝐸 𝑦(𝑠Y − ∑ 𝑙.𝑦(𝑠.)-._) )*
𝐸 𝑦(𝑠Y −` 𝑙.𝑦(𝑠.)-
._)) = 0
𝐸(𝑦(𝑠Y)) ≡ 𝐸(𝑦(𝑠.)) ≡ 𝜇1 −` 𝑙. = 0
-
._)
= 𝑉𝑎𝑟(𝑦(𝑠Y)) − 2𝐶𝑜𝑣 𝑦(𝑠Y ,` 𝑙.𝑦(𝑠.)) +-
._)` ` 𝑙.𝑙a𝐶𝑜𝑣 𝑦(𝑠. , 𝑦(𝑠a))
-
a_)
-
._)
= 𝐶Y,Y −2` 𝑙.𝐶Y,. +` ` 𝑙.𝑙a𝐶.,a-
a_)
-
._)
-
._)
= 𝐶Y,Y −2𝐶Y,∗c 𝑙 + 𝑙c𝐶𝑙
𝟏c𝑙 − 1 = 0
Constrainedconvexoptimizationproblem!UsingLagrangian multiplier,Optimalsolution:
Remember 𝐶𝑜𝑣 𝑌 𝑠 , 𝑌 𝑠 + ℎ ≡ 𝐶 ℎ𝐶 ℎ covariogramcan be estimated from data!
𝑙∗ = 𝐶e)[𝐶Y∗ −𝟏c𝐶e)𝐶Y∗ − 1𝟏c𝐶e)1 𝟏] 𝑦\(𝑠Y) = 𝑙∗c𝑌
Notation:𝐶.,a ≡ 𝐶𝑜𝑣(𝑌 𝑠. , 𝑌 𝑠a )
![Page 13: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/13.jpg)
Universal Kriging• Problem: • Given observations y = 𝑦(𝑠)), … , 𝑦(𝑠- ) , covariates 𝑥(𝑠)), … , 𝑥(𝑠- ) , 𝑥(𝑠Y),predict 𝑦(𝑠Y)
• Assumptions:• 𝑦(𝑠.) = 𝑥(𝑠.)c𝛽 + 𝜖., 𝒀 = 𝑿𝛽 + 𝝐• 𝝐~𝑁(0, Σ), where Σ = 𝜎*𝐻 ∅ + 𝜏*𝐼
• Estimator: 𝑦\(𝑠Y) = ℎ(𝑦)• How to find optimal ℎ(𝑦)?• Minimize expected square loss!
13
𝐸 𝑦(𝑠Y − ℎ(𝑦))*
𝑦\(𝑠Y) = ℎ 𝑦 = 𝐸 𝑦(𝑠Y |𝑦)Optimalprediction:
= × +
𝒀 𝑿 𝛽 𝝐
![Page 14: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/14.jpg)
Universal Kriging
14
𝒀 = 𝑿𝛽 + 𝝐
= × +
𝒀 𝑿 𝛽 𝝐
𝝐~𝑁(0, Σ),whereΣ = 𝜎*𝐻 ∅ + 𝜏*𝐼AssumingaGaussianprocess,i.e.,anysetofobservationYfollowsGaussiandistribution
𝑌)𝑌*
~𝑁𝜇)𝜇* , Ω)) Ω)*
Ω*) Ω**
𝐸(𝑌)|𝑌*) = 𝜇) + Ω)*Ω**e)(𝑌* − 𝜇*)
𝑉𝑎𝑟(𝑌)|𝑌*) = Ω)) − Ω)*Ω**e)Ω*)
𝑦\(𝑠Y) = ℎ 𝑦 = 𝐸 𝑦(𝑠Y |𝑦)Optimalprediction:
𝑌) = 𝑦(𝑠Y)𝑌* = 𝑦 = 𝑦(𝑠) , … , 𝑦(𝑠-))c
𝜇) = 𝑥(𝑠Y)c𝛽𝜇* = (𝑥(𝑠))c𝛽, … , 𝑥(𝑠-)c𝛽)c= 𝑿𝛽
𝐸 𝑦(𝑠Y 𝑦 = 𝑥(𝑠Y)c𝛽 + 𝐶Y∗c Σe)(𝑦 − 𝑋𝛽)
𝛽w 𝐶 ℎ + 𝜏*𝐼
![Page 15: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/15.jpg)
Review Questions:• For each of the following statement, True or False?
1. In Geostatistics, strict stationarity is often assumed.2. Variogram can help select a distance threshold for
spatial neighborhood (range of spatial dependency).3. Kriging assumes weak or intrinsic stationarity.4. Ordinary Kriging assumes weak or intrinsic
stationarity.
15
![Page 16: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/16.jpg)
Part II: Lattice Statistics• Areal data model• A tessellation of continuous space into (regular or
irregular) cells• Mapping each unit to a non-spatial attribute value
• Lattice Statistics:• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is a set of cells in areal data
• What is Lattice Statistics used for?• Explore spatial patterns in areal maps• Model areal maps for interpretation
16http://brilliantmaps.com
![Page 17: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/17.jpg)
17
Electionresultbycountyin2016
Source:http://brilliantmaps.com
Q1:Isthemapspatiallystationary?Q2:Istherestrongautocorrelationgloballyandlocally?Q3:Howtomodelandinterpretcoefficientsthatimpactresults?
![Page 18: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/18.jpg)
W-Matrix• Spatial neighborhood matrix• Wij > 0 when i and j are neighbors• Wij = 0 when i and j are not neighbors
• Example
18
14 7
5 82
3 6
Anarealdatawith8units1
4 7
5 82
3 6
14 7
5 82
3 6
0 1 0 1 1 0 0 01 0 1 0 1 0 0 00 1 0 0 0 1 0 01 0 0 0 1 0 1 01 1 0 1 0 0 0 10 0 1 0 1 0 0 10 0 0 1 0 0 0 10 0 0 0 1 1 1 0
Arookneighborhood
Aqueenneighborhood
0 1 0 1 1 0 0 01 0 1 0 1 1 0 00 1 0 0 1 1 0 01 0 0 0 1 0 1 11 1 1 1 0 1 1 10 1 1 0 1 0 0 10 0 0 1 1 0 0 10 0 0 1 1 0 1 0
![Page 19: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/19.jpg)
Spatial Autocorrelation• Measures the level of global spatial association• Moran’s I:• 𝐼 =
- ∑ ∑ xyz({ye{|)({ze{|)�z
�y
(∑ xyz) ∑ ({ye{|)~�y
�y�z
where 𝑖and 𝑗are locations.
• 𝐼 ∈ [−1, 1], high value shows strong spatial association• Example with rook neighborhood:
19
0 1 0 1 01 0 1 0 10 1 0 1 01 0 1 0 10 1 0 1 0
𝐼 ≈ −1
1 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 0
𝐼 ≈ 1
![Page 20: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/20.jpg)
Spatial Autocorrelation• Geary’s C• 𝐶 =
(-e)) ∑ ∑ xyz({ye{z)~�z
�y
*(∑ xyz) ∑ ({ye{|)~�y
�y�z
where 𝑖and 𝑖are locations.
• 𝐶 ≥ 0, low values show strong spatial association• Example with rook neighborhood:
20
0 1 0 1 01 0 1 0 10 1 0 1 01 0 1 0 10 1 0 1 0
𝐶 =?
1 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 0
𝐶 =?
WhichonehashigherC?
![Page 21: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/21.jpg)
Local Spatial Autocorrelation• Local indicator of spatial association (LISA)• When data is not homogeneous, local behaviors
may differ from global behavior (outliers)• Local Moran’s I: • 𝐼. =
{ye{|�~
∑ 𝑤.a(𝑌a−𝑌|)�a where 𝑚* =
∑ ({ye{|)~�y-
• 𝐼 = )-∑ 𝐼.�.
• Local Geary’s C• 𝐶. =
)�~∑ 𝑤.a(𝑌. − 𝑌a)*�a
• 𝐶 ∝ ∑ 𝐶.�.
21
![Page 22: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/22.jpg)
Spatial Autocorrelation for Nominal Data
• Black-Black Joint Count
22
W B W B WB W B W BW B W B WB W B W BW B W B W
B B B W WB B B W WB B B W WB B B W WB B B W W
Suppose𝑛 locations,𝑛� white,𝑛�black.
𝑃� =-�-,𝑃� = -�
-
𝐽𝐶�� =)*∑ ∑ 𝑤.a𝐼(𝑦. = 𝐵, 𝑦a = 𝐵)�
a�. ,
Test:����e�(����)�𝐸(𝐽𝐶��) =
)*∑ ∑ 𝑤.a�
a�. 𝑃�𝑃�,𝑉𝑎𝑟(𝐽𝐶��) = 𝜎* assumingGaussiandistribution
𝐸(𝐽𝐶��) and𝑉𝑎𝑟(𝐽𝐶��) canalsobegeneratedfromrandompermutation!
H0:BandWareindependentlydistributed,𝐽𝐶�� asymptoticallynormaldistributionH1:BandWarenotindependent,BBtendstocluster.
![Page 23: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/23.jpg)
Markov Random Field• Problem: How to model joint distribution of a field?• Brook’s Lemma: • Joint distribution can be determined by conditional
distribution• 𝑝 𝑦), 𝑦*, … , 𝑦- ⟸ 𝑝 𝑦.|𝑦a, 𝑗 ≠ 𝑖• However, conditional distribution above is too complex!
• Markov property:• 𝑝 𝑦.|𝑦a, 𝑗 ≠ 𝑖 ≡ 𝑝 𝑦.|𝑦a, 𝑗 ∈ 𝑁(𝑖)• Conditional distribution of observation at a location only
depends on its neighbors
23
![Page 24: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/24.jpg)
Spatial Autoregressive Model (SAR)• 𝑌 = 𝜌𝑊𝑌 + 𝑋𝛽 + 𝜖
24
= × +
𝒀 𝑿 𝛽 𝝐
× +×
𝜌 𝑊 𝒀
autoregressiveterm covariates independentnoise
![Page 25: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/25.jpg)
1854BroadStreetcholeraoutbreak(Solidblackrectanglesshowvictims)
Part III: Spatial Point Process• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare event locations, fixed event type
25
![Page 26: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/26.jpg)
1854BroadStreetcholeraoutbreak(Solidblackrectanglesshowvictims)
Part III: Spatial Point Process• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare event locations, fixed event type
26
![Page 27: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/27.jpg)
Spatial Point Process• Example• Crime event locations• Disease event locations
• Questions to answer• Are points tend to cluster/de-cluster?• Is point intensity homogeneous or there is a hotspot?
27
Shooting,Chicago2010Source:http://assets.dnainfo.com
K-functionstatisticsSpatialscanstatistics
&
'
%
$
What’s Special About Spatial Data Mining?
Clustering
? Clustering: Find groups of tuples
? Statistical Significance
• Complete spatial randomness, cluster, and decluster
Figure 9: Inputs: Complete Spatial Random (CSR), Cluster, and Decluster
Figure 10: Classical Clustering
Data is of Complete
Spatial Randomness
3: Mean Dense
1: Unusually Dense 2: Desnse
4: Sparse
33
4
3
2
1 2
3
3
2
3
2
2
1Data is of Decluster Pattern
Figure 11: Spatial Clustering
29
completespatialrandom(CSR)
clustering declustering pp
1 pp
CSR Hotspot
![Page 28: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/28.jpg)
Homogeneous Poisson Point Process• Complete spatial randomness (CSR)• Intensity parameter 𝜆, 𝑁(𝐴) as number of points in an area 𝐴• For any 𝐴, 𝑁(𝐴)follows Poisson(𝜆𝐴)• For any disjoint 𝐴) and 𝐴*, 𝑁(𝐴))and 𝑁(𝐴*)are
independent• In any area A, conditioned on 𝑁 𝐴 = 𝑛, these 𝑛 points
independent and uniformly distributed in 𝐴
28
pp1
What can we say about CSR?• The intensity of points are the same everywhere• Once 𝑁 𝐴 is realized, point locations are independent• CSR is often used as a null distribution
![Page 29: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/29.jpg)
Ripley’s K Function• Test if points tend to cluster with each other (micro)• Hypothesis testing• H0: homogeneous Poisson point process (independent)• H1: points tend to cluster with each other• Test statistic:
• 𝐾 𝑑 = 𝜆e)𝐸(#𝑜𝑓𝑝𝑜𝑖𝑛𝑡𝑠𝑤𝑖𝑡ℎ𝑖𝑛𝑟𝑎𝑑𝑖𝑢𝑠𝑑𝑜𝑓𝑎𝑝𝑜𝑖𝑛𝑡)• 𝐾� 𝑑 = 𝜆e) ∑ 𝐼(𝑑.a ≤ 𝑑)/𝑛�
. a• Under H0, 𝐾 𝑑 = 𝜋𝑑*
29
&
'
%
$
What’s Special About Spatial Data Mining?
Clustering
? Clustering: Find groups of tuples
? Statistical Significance
• Complete spatial randomness, cluster, and decluster
Figure 9: Inputs: Complete Spatial Random (CSR), Cluster, and Decluster
Figure 10: Classical Clustering
Data is of Complete
Spatial Randomness
3: Mean Dense
1: Unusually Dense 2: Desnse
4: Sparse
33
4
3
2
1 2
3
3
2
3
2
2
1Data is of Decluster Pattern
Figure 11: Spatial Clustering
29
CSR Clustering Declustering ExampleofKfunctionplot
CSR
clustering
declustering
![Page 30: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/30.jpg)
Ripley’s Cross K Function• Test if of two types of events tend to cluster together• H0: event types i and j are independent• H1: event types i and j tend to cluster together• Test statistic:
• 𝐾.a 𝑑 = 𝜆ae)𝐸(#𝑜𝑓𝑝𝑜𝑖𝑛𝑡𝑠𝑜𝑓𝑡𝑦𝑝𝑒𝑗𝑤𝑖𝑡ℎ𝑖𝑛𝑑𝑜𝑓𝑎𝑝𝑜𝑖𝑛𝑡𝑖)
• 𝐾.a¢ 𝑑 = 𝜆ae) ∑ 𝐼(𝑑.a ≤ 𝑑)/𝑛.�
. a = (𝜆.𝜆a𝐴)e) ∑ 𝐼(𝑑.a ≤ 𝑑)�. a
• Under H0, 𝐾.a 𝑑 = 𝜋𝑑*
30
&
'
%
$
What’s Special About Spatial Data Mining?
Illustration of Cross-Correlation
? Illustration of Cross K-Function for Example Data
0 2 4 6 8 100
200
400
600
800
1000
Distance h
Cro
ss−K
func
tion
Cross−K function of pairs of spatial features
y=pi*h2o and *x and +* and x* and +
Figure 6: Cross K-function for Example Data
23
&
'
%
$
What’s Special About Spatial Data Mining?
Cross-Correlation
? Cross K-Function Definition
• K
ij
(h) = ∏
°1j
E [number of type j event within distance h
of a randomly chosen type i event]
• Cross K-function of some pair of spatial feature types
• Example
– Which pairs are frequently co-located?
– Statistical significance
0 10 20 30 40 50 60 70 800
10
20
30
40
50
60
70
80Co−location Patterns − Sample Data
X
Y
Figure 5: Example Data (o and * ; x and +)
22
![Page 31: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/31.jpg)
Spatial Scan Statistics• Test if point intensity is homogeneous everywhere• Hypothesis testing (assume Poisson point process)• H0: homogeneous intensity 𝜆.- = 𝜆£¤¥ for window W• H1: inhomogeneous intensity 𝜆.- > 𝜆£¤¥ for window W• Test statistic (likelihood ratio)
• 𝐿𝑅 = §¨©ª.«¬.®££¯(°¨¥¨|±²)§¨©ª.«¬.®££¯(°¨¥¨|±³)
=Sup�,·y¸¹·º»¼L(°¨¥¨;�,¿y¸,¿º»¼)Sup·y¸À·º»¼L(°¨¥¨;�,¿y¸,¿º»¼)
• 𝐿𝑅 = Sup�-���
-� -�Á
��Á
-�Á
𝐼(. )
• Significance • P-value, Monte Carlo simulation
31
pp1 pp
CSR Hotspot
𝑛�:observed#inW𝐸�:expected#inW
𝑛�/ :observed#outofW𝐸�/ :expected#outofW
![Page 32: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f976f5fec6fec41746b1c63/html5/thumbnails/32.jpg)
Spatial Scan Statistics: An Example• Input: 13 points• Compute test statistic:
• 𝐿𝑅 = Sup�-���
-� -�Á
��Á
-�Á
𝐼(. )• 𝑛� = 7 (observed # in W)• 𝑛�/ = 6 (observed # out of W)• 𝐸� = )+
)Å ∗ 1 (expected # in W)• 𝐸�/ = )+
)Å ∗ 15 (expected # out of W)• 𝐼 . = 1 (density in W is higher than out)
• 𝐿𝑅 = ( Ç)+/)Å
)Ç( Å)+∗)È/)Å
)Å= 56115.15• Monte Carlo simulation
32
An Example
Scanning window Z , outside window as Z 0
LR =supZ2Z,p>q L(Z , p, q)
supp=qL(Z , p, q)= sup
Z2Z(nZBZ
)nZ (nZ 0
BZ 0)nZ 0 I (·)
nZ = 7 and nZ 0 = 6, |Z | = 1, |Z 0| = 16, BZ = 13/16⇥ 1 = 0.8,BZ 0 = 13/16⇥ 15 = 12.2, so LR = 56115.15
Z Z’
Figure: Illustrative exampleZhe Jiang (University of Alabama) Group Seminar Slides September 22, 2016 8 / 9
EnumeratingwindowWwithsize1
Studyareasize4*4=16