l1 spatial data - uantwerpen

92
Spatial issues in data analysis and model building: distance, scale and complexity. Isabelle THOMAS Francqui Chair March 11 th 2015

Upload: others

Post on 22-Mar-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: L1 Spatial data - UAntwerpen

Spatial issues in data analysis and model building:

distance, scale and complexity.

Isabelle THOMAS Francqui Chair

March 11th 2015

Page 2: L1 Spatial data - UAntwerpen

Spatial analysis

• Visualization Showing interesting patterns (Maps)

• Exploratory Spatial Data Analysis (ESDA) Finding interesting patterns

• Spatial modelling (regression, …) Explaining interesting patterns

Spatial is special

INTRODUCTION Distance Scale Complexity Accidents Conclusions

BAD NEWS

GOOD NEWS

Page 3: L1 Spatial data - UAntwerpen

ESDA DESCRIPTION

Spatial STATISTICS

Statistical MAPS

Modeling Spatial statistical

analysis and hypothesis testing

(Spatial) modeling and prediction

LEVEL OF DIFFICULTY

INTRODUCTION Distance Scale Complexity Accidents Conclusions

Page 4: L1 Spatial data - UAntwerpen

DISTANCE

DISTANCE Adjacency, interaction, and neighborhoods SCALE MAUP, spatial autocorrelation, ecology fallacy, edge/border effect

Page 5: L1 Spatial data - UAntwerpen

Why is distance so important ? (1)

Price of land

Quantity of Land

Towards downtown

Towards the periphery

Q1 Q2 Q3

P1

P2

P3

Distance to CBD

High densities ----------------------------------------------------Low densities

The core of (transport) geography Enters most models, many indices

Page 6: L1 Spatial data - UAntwerpen

LOCATION

Absolute Latitude,

longitude; an address

Relative Distance,

directions to other places

Distance

Adjacency

Neighbourhood

Interaction

Why distance so important ? (2)

Introduction DISTANCE Scale Complexity Accidents Conclusions

Presenter
Presentation Notes
Page 7: L1 Spatial data - UAntwerpen

B

C

A

E

F

D

Adjacency Distance Interaction Neighboorhood

Adjacency matrix (or adjacency list)

Introduction DISTANCE Scale Complexity Accidents Conclusions

Page 8: L1 Spatial data - UAntwerpen

i and j are adjacent - if they share a common boundary - Share = ? - if they are within a specified distance (buffer - neighbourhood) Binary or distance-based weights.

Order of adjacency.

Introduction DISTANCE Scale Complexity Accidents Conclusions

Presenter
Presentation Notes
.
Page 9: L1 Spatial data - UAntwerpen

Rook Queen

Brig

gs H

enan

Uni

vers

ity 2

012

9

1st order

2nd order

Introduction DISTANCE Scale Complexity Accidents Conclusions

Page 10: L1 Spatial data - UAntwerpen

B

C

A

E

F

D

66

24

41

68

68

Adjacency Distance Interaction Neighboorhood

Introduction DISTANCE Scale Complexity Accidents Conclusions

Page 11: L1 Spatial data - UAntwerpen

– dij measures the separation between i and j – (mathematical) definition:

• dij>0 if i≠j (distinction/separation) • dij=0 if i=j (co-location/equivalence)

Diagonal of the adjacency matrix

• dij+djk≥dik (triangle inequality) • dij=dji symmetry (is the graph symmetric ?)

Measuring distance is not simple …

In spatial analysis Objects may not be truly point-like/distinct Triangle inequality may not hold Symmetry condition may not hold

Introduction DISTANCE Scale Complexity Accidents Conclusions

Page 12: L1 Spatial data - UAntwerpen

ww

w.s

patia

lana

lysis

onlin

e.co

m

Terrain distances – cross section view

Measuring distance is not simple …

Introduction DISTANCE Scale Complexity Accidents Conclusions

Page 13: L1 Spatial data - UAntwerpen

ww

w.s

patia

lana

lysis

onlin

e.co

m

13

NB.- Spherical coordinates – spherical /ellipsoidal computations • Metrics

( ) ( )

2,

2:

coscossinsinsin2 221

jiji

jiij

BAwhere

BARd

λλφφ

φφ

−=

−=

+= −

Measuring distance • lp metrics

p = 1 Manhattan; p = 2 Euclidean; ...

Introduction DISTANCE Scale Complexity Accidents Conclusions

Page 14: L1 Spatial data - UAntwerpen

B

C

A

E

F

D

Distance Adjacency Interaction Neighboorhood

Introduction DISTANCE Scale Complexity Accidents Conclusions

Page 15: L1 Spatial data - UAntwerpen

ww

w.s

patia

lana

lysis

onlin

e.co

m

Distance decay models – Simple inverse power models

– Trip distribution models

– Statistical modelling

0,})({

≥= ββij

ij

d

zfz

)( ijjijiij dfDOBAT =

Introduction DISTANCE Scale Complexity Accidents Conclusions

Page 16: L1 Spatial data - UAntwerpen

? B

C

A

E

F

D

Adjacency Distance Interaction Neighboorhood

Page 17: L1 Spatial data - UAntwerpen

Sour

ce :

Ovt

rach

t, 20

14

Introduction DISTANCE Scale Complexity Accidents Conclusions

Page 18: L1 Spatial data - UAntwerpen

http

://w

ww

.col

orad

o.ed

u/ge

ogra

phy/

Introduction DISTANCE Scale Complexity Accidents Conclusions

Page 19: L1 Spatial data - UAntwerpen

http

://w

ww

.col

orad

o.ed

u/ge

ogra

phy/

Introduction DISTANCE Scale Complexity Accidents Conclusions

Page 20: L1 Spatial data - UAntwerpen

j

1 2

4 3

Errors A : d(2,j) < d(i,j) < d(5;j) B : d(1,i) = 0 C : i can be allocated to j while closer to j ’

5 i

Aggregation decreases – data collection costs – modeling costs – computing costs – confidentiality concerns – data statistical

uncertainty (smaller sample deviations for larger samples)

Increases – modeling errors/biases

Distance – agregation & scale

Introduction Distance SCALE Complexity Accidents Conclusions

Page 21: L1 Spatial data - UAntwerpen

SCALE

Page 22: L1 Spatial data - UAntwerpen

LOCATION

Don’t forget the essence of your problem

SITE

SITUATION

SOCIOECONOMIC ENVIRONMENT

Land, transportation, amenities, …

Labor, materials, energy, …

Capital, subsidies, regulations, …

MACRO (national)

MICRO (local)

MESO (regional)

SCALE

Page 23: L1 Spatial data - UAntwerpen

SCALE: cartographically

Large cartographic scale Small cartographic scale

Sour

ce :

Topo

map

vie

wer

. N

GI/

ING

Statistical sectors Communes, provinces, …

Introduction Distance SCALE Complexity Accidents Conclusions

Page 24: L1 Spatial data - UAntwerpen

Extent constant, different grain

Increasing extent, grain constant

• Extent: spatial dimension

of an object (or process) observed/analyzed

• Grain (BSU): level of spatial resolution at which an object (or process) is measured/observed.

SCALE: 2 aspects

Page 25: L1 Spatial data - UAntwerpen

Source « INS »

Aute

urs :

Lar

ielle

et T

hom

as, 2

014

Land rent

(by sq m) 2013

25

SCALE: Extent

Page 26: L1 Spatial data - UAntwerpen

Results obtained at one scale do not necessarily apply at other scales. A pattern may be clustered at one scale but dispersed at another scale

Brig

gs H

enan

Uni

vers

ity 2

012

Population clustered into cities

City populations are dispersed

Scale is always important in spatial analysis!

SCALE: Extent

Introduction Distance SCALE Complexity Accidents Conclusions

Page 27: L1 Spatial data - UAntwerpen

1. Patterns are dependent upon the scale of observation 2. The importance of explanatory variables changes with scale. 3. Statistical relationships may change with scale. 4. Patterns are generated by processes acting over various

spatial (and temporal) scales.

No unique solution Nested models, power laws, fractals, networks, …

Why being concerned about scale?

Page 28: L1 Spatial data - UAntwerpen

Power laws • Summarize how relationships

change with changes in scale • Often expressed on a log-log

plot. • Y = constant (X)n

• Similar slopes are thought to have similar structuring processes (n = slope)

• Example • Species-area relationships

! However : power laws often lack an explanatory process

Page 29: L1 Spatial data - UAntwerpen

• The same pattern appears across all scales. It is scale invariant.

• The relationship between size of box and pattern in it is constant.

• Fractals follow their own power law relating how number of boxes needed to cover a shape change in relation to their size.

Fractals

Introduction Distance SCALE Complexity Accidents Conclusions

Page 30: L1 Spatial data - UAntwerpen

• Can represent relationships at a variety of scales at once.

• Structural properties of networks provide means of understanding how they work. – Nodes and links – Degree centrality and

betweeness – Weak versus strong links – Directional versus non-

directional graphs

Networks

Introduction Distance SCALE Complexity Accidents Conclusions

Page 31: L1 Spatial data - UAntwerpen

1. Modifiable Areal Unit Problem (MAUP) 2. Ecology fallacy, 3. Edge/border effect 4. Spatial autocorrelation, (…)

Fallacies of scale

Introduction Distance SCALE Complexity Accidents Conclusions

Page 32: L1 Spatial data - UAntwerpen

1. Modifiable Areal Unit Problem (MAUP)

Introduction Distance SCALE Complexity Accidents Conclusions

Page 33: L1 Spatial data - UAntwerpen

Ecological fallacy: making claims about local-scale phenomena based on broad-scale observations Individualistic fallacy: making claims about broad scale phenomena based on observations conducted at small, local scales

2. Ecological fallacy

Do not generalise conclusions at other scales

Page 34: L1 Spatial data - UAntwerpen

Points close to the border are closer to locations out of the studied area. Arises when an artificial boundary is imposed on a study, often just to keep it manageable. Biases > nearest-neighbor distances > (model results) ? How to consider “the rest of the world”.

3. Edge/Border effects Solution:

Page 35: L1 Spatial data - UAntwerpen

1)Biased parameter estimates 2)Data redundancy (affecting the calculation of confidence intervals) 3)Moran and Geary

4. Spatial autocorrelation (1)

Page 36: L1 Spatial data - UAntwerpen

ww

w.s

patia

lana

lysis

onlin

e.co

m

Coefficient – Coordinate (x,y,Z) – Spatial weights matrix (binary or other), W={wij} – Coefficient formulation – desirable properties

• Reflects co-variation patterns • Reflects adjacency patterns via weights matrix • Normalised for absolute cell values • Normalised for data variation • Adjusts for number of included cells in totals

4. Spatial autocorrelation (2)

Introduction Distance SCALE Complexity Accidents Conclusions

Page 37: L1 Spatial data - UAntwerpen

ww

w.spatialanalysisonline.com

• Moran’s I

• Modification for point data • Replace weights matrix with distance bands, width h • Pre-normalise z values by subtracting means • Count number of other points in each band, N(h)

∑∑∑∑∑

=−

−−

=i j

ij

ii

i jjiij

nwpzz

zzzzw

pI / where,

)(

))((1

2

∑∑∑

=

ii

i jji

z

zz

hNhI2

)()(

4. Spatial autocorrelation (3)

Introduction Distance SCALE Complexity Accidents Conclusions

Page 38: L1 Spatial data - UAntwerpen

Extending SA concepts – Distance formula weights vs bands – Lattice models with more complex

neighbourhoods and lag models (GeoDa) – Disaggregation of SA index computations (row-

wise) with/without row standardisation (LISA) – Significance testing

• Normal model • Randomisation models • Bonferroni/other corrections

4. Spatial autocorrelation (4)

Introduction Distance SCALE Complexity Accidents Conclusions

Page 39: L1 Spatial data - UAntwerpen

ww

w.s

patia

lana

lysis

onlin

e.co

m

Moran I Correlogram

Source data points Lag distance bands, h Correlogram

4. Spatial autocorrelation (5)

Introduction Distance SCALE Complexity Accidents Conclusions

Page 40: L1 Spatial data - UAntwerpen

• Underlying socio-economic process has led to clustered distribution of variable values – Grouping, Spatial interaction – Diffusion, Dispersal – Spatial hierarchies

• Mis-match betw. process and spatial units

– Counties vs retail trade zones – Census block groups vs neighborhood networks

4. Spatial autocorrelation (6) Causes of spatial dependence / Interpretation

What is Spatial autocorrelation D. Griffith, 1992 – L’Esp. Géo.

Page 41: L1 Spatial data - UAntwerpen

Explore the data

Fit an OLS

model

Perform diagnosis

Run adapted model

(ex GWR)

Compare models

EDA ESDA

Global autocorrelation Local autocorrelation

Global model Local model

RESULTS DECISION

Hypo theses

Introduction Distance SCALE COMPLEXITY Accidents Conclusions

Page 42: L1 Spatial data - UAntwerpen

Start with OLS and look for

– Positive spatial autocorrelation > dependence between samples exists

– Datasets often non-Normal >> transformations may be required (Log, Box-Cox, Logistic)

– Samples are often clustered >> spatial declustering may be required

– Heteroskedasticity is common (iid) – Spatial coordinates (x,y) may form part of the

modelling process

ww

w.s

patia

lana

lysis

onlin

e.co

m

Introduction Distance SCALE Complexity Accidents Conclusions

Page 43: L1 Spatial data - UAntwerpen

Type of spatial effect > Remedies – Spatial heterogeneity (Koenker-Bassett test)

• Include covariate which accounts for heterogeneity? • Split region?

– Spatial autocorrelation (Lagrange Multiplier tests) • Identify missing variables? • Explore effects of spatially-lagged independent variables? • Use appropriate spatial regression model?

Regression models

ww

w.s

patia

lana

lysis

onlin

e.co

m

Introduction Distance SCALE COMPLEXITY Accidents Conclusions

Page 44: L1 Spatial data - UAntwerpen

• Identify the source (LM tests will help) – Regression residuals (LM-Error)

• Mismatch of process and spatial units => systematic errors, correlated across spatial units

– Dependent variable (LM-Lag) • Underlying socio-economic process has led to clustered

distribution of variable values => influence of neighboring values on unit values

Regression models

ww

w.s

patia

lana

lysis

onlin

e.co

m

LARGE number of solutions : Spatial autoregressive process (SAR) Spatial moving average process (SMA), …

Page 45: L1 Spatial data - UAntwerpen

COMPLEXITY or COMPLICATION ?

Introduction Distance Scale COMPLEXITY Accidents Conclusions

Page 46: L1 Spatial data - UAntwerpen

• Algorithmic complexity • Deterministic complexity • Aggregate complexity Key generic properties 1. Nonlinear relationships 2. Techniques such as artificial intelligence 3. Emerges form relatively simple interactions System change and evolve

Complexity is hard to define

M

anso

n, 2

001

- R

. Mar

tin a

nd S

unle

y.

Page 47: L1 Spatial data - UAntwerpen

Property Attributes

Has a distributed nature & representation Multiscalar.

Openness Open system

Non-linear dynamics Path dependence.

Limited functional decomposability

Emergence and self-organisation Emergence

Adaptive behaviour and adaptation Self organization

Non deterministic and non tractability Stochastic

Vocabulary about complexity

M

anso

n, 2

001

- R

. Mar

int a

nd S

unle

y.

Page 48: L1 Spatial data - UAntwerpen

SYSTEM ANALYSIS

MIT, Jay Forrester (6’), Bertalanffy (67) General system

theorySystem’s autonomy

SELFORGANIZATION Prigogine, Haken (1970-80)

Open systems, dissipative structures, impredictible effects of

non linear micro-interactions on system’s macro structure and dynamics, path dependence

(irreversibility)

COMPLEX SYSTEMS Santa Fe Institute,

ISI, ECSS (1990-2000)

Emerging properties

Models: Multi-Agents-Systems

Models: differential equations

Page 49: L1 Spatial data - UAntwerpen

Urban systems are complex systems • Urban systems are produced by social interactions (conveying

information), according to their range in space and duration in time

• Non-linear interaction occur at micro, meso or macro levels, and between levels

• Emergence of collective properties within cities: • Hierarchical organisation (« cities as systems within systems of cities »

Reynaud, 1841, Berry, 1964, Pred, 1977) • Urban « memory » (dynamic path dependence) as a constraint on

urban dynamics at both levels

Page 50: L1 Spatial data - UAntwerpen

PLACE(S)(Environment)

Road(s) PEOPLE (Roadusers):

(x, y, t)

t-1

t

t+1

VEHICLE(S)

INTERACTIONS

From facts … to geography

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Page 51: L1 Spatial data - UAntwerpen

Multi-level problem

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Page 52: L1 Spatial data - UAntwerpen

Explore the data

Fit an OLS

model

Perform diagnosis

Run adapted model

(ex GWR)

Compare models

EDA ESDA

Global autocorrelation Local autocorrelation

Global model Local model

Page 53: L1 Spatial data - UAntwerpen

Step 1: EDA Select variable and describe

Univariate

Bi- and multi- variate

Visualizations

Tables, Charts, Plots, autocorr, hot spot

Maps

Step 2 : ESDA

Test spatial homogeneity

Spatial weights

Global & Local spatial autocorrelation

Page 54: L1 Spatial data - UAntwerpen

• Point pattern analysis Describing a point pattern. Black spots, black zones

- Density-based point pattern measures - Distance-based point pattern measures

Assessing point patterns statistically • Aggregation - Segments of road - Communes (stat sectors) • Explanation/prediction - Measuring and modeling numbers/risk

5.1

Poin

t pat

tern

ana

lyse

s

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Page 55: L1 Spatial data - UAntwerpen

Pinpoint location (point) Black spot Black road segment (line) Black « region » (polygon) Multi- scale, dimensional, disciplinary, causal analysis. Necessity: to isolate, to control for in order to avoid badly specified models.

Describe / Understand / Explain / predict + ACT (Engineering, Enforcement, Education, Environment)

5.1

Poin

t pat

tern

ana

lyse

s

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Page 56: L1 Spatial data - UAntwerpen

Poisson or not ?

• Poisson > Binomial • Aggregation effects • Length of segments

Sour

ce :

Thom

as, 1

996

5.1

Poin

t pat

tern

ana

lyse

s

Page 57: L1 Spatial data - UAntwerpen

Sour

ce :

Flah

aut,

2002

Road accidents N29 Charleroi-Jodoigne

Moran for black segments

5.1

Poin

t pat

tern

ana

lyse

s

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Page 58: L1 Spatial data - UAntwerpen

5. A

CC

IDEN

TS D

E LA

RO

UTE

5.

1 Po

int p

atte

rn a

naly

ses

Page 59: L1 Spatial data - UAntwerpen

Sour

ce: E

ckha

rt, 2

002

5.1

Poin

t pat

tern

ana

lyse

s

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Page 60: L1 Spatial data - UAntwerpen

Kernel

Sour

ce: S

teen

berg

hen,

Def

ays,

Tho

mas

, Fla

haut

, 201

0

5.1

Poin

t pat

tern

ana

lyse

s

Page 61: L1 Spatial data - UAntwerpen

Mechelen

Sour

ce: S

teen

berg

hen,

Def

ays,

Tho

mas

, Fla

haut

, 201

0

5.1

Poin

t pat

tern

ana

lyse

s

Page 62: L1 Spatial data - UAntwerpen

Infrastructure &

Environnement

Yi = 1 if hm belongs to a « black segment ».

Yi = 0 otherwise

Xi

Characteristics of the road - Usage - Physical properties - Environment (landuse, …)

(Official data; Numerical Digital Terrain Model; IGN maps)

Logistic regression 5.2

Mod

el fo

r i =

hec

otm

ers

Sour

ce :

Flah

aut,

2004

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Page 63: L1 Spatial data - UAntwerpen

N 0 250m

5. A

CC

IDEN

TS D

E LA

RO

UTE

5.

2 M

odel

for i

= h

ecot

mer

s

Sour

ce :

Flah

aut,

2004

Page 64: L1 Spatial data - UAntwerpen

5. A

CC

IDEN

TS D

E LA

RO

UTE

Sour

ce :

Flah

aut,

2004

5.2

Mod

el fo

r i =

hec

otm

ers

Page 65: L1 Spatial data - UAntwerpen

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

5.3

Mod

el fo

r i =

com

mun

es

Objective : explain variations in Y Controlling spatial biases

Page 66: L1 Spatial data - UAntwerpen

5.3

Mod

el fo

r i =

com

mun

es

EXPLORATORY

Identify potential explanatory factors

Statistical tools: • Graphics, (basic statistics) • Cluster analyses, (PCA) • Correlations (x,y)

STATISTICAL MODELLING

Relative importance of variables?

Statistical tools • Statistical models • Corrections for

multicollinearity & spatial effects

2 steps

Factor X ?

Factor 1

Factor 2

?

Page 67: L1 Spatial data - UAntwerpen

town

village

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

% cycling

Distance (km)

H1

H2

H3

H5

H8

10 km

• Commuting distances (< 10 km) • Town size: regional towns > large towns • Regional differences (culture + …)

Exploratory step

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

5.3

Mod

el fo

r i =

com

mun

es

Page 68: L1 Spatial data - UAntwerpen

5. A

CC

IDEN

TS D

E LA

RO

UTE

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

Exploratory step 5.

3 M

odel

for i

= c

omm

unes

Page 69: L1 Spatial data - UAntwerpen

Unsatisfaction of cycleways: –0.82

Slopes: –0.77 Bad health: – 0.58

ρxy = 1

(correlation)

Active people < 25 years: 0.54

Accident risk: – 0.32

Job density: 0.38

No child, town size: 0.23

ρxy = 0

ρxy = –1

Commuting distances (km)

Average slopes (d°)

Commuting distances: – 0.54

Page 70: L1 Spatial data - UAntwerpen

POLICY-RELATED FACTORS

ENVIRONMENTAL FACTORS

INDIVIDUAL FACTORS

- Income - Education - Gender - Age - Car availability - Young childrens/household

Socio-economic data (NIS)

- Subjective health

Health data (NIS)

- Slopes (d°)

Physical data (UCL)

- Air pollution (PM10)

Environmental data (IRCEL-CELINE)

- Accident risk: f (number of accidents, travel time)

Accident data (NIS)

- Land-use (e.g. urban) - City size - Job and pop. densities

Land-use data (UCL)

- Satisfaction of cycle paths - Traffic volume - Commuting distance (km)

Trip/local characteristics

BICYCLE USE

Scale : communes (INS 5)

Vandenbulcke et al Transportation Research Part A (2011)

Page 71: L1 Spatial data - UAntwerpen

SPATIAL AUTOREGRESSIVE

MODEL + REGIMES

Uncorrelated X

"White correction »

OLS (Ordinary-Least Squares )

Spatial autocorrelation (LM tests)

Structural instability (Chow tests)

Multicollinearity (VIF, …)

Heteroskedasticity (BP tests)

Spatial autoregressive model (spatial lag)

Inclusion of spatial regimes (ESDA)

111111 εβρ ++= XyWy

222222 εβρ ++= XyWy

εβρ ++= XWyy(Queenmatrix)

εβ += Xy5.

3 M

odel

for i

= c

omm

unes

Presenter
Presentation Notes
Page 72: L1 Spatial data - UAntwerpen

OLS Model (n = 589)

Italics: ln(x+1)

Y = % commuter cyclists in commune i

Estimation OLS (y)

Intercept 6,4124****

Median income 0,0030

Active men 0,0472****

Age 2 (45-54 years) -0,0460****

Young children -0,0567****

Cycleways unsatisfaction -0,0127****

Commuting distance -0,0114***

Air quality 0,0141****

City size -0,0954****

Bad health -0,0521****

Accident risk -0,1673**

Traffic volume 2 (municipal network) -0,9216****

Age 3 (> 54 years) -0,2054*

Education 3 (university degree) -0,4988****

Slopes -0,4873****

R-squared (R²) 0,879

Log Likelihood -102,43

Moran's I of residuals 0,34 (0,00)

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

Page 73: L1 Spatial data - UAntwerpen

Estimation OLS (y) ML (y)

Intercept 6,4124**** 3,2698****

Median income 0,0030 0,00852

Active men 0,0472**** 0,01673**

Age 2 (45-54 years) -0,0460**** -0,02505***

Young children -0,0567**** -0,0218****

Cycleways unsatisfaction -0,0127**** -0,0049****

Commuting distance -0,0114*** -0,00652**

Air quality 0,0141**** 0,00405

City size -0,0954**** -0,08747****

Bad health -0,0521**** -0,01889****

Accident risk -0,1673** -0,14495***

Traffic volume 2 (municipal network) -0,9216**** -0,46952****

Age 3 (> 54 years) -0,2054* -0,14503*

Education 3 (university degree) -0,4988**** -0,23034***

Slopes -0,4873**** -0,17630****

Lag coefficient (ρ) - 0,6015****

R-squared (R²) 0,879 -

Log Likelihood -102,43 33,68

Moran's I of residuals 0,34 (0,00) 0,01 (0,45)

Y = % commuter cyclists in commune i

OLS

LAG

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

SAR Model (LAG)

Page 74: L1 Spatial data - UAntwerpen

LAG

Residuals

OLS

Page 75: L1 Spatial data - UAntwerpen

Simpson’s paradox 5.

3 M

odel

for i

= c

omm

unes

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Page 76: L1 Spatial data - UAntwerpen

Spatial LAG model + Regimes N-S

North South

Intercept 2,3084* 4,30951****

Median income 0,0311* -0,0027

Active men 0,0296** 0,0008

Age 2 (45-54 years) -0,0417** -0,0205***

Young children -0,0365*** -0,0247***

Cycleways unsatisfaction -0,0052*** -0,0045***

Commuting distance -0,0165*** -0,0047*

Air quality 0,01384**** -0,0054

City size -0,11459**** -0,03615****

Bad health -0,0098 -0,0146**

Accident risk -0,76319**** -0,14892****

Traffic volume 2 (municipal network) -0,2357 -0,4521**

Age 3 (> 54 years) -0,1074 -0,0680

Education 3 (university degree) -0,0968 -0,3132***

Slopes -0,1931** -0,19718****

Lag coefficient (ρ) 0,5362****

N 589 (NNorth = 308; NSouth = 281)

Log Likelihood 93,923

Y = % commuter cyclists in commune i

North = Flanders South = Wallonia & Brussels

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

Page 77: L1 Spatial data - UAntwerpen

Main results

– Demographic factors: e.g. gender, children – Socio-economic: e.g. education – Environmental & policy-related factors, e.g.:

• Dissatisfaction with cycle facilities • Town size • Accident risk • Traffic volume

5.3

Mod

el fo

r i =

com

mun

es

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Page 78: L1 Spatial data - UAntwerpen

location 2 > location 1

Spatial factors?

Importance of space/location

Network location 1 Network location 2

Bicycle traffic =

? ?

? accident

street network

5.4

Mod

el fo

r i =

add

ress

es

Page 79: L1 Spatial data - UAntwerpen

• Binary Yi = 0,1 logistic specification

• Corrections for – Multicollinearity – Heteroskedasticity – Residual spatial autocorrelation

omitted variables? spatial models

• Spatial models (Bayesian framework) – ICAR model… but fit not improved – Hierarchical auto-logistic model

5. A

CC

IDEN

TS D

E LA

RO

UTE

5.

4 M

odel

for i

= a

ddre

sses

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Page 80: L1 Spatial data - UAntwerpen

Cases = accidents + Controls = generated absences yi = (0,1)

Regression methods (e.g. logistic models) Advantage: estimation of risk, reduced statistical bias Issues: no vehicle & human factors, selection of controls

Models based on case-controls?

Methodology

Regression methods (e.g. multinomial logit models) Issues: over-/under-dispersion, underreporting, etc.

Regression methods (e.g. logistic models) Main issue: bias in the selection of road trajectories

Case-control

strategy

Transportation (gravity-based

models)

Epidemiology (case-control

studies)

Ecology (generation of

controls)

Models based on surveys, road trajectories

Models based on accident-only data

Presenter
Presentation Notes
The first category of models are those that are based on …
Page 81: L1 Spatial data - UAntwerpen

Data collection

• Accident risk = time-consuming process – Accidents (cases) to be geocoded/located

– ‘Absences’ (controls) to be generated • … but no rigorous sampling method tricky and questionable results!

– Road network exclude ‘unbikeable’ links

– Risk factors to be collected…

• Software requirements: GIS 4.4

Mod

el fo

r i =

add

ress

es

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Page 82: L1 Spatial data - UAntwerpen

• Controls = locations without any accident (officially) supposed to be safe

• Generation of controls = random sampling of points along the road network, BUT:

Proportional to bicycle traffic (stratified sampling) Exclude ‘black zones’ (hot spots of accidents) from the

bikeable network

Black zones

Data collection: controls and absences

Page 83: L1 Spatial data - UAntwerpen

1) Negative exponential function

2) 500 impedance functions 3) No edge effect

Stratified random sampling

Potential bicycle traffic

111111

Black spots (network kernel densities)

Sa

mp

ling

inte

nsi

ty

Sa

mp

ling

re

gio

n

111111

Ncontrols = 4*Naccidents

Page 84: L1 Spatial data - UAntwerpen

Data collection: risk factors Infrastructure factors • Cycling facilities & contraflow cycling • Discontinuities • Parking areas & garages • Bridge & funnels • Crossroads & complexity • Tram railways • Traffic-calming areas • Major roads • Proximity city centre • Distance to specific points of interest (e.g. schools, bus stops, etc.)

Traffic conditions • Cars • Trucks/lorries & buses • Vans

Environmental factors • Gradients • Green blocks (parks, etc.)

5. A

CC

IDEN

TS D

E LA

RO

UTE

4.

4 M

odel

for i

= a

ddre

sses

Page 85: L1 Spatial data - UAntwerpen

• Advantage of GIS: combination of several datasets

• Accidents/controls – ‘Attached’ variables – ‘Crossings’

Data collection: risk factors

Page 86: L1 Spatial data - UAntwerpen

DATASET

Results: Modelling process

DEPENDENT VARIABLE (BINARY) Accident data (geocoded)

Controls/absences

INDEPENDENT VARIABLES (RISK FACTORS)

Infrastructure factors

Traffic conditions

Environment (physical)

MODELLING PROCESS

FINAL MODEL

Choice of the specification

Convergence diagnostics

Corrections for spatial effects

PREDICTIONS

GIS

Page 87: L1 Spatial data - UAntwerpen

Results: robust

Page 88: L1 Spatial data - UAntwerpen

Results: Predictions for a trajectory

Schuman’s roundabout

Tram railways

High traffic

volume

Exit High traffic

volume

Succession of crossroads on a major road (Wetstraat/Rue de la Loi) + segregated cycling facility

End of a separated cycling facility at

the crossroad Residential ward

Residential ward + contraflow

Page 89: L1 Spatial data - UAntwerpen

Take home message

• Location(s) and distance (s) • Scale : independance of scales; nested. • COMPLEXITY of spatial processes • UNCERTAINTY

Introduction Distance Scale Complexity Accidents CONCLUSIONS

Page 90: L1 Spatial data - UAntwerpen

Spatial statistics Large data sets Spatial autocorrelation Scales Border/edge effects MAUP (scale + zoning) Heterogeneity …

SPACE BIASES

Introduction Distance Scale Complexity Accidents CONCLUSIONS

Page 91: L1 Spatial data - UAntwerpen

Readings

Data analysis • Fotheringham A., Brunsdon C. &Charlton M. (2000) Quantitative Geography Perspectives on Spatial Data Analysis, London, SAGE • Fotheringham A, C Brunsdon &M Charlton (2002) Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Chichester. • Bailey, T., & A. Gatrell. 1995. Interactive spatial data analysis. Essex, UK: Longman. • www.spatialanalysisonline.com Road accidents in Belgium • Thomas I. (1996), Spatial Data Aggregation. Exploratory Analysis of Road Accidents. AAP, 28:2, 251-264 • SteenberghenT. et al. (2004) Intra-urban location of road accidents blackzones: a Belgian example. IJGIS: 18,2, 169-181. • Vandenbulcke G., Thomas I., IntPanis L. (2014), Predicting cycling accident risk in Brussels: an innovative spatial case-control approach. AAP, 62, 341-357 • Vandenbulcke G.,. et al. (2011) Bicycle commuting in Belgium: Spatial determinants and re-cycling strategies, TR – A 45 118–137

Page 92: L1 Spatial data - UAntwerpen

Your exercice – 10 pages. Take your own data set (If you haven’t : go to Census11) and « PLAY » with them. Get 3 variables : Y (your choice) + 1 X « explanatory » + a measure of distance 1. Define/describe them very well; justify the scale (extent and grain) and its

limitations 2. EDA and ESDA + Statistical map of the 3 variables. Compute correlations between variables for several extents and/or 2 levels of aggregation and/or 2 subsets. 3. Compute simple OLS and map residuals (compute spatial autocorrelation) for both levels of aggregation. 4. If possible enhance regression by adopting other method f.i. correct for spatial autocorrelation. 5. Critical and strong conclusion (incl. potentials, challenges, …)