camila casquilho-resende, nick fishbane, seong-hwan jun...

1
Exploring Spatial and Temporal Heterogeneity of Environmental Noise in Toronto Camila Casquilho-Resende, Nick Fishbane, Seong-Hwan Jun, and Yunlong Nie Department of Statistics, The University of British Columbia Introduction Objective The objective is to analyze the variation of environmental noise across temporal and spatial dimensions. In the process, we identify site characteristics that contribute to the noise level in Toronto. Data Noise was measured at 10 sites throughout Toronto over a week (or more) at 30-minute intervals. Time series were recorded at various intervals from mid-June to early September, starting at different times in the week, with various amplitudes and noise levels. This can be seen by the time-series from three of the ten sites. The map shows the site locations of this case study, collected in three phases: Cycle1 (n = 241), Cycle1.rep (n = 100), and Cycle2 (n = 213). 30 40 50 60 70 80 Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Time (labels on midnight) LEQ Site 6 30 40 50 60 70 80 Wed Thu Fri Sat Sun Mon Tue Wed LEQ Site 3 30 40 50 60 70 80 Fri Sat Sun Mon Tue Wed Thu LEQ Site 9 43.60 43.65 43.70 43.75 43.80 43.85 -79.7 -79.6 -79.5 -79.4 -79.3 -79.2 -79.1 Longtitude Latitude 50 60 70 80 90 LEQ Data cycle1 cycle1.rep cycle2 Methods Overview I Temporal variation was modeled using a linear mixed effect (LME) model with harmonic and polynomial com- ponents I Importance of site characteristics assessed with random forest and Dirichlet process mixture models I Spatial variation of adjusted noise (LEQ) explored using residual kriging by assuming a Gaussian process with Mat´ ern covariance function I Aforementioned methods reapplied on test data Temporal Model We assume each site to be independent since we observe no relationship between distance and noise correlation between sites. Also, for each site we only consider the “time point” of the observations, which we define as the time of week (of which there are 48*7 = 336). The LME is as follows: LEQ it = b i + d 1 (t)+ d 2 (t)1 [tW ] + w (t)+ it b i ∼N (02 ) d j (t)= β 0j + β 1j sin 2πt 24 + β 2j cos 2πt 24 + β 3j sin 2πt 12 +β 4j sin 2πt 8 + β 5j cos 2πt 8 j =1, 2 w (t)= α 1 t + α 2 t 2 + α 3 t 3 The components (periods) of d j (t) were chosen based on a spectral analysis on all sites time-series individually. Then, the combination of components and the order of the polynomial are determined by significance and AIC. Finally, the choice of time point baseline reference (t =0 on Sunday midnight) is chosen by AIC. The resulting trend will come from the fixed effects which will look like d 1 (t)+ w (t) for weekdays and d 1 (t)+ d 2 (t)+ w (t) for weekends. Describing temporal variations The isolated harmonic functions d 1 (t) and d 1 (t)+ d 2 (t) are shown below as well as the overall weekly trend. We see that for weekdays, noise is fairly constant throughout work hours, rising and falling at both rush hours, with a peak at 3:30 PM; on weekends noise levels rise later in the day, and higher at that, peaking at 6:00 PM. 48 50 52 54 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 Time Fitted LEQ Weekend 48 50 52 54 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 Time Fitted LEQ Weekday -10 0 10 20 6 8 5 3 7 4 1 10 2 9 SiteID (in increasing order of median residual magnitude) Residual LEQ 50 55 60 65 70 Mean LEQ The week-long trend shown below illustrates that Wednesday is the loudest weekday while Saturday is the loudest day of the week. The model was able to fit approximate the observed levels, as seen by the residuals boxplots. 50.0 52.5 55.0 57.5 Sun 0:00 Sun 6:00 Sun 12:00 Sun 18:00 Mon 0:00 Mon 6:00 Mon 12:00 Mon 18:00 Tue 0:00 Tue 6:00 Tue 12:00 Tue 18:00 Wed 0:00 Wed 6:00 Wed 12:00 Wed 18:00 Thu 0:00 Thu 6:00 Thu 12:00 Thu 18:00 Fri 0:00 Fri 6:00 Fri 12:00 Fri 18:00 Sat 0:00 Sat 6:00 Sat 12:00 Sat 18:00 Sat 23:30 Time of Week Fitted LEQ Weekday Sun Mon Tue Wed Thu Fri Sat Site Characteristics The site characteristics can potentially obscure the spatial pattern in the LEQ noise process, thus the need for identifying the important site characteristics. Random Forest Regression The random forest regression implicitly models the interaction effects and it can identify important variables. The variable importance plot shows that the traffic count is the most important variable followed by industrial land use (IND100M) and population density (POP500M). REC100M OPEN100M COM100M DIST_EXP POP500M IND100M Total.Traffic 0 2000 4000 6000 Improvement in split criterion Data Cycle1 Cycle2 0 200 400 600 HIGH MEDIUM LOW Total.Traffic 0 5000 10000 15000 20000 25000 HIGH MEDIUM LOW IND100M 0 3000 6000 9000 12000 HIGH MEDIUM LOW COM100M 0 2500 5000 7500 10000 HIGH MEDIUM LOW REC100M Dirichlet Process Mixture Models We wish to cluster the sites based on the noise levels and identify the characteristics for each cluster, but the number of clusters K is unknown. An appropriate method in this framework is the Dirichlet Process Mixture Models. Obtained three clusters, which we label as HIGH, MEDIUM, and LOW noise. High level of traffic count and IND100M for the HIGH cluster and MEDIUM noise sites. Recreational land use (REC100M) is high for LOW noise sites. Describing spatial variations Question: Describe the spatial variations; are there locations in Toronto experiencing higher noise levels? It is assumed that data Z (s) = (Z (s 1 ),...,Z (s n )) are a realization of a spatial process {Z (s) : s R 2 }. For every site location s i , the residuals of the following regression model, denoted as R(s i ), retain the spatial variability of the process but some of it has been removed, as a result of the external information used in the modelling. Z (s i )= μ(s i )+ ε(s i )(s i ) N (02 ) μ(s i )= X (s i ) > β These residuals are then kriged by using Ordinary Kriging assuming a Gaussian Process with Mat´ ern covariance model with κ =1.5 as described below. R(s) N (μ 0 2 I + σ 2 Σ(φ)). The objective of the kriging procedure is to obtain predictions at unsampled locations using information available elsewhere in the study domain. Describing spatial variations Question: Are the spatial variations of Cycle1.rep similar to Cycle1? I If Cycle1 and Cycle1.rep follow the same spatial pattern in terms of “unexplained noise”, it is likely the kriging prediction of Cycle1.rep obtained from the residual kriging using Cycle1 is somewhat a fair prediction of the noise after adjusting the predictors. I We observe a good agreement for the two sets of residuals, so we augment Cycle1 with Cycle1.rep to provide calibrated predictions on Toronto. These are mapped in the first contour plot to show areas of high residual noise, such as downtown and near the Pearson airport, as expected. Cycle 1 43.60 43.65 43.70 43.75 43.80 43.85 -79.7 -79.6 -79.5 -79.4 -79.3 -79.2 -79.1 Longtitude Latitude -15-10 -5 0 5 10 Residual Noise (LEQ) Cycle 2 43.60 43.65 43.70 43.75 43.80 43.85 -79.7 -79.6 -79.5 -79.4 -79.3 -79.2 -79.1 Longtitude Latitude -10 0 10 20 Residual Noise (LEQ) Describing spatial variations Question: Cycles 1 and 2 similarity I The same methods as described above for Cycle1–linear model with chosen covariates and residual kriging–is implemented on Cycle2 to obtain a contour plot for residual noise in Toronto. I A large porportion of LEQ variation between sites in Cycle 2 can be explained by site characteristics similarly to Cycle 1. Random forest shows nearly identical results for the importance of site characteristics, so the cause of spatial noise variation is comparable. I When employing residual kriging, we obtain the contour map shown above which is dissimilar to that obtained with Cycle 1. This is likely due to the complementarity of the site placements coupled with the low long-distance prediction capability of this particular spatial model.

Upload: others

Post on 28-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Camila Casquilho-Resende, Nick Fishbane, Seong-Hwan Jun ...junseonghwan.github.io/resources/SSC_Poster_2013.pdfCamila Casquilho-Resende, Nick Fishbane, Seong-Hwan Jun, and Yunlong

Exploring Spatial and Temporal Heterogeneity of Environmental Noise in TorontoCamila Casquilho-Resende, Nick Fishbane, Seong-Hwan Jun, and Yunlong Nie

Department of Statistics, The University of British Columbia

Introduction

Objective

The objective is to analyze the variation of environmental noise across temporal and spatial dimensions. In the

process, we identify site characteristics that contribute to the noise level in Toronto.

Data

Noise was measured at 10 sites throughout Toronto over a week (or more) at 30-minute intervals. Time series

were recorded at various intervals from mid-June to early September, starting at different times in the week, with

various amplitudes and noise levels. This can be seen by the time-series from three of the ten sites. The map

shows the site locations of this case study, collected in three phases: Cycle1 (n = 241), Cycle1.rep (n = 100),

and Cycle2 (n = 213).

30

40

50

60

70

80

Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon TueTime (labels on midnight)

LEQ

Site 6

30

40

50

60

70

80

Wed Thu Fri Sat Sun Mon Tue Wed

LEQ

Site 3

30

40

50

60

70

80

Fri Sat Sun Mon Tue Wed Thu

LEQ

Site 9

43.60

43.65

43.70

43.75

43.80

43.85

−79.7 −79.6 −79.5 −79.4 −79.3 −79.2 −79.1Longtitude

Latit

ude

50 60 70 80 90LEQ

Data cycle1 cycle1.rep cycle2

Methods Overview

ITemporal variation was modeled using a linear mixed effect (LME) model with harmonic and polynomial com-

ponents

I Importance of site characteristics assessed with random forest and Dirichlet process mixture models

I Spatial variation of adjusted noise (LEQ) explored using residual kriging by assuming a Gaussian process with

Matern covariance function

IAforementioned methods reapplied on test data

Temporal Model

We assume each site to be independent since we observe no relationship between distance and noise correlation

between sites. Also, for each site we only consider the “time point” of the observations, which we define as the

time of week (of which there are 48*7 = 336). The LME is as follows:

LEQit = bi + d1(t) + d2(t)1[t∈W ] + w(t) + εit bi ∼ N (0, σ2)

dj(t) = β0j + β1j sin(2πt

24

)+ β2j cos

(2πt

24

)+ β3j sin

(2πt

12

)+β4j sin

(2πt

8

)+ β5j cos

(2πt

8

)j = 1, 2

w(t) = α1t+ α2t2 + α3t

3

The components (periods) of dj(t) were chosen based on a spectral analysis on all sites time-series individually.

Then, the combination of components and the order of the polynomial are determined by significance and AIC.

Finally, the choice of time point baseline reference (t = 0 on Sunday midnight) is chosen by AIC. The resulting

trend will come from the fixed effects which will look like d1(t)+w(t) for weekdays and d1(t)+d2(t)+w(t)

for weekends.

Describing temporal variations

The isolated harmonic functions d1(t) and d1(t) + d2(t) are shown below as well as the overall weekly trend.

We see that for weekdays, noise is fairly constant throughout work hours, rising and falling at both rush hours,

with a peak at 3:30 PM; on weekends noise levels rise later in the day, and higher at that, peaking at 6:00 PM.

48

50

52

54

0:00

2:00

4:00

6:00

8:00

10:0

0

12:0

0

14:0

0

16:0

0

18:0

0

20:0

0

22:0

0

Time

Fitt

ed L

EQ

Weekend

48

50

52

54

0:00

2:00

4:00

6:00

8:00

10:0

0

12:0

0

14:0

0

16:0

0

18:0

0

20:0

0

22:0

0

Time

Fitt

ed L

EQ

Weekday

●●

●●

●●

●●

●●

●●

●●

−10

0

10

20

6 8 5 3 7 4 1 10 2 9SiteID (in increasing order of median residual magnitude)

Res

idua

l LE

Q

50 55 60 65 70Mean LEQ

The week-long trend shown below illustrates that Wednesday is the loudest weekday while Saturday is the loudest

day of the week. The model was able to fit approximate the observed levels, as seen by the residuals boxplots.

50.0

52.5

55.0

57.5

Sun

0:0

0

Sun

6:0

0

Sun

12:

00

Sun

18:

00

Mon

0:0

0

Mon

6:0

0

Mon

12:

00

Mon

18:

00

Tue

0:00

Tue

6:00

Tue

12:0

0

Tue

18:0

0

Wed

0:0

0

Wed

6:0

0

Wed

12:

00

Wed

18:

00

Thu

0:0

0

Thu

6:0

0

Thu

12:

00

Thu

18:

00

Fri

0:00

Fri

6:00

Fri

12:0

0

Fri

18:0

0

Sat

0:0

0

Sat

6:0

0

Sat

12:

00

Sat

18:

00

Sat

23:

30

Time of Week

Fitt

ed L

EQ

Weekday

Sun

Mon

Tue

Wed

Thu

Fri

Sat

Site Characteristics

The site characteristics can potentially obscure the spatial pattern in the LEQ noise process, thus the need for

identifying the important site characteristics.

Random Forest Regression The random forest regression implicitly models the interaction effects

and it can identify important variables.

The variable importance plot shows that the traffic count is the most important variable followed by industrial

land use (IND100M) and population density (POP500M).

REC100M

OPEN100M

COM100M

DIST_EXP

POP500M

IND100M

Total.Traffic

0 2000 4000 6000Improvement in split criterion

Data Cycle1 Cycle2

0

200

400

600

HIGH MEDIUM LOW

Total.Traffic

0

5000

10000

15000

20000

25000

HIGH MEDIUM LOW

IND100M

0

3000

6000

9000

12000

HIGH MEDIUM LOW

COM100M

0

2500

5000

7500

10000

HIGH MEDIUM LOW

REC100M

Dirichlet Process Mixture ModelsWe wish to cluster the sites based on the noise levels and identify the characteristics for each cluster, but the

number of clusters K is unknown. An appropriate method in this framework is the Dirichlet Process Mixture

Models.

Obtained three clusters, which we label as HIGH, MEDIUM, and LOW noise. High level of traffic count and

IND100M for the HIGH cluster and MEDIUM noise sites. Recreational land use (REC100M) is high for LOW

noise sites.

Describing spatial variations

Question: Describe the spatial variations; are there locations inToronto experiencing higher noise levels?It is assumed that data Z(s) = (Z(s1), . . . , Z(sn)) are a realization of a spatial process {Z(s) : s ∈ R2}.

For every site location si, the residuals of the following regression model, denoted as R(si), retain the spatial

variability of the process but some of it has been removed, as a result of the external information used in the

modelling.

Z(si) = µ(si) + ε(si), ε(si) ∼ N(0, ψ2)

µ(si) = X(si)>β

These residuals are then kriged by using Ordinary Kriging assuming a Gaussian Process with Matern covariance

model with κ = 1.5 as described below.

R(s) ∼ N(µ0, τ2I + σ2Σ(φ)).

The objective of the kriging procedure is to obtain predictions at unsampled locations using information available

elsewhere in the study domain.

Describing spatial variations

Question: Are the spatial variations of Cycle1.rep similar to Cycle1?

I If Cycle1 and Cycle1.rep follow the same spatial pattern in terms of “unexplained noise”, it is likely the kriging

prediction of Cycle1.rep obtained from the residual kriging using Cycle1 is somewhat a fair prediction of the

noise after adjusting the predictors.

IWe observe a good agreement for the two sets of residuals, so we augment Cycle1 with Cycle1.rep to provide

calibrated predictions on Toronto. These are mapped in the first contour plot to show areas of high residual

noise, such as downtown and near the Pearson airport, as expected.

Cycle 1

43.60

43.65

43.70

43.75

43.80

43.85

−79.7 −79.6 −79.5 −79.4 −79.3 −79.2 −79.1Longtitude

Latit

ude

−15−10 −5 0 5 10

Residual Noise (LEQ)

Cycle 2

43.60

43.65

43.70

43.75

43.80

43.85

−79.7 −79.6 −79.5 −79.4 −79.3 −79.2 −79.1Longtitude

Latit

ude

−10 0 10 20

Residual Noise (LEQ)

Describing spatial variations

Question: Cycles 1 and 2 similarityIThe same methods as described above for Cycle1–linear model with chosen covariates and residual kriging–is

implemented on Cycle2 to obtain a contour plot for residual noise in Toronto.

IA large porportion of LEQ variation between sites in Cycle 2 can be explained by site characteristics similarly

to Cycle 1. Random forest shows nearly identical results for the importance of site characteristics, so the cause

of spatial noise variation is comparable.

IWhen employing residual kriging, we obtain the contour map shown above which is dissimilar to that obtained

with Cycle 1. This is likely due to the complementarity of the site placements coupled with the low

long-distance prediction capability of this particular spatial model.