study on the spatial–temporal change characteristics and ......study on the spatial–temporal...

13
S.I. : EMERGING INTELLIGENT ALGORITHMS FOR EDGE-OF-THINGS COMPUTING Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM Zhuang Wu 1,2 Shuo Zhang 1 Received: 24 February 2018 / Accepted: 4 May 2018 / Published online: 18 June 2018 Ó The Author(s) 2018 Abstract PM 2.5 (particulate matter) is an important object for air quality monitoring, and the research on related influence factors and diffusion process of PM 2.5 plays a key role in the fight against pollution of fog and haze. Based on the air quality monitoring data and related meteorological data of 16 districts of Beijing during November 2016 and December 2017, such methods as time-series analysis and nonparametric test are adopted to describe the variation trend of PM 2.5 concentration in space and time and its disparities in different seasons, time periods and areas. Linear regression method is used in most of the previous research on influence factors and prediction of PM 2.5 concentration, but actually, the relation between these factors is rather intricate and it is usually nonlinear. So, generalized additive model (GAM) is used in this paper to analyze the impact that different influence factors, especially their interaction, have on PM 2.5 concentration and its diffusion process. The result shows that in the dimensionality of time, PM 2.5 concentration has strong autocorrelation over time and it is most significant in the first to the third order (lag 0–lag 3). Throughout the year, PM 2.5 concentration is relatively high in winter and low in summer. It is usually the lowest during 16:00–18:00 and the highest during 9:00–11:00 every day and far higher at night than in the daytime (MD = - 6.455, P = 0.003). And in terms of space, PM 2.5 concentration shows distinct spatial gradient and it gradually decreases from south to north (MD = - 19.250, P = 0.004). It is found in the analysis of influence factors of PM 2.5 concentration that the change of PM 2.5 concentration is a complex nonlinear time series driven and affected by many factors; among these factors, the interaction between air pollutants and meteorological elements is the most prominent, while average wind speed (WS lag 1) plays a decisive role in the entire diffusion process, and it explains the whole diffusion of PM 2.5 concentration to a large extent. Keywords Spatial and temporal variation Interaction Diffusion and evolution Generalized additive model 1 Introduction Haze, as a collective name for fog and haze, is a kind of weather phenomenon in meteorological science. Fog is the outcome of condensation of water vapor (or deposition) in the air of atmospheric surface layer, and the core materials of haze include the aerosol particulate matter suspended in the air, mainly coming from such artificial sources as industrial pollution, fossil fuel combustion and biomass burning as well as the natural sources like soil dust. PM 2.5 is not only an important component of haze, but also a key object for air quality monitoring. It degrades the atmo- spheric visibility and increases the morbidity and mortality of respiratory disease and cerebrovascular disease [1, 2]. In recent years, unusual hazy weather has been frequently sweeping over China, dramatically deteriorating air quality and severely affecting normal production activities. Since the year of 2012, PM 2.5 has been added to ‘‘Air Quality Standards’’ as a conventional index and its real-time con- centration has also been appended to the air quality mon- itoring system of Ministry of Environmental Protection and the People’s Republic of China. Therefore, to figure out the related influence factors and diffusion process of PM 2.5 is vital to find an effective governance approach. & Zhuang Wu [email protected] Shuo Zhang [email protected] 1 School of Information, Capital University of Economics and Business, Beijing 100070, China 2 CTSC Center, Information College, Capital University of Economics and Business, Beijing 100070, China 123 Neural Computing and Applications (2019) 31:1619–1631 https://doi.org/10.1007/s00521-018-3532-z

Upload: others

Post on 17-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

S. I . : EMERGING INTELLIGENT ALGORITHMS FOR EDGE-OF-THINGS COMPUTING

Study on the spatial–temporal change characteristics and influencefactors of fog and haze pollution based on GAM

Zhuang Wu1,2 • Shuo Zhang1

Received: 24 February 2018 / Accepted: 4 May 2018 / Published online: 18 June 2018� The Author(s) 2018

AbstractPM2.5 (particulate matter) is an important object for air quality monitoring, and the research on related influence factors and

diffusion process of PM2.5 plays a key role in the fight against pollution of fog and haze. Based on the air quality

monitoring data and related meteorological data of 16 districts of Beijing during November 2016 and December 2017, such

methods as time-series analysis and nonparametric test are adopted to describe the variation trend of PM2.5 concentration in

space and time and its disparities in different seasons, time periods and areas. Linear regression method is used in most of

the previous research on influence factors and prediction of PM2.5 concentration, but actually, the relation between these

factors is rather intricate and it is usually nonlinear. So, generalized additive model (GAM) is used in this paper to analyze

the impact that different influence factors, especially their interaction, have on PM2.5 concentration and its diffusion

process. The result shows that in the dimensionality of time, PM2.5 concentration has strong autocorrelation over time and

it is most significant in the first to the third order (lag 0–lag 3). Throughout the year, PM2.5 concentration is relatively high

in winter and low in summer. It is usually the lowest during 16:00–18:00 and the highest during 9:00–11:00 every day and

far higher at night than in the daytime (MD = - 6.455, P = 0.003). And in terms of space, PM2.5 concentration shows

distinct spatial gradient and it gradually decreases from south to north (MD = - 19.250, P = 0.004). It is found in the

analysis of influence factors of PM2.5 concentration that the change of PM2.5 concentration is a complex nonlinear time

series driven and affected by many factors; among these factors, the interaction between air pollutants and meteorological

elements is the most prominent, while average wind speed (WS lag 1) plays a decisive role in the entire diffusion process,

and it explains the whole diffusion of PM2.5 concentration to a large extent.

Keywords Spatial and temporal variation � Interaction � Diffusion and evolution � Generalized additive model

1 Introduction

Haze, as a collective name for fog and haze, is a kind of

weather phenomenon in meteorological science. Fog is the

outcome of condensation of water vapor (or deposition) in

the air of atmospheric surface layer, and the core materials

of haze include the aerosol particulate matter suspended in

the air, mainly coming from such artificial sources as

industrial pollution, fossil fuel combustion and biomass

burning as well as the natural sources like soil dust. PM2.5

is not only an important component of haze, but also a key

object for air quality monitoring. It degrades the atmo-

spheric visibility and increases the morbidity and mortality

of respiratory disease and cerebrovascular disease [1, 2]. In

recent years, unusual hazy weather has been frequently

sweeping over China, dramatically deteriorating air quality

and severely affecting normal production activities. Since

the year of 2012, PM2.5 has been added to ‘‘Air Quality

Standards’’ as a conventional index and its real-time con-

centration has also been appended to the air quality mon-

itoring system of Ministry of Environmental Protection and

the People’s Republic of China. Therefore, to figure out the

related influence factors and diffusion process of PM2.5 is

vital to find an effective governance approach.

& Zhuang Wu

[email protected]

Shuo Zhang

[email protected]

1 School of Information, Capital University of Economics and

Business, Beijing 100070, China

2 CTSC Center, Information College, Capital University of

Economics and Business, Beijing 100070, China

123

Neural Computing and Applications (2019) 31:1619–1631https://doi.org/10.1007/s00521-018-3532-z(0123456789().,-volV)(0123456789().,- volV)

Page 2: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

Existing international study on fog and haze mainly

includes: (1) the composition and sources of haze pollu-

tants, (2) the correlation between urban air pollution and

meteorological elements and (3) the virtual and forecasting

of numeric value of regional airborne pollutant concen-

tration. With regard to the root of haze, Zhang et al. [3]

have believed that the use of non-clean energy is the most

fundamental cause of the pollution of fog and haze in

China and that the exploding urbanization process in China

in recent years is the immediate cause [4]. Related mete-

orological conditions, including temperature, relative

humidity and height of planetary boundary layer, play a

significant role in the formation, distribution, maintenance

and variation of these pollutants. Most previous study on

influence factors and prediction of PM2.5 concentration has

adopted the method of linear regression, but in fact, there is

complicated relationship, usually nonlinear, between these

factors. So, scholars have begun to use generalized additive

model to depict the complex relations between these

potential variables and PM2.5 concentration in recent years.

Song and others [5] have applied generalized additive

model and multi-source monitoring data to describe and

changes of PM2.5 concentration in Xi’an City. For the

independent variables, SO2 and CO will be considered as

linear functions, while NO2, O3, AOT and temperature as

single-variable smooth nonlinear functions and wind scale

as two-variable smooth nonlinear function. The final

explanation rate of the model is 69.50%. Li et al. [6] have

brought in principal component analysis (PCA) approach

into generalized additive model (GAM), proposed and used

PCA-GAM to forecast PM2.5 concentrations in Beijing

Municipality, Tianjin Municipality and Hebei Province,

and the results have shown that compared with conven-

tional land-use regression model, that model has a higher

accuracy rate (R2 = 0.94). In conclusion, PM2.5 concen-

tration increases for the following two causes: the meteo-

rological elements featured by southeast wind, temperature

inversion and high relative humidity which go against

diffusion and the pollution factors featured by the increase

in the concentration of suspended particulate matter [7–12].

Among them, pollution is the internal cause, which is

closely related to human activities and which can be con-

trolled while meteorological element is the external cause

and it cannot be controlled [13].

To sum up, the foregoing research work mainly inte-

grates specific pollution process and analyzes the numeric

changes of PM2.5 and the impact a single meteorological

factor has on PM2.5; in other words, it only conducts

research on fog and haze within a short time, and it is not

universally applicable. Even for the same city, PM2.5,

together with air pollutants, meteorological elements and

other factors, has formed a complex nonlinear dynamic

system, which has multilayered scale structure and local

variations in the time domain [14]. Therefore, based on the

analysis of change rule of particulate matter (PM2.5), the

promotion or inhibition that precursor pollutants (i.e., SO2,

NO2, CO) and meteorological elements have on PM2.5 has

been taken into consideration, and research has been con-

ducted to PM2.5 concentration in Beijing from the dimen-

sions of time and space by using the air quality data

released by Beijing Municipal Environmental Monitoring

Center and the meteorological data published by China

Meteorological Administration to analyze its features in

different stages. Generalized additive model (GAM) is a

flexible statistical model, which can be used to detect the

influence of nonlinear regression. Nonparametric regres-

sion does not require models to satisfy linear assumptions

and can detect the complex relationship between data

flexibly. However, when the number of independent vari-

ables in the model is more, the estimated variance of the

model will increase, and each additive item in the GAM

model is estimated by a single smooth function. In each

additive item, how the dependent variable varies with the

independent variable can be explained well. GAM is con-

structed with the focus on the analysis of the impact the

interaction of different influence factors has on the changes

of PM2.5 concentration to find the key influence factors and

analyze the entire change process of PM2.5 concentration.

2 Data and research methods

2.1 Data

According to the information provided by Beijing Munic-

ipal Environmental Monitoring Center, 35 air pollution

monitoring stations have been selected with complete

coverage of all districts of Beijing, and the information and

locations of every monitoring station are shown in Fig. 1.

Collect daily and hourly data of SO2 (ug/m3), NO2 (ug/m

3),

CO (ug/m3), O3 (ug/m3) and PM2.5 (ug/m3), respectively,

and the meteorological data, including wind speed (km/h),

relative humidity (%), temperature (�C), atmospheric

pressure (hpa), precipitation (mm) and sunshine hour (h),

from December 1, 2016, to November 30, 2017. Among

them, air quality data all come from Beijing Municipal

Environmental Monitoring Center (www.bjmemc.com.cn)

and meteorological data from China Meteorological

Administration (www.cma.gov.cn).

1620 Neural Computing and Applications (2019) 31:1619–1631

123

Page 3: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

2.2 Research methods

2.2.1 Analysis of spatial and temporal characteristicsof PM2.5

Based on the air quality monitoring data of 16 districts of

Beijing during December 2016 and November 2017, such

methods as time-series analysis, time-series plot and spatial

correlation analysis have been adopted to study the rule of

temporal and spatial variation of PM2.5 concentration, and

Kruskal–Wallis test, Mann–Whitney U test and Bonferroni

correction have been conducted on PM2.5 concentration,

respectively, to clarify their differences. By reference to the

air quality standards released by environmental protection

departments, the air quality is excellent when PM2.5 is

within 0–35 ug/m3, good when within 35–75 ug/m3, slight

(intermediate) pollution when within 75–150 ug/m3 and

heavy pollution when bigger than 150 ug/m3. In order to

Code Station Name Code Station Name Code Station Name1 DongSi 13 FangShan 25 BaDaLing2 TianTan 14 DaXing 26 MiYunShuiKu3 GuanYuan 15 YiZhuang 27 DongGaoCun4 WanShouXiGong 16 TongZhou 28 YongLeDian5 AoTiZhongXin 17 ShunYi 29 YuFa6 NongZhanGuan 18 ChangPing 30 LiuLiHe7 WanLiu 19 MenTouGou 31 QianMen8 BeiBuXinQu 20 PingGu 32 YongDingMenNei9 ZhiWuYuan 21 HuaiRou 33 XiZhiMenBei10 FengTaiHuaYuan 22 MiYun 34 NanSanHuan11 YunGang 23 YanQing 35 DongSiHuan12 GuCheng 24 DingLing

Fig. 1 Location of the 35 PM2.5 monitoring stations in Beijing

Neural Computing and Applications (2019) 31:1619–1631 1621

123

Page 4: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

simplify the research process, 16 districts of Beijing have

been divided into three regions from north to south (see

Table 1) in four seasons: spring (March–May), summer

(June–August), autumn (September–November) and winter

(December–February). Besides, time-series plot is used to

depict the changes in daily PM2.5 concentration.

2.2.2 Generalized additive model

The function in GAM can be identified by the reverse fit-

ting method; it can be applied to the analysis of a variety of

distributed data. The model can include both the parameter

fitting part and the nonparametric fitting part. All the

interpretative components in the GAM are all kinds of

smoothing function forms to explain the variables. It is

suitable for the analysis of many complex linear relations.

In recent years, nonparametric models have attracted

increasing attention from scholars. Hastie and Tibshirani

have applied additive model into generalized linear model

(GLM) and come up with the concept of generalized

additive model (GAM) in 1990, which, in its essence, is to

connect the expectation of response variable with the

additivity in the additive model via the connection function

[15] in the following formula.

gðEðYÞÞ ¼ s0 þ s1ðX1Þ þ s2ðX2Þ þ � � � spðXpÞ ð1Þ

In this formula, E(Y) is the expectation of Y. s0 is the

intercept and si (.) (i = 1, 2,…p) is a nonparametric smooth

function and Esi(Xi) = 0. It can be a smooth spline func-

tion, a local regression smooth function or a kernel func-

tion. g(.) is the connection function and g(.) can be

represented in the following forms for predictive variables

of different distribution types.

gðEðYÞÞ ¼

EðYÞ Y � Nðl; r2Þ

logEðYÞ

1� EðYÞ

� �Y � B ðn; pÞ

logðEðYÞÞ Y � Ga ða; kÞlogðEðYÞÞ Y � P ðkÞ

8>>>>><>>>>>:

ð2Þ

GAM obtains the most suitable trend line of source data

by identifying and accumulating multiple functions. By

dealing with the complex nonlinear relationship between

the dependent variables and the explanatory variables, the

nonparametric regression is fitted. The algorithm iteratively

fits and adjusts the function to reduce the prediction error.

In fact, GAM pays more attention to nonparametric

exploration of data, which is more suitable for analysis and

interpretation of the relationship between the response

variable and the explanatory variable. The specific analysis

of generalized additive model uses R version Rx64.3.4.3

and mgcv package coming from The R-Project for Statis-

tical Computing, i.e., https://www.r-project.org/.

Table 1 Distribution of PM2.5

concentrations in the 16 districts

of Beijing, 2016–2017 (ug/m3)

Regional category District Mean SD Rate I (%) Rate II (%)

North Yanqing 52.92 40.74 4.81–308.80 191 (55.20)

Huairou 56.23 56.08 5.42–485.00 187 (54.05)

Miyun 57.35 56.74 4.86–459.50 195 (56.36)

Shunyi 65.30 63.51 4.43–398.00 216 (62.42)

Changping 56.97 56.84 4.71–413.75 188 (54.17)

Pinggu 70.04 65.80 6.07–419.75 239 (69.08)

Center Dongcheng 71.73 69.91 7.38–430.92 237 (68.59)

Haidian 65.32 65.09 5.38–463.50 219 (63.29)

Xicheng 72.49 68.65 6.88–462.00 239 (69.08)

Chaoyang 72.85 66.29 5.92–406.67 255 (73.70)

ShiJingShan 71.15 71.25 4.83–495.00 233 (67.34)

Fengtai 77.77 71.10 6.83–495.13 277 (80.06)

South Mentougou 65.61 66.37 6.04–551.00 216 (62.43)

Touzhou 85.21 76.40 6.98–458.46 278 (80.35)

Daxing 79.30 73.58 7.76–443.03 260 (75.14)

Fangshan 91.13 77.68 6.78–502.31 290 (83.81)

SD standard deviation

Rate I (%): non-attainment days and rates (%) for grade I (0–35 ug/m3)

Rate II (%): non-attainment days and rates (%) for grade II (35–75 ug/m3)

1622 Neural Computing and Applications (2019) 31:1619–1631

123

Page 5: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

3 Results

3.1 Overview of PM2.5 pollution in Beijing

According to the PM2.5 data from Beijing Municipal

Environmental Monitoring Center (www.bjmemc.com.cn),

it can be calculated that the annual average PM2.5 con-

centration from December 2016 to November 2017 is

69.46 ug/m3 and the lowest is Yanqing District, northwest

of Beijing with an annual average PM2.5 concentration of

52.92 ug/m3 and the highest is Fangshan District, south-

west of Beijing with an annual average PM2.5 concentra-

tion of 91.13 ug/m3, much higher than 75 ug/m3, the limit

of Grade II level stipulated in Ambient Air Quality Stan-

dards (GB 3095-2012) with significant spatial gradient. In

another word, PM2.5 concentration gradually decreases

from south to north (MD = - 19.250, P = 0.004, as shown

in Table 2). Besides, 16 districts have experienced the limit

of Grade I PM2.5 concentration (55 ug/m3) for over 54%

of total days and the limit of Grade II PM2.5 concentration

(75 ug/m3) for over 22% of total days.

During the analysis, it is found that there also exist

seasonal fluctuations in the variation of PM2.5 concentra-

tion (see Fig. 2). Haze weather occurs 47 times during the

research: 9 in spring, 9 in summer, 14 in autumn and 15 in

winter, and it usually lasts 1–8 days in spring, 1–2 days in

summer, 1–3 days in autumn and 1–9 days in winter

featured by large-scale persistent outbreak. Overall, PM2.5

concentration is relatively high in winter and low in sum-

mer, and the difference between spring and autumn has no

statistical significance (MD = - 0.791, P = 1.000, see

Table 2), showing a U-shaped curve all over the year. The

main reason is that in winter more coal is burned for heat,

increasing the particulate matter discharged to the atmo-

sphere, and from the view of meteorological conditions,

pollutant usually diffuses slowly in winter with stable at-

mospheric stratification of troposphere, no cold air and

high-rise buildings in urban areas while the high temper-

ature, exuberant air convection and large precipitation in

summer have promoted the deposition of particulate mat-

ter. Additionally, Mann–Whitney U test is used to identify

the discrepancy of PM2.5 concentration between weekdays

and weekends, and the result shows that P = 0.544

([ 0.05), suggesting that the difference of these two groups

of data is not significant in statistics.

Expand the above research process and use MATLAB

2014a tool box to conduct autocorrelation analysis on daily

PM2.5 concentration to reveal its time-series characteristics,

as indicated in Fig. 3. The result has shown that the top and

bottom critical values of autocorrelation coefficient are

± 0.163, respectively, and the first-order autocorrelation

coefficient is 0.6. It can be seen that PM2.5 concentration of

Beijing has strong autocorrelation over time, and it is the

most significant from the first to the third order. Besides,

Table 2 Significance tests of

PM2.5 levels for different

seasons, regions, daytime and

day of week

Variable Kruskal–Wallis H test Bonferroni test

Chi-square P MD P

Region#

North versus south 8.661 0.013 - 19.250 0.004

Center versus south - 8.417 0.258

North versus center - 10.833 0.074

Season*

Spring versus summer 18.662 0.000 14.109 0.686

Spring versus autumn - 0.791 1.000

Spring versus winter - 46.322 0.000

Summer versus autumn - 14.900 0.578

Summer versus winter - 60.431 0.000

Autumn versus winter - 45.513 0.000

Daytime#

7–12 a.m. versus 1 pm–6 pm 45.740 0.000 4.319 0.162

7–12 a.m. versus 7 p.m.–6 a.m. - 6.455 0.003

1 pm–6 pm versus 7 p.m.–6 a.m. - 10.775 1.945

Day of week

Weekday versus weekends Mann–Whitney U test

P = 0.544

* The mean difference is significant at the 0.0083 level for Bonferroni test# The mean difference is significant at the 0.0167 level for Bonferroni test

Neural Computing and Applications (2019) 31:1619–1631 1623

123

Page 6: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

PM2.5 concentration also has different diurnal variations in

different seasons (as shown in Fig. 4). In comparison with

autumn and winter, diurnal variation is stronger in spring

and autumn, showing a smooth W shape. Further use

Bonferroni correction for detection, and the result has

demonstrated that PM2.5 concentration in the morning

(7 a.m.–12 a.m.) is far lower than in the nighttime (7 p.m.–

6 a.m.) (MD = - 6.455, P = 0.003), but the difference in

the daytime (7 a.m.–12 a.m./1–6 p.m.) is not notable.

Generally, the lowest daily concentration occurs during the

period of 16:00–18:00, while the highest in 9:00–11:00.

3.2 Analysis of generalized additive modelof PM2.5 and single influence factor

Air pollutants and meteorological factors are the important

environmental factors that affect PM2.5 concentration of a

place. The air pollutants include SO2 (ug/m3), NO2 (ug/

Fig. 2 Day-to-day variation of PM2.5 in different seasons, Beijing 2016–2017

Fig. 3 Autocorrelation coefficient of PM2.5 in Beijing 2016–2017

Fig. 4 Diurnal variation of PM2.5 in different seasons, Beijing

2016–2017

1624 Neural Computing and Applications (2019) 31:1619–1631

123

Page 7: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

m3), CO (ug/m3) and O3 (ug/m3), and meteorological fac-

tors include temperature (�C), atmospheric pressure of sea

level (hpa), relative humidity (%), wind speed (km/h),

precipitation (mm) and sunshine hours (h). On this basis,

adopt generalized additive model to describe the impact a

single influence factor and interaction have on PM2.5

concentration with daily average PM2.5 concentration as

the response variable and the daily average numeric value

of related influence factors as the explanatory variable.

3.2.1 Preanalysis of explanatory variable

According to the analysis result of Q–Q Chart in SPSS, it

can be found that daily average concentration of PM2.5 is

approximately subject to gamma distribution (Y * (a, k)).So, use log connection function to connect explanatory

variables with response variable subject to gamma distri-

bution via linear combination and calculate the Pearson

correlation coefficient between any two explanatory vari-

ables to prevent them from distorting the model estimation

due to the existence of severe collinearity (|r|[ 0.8). It can

be known from the calculation result (Table 3) that the

correlation coefficient between temperature (t) and sun-

shine hours (sh) is 0.922 and that between temperature

(t) and sea-level atmospheric pressure is - 0.900, sug-

gesting that the longer the same region experiences the

sunshine, the higher the temperature is. If other causes such

as power are not taken into consideration, the higher the

temperature is, the faster the atmosphere expands and rises

because of heat, namely that the lower the atmospheric

pressure is. Sunshine hour is used to represent temperature

index in this paper to avoid concurvity in the construction

of multivariable curve model. Moreover, though the cor-

relation coefficient between NO2 and CO has reached

0.856, they are not eliminated as they come from different

sources.

It can be known from Fig. 3 that the autocorrelation

between PM2.5 concentration in Beijing over time is very

strong, which is most obvious in first to third order. But in a

short period of time, the development scale, geographical

conditions, emissions of industrial pollution and automobile

exhaust of a region are relatively fixed, so the change of

PM2.5 concentration is mainly related to local meteorological

conditions. Next, emphasis is placed on analyzing the corre-

lation between PM2.5 and various meteorological factors

located from lag 0 (current PM2.5 concentration) to lag 3

(PM2.5 concentration 3 days ago). With Spearman correlation

coefficient (rs) as a measurement criterion, the meteorological

elements with a Spearman correlation coefficient bigger than

0.3 (rs[0.3) are selected as the explanatory variables of the

model. And the result is shown as follows.

Integrate the result of Table 4 and take SO2, NO2, CO,

O3, wind speed lag 1 (rs = - 0.564, P\ 0.01), sea-level

atmospheric pressure lag 3 (rs = 0.301, P\ 0.01), relative

humidity lag 0 (rs = 0.367, P\ 0.01) and sunshine hour

lag 0 (rs = - 0.308, P\ 0.01) as explanatory variables

and daily PM2.5 concentration as response variable into the

final model (Formula 3). On this basis, spline smooth

function is adopted to analyze the impact every explanatory

variable has on response variable and the final goodness of

fit of the model.

logðEðYÞÞ ¼ s0 þ sðSO2; bs ¼ crÞ þ sðNO2; bs ¼ crÞþ sðCO; bs ¼ crÞ þ sðO3; bs ¼ crÞþ sðWSlag1; bs ¼ crÞ þ sðAPlag3; bs ¼ crÞþ sðRH; bs ¼ crÞ þ sðSH; bs ¼ crÞ

ð3Þ

Table 3 Pearson correlation coefficient among influencing factors

SO2 NO2 AP T WS SH RH P O3 CO

S02 1 .611** .354** - .486** - .100 - .405** - .112* - .110* - .319** .539**

NO2 .611** 1 .310** - .414** - .312** - .461** .176** - .112* - .510** .856**

AP .354** .310** 1 - .900** .013 - .859** - .202** - .182** - .703** .283**

T - .486** - .414** - .900** 1 - .079 .922** .225** .180** .728** - .398**

WS - .100 - .312** .013 - .079 1 .003 - .592** - .102 .140** - .236**

SH - .405** - .461** - .859** .922** .003 1 .105* .230** .792** - .422**

RH - .112* .176** - .202** .225** - .592** .105* 1 .452** - .001 .263**

P - .110* - .112* - .182** .180** - .102 .230** .452** 1 .166** - .046

O3 - .319** - .510** - .703** .728** .140** .792** - .001 .166** 1 - .389**

CO 0.39** .856** .283** - .398** - .236** - .422** .263** - .046 - .389** 1

Bold font used to mark the Pearson correlation coefficient ðjrjÞ if the absolute value of it is greater than 0.8, which means the two explanatory

variables existing severe collinearity

SO2 sulfur dioxide, NO2 nitrogen dioxide, CO carbon monoxide, O3 ozone, AP atmospheric pressure, T temperature, WS wind speed, SH

sunshine hours, RH relative humidity

** Correlation is significant at the 0.01 level (2-tailed); * Correlation is significant at the 0.05 level (2-tailed)

Neural Computing and Applications (2019) 31:1619–1631 1625

123

Page 8: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

The result has shown that all influence factors play a

great impact on the change of PM2.5 concentration when

P\ 0.01, namely that every influence factor is statistically

significant as explanatory variable of variation of PM2.5

concentration alone. Among them, the big explanation

rates of CO, NO2 and wind speed (WS lag 1) to the vari-

ation of PM2.5 concentration (47.5, 44.9, 36.7%) and cor-

rection coefficients of determination (0.468, 0.439, 0.357)

indicate excellent degree of fitting in the existing model,

while the small explanation rate of sea-level atmospheric

pressure (AP lag 3) to the variation of PM2.5 concentration

(7.56) and the correction coefficient of determination

(0.0709) means a bad degree of fitting in the current model

(see Table 5).

Besides, when degree of freedom (df) is 1, the function

is a linear equation, suggesting that there exists a linear

relation between explanatory variables and response vari-

able. When it is bigger than 1, the function is a nonlinear

curve equation and the bigger it is, the more significant the

nonlinear relation is. Among eight explanatory variables in

this experiment, there is certain nonlinear relationship

between SO2, relative humidity (RH) and sea-level atmo-

spheric pressure (AP lag 3) and PM2.5 (with a degree of

freedom of 2) and significant linear relationship between

other factors and PM2.5. So, variation of PM2.5 concentra-

tion is a sophisticated nonlinear time variation series driven

and affected by multiple factors. By building generalized

additive model between explanatory variables and PM2.5

concentration, the effect plot between explanatory vari-

ables and PM2.5 concentration can be obtained. In Fig. 5,

the independent influence every predictive variable has on

PM2.5 is depicted: the dashed line refers to the point-to-

point standard deviation of fitting additive function, namely

the top and bottom limitations of confidential interval and

the full line is the smooth fitting curve of PM2.5 concen-

tration. X-axis is the measured value of various explanatory

variables, and Y-axis is the smooth fitted value of different

explanatory variables over PM2.5 concentration.

It can be found from Fig. 5 that the variation of PM2.5

concentration is affected by multiple influence factors.

Interact the explanatory variables and analyze the impact

the interaction terms have on variation of PM2.5 concen-

tration. It can be known from the calculation result of

Table 6 that the interaction terms have very big numeric

value in degree of freedom, suggesting that they have

significant nonlinear relations with variation of PM2.5

concentration. Additionally, 28 interaction terms in the

model equation have got through the significance test with

an explanation rate between 19.6 and 58.0% and a

Table 4 Spearman correlation coefficient among influencing factors

Independent variable Lag rs Sig.(2-tailed)

Wind speed Lag 0 - 0.356** 0.000

Lag 1 - 0.564** 0.000

Lag 2 - 0.242** 0.000

Lag 3 - 0.075** 0.000

Relative humidity Lag 0 0.367** 0.000

Lag 1 0.235** 0.000

Lag 2 - 0.028 0.569

Lag 3 - 0.087 0.100

Air pressure Lag 0 0.073 0.168

Lag 1 0.211** 0.000

Lag 2 0.252** 0.000

Lag 3 0.301** 0.000

Precipitation Lag 0 - 0.024 0.649

Lag 1 - 0.173** 0.001

Lag 2 - 0.175** 0.001

Lag 3 - 0.154** 0.003

Sunlight hours Lag 0 - 0.308** 0.000

Lag 1 - 0.252** 0.000

Lag 2 - 0.245** 0.000

Lag 3 - 0.238** 0.000

Bold font used to mark the number if the absolute value of Spearman

correlation coefficient ðjrsjÞ is greater than 0.3 and the correlation is

significant at 0.01 level, otherwise the response variable are not

correlated with the explanatory variables

Lag PM2.5 concentration values in different days, rs Spearman cor-

relation coefficient, Sig.(2-tailed) two-tailed tests of significance

** Correlation is significant at the 0.01 level (2-tailed)

Table 5 GAM model

hypothesis test results between

PM2.5 concentration and single

influencing factors

Smooth term df Ref.df F P Deviance explained (%) Adjusted R2

SO2 2.159 2.698 43.18 \ 2E-16*** 24.7 0.243

NO2 6.376 7.632 38.18 \ 2E-16*** 44.9 0.439

CO 4.563 5.578 57.57 \ 2E-16*** 47.5 0.468

O3 7.233 8.252 15.85 \ 2E-16*** 24.7 0.259

AP lag 3 1.798 2.271 12.36 2.73E-06*** 7.56 0.0709

WS lag 1 5.221 6.275 32.24 \ 2E-16*** 36.7 0.357

RH 2.391 3.007 23.75 3.26E-14*** 17.0 0.165

SH 7.569 8.499 7.447 1.93E-09*** 15.7 0.139

*, **, *** Significant level at 5, 1, 0.1%

1626 Neural Computing and Applications (2019) 31:1619–1631

123

Page 9: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

Fig. 5 Effect of influencing

factors on the variation of PM2.5

concentration

Neural Computing and Applications (2019) 31:1619–1631 1627

123

Page 10: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

significant rise compared with single-factor model, proving

that the fitting degree of the model has reached a high

standard. The top five places in fitting degree are CO-WS

lag 1 (58%), CO-SH (56.2%), SO2-WS lag 1 (54.8%),

NO2-WS lag 1 (51.9%) and CO–RH (50.7%), and they are

combinations of precursors and meteorological elements. It

states that PM2.5 concentration changes mainly under the

interaction between air pollutants and meteorological ele-

ments from one side, and the calculation result has also

shown that wind speed (WS lag 1) has played a decisive

role in the entire diffusion, and it can largely explain the

entire change of PM2.5 concentration.

Take average wind speed (WS lag 1) as the key factor

and analyze the impact the interaction between the average

wind speed and concentration of air pollutants has on

PM2.5 concentration as shown in the Fig. 6.

When CO concentration is low, PM2.5 concentration

slowly decreases with the increase in average wind speed

(WS lag 1), and when the average wind speed (WS lag 1) is

quite low, the concentration of PM2.5 sees a fluctuant

increase with the increase in CO concentration. It means

that wind speed can dilute CO concentration in the air and

it significantly affects the mixing effect of pollutants when

the wind speed is low and increases PM2.5 concentration.

NO2 and SO2 have similar variation relations with

average wind speed (WS lag 1). When NO2/SO2 is low in

concentration, average wind speed can diffuse and dilute

and stabilize PM2.5 concentration within a relatively low

interval. However, with the increase in NO2/SO2 concen-

tration, average wind speed plays a less and less significant

role, indicating that their concentration in the air has a

critical value. Once it exceeds that value, the average wind

speed plays a more significant role on the mixture of NO2

and SO2 and promotes the occurrence of secondary

chemical reaction, so the pollutants in the air cannot be

effectively diluted.

As O3 is mainly distributed in the stratosphere with a

height of 10–50 km, average wind speed (WS lag 1) barely

has any influence. It can be learnt from Table 3 that there is

certain negative correlation between O3 and NO2 and the

increase in NOx will increase secondary nitrate particulate

matter and produce release PM2.5. On the other hand, the

low visibility caused by rise of PM2.5 concentration will

suppress the production of O3. So, it can be concluded that

there also exists negative correlation between PM2.5 con-

centration and O3 concentration. In recent years, PM2.5

concentration in China has dropped a bit, but O3 pollution

has become increasingly serious, especially in summer. It

seems that O3 may become the primary pollutant in place

of PM2.5. Therefore, PM2.5 and O3 shall be taken into

collaborative control in the governance.

In the previous research on the relationship between

PM2.5 and meteorological elements, the majority focuses

on wind speed and proves that there is negative correlation

between them [26, 27], which has also been verified in this

paper. What is different is that based on the precursors of

Table 6 GAM model hypothesis test results between PM2.5 concentration and interaction of influencing factors

Cross term SO2-NO2 SO2-CO SO2-O3 SO2-APlag 3 SO2-WSlag 1 SO2-RH SO2-SH

df 8.41 17.63 12.20 4.60 15.40 8.59 9.26

DE 45.9% 49.3% 38.4% 25.1% 54.8% 45.1% 27.5%

Adjusted R2 0.446 0.467 0.362 0.242 0.527 0.437 0.256

Cross term NO2-CO NO2-O3 NO2-RH NO2-APlag 3 NO2-WS lag 1 NO2-SH CO-O3

df 9.127 16.56 3.241 9.716 2.003 10.59 24.55

DE 50.0% 50.3% 48.9% 44.7% 51.9% 46.0% 48.8%

Adjusted R2 0.487 0.479 0.484 0.431 0.516 0.444 0.451

Cross term CO-APlag 3 CO-WSlag 1 CO–RH CO-SH O3-APlag 3 O3-WSlag 1 O3-RH

df 21.42 18.10 19.78 18.76 15.46 13.98 10.88

DE 48.1% 58.0% 50.7% 56.2% 34.0% 45.1% 31.4%

Adjusted R2 0.448 0.558 0.479 0.538 0.311 0.429 0.293

Cross term O3-SH APlag 3-WSlag 1 APlag 3-RH APlag 3-SH WSlag 1-RH WSlag 1-SH RH-SH

df 12.51 11.54 8.975 18.64 16.27 21.26 22.17

DE 29.3% 42.8% 32.6% 19.6% 39.3% 48.6% 37.8%

Adjusted R2 0.268 0.409 0.309 0.152 0.364 0.453 0.338

DE deviance explained

*, **, *** Significant level at 5, 1, 0.1% (all terms: P\ 2E-16***)

1628 Neural Computing and Applications (2019) 31:1619–1631

123

Page 11: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

PM2.5, time autocorrelation has also been taken into con-

sideration and time-delay characteristic of meteorological

elements have also been taken into test. The calculation

result has shown that SO2, NO2, CO, O3, wind speed lag 1

(rs = - 0.564, P\ 0.01), sea-level atmospheric pressure

lag 3 (rs = 0.301, P\ 0.01), relative humidity lag 0 (rs =

0.367, P\ 0.01) and sunshine hour lag 0 (rs = - 0.308,

P\ 0.01) are the variables most correlated with daily

average PM2.5.

4 Conclusions

The pollution of fog and haze, as one of the most severe

environmental pollution problems in recent years in China,

has a great effect on the physical and psychological health

and regular travel of the public and the economic develop-

ment of the state. In the critical period of economic devel-

opment in China, it is urgent to solve fog and haze pollution.

Generalized additive model is adopted in this paper to

analyze the spatial–temporal characteristics of pollution of

fog and haze and the impact brought by various influence

factors and the interactive terms in Beijing and the sur-

rounding regions, and the entire diffusion process is analyzed

on this basis. The key conclusions are as follows.

(1) There exists a significant spatial gradient in pollution

of fog and haze in Beijing and the surrounding areas,

as indicated by progressive decrease from south to

north (MD = - 19.250, P = 0.004, as shown in

Table 2), which is mainly the result of mutual

accumulation of regional pollution. The northern

area is surrounded by mountains and plenty of green

vegetation can purify the air to a certain degree [16].

On the other hand, as the southern area is adjacent to

such severely polluted regions as Tianjin Munici-

pality and Hebei Province, external influence is more

significant [17–19]. So, regional collaborative pre-

vention and treatment is required to control the

pollution of PM2.5 in Beijing and special attention

Fig. 6 Relationship between pollution status and wind speed on the day before

Neural Computing and Applications (2019) 31:1619–1631 1629

123

Page 12: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

shall be paid to the transmission of pollutants from

the southwest and the control of major sources.

(2) PM2.5 concentration fluctuates greatly in different

reasons. As a whole, the concentration of PM2.5 is

relatively high in winter and low in summer with a

U-shaped distribution all year round. According to

‘‘Measures of Beijing Municipality for Administra-

tion of Heat Supply and Use,’’ the heating period of

this Municipality is from November 15 of the current

year to March 15 of the next year, so PM2.5

concentration goes up in winter mainly because

household heating increases and causes more partic-

ulate matter discharged into the air [17, 18, 20].

Earlier, the seasonal characteristics of spring

drought, few rains and strong wind as well as

unreasonable human activities in Beijing have fre-

quently resulted in large-scale outbreak of sand

storm, but in the experiment result of this paper,

PM2.5 concentration does not increase greatly in

spring, suggesting that the sandstorm source control

project in Beijing–Tianjin–Hebei Region has worked

well. Besides, some research achievements have

once pointed out that there is a difference in

pollutant concentration between weekdays and

weekends [21, 22], but according to the experiment

result of Mann–Whitney U test in this paper, the

foregoing difference is not statistically significant

and the credit should be given to the implementation

of vehicle ban policy in China.

(3) The daily change rule of PM2.5 is subject to the

height of boundary layer [23]. According to the

calculation result, the lowest concentration usually

occurs during 16:00–18:00 and the highest during

9:00–11:00 every day. Generally speaking, the peak

PM2.5 concentration in the morning is caused by

human activities, while the fall of PM2.5 concentra-

tion in the afternoon is because the increase in height

of boundary layer has accelerated the diffusion of

PM2.5. In the nighttime, the height of boundary layer

drops again and human activities increase, so PM2.5

concentration rises again. Therefore, PM2.5 concen-

tration every morning (7–12 a.m.) is much lower

than every night (7 p.m.–6 a.m.) (MD = - 6.455,

P = 0.003). Especially in winter, heating increases

from coal burning and so does the particulate matter

released to the air. Besides, less solar radiation also

moves the time when the height of boundary layer

falls in advance, so the nighttime PM2.5 level is

relatively high in winter [24, 25].

(4) PM2.5 concentration changes mainly due to the

interaction between air pollutants and meteorologi-

cal elements [26, 27]. In the entire diffusion process,

average wind speed (WS lag 1) plays a decisive role

and it affects the diffusion of CO and the occurrence

of secondary reaction. For NO2 and SO2, average

wind speed (WS lag 1) has a critical value. The

concentration of NO2 and SO2 remains stable below

the critical value, and secondary reaction occurs

when exceeding the critical value. For O3, as it is

mainly distributed in the stratosphere with a height

of 10–50 km, average wind speed (WS lag 1) has no

effect on it, but it can be concluded that there is

negative correlation between PM2.5 concentration

and O3 concentration through the calculation result.

Therefore, in the governance, they shall be taken into

collaborative control.

Acknowledgements This work was supported by the National Social

Science Foundation of China (No.16BGL145).

Compliance with ethical standards

Conflict of interest We declare that we have no financial and personal

relationships with other people or organizations that can inappropri-

ately influence our work; there is no professional or other personal

interest of any nature or kind in any product, service and/or company

that could be construed as influencing the position presented in, or the

review of, the manuscript entitled.

Open Access This article is distributed under the terms of the Creative

Commons Attribution 4.0 International License (http://creative

commons.org/licenses/by/4.0/), which permits unrestricted use, dis-

tribution, and reproduction in any medium, provided you give

appropriate credit to the original author(s) and the source, provide a

link to the Creative Commons license, and indicate if changes were

made.

References

1. Tian S, Pan Y, Liu Z et al (2014) Size-resolved aerosol chemical

analysis of extreme haze pollution events during early 2013 in

urban Beijing, China. J Hazard Mater 279:452–460

2. Ikram M, Yan Z, Liu Y et al (2015) Seasonal effects of tem-

perature fluctuations on air quality and respiratory disease: a

study in Beijing. Nat Hazards 79(2):833–853

3. Zhang X-Y, Sun J-Y, Wang Y-Q et al (2013) Thinking on causes

and governance of fog and haze in China. Chin Sci Bull

58(13):1178–1187

4. Chan CK, Yao X (2008) Air pollution in mega cities in China.

Atmos Environ 42(1):1–42

5. Song YZ, Yang HL, Peng JH et al (2015) Estimating PM2.5

concentrations in Xi’an City using a generalized additive model

with multi-source monitoring data. PLoS ONE 10(11):e0142149

6. Li S, Zhai L, Zou B et al (2017) A generalized additive model

combining principal component analysis for PM2.5 concentration

estimation. Int J Geo-Inf 6(8):248

7. Quan J, Tie X, Zhang Q et al (2014) Characteristics of heavy

aerosol pollution during the 2012–2013 winter in Beijing, China.

Atmos Environ 88(5):83–89

8. Zhou Y-M, Zhao X-Y (2017) Correlation analysis between

PM2.5 concentration and meteorological elements in Beijing.

Acta Sci Nat Univ 53(1):111–124

1630 Neural Computing and Applications (2019) 31:1619–1631

123

Page 13: Study on the spatial–temporal change characteristics and ......Study on the spatial–temporal change characteristics and influence factors of fog and haze pollution based on GAM

9. Tai APK, Mickley LJ, Jacob DJ (2010) Correlations between fine

particulate matter (PM2.5) and meteorological variables in the

United States: implications for the sensitivity of PM2.5 to climate

change. Atmos Environ 44(32):3976–3984

10. Zhao CX, Wang YQ, Wang YJ et al (2014) [Temporal and spatial

distribution of PM2.5 and PM10 pollution status and the corre-

lation of particulate matters and meteorological factors during

winter and spring in Beijing. Environ Sci 35(2):418

11. Huang F, Li X, Wang C et al (2015) PM2.5 spatiotemporal

variations and the relationship with meteorological factors

during 2013–2014 in Beijing, China. PLoS ONE 10(11):

e0141642

12. Guo L, Zhang Y-K, Liu S-H et al (2011) Correlation analysis

between PM_(10) quality concentration and meteorological ele-

ments of boundary layer. Acta Sci Nat Univ 47(4):607–612

13. Wang Y-S, Wang L-L (2014) Sources, impact and control of

atmospheric haze pollution. Sci Soc 4(2):9–18

14. Qian Z, He Q, Lin HM et al (2007) Association of daily cause-

specific mortality with ambient particle air pollution in Wuhan,

China. Environ Res 105(3):380–389

15. Tibshirani R, Hastie J et al (1990) Generalized additive models.

Chapman and Hall, London, pp 590–606

16. Liu J, Mo L, Zhu L et al (2016) Removal efficiency of particulate

matters at different underlying surfaces in Beijing. Environ Sci

Pollut Res Int 23(1):408–417

17. Wang Y, Ying Q, Hu J et al (2014) Spatial and temporal varia-

tions of six criteria air pollutants in 31 provincial capital cities in

China during 2013–2014. Environ Int 73(1):413–422

18. Chai F, Gao J, Chen Z et al (2014) Spatial and temporal variation

of particulate matter and gaseous pollutants in 26 cities in China.

J Environ Sci 26(1):75–82

19. Ji D, Wang Y, Wang L et al (2012) Analysis of heavy pollution

episodes in selected cities of northern China. Atmos Environ

50(3):338–348

20. Xiao Q, Ma Z, Li S et al (2015) The impact of winter heating on

air pollution in China. PLoS ONE 10(1):e0117311

21. Motallebi N, Tran H, Croes BE et al (2003) Day-of-week patterns

of particulate matter and its chemical components at selected sites

in California. Air Repair 53(7):876–888

22. Blanchard CL, Tanenbaum S (2006) Weekday/weekend differ-

ences in ambient air pollutant concentrations in atlanta and the

southeastern United States. Air Repair 56(3):271

23. Liu Z, Hu B, Wang L et al (2015) Seasonal and diurnal variation

in particulate matter (PM 10, and PM 2.5) at an urban site of

Beijing: analyses from a 9-year study. Environ Sci Pollut Res

22(1):627–642

24. Guinot B, Roger JC, Cachier H et al (2006) Impact of vertical

atmospheric structure on Beijing aerosol distribution. Atmos

Environ 40(27):5167–5180

25. Miao S, Chen F, Lemone MA et al (2009) An observational and

modeling study of characteristics of urban heat island and

boundary layer structures in Beijing. J Appl Meteorol Climatol

48(3):484–501

26. Zhang H, Wang Y, Hu J et al (2015) Relationships between

meteorological parameters and criteria air pollutants in three

megacities in China. Environ Res 140:242–254

27. Zhang Q, Quan J, Tie X et al (2015) Effects of meteorology and

secondary particle formation on visibility during heavy haze

events in Beijing, China. Sci Total Environ 502:578

Neural Computing and Applications (2019) 31:1619–1631 1631

123