study on the spatial–temporal change characteristics and ......study on the spatial–temporal...
TRANSCRIPT
S. I . : EMERGING INTELLIGENT ALGORITHMS FOR EDGE-OF-THINGS COMPUTING
Study on the spatial–temporal change characteristics and influencefactors of fog and haze pollution based on GAM
Zhuang Wu1,2 • Shuo Zhang1
Received: 24 February 2018 / Accepted: 4 May 2018 / Published online: 18 June 2018� The Author(s) 2018
AbstractPM2.5 (particulate matter) is an important object for air quality monitoring, and the research on related influence factors and
diffusion process of PM2.5 plays a key role in the fight against pollution of fog and haze. Based on the air quality
monitoring data and related meteorological data of 16 districts of Beijing during November 2016 and December 2017, such
methods as time-series analysis and nonparametric test are adopted to describe the variation trend of PM2.5 concentration in
space and time and its disparities in different seasons, time periods and areas. Linear regression method is used in most of
the previous research on influence factors and prediction of PM2.5 concentration, but actually, the relation between these
factors is rather intricate and it is usually nonlinear. So, generalized additive model (GAM) is used in this paper to analyze
the impact that different influence factors, especially their interaction, have on PM2.5 concentration and its diffusion
process. The result shows that in the dimensionality of time, PM2.5 concentration has strong autocorrelation over time and
it is most significant in the first to the third order (lag 0–lag 3). Throughout the year, PM2.5 concentration is relatively high
in winter and low in summer. It is usually the lowest during 16:00–18:00 and the highest during 9:00–11:00 every day and
far higher at night than in the daytime (MD = - 6.455, P = 0.003). And in terms of space, PM2.5 concentration shows
distinct spatial gradient and it gradually decreases from south to north (MD = - 19.250, P = 0.004). It is found in the
analysis of influence factors of PM2.5 concentration that the change of PM2.5 concentration is a complex nonlinear time
series driven and affected by many factors; among these factors, the interaction between air pollutants and meteorological
elements is the most prominent, while average wind speed (WS lag 1) plays a decisive role in the entire diffusion process,
and it explains the whole diffusion of PM2.5 concentration to a large extent.
Keywords Spatial and temporal variation � Interaction � Diffusion and evolution � Generalized additive model
1 Introduction
Haze, as a collective name for fog and haze, is a kind of
weather phenomenon in meteorological science. Fog is the
outcome of condensation of water vapor (or deposition) in
the air of atmospheric surface layer, and the core materials
of haze include the aerosol particulate matter suspended in
the air, mainly coming from such artificial sources as
industrial pollution, fossil fuel combustion and biomass
burning as well as the natural sources like soil dust. PM2.5
is not only an important component of haze, but also a key
object for air quality monitoring. It degrades the atmo-
spheric visibility and increases the morbidity and mortality
of respiratory disease and cerebrovascular disease [1, 2]. In
recent years, unusual hazy weather has been frequently
sweeping over China, dramatically deteriorating air quality
and severely affecting normal production activities. Since
the year of 2012, PM2.5 has been added to ‘‘Air Quality
Standards’’ as a conventional index and its real-time con-
centration has also been appended to the air quality mon-
itoring system of Ministry of Environmental Protection and
the People’s Republic of China. Therefore, to figure out the
related influence factors and diffusion process of PM2.5 is
vital to find an effective governance approach.
& Zhuang Wu
Shuo Zhang
1 School of Information, Capital University of Economics and
Business, Beijing 100070, China
2 CTSC Center, Information College, Capital University of
Economics and Business, Beijing 100070, China
123
Neural Computing and Applications (2019) 31:1619–1631https://doi.org/10.1007/s00521-018-3532-z(0123456789().,-volV)(0123456789().,- volV)
Existing international study on fog and haze mainly
includes: (1) the composition and sources of haze pollu-
tants, (2) the correlation between urban air pollution and
meteorological elements and (3) the virtual and forecasting
of numeric value of regional airborne pollutant concen-
tration. With regard to the root of haze, Zhang et al. [3]
have believed that the use of non-clean energy is the most
fundamental cause of the pollution of fog and haze in
China and that the exploding urbanization process in China
in recent years is the immediate cause [4]. Related mete-
orological conditions, including temperature, relative
humidity and height of planetary boundary layer, play a
significant role in the formation, distribution, maintenance
and variation of these pollutants. Most previous study on
influence factors and prediction of PM2.5 concentration has
adopted the method of linear regression, but in fact, there is
complicated relationship, usually nonlinear, between these
factors. So, scholars have begun to use generalized additive
model to depict the complex relations between these
potential variables and PM2.5 concentration in recent years.
Song and others [5] have applied generalized additive
model and multi-source monitoring data to describe and
changes of PM2.5 concentration in Xi’an City. For the
independent variables, SO2 and CO will be considered as
linear functions, while NO2, O3, AOT and temperature as
single-variable smooth nonlinear functions and wind scale
as two-variable smooth nonlinear function. The final
explanation rate of the model is 69.50%. Li et al. [6] have
brought in principal component analysis (PCA) approach
into generalized additive model (GAM), proposed and used
PCA-GAM to forecast PM2.5 concentrations in Beijing
Municipality, Tianjin Municipality and Hebei Province,
and the results have shown that compared with conven-
tional land-use regression model, that model has a higher
accuracy rate (R2 = 0.94). In conclusion, PM2.5 concen-
tration increases for the following two causes: the meteo-
rological elements featured by southeast wind, temperature
inversion and high relative humidity which go against
diffusion and the pollution factors featured by the increase
in the concentration of suspended particulate matter [7–12].
Among them, pollution is the internal cause, which is
closely related to human activities and which can be con-
trolled while meteorological element is the external cause
and it cannot be controlled [13].
To sum up, the foregoing research work mainly inte-
grates specific pollution process and analyzes the numeric
changes of PM2.5 and the impact a single meteorological
factor has on PM2.5; in other words, it only conducts
research on fog and haze within a short time, and it is not
universally applicable. Even for the same city, PM2.5,
together with air pollutants, meteorological elements and
other factors, has formed a complex nonlinear dynamic
system, which has multilayered scale structure and local
variations in the time domain [14]. Therefore, based on the
analysis of change rule of particulate matter (PM2.5), the
promotion or inhibition that precursor pollutants (i.e., SO2,
NO2, CO) and meteorological elements have on PM2.5 has
been taken into consideration, and research has been con-
ducted to PM2.5 concentration in Beijing from the dimen-
sions of time and space by using the air quality data
released by Beijing Municipal Environmental Monitoring
Center and the meteorological data published by China
Meteorological Administration to analyze its features in
different stages. Generalized additive model (GAM) is a
flexible statistical model, which can be used to detect the
influence of nonlinear regression. Nonparametric regres-
sion does not require models to satisfy linear assumptions
and can detect the complex relationship between data
flexibly. However, when the number of independent vari-
ables in the model is more, the estimated variance of the
model will increase, and each additive item in the GAM
model is estimated by a single smooth function. In each
additive item, how the dependent variable varies with the
independent variable can be explained well. GAM is con-
structed with the focus on the analysis of the impact the
interaction of different influence factors has on the changes
of PM2.5 concentration to find the key influence factors and
analyze the entire change process of PM2.5 concentration.
2 Data and research methods
2.1 Data
According to the information provided by Beijing Munic-
ipal Environmental Monitoring Center, 35 air pollution
monitoring stations have been selected with complete
coverage of all districts of Beijing, and the information and
locations of every monitoring station are shown in Fig. 1.
Collect daily and hourly data of SO2 (ug/m3), NO2 (ug/m
3),
CO (ug/m3), O3 (ug/m3) and PM2.5 (ug/m3), respectively,
and the meteorological data, including wind speed (km/h),
relative humidity (%), temperature (�C), atmospheric
pressure (hpa), precipitation (mm) and sunshine hour (h),
from December 1, 2016, to November 30, 2017. Among
them, air quality data all come from Beijing Municipal
Environmental Monitoring Center (www.bjmemc.com.cn)
and meteorological data from China Meteorological
Administration (www.cma.gov.cn).
1620 Neural Computing and Applications (2019) 31:1619–1631
123
2.2 Research methods
2.2.1 Analysis of spatial and temporal characteristicsof PM2.5
Based on the air quality monitoring data of 16 districts of
Beijing during December 2016 and November 2017, such
methods as time-series analysis, time-series plot and spatial
correlation analysis have been adopted to study the rule of
temporal and spatial variation of PM2.5 concentration, and
Kruskal–Wallis test, Mann–Whitney U test and Bonferroni
correction have been conducted on PM2.5 concentration,
respectively, to clarify their differences. By reference to the
air quality standards released by environmental protection
departments, the air quality is excellent when PM2.5 is
within 0–35 ug/m3, good when within 35–75 ug/m3, slight
(intermediate) pollution when within 75–150 ug/m3 and
heavy pollution when bigger than 150 ug/m3. In order to
Code Station Name Code Station Name Code Station Name1 DongSi 13 FangShan 25 BaDaLing2 TianTan 14 DaXing 26 MiYunShuiKu3 GuanYuan 15 YiZhuang 27 DongGaoCun4 WanShouXiGong 16 TongZhou 28 YongLeDian5 AoTiZhongXin 17 ShunYi 29 YuFa6 NongZhanGuan 18 ChangPing 30 LiuLiHe7 WanLiu 19 MenTouGou 31 QianMen8 BeiBuXinQu 20 PingGu 32 YongDingMenNei9 ZhiWuYuan 21 HuaiRou 33 XiZhiMenBei10 FengTaiHuaYuan 22 MiYun 34 NanSanHuan11 YunGang 23 YanQing 35 DongSiHuan12 GuCheng 24 DingLing
Fig. 1 Location of the 35 PM2.5 monitoring stations in Beijing
Neural Computing and Applications (2019) 31:1619–1631 1621
123
simplify the research process, 16 districts of Beijing have
been divided into three regions from north to south (see
Table 1) in four seasons: spring (March–May), summer
(June–August), autumn (September–November) and winter
(December–February). Besides, time-series plot is used to
depict the changes in daily PM2.5 concentration.
2.2.2 Generalized additive model
The function in GAM can be identified by the reverse fit-
ting method; it can be applied to the analysis of a variety of
distributed data. The model can include both the parameter
fitting part and the nonparametric fitting part. All the
interpretative components in the GAM are all kinds of
smoothing function forms to explain the variables. It is
suitable for the analysis of many complex linear relations.
In recent years, nonparametric models have attracted
increasing attention from scholars. Hastie and Tibshirani
have applied additive model into generalized linear model
(GLM) and come up with the concept of generalized
additive model (GAM) in 1990, which, in its essence, is to
connect the expectation of response variable with the
additivity in the additive model via the connection function
[15] in the following formula.
gðEðYÞÞ ¼ s0 þ s1ðX1Þ þ s2ðX2Þ þ � � � spðXpÞ ð1Þ
In this formula, E(Y) is the expectation of Y. s0 is the
intercept and si (.) (i = 1, 2,…p) is a nonparametric smooth
function and Esi(Xi) = 0. It can be a smooth spline func-
tion, a local regression smooth function or a kernel func-
tion. g(.) is the connection function and g(.) can be
represented in the following forms for predictive variables
of different distribution types.
gðEðYÞÞ ¼
EðYÞ Y � Nðl; r2Þ
logEðYÞ
1� EðYÞ
� �Y � B ðn; pÞ
logðEðYÞÞ Y � Ga ða; kÞlogðEðYÞÞ Y � P ðkÞ
8>>>>><>>>>>:
ð2Þ
GAM obtains the most suitable trend line of source data
by identifying and accumulating multiple functions. By
dealing with the complex nonlinear relationship between
the dependent variables and the explanatory variables, the
nonparametric regression is fitted. The algorithm iteratively
fits and adjusts the function to reduce the prediction error.
In fact, GAM pays more attention to nonparametric
exploration of data, which is more suitable for analysis and
interpretation of the relationship between the response
variable and the explanatory variable. The specific analysis
of generalized additive model uses R version Rx64.3.4.3
and mgcv package coming from The R-Project for Statis-
tical Computing, i.e., https://www.r-project.org/.
Table 1 Distribution of PM2.5
concentrations in the 16 districts
of Beijing, 2016–2017 (ug/m3)
Regional category District Mean SD Rate I (%) Rate II (%)
North Yanqing 52.92 40.74 4.81–308.80 191 (55.20)
Huairou 56.23 56.08 5.42–485.00 187 (54.05)
Miyun 57.35 56.74 4.86–459.50 195 (56.36)
Shunyi 65.30 63.51 4.43–398.00 216 (62.42)
Changping 56.97 56.84 4.71–413.75 188 (54.17)
Pinggu 70.04 65.80 6.07–419.75 239 (69.08)
Center Dongcheng 71.73 69.91 7.38–430.92 237 (68.59)
Haidian 65.32 65.09 5.38–463.50 219 (63.29)
Xicheng 72.49 68.65 6.88–462.00 239 (69.08)
Chaoyang 72.85 66.29 5.92–406.67 255 (73.70)
ShiJingShan 71.15 71.25 4.83–495.00 233 (67.34)
Fengtai 77.77 71.10 6.83–495.13 277 (80.06)
South Mentougou 65.61 66.37 6.04–551.00 216 (62.43)
Touzhou 85.21 76.40 6.98–458.46 278 (80.35)
Daxing 79.30 73.58 7.76–443.03 260 (75.14)
Fangshan 91.13 77.68 6.78–502.31 290 (83.81)
SD standard deviation
Rate I (%): non-attainment days and rates (%) for grade I (0–35 ug/m3)
Rate II (%): non-attainment days and rates (%) for grade II (35–75 ug/m3)
1622 Neural Computing and Applications (2019) 31:1619–1631
123
3 Results
3.1 Overview of PM2.5 pollution in Beijing
According to the PM2.5 data from Beijing Municipal
Environmental Monitoring Center (www.bjmemc.com.cn),
it can be calculated that the annual average PM2.5 con-
centration from December 2016 to November 2017 is
69.46 ug/m3 and the lowest is Yanqing District, northwest
of Beijing with an annual average PM2.5 concentration of
52.92 ug/m3 and the highest is Fangshan District, south-
west of Beijing with an annual average PM2.5 concentra-
tion of 91.13 ug/m3, much higher than 75 ug/m3, the limit
of Grade II level stipulated in Ambient Air Quality Stan-
dards (GB 3095-2012) with significant spatial gradient. In
another word, PM2.5 concentration gradually decreases
from south to north (MD = - 19.250, P = 0.004, as shown
in Table 2). Besides, 16 districts have experienced the limit
of Grade I PM2.5 concentration (55 ug/m3) for over 54%
of total days and the limit of Grade II PM2.5 concentration
(75 ug/m3) for over 22% of total days.
During the analysis, it is found that there also exist
seasonal fluctuations in the variation of PM2.5 concentra-
tion (see Fig. 2). Haze weather occurs 47 times during the
research: 9 in spring, 9 in summer, 14 in autumn and 15 in
winter, and it usually lasts 1–8 days in spring, 1–2 days in
summer, 1–3 days in autumn and 1–9 days in winter
featured by large-scale persistent outbreak. Overall, PM2.5
concentration is relatively high in winter and low in sum-
mer, and the difference between spring and autumn has no
statistical significance (MD = - 0.791, P = 1.000, see
Table 2), showing a U-shaped curve all over the year. The
main reason is that in winter more coal is burned for heat,
increasing the particulate matter discharged to the atmo-
sphere, and from the view of meteorological conditions,
pollutant usually diffuses slowly in winter with stable at-
mospheric stratification of troposphere, no cold air and
high-rise buildings in urban areas while the high temper-
ature, exuberant air convection and large precipitation in
summer have promoted the deposition of particulate mat-
ter. Additionally, Mann–Whitney U test is used to identify
the discrepancy of PM2.5 concentration between weekdays
and weekends, and the result shows that P = 0.544
([ 0.05), suggesting that the difference of these two groups
of data is not significant in statistics.
Expand the above research process and use MATLAB
2014a tool box to conduct autocorrelation analysis on daily
PM2.5 concentration to reveal its time-series characteristics,
as indicated in Fig. 3. The result has shown that the top and
bottom critical values of autocorrelation coefficient are
± 0.163, respectively, and the first-order autocorrelation
coefficient is 0.6. It can be seen that PM2.5 concentration of
Beijing has strong autocorrelation over time, and it is the
most significant from the first to the third order. Besides,
Table 2 Significance tests of
PM2.5 levels for different
seasons, regions, daytime and
day of week
Variable Kruskal–Wallis H test Bonferroni test
Chi-square P MD P
Region#
North versus south 8.661 0.013 - 19.250 0.004
Center versus south - 8.417 0.258
North versus center - 10.833 0.074
Season*
Spring versus summer 18.662 0.000 14.109 0.686
Spring versus autumn - 0.791 1.000
Spring versus winter - 46.322 0.000
Summer versus autumn - 14.900 0.578
Summer versus winter - 60.431 0.000
Autumn versus winter - 45.513 0.000
Daytime#
7–12 a.m. versus 1 pm–6 pm 45.740 0.000 4.319 0.162
7–12 a.m. versus 7 p.m.–6 a.m. - 6.455 0.003
1 pm–6 pm versus 7 p.m.–6 a.m. - 10.775 1.945
Day of week
Weekday versus weekends Mann–Whitney U test
P = 0.544
* The mean difference is significant at the 0.0083 level for Bonferroni test# The mean difference is significant at the 0.0167 level for Bonferroni test
Neural Computing and Applications (2019) 31:1619–1631 1623
123
PM2.5 concentration also has different diurnal variations in
different seasons (as shown in Fig. 4). In comparison with
autumn and winter, diurnal variation is stronger in spring
and autumn, showing a smooth W shape. Further use
Bonferroni correction for detection, and the result has
demonstrated that PM2.5 concentration in the morning
(7 a.m.–12 a.m.) is far lower than in the nighttime (7 p.m.–
6 a.m.) (MD = - 6.455, P = 0.003), but the difference in
the daytime (7 a.m.–12 a.m./1–6 p.m.) is not notable.
Generally, the lowest daily concentration occurs during the
period of 16:00–18:00, while the highest in 9:00–11:00.
3.2 Analysis of generalized additive modelof PM2.5 and single influence factor
Air pollutants and meteorological factors are the important
environmental factors that affect PM2.5 concentration of a
place. The air pollutants include SO2 (ug/m3), NO2 (ug/
Fig. 2 Day-to-day variation of PM2.5 in different seasons, Beijing 2016–2017
Fig. 3 Autocorrelation coefficient of PM2.5 in Beijing 2016–2017
Fig. 4 Diurnal variation of PM2.5 in different seasons, Beijing
2016–2017
1624 Neural Computing and Applications (2019) 31:1619–1631
123
m3), CO (ug/m3) and O3 (ug/m3), and meteorological fac-
tors include temperature (�C), atmospheric pressure of sea
level (hpa), relative humidity (%), wind speed (km/h),
precipitation (mm) and sunshine hours (h). On this basis,
adopt generalized additive model to describe the impact a
single influence factor and interaction have on PM2.5
concentration with daily average PM2.5 concentration as
the response variable and the daily average numeric value
of related influence factors as the explanatory variable.
3.2.1 Preanalysis of explanatory variable
According to the analysis result of Q–Q Chart in SPSS, it
can be found that daily average concentration of PM2.5 is
approximately subject to gamma distribution (Y * (a, k)).So, use log connection function to connect explanatory
variables with response variable subject to gamma distri-
bution via linear combination and calculate the Pearson
correlation coefficient between any two explanatory vari-
ables to prevent them from distorting the model estimation
due to the existence of severe collinearity (|r|[ 0.8). It can
be known from the calculation result (Table 3) that the
correlation coefficient between temperature (t) and sun-
shine hours (sh) is 0.922 and that between temperature
(t) and sea-level atmospheric pressure is - 0.900, sug-
gesting that the longer the same region experiences the
sunshine, the higher the temperature is. If other causes such
as power are not taken into consideration, the higher the
temperature is, the faster the atmosphere expands and rises
because of heat, namely that the lower the atmospheric
pressure is. Sunshine hour is used to represent temperature
index in this paper to avoid concurvity in the construction
of multivariable curve model. Moreover, though the cor-
relation coefficient between NO2 and CO has reached
0.856, they are not eliminated as they come from different
sources.
It can be known from Fig. 3 that the autocorrelation
between PM2.5 concentration in Beijing over time is very
strong, which is most obvious in first to third order. But in a
short period of time, the development scale, geographical
conditions, emissions of industrial pollution and automobile
exhaust of a region are relatively fixed, so the change of
PM2.5 concentration is mainly related to local meteorological
conditions. Next, emphasis is placed on analyzing the corre-
lation between PM2.5 and various meteorological factors
located from lag 0 (current PM2.5 concentration) to lag 3
(PM2.5 concentration 3 days ago). With Spearman correlation
coefficient (rs) as a measurement criterion, the meteorological
elements with a Spearman correlation coefficient bigger than
0.3 (rs[0.3) are selected as the explanatory variables of the
model. And the result is shown as follows.
Integrate the result of Table 4 and take SO2, NO2, CO,
O3, wind speed lag 1 (rs = - 0.564, P\ 0.01), sea-level
atmospheric pressure lag 3 (rs = 0.301, P\ 0.01), relative
humidity lag 0 (rs = 0.367, P\ 0.01) and sunshine hour
lag 0 (rs = - 0.308, P\ 0.01) as explanatory variables
and daily PM2.5 concentration as response variable into the
final model (Formula 3). On this basis, spline smooth
function is adopted to analyze the impact every explanatory
variable has on response variable and the final goodness of
fit of the model.
logðEðYÞÞ ¼ s0 þ sðSO2; bs ¼ crÞ þ sðNO2; bs ¼ crÞþ sðCO; bs ¼ crÞ þ sðO3; bs ¼ crÞþ sðWSlag1; bs ¼ crÞ þ sðAPlag3; bs ¼ crÞþ sðRH; bs ¼ crÞ þ sðSH; bs ¼ crÞ
ð3Þ
Table 3 Pearson correlation coefficient among influencing factors
SO2 NO2 AP T WS SH RH P O3 CO
S02 1 .611** .354** - .486** - .100 - .405** - .112* - .110* - .319** .539**
NO2 .611** 1 .310** - .414** - .312** - .461** .176** - .112* - .510** .856**
AP .354** .310** 1 - .900** .013 - .859** - .202** - .182** - .703** .283**
T - .486** - .414** - .900** 1 - .079 .922** .225** .180** .728** - .398**
WS - .100 - .312** .013 - .079 1 .003 - .592** - .102 .140** - .236**
SH - .405** - .461** - .859** .922** .003 1 .105* .230** .792** - .422**
RH - .112* .176** - .202** .225** - .592** .105* 1 .452** - .001 .263**
P - .110* - .112* - .182** .180** - .102 .230** .452** 1 .166** - .046
O3 - .319** - .510** - .703** .728** .140** .792** - .001 .166** 1 - .389**
CO 0.39** .856** .283** - .398** - .236** - .422** .263** - .046 - .389** 1
Bold font used to mark the Pearson correlation coefficient ðjrjÞ if the absolute value of it is greater than 0.8, which means the two explanatory
variables existing severe collinearity
SO2 sulfur dioxide, NO2 nitrogen dioxide, CO carbon monoxide, O3 ozone, AP atmospheric pressure, T temperature, WS wind speed, SH
sunshine hours, RH relative humidity
** Correlation is significant at the 0.01 level (2-tailed); * Correlation is significant at the 0.05 level (2-tailed)
Neural Computing and Applications (2019) 31:1619–1631 1625
123
The result has shown that all influence factors play a
great impact on the change of PM2.5 concentration when
P\ 0.01, namely that every influence factor is statistically
significant as explanatory variable of variation of PM2.5
concentration alone. Among them, the big explanation
rates of CO, NO2 and wind speed (WS lag 1) to the vari-
ation of PM2.5 concentration (47.5, 44.9, 36.7%) and cor-
rection coefficients of determination (0.468, 0.439, 0.357)
indicate excellent degree of fitting in the existing model,
while the small explanation rate of sea-level atmospheric
pressure (AP lag 3) to the variation of PM2.5 concentration
(7.56) and the correction coefficient of determination
(0.0709) means a bad degree of fitting in the current model
(see Table 5).
Besides, when degree of freedom (df) is 1, the function
is a linear equation, suggesting that there exists a linear
relation between explanatory variables and response vari-
able. When it is bigger than 1, the function is a nonlinear
curve equation and the bigger it is, the more significant the
nonlinear relation is. Among eight explanatory variables in
this experiment, there is certain nonlinear relationship
between SO2, relative humidity (RH) and sea-level atmo-
spheric pressure (AP lag 3) and PM2.5 (with a degree of
freedom of 2) and significant linear relationship between
other factors and PM2.5. So, variation of PM2.5 concentra-
tion is a sophisticated nonlinear time variation series driven
and affected by multiple factors. By building generalized
additive model between explanatory variables and PM2.5
concentration, the effect plot between explanatory vari-
ables and PM2.5 concentration can be obtained. In Fig. 5,
the independent influence every predictive variable has on
PM2.5 is depicted: the dashed line refers to the point-to-
point standard deviation of fitting additive function, namely
the top and bottom limitations of confidential interval and
the full line is the smooth fitting curve of PM2.5 concen-
tration. X-axis is the measured value of various explanatory
variables, and Y-axis is the smooth fitted value of different
explanatory variables over PM2.5 concentration.
It can be found from Fig. 5 that the variation of PM2.5
concentration is affected by multiple influence factors.
Interact the explanatory variables and analyze the impact
the interaction terms have on variation of PM2.5 concen-
tration. It can be known from the calculation result of
Table 6 that the interaction terms have very big numeric
value in degree of freedom, suggesting that they have
significant nonlinear relations with variation of PM2.5
concentration. Additionally, 28 interaction terms in the
model equation have got through the significance test with
an explanation rate between 19.6 and 58.0% and a
Table 4 Spearman correlation coefficient among influencing factors
Independent variable Lag rs Sig.(2-tailed)
Wind speed Lag 0 - 0.356** 0.000
Lag 1 - 0.564** 0.000
Lag 2 - 0.242** 0.000
Lag 3 - 0.075** 0.000
Relative humidity Lag 0 0.367** 0.000
Lag 1 0.235** 0.000
Lag 2 - 0.028 0.569
Lag 3 - 0.087 0.100
Air pressure Lag 0 0.073 0.168
Lag 1 0.211** 0.000
Lag 2 0.252** 0.000
Lag 3 0.301** 0.000
Precipitation Lag 0 - 0.024 0.649
Lag 1 - 0.173** 0.001
Lag 2 - 0.175** 0.001
Lag 3 - 0.154** 0.003
Sunlight hours Lag 0 - 0.308** 0.000
Lag 1 - 0.252** 0.000
Lag 2 - 0.245** 0.000
Lag 3 - 0.238** 0.000
Bold font used to mark the number if the absolute value of Spearman
correlation coefficient ðjrsjÞ is greater than 0.3 and the correlation is
significant at 0.01 level, otherwise the response variable are not
correlated with the explanatory variables
Lag PM2.5 concentration values in different days, rs Spearman cor-
relation coefficient, Sig.(2-tailed) two-tailed tests of significance
** Correlation is significant at the 0.01 level (2-tailed)
Table 5 GAM model
hypothesis test results between
PM2.5 concentration and single
influencing factors
Smooth term df Ref.df F P Deviance explained (%) Adjusted R2
SO2 2.159 2.698 43.18 \ 2E-16*** 24.7 0.243
NO2 6.376 7.632 38.18 \ 2E-16*** 44.9 0.439
CO 4.563 5.578 57.57 \ 2E-16*** 47.5 0.468
O3 7.233 8.252 15.85 \ 2E-16*** 24.7 0.259
AP lag 3 1.798 2.271 12.36 2.73E-06*** 7.56 0.0709
WS lag 1 5.221 6.275 32.24 \ 2E-16*** 36.7 0.357
RH 2.391 3.007 23.75 3.26E-14*** 17.0 0.165
SH 7.569 8.499 7.447 1.93E-09*** 15.7 0.139
*, **, *** Significant level at 5, 1, 0.1%
1626 Neural Computing and Applications (2019) 31:1619–1631
123
Fig. 5 Effect of influencing
factors on the variation of PM2.5
concentration
Neural Computing and Applications (2019) 31:1619–1631 1627
123
significant rise compared with single-factor model, proving
that the fitting degree of the model has reached a high
standard. The top five places in fitting degree are CO-WS
lag 1 (58%), CO-SH (56.2%), SO2-WS lag 1 (54.8%),
NO2-WS lag 1 (51.9%) and CO–RH (50.7%), and they are
combinations of precursors and meteorological elements. It
states that PM2.5 concentration changes mainly under the
interaction between air pollutants and meteorological ele-
ments from one side, and the calculation result has also
shown that wind speed (WS lag 1) has played a decisive
role in the entire diffusion, and it can largely explain the
entire change of PM2.5 concentration.
Take average wind speed (WS lag 1) as the key factor
and analyze the impact the interaction between the average
wind speed and concentration of air pollutants has on
PM2.5 concentration as shown in the Fig. 6.
When CO concentration is low, PM2.5 concentration
slowly decreases with the increase in average wind speed
(WS lag 1), and when the average wind speed (WS lag 1) is
quite low, the concentration of PM2.5 sees a fluctuant
increase with the increase in CO concentration. It means
that wind speed can dilute CO concentration in the air and
it significantly affects the mixing effect of pollutants when
the wind speed is low and increases PM2.5 concentration.
NO2 and SO2 have similar variation relations with
average wind speed (WS lag 1). When NO2/SO2 is low in
concentration, average wind speed can diffuse and dilute
and stabilize PM2.5 concentration within a relatively low
interval. However, with the increase in NO2/SO2 concen-
tration, average wind speed plays a less and less significant
role, indicating that their concentration in the air has a
critical value. Once it exceeds that value, the average wind
speed plays a more significant role on the mixture of NO2
and SO2 and promotes the occurrence of secondary
chemical reaction, so the pollutants in the air cannot be
effectively diluted.
As O3 is mainly distributed in the stratosphere with a
height of 10–50 km, average wind speed (WS lag 1) barely
has any influence. It can be learnt from Table 3 that there is
certain negative correlation between O3 and NO2 and the
increase in NOx will increase secondary nitrate particulate
matter and produce release PM2.5. On the other hand, the
low visibility caused by rise of PM2.5 concentration will
suppress the production of O3. So, it can be concluded that
there also exists negative correlation between PM2.5 con-
centration and O3 concentration. In recent years, PM2.5
concentration in China has dropped a bit, but O3 pollution
has become increasingly serious, especially in summer. It
seems that O3 may become the primary pollutant in place
of PM2.5. Therefore, PM2.5 and O3 shall be taken into
collaborative control in the governance.
In the previous research on the relationship between
PM2.5 and meteorological elements, the majority focuses
on wind speed and proves that there is negative correlation
between them [26, 27], which has also been verified in this
paper. What is different is that based on the precursors of
Table 6 GAM model hypothesis test results between PM2.5 concentration and interaction of influencing factors
Cross term SO2-NO2 SO2-CO SO2-O3 SO2-APlag 3 SO2-WSlag 1 SO2-RH SO2-SH
df 8.41 17.63 12.20 4.60 15.40 8.59 9.26
DE 45.9% 49.3% 38.4% 25.1% 54.8% 45.1% 27.5%
Adjusted R2 0.446 0.467 0.362 0.242 0.527 0.437 0.256
Cross term NO2-CO NO2-O3 NO2-RH NO2-APlag 3 NO2-WS lag 1 NO2-SH CO-O3
df 9.127 16.56 3.241 9.716 2.003 10.59 24.55
DE 50.0% 50.3% 48.9% 44.7% 51.9% 46.0% 48.8%
Adjusted R2 0.487 0.479 0.484 0.431 0.516 0.444 0.451
Cross term CO-APlag 3 CO-WSlag 1 CO–RH CO-SH O3-APlag 3 O3-WSlag 1 O3-RH
df 21.42 18.10 19.78 18.76 15.46 13.98 10.88
DE 48.1% 58.0% 50.7% 56.2% 34.0% 45.1% 31.4%
Adjusted R2 0.448 0.558 0.479 0.538 0.311 0.429 0.293
Cross term O3-SH APlag 3-WSlag 1 APlag 3-RH APlag 3-SH WSlag 1-RH WSlag 1-SH RH-SH
df 12.51 11.54 8.975 18.64 16.27 21.26 22.17
DE 29.3% 42.8% 32.6% 19.6% 39.3% 48.6% 37.8%
Adjusted R2 0.268 0.409 0.309 0.152 0.364 0.453 0.338
DE deviance explained
*, **, *** Significant level at 5, 1, 0.1% (all terms: P\ 2E-16***)
1628 Neural Computing and Applications (2019) 31:1619–1631
123
PM2.5, time autocorrelation has also been taken into con-
sideration and time-delay characteristic of meteorological
elements have also been taken into test. The calculation
result has shown that SO2, NO2, CO, O3, wind speed lag 1
(rs = - 0.564, P\ 0.01), sea-level atmospheric pressure
lag 3 (rs = 0.301, P\ 0.01), relative humidity lag 0 (rs =
0.367, P\ 0.01) and sunshine hour lag 0 (rs = - 0.308,
P\ 0.01) are the variables most correlated with daily
average PM2.5.
4 Conclusions
The pollution of fog and haze, as one of the most severe
environmental pollution problems in recent years in China,
has a great effect on the physical and psychological health
and regular travel of the public and the economic develop-
ment of the state. In the critical period of economic devel-
opment in China, it is urgent to solve fog and haze pollution.
Generalized additive model is adopted in this paper to
analyze the spatial–temporal characteristics of pollution of
fog and haze and the impact brought by various influence
factors and the interactive terms in Beijing and the sur-
rounding regions, and the entire diffusion process is analyzed
on this basis. The key conclusions are as follows.
(1) There exists a significant spatial gradient in pollution
of fog and haze in Beijing and the surrounding areas,
as indicated by progressive decrease from south to
north (MD = - 19.250, P = 0.004, as shown in
Table 2), which is mainly the result of mutual
accumulation of regional pollution. The northern
area is surrounded by mountains and plenty of green
vegetation can purify the air to a certain degree [16].
On the other hand, as the southern area is adjacent to
such severely polluted regions as Tianjin Munici-
pality and Hebei Province, external influence is more
significant [17–19]. So, regional collaborative pre-
vention and treatment is required to control the
pollution of PM2.5 in Beijing and special attention
Fig. 6 Relationship between pollution status and wind speed on the day before
Neural Computing and Applications (2019) 31:1619–1631 1629
123
shall be paid to the transmission of pollutants from
the southwest and the control of major sources.
(2) PM2.5 concentration fluctuates greatly in different
reasons. As a whole, the concentration of PM2.5 is
relatively high in winter and low in summer with a
U-shaped distribution all year round. According to
‘‘Measures of Beijing Municipality for Administra-
tion of Heat Supply and Use,’’ the heating period of
this Municipality is from November 15 of the current
year to March 15 of the next year, so PM2.5
concentration goes up in winter mainly because
household heating increases and causes more partic-
ulate matter discharged into the air [17, 18, 20].
Earlier, the seasonal characteristics of spring
drought, few rains and strong wind as well as
unreasonable human activities in Beijing have fre-
quently resulted in large-scale outbreak of sand
storm, but in the experiment result of this paper,
PM2.5 concentration does not increase greatly in
spring, suggesting that the sandstorm source control
project in Beijing–Tianjin–Hebei Region has worked
well. Besides, some research achievements have
once pointed out that there is a difference in
pollutant concentration between weekdays and
weekends [21, 22], but according to the experiment
result of Mann–Whitney U test in this paper, the
foregoing difference is not statistically significant
and the credit should be given to the implementation
of vehicle ban policy in China.
(3) The daily change rule of PM2.5 is subject to the
height of boundary layer [23]. According to the
calculation result, the lowest concentration usually
occurs during 16:00–18:00 and the highest during
9:00–11:00 every day. Generally speaking, the peak
PM2.5 concentration in the morning is caused by
human activities, while the fall of PM2.5 concentra-
tion in the afternoon is because the increase in height
of boundary layer has accelerated the diffusion of
PM2.5. In the nighttime, the height of boundary layer
drops again and human activities increase, so PM2.5
concentration rises again. Therefore, PM2.5 concen-
tration every morning (7–12 a.m.) is much lower
than every night (7 p.m.–6 a.m.) (MD = - 6.455,
P = 0.003). Especially in winter, heating increases
from coal burning and so does the particulate matter
released to the air. Besides, less solar radiation also
moves the time when the height of boundary layer
falls in advance, so the nighttime PM2.5 level is
relatively high in winter [24, 25].
(4) PM2.5 concentration changes mainly due to the
interaction between air pollutants and meteorologi-
cal elements [26, 27]. In the entire diffusion process,
average wind speed (WS lag 1) plays a decisive role
and it affects the diffusion of CO and the occurrence
of secondary reaction. For NO2 and SO2, average
wind speed (WS lag 1) has a critical value. The
concentration of NO2 and SO2 remains stable below
the critical value, and secondary reaction occurs
when exceeding the critical value. For O3, as it is
mainly distributed in the stratosphere with a height
of 10–50 km, average wind speed (WS lag 1) has no
effect on it, but it can be concluded that there is
negative correlation between PM2.5 concentration
and O3 concentration through the calculation result.
Therefore, in the governance, they shall be taken into
collaborative control.
Acknowledgements This work was supported by the National Social
Science Foundation of China (No.16BGL145).
Compliance with ethical standards
Conflict of interest We declare that we have no financial and personal
relationships with other people or organizations that can inappropri-
ately influence our work; there is no professional or other personal
interest of any nature or kind in any product, service and/or company
that could be construed as influencing the position presented in, or the
review of, the manuscript entitled.
Open Access This article is distributed under the terms of the Creative
Commons Attribution 4.0 International License (http://creative
commons.org/licenses/by/4.0/), which permits unrestricted use, dis-
tribution, and reproduction in any medium, provided you give
appropriate credit to the original author(s) and the source, provide a
link to the Creative Commons license, and indicate if changes were
made.
References
1. Tian S, Pan Y, Liu Z et al (2014) Size-resolved aerosol chemical
analysis of extreme haze pollution events during early 2013 in
urban Beijing, China. J Hazard Mater 279:452–460
2. Ikram M, Yan Z, Liu Y et al (2015) Seasonal effects of tem-
perature fluctuations on air quality and respiratory disease: a
study in Beijing. Nat Hazards 79(2):833–853
3. Zhang X-Y, Sun J-Y, Wang Y-Q et al (2013) Thinking on causes
and governance of fog and haze in China. Chin Sci Bull
58(13):1178–1187
4. Chan CK, Yao X (2008) Air pollution in mega cities in China.
Atmos Environ 42(1):1–42
5. Song YZ, Yang HL, Peng JH et al (2015) Estimating PM2.5
concentrations in Xi’an City using a generalized additive model
with multi-source monitoring data. PLoS ONE 10(11):e0142149
6. Li S, Zhai L, Zou B et al (2017) A generalized additive model
combining principal component analysis for PM2.5 concentration
estimation. Int J Geo-Inf 6(8):248
7. Quan J, Tie X, Zhang Q et al (2014) Characteristics of heavy
aerosol pollution during the 2012–2013 winter in Beijing, China.
Atmos Environ 88(5):83–89
8. Zhou Y-M, Zhao X-Y (2017) Correlation analysis between
PM2.5 concentration and meteorological elements in Beijing.
Acta Sci Nat Univ 53(1):111–124
1630 Neural Computing and Applications (2019) 31:1619–1631
123
9. Tai APK, Mickley LJ, Jacob DJ (2010) Correlations between fine
particulate matter (PM2.5) and meteorological variables in the
United States: implications for the sensitivity of PM2.5 to climate
change. Atmos Environ 44(32):3976–3984
10. Zhao CX, Wang YQ, Wang YJ et al (2014) [Temporal and spatial
distribution of PM2.5 and PM10 pollution status and the corre-
lation of particulate matters and meteorological factors during
winter and spring in Beijing. Environ Sci 35(2):418
11. Huang F, Li X, Wang C et al (2015) PM2.5 spatiotemporal
variations and the relationship with meteorological factors
during 2013–2014 in Beijing, China. PLoS ONE 10(11):
e0141642
12. Guo L, Zhang Y-K, Liu S-H et al (2011) Correlation analysis
between PM_(10) quality concentration and meteorological ele-
ments of boundary layer. Acta Sci Nat Univ 47(4):607–612
13. Wang Y-S, Wang L-L (2014) Sources, impact and control of
atmospheric haze pollution. Sci Soc 4(2):9–18
14. Qian Z, He Q, Lin HM et al (2007) Association of daily cause-
specific mortality with ambient particle air pollution in Wuhan,
China. Environ Res 105(3):380–389
15. Tibshirani R, Hastie J et al (1990) Generalized additive models.
Chapman and Hall, London, pp 590–606
16. Liu J, Mo L, Zhu L et al (2016) Removal efficiency of particulate
matters at different underlying surfaces in Beijing. Environ Sci
Pollut Res Int 23(1):408–417
17. Wang Y, Ying Q, Hu J et al (2014) Spatial and temporal varia-
tions of six criteria air pollutants in 31 provincial capital cities in
China during 2013–2014. Environ Int 73(1):413–422
18. Chai F, Gao J, Chen Z et al (2014) Spatial and temporal variation
of particulate matter and gaseous pollutants in 26 cities in China.
J Environ Sci 26(1):75–82
19. Ji D, Wang Y, Wang L et al (2012) Analysis of heavy pollution
episodes in selected cities of northern China. Atmos Environ
50(3):338–348
20. Xiao Q, Ma Z, Li S et al (2015) The impact of winter heating on
air pollution in China. PLoS ONE 10(1):e0117311
21. Motallebi N, Tran H, Croes BE et al (2003) Day-of-week patterns
of particulate matter and its chemical components at selected sites
in California. Air Repair 53(7):876–888
22. Blanchard CL, Tanenbaum S (2006) Weekday/weekend differ-
ences in ambient air pollutant concentrations in atlanta and the
southeastern United States. Air Repair 56(3):271
23. Liu Z, Hu B, Wang L et al (2015) Seasonal and diurnal variation
in particulate matter (PM 10, and PM 2.5) at an urban site of
Beijing: analyses from a 9-year study. Environ Sci Pollut Res
22(1):627–642
24. Guinot B, Roger JC, Cachier H et al (2006) Impact of vertical
atmospheric structure on Beijing aerosol distribution. Atmos
Environ 40(27):5167–5180
25. Miao S, Chen F, Lemone MA et al (2009) An observational and
modeling study of characteristics of urban heat island and
boundary layer structures in Beijing. J Appl Meteorol Climatol
48(3):484–501
26. Zhang H, Wang Y, Hu J et al (2015) Relationships between
meteorological parameters and criteria air pollutants in three
megacities in China. Environ Res 140:242–254
27. Zhang Q, Quan J, Tie X et al (2015) Effects of meteorology and
secondary particle formation on visibility during heavy haze
events in Beijing, China. Sci Total Environ 502:578
Neural Computing and Applications (2019) 31:1619–1631 1631
123