flood susceptibility mapping using integrated bivariate and multivariate statistical models
TRANSCRIPT
![Page 1: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/1.jpg)
ORIGINAL ARTICLE
Flood susceptibility mapping using integrated bivariateand multivariate statistical models
Mahyat Shafapour Tehrany • Moung-Jin Lee •
Biswajeet Pradhan • Mustafa Neamah Jebur •
Saro Lee
Received: 29 July 2013 / Accepted: 15 April 2014
� Springer-Verlag Berlin Heidelberg 2014
Abstract Flooding can have catastrophic effects on human
lives and livelihoods and thus comprehensive flood man-
agement is needed. Such management requires information
on the hydrologic, geotechnical, environmental, social, and
economic aspects of flooding. The number of flood events
that took place in Busan, South Korea, in 2009 exceeded the
normal situation for that city. Mapping the susceptible areas
helps us to understand flood trends and can aid in appropriate
planning and flood prevention. In this study, a combination
of bivariate probability analysis and multivariate logistic
regression was used to produce flood susceptibility maps of
Busan City. The main aim of this research was to overcome
the weakness of logistic regression regarding bivariate
probability capabilities. A flood inventory map with a total of
160 flood locations was extracted from various sources.
Then, the flood inventory was randomly split into a testing
dataset 70 % for training the models and the remaining
30 %, which was used for validation. Independent variables
datasets included the rainfall, digital elevation model, slope,
curvature, geology, green farmland, rivers, slope, soil
drainage, soil effect, soil texture, stream power index, timber
age, timber density, timber diameter, and timber type. The
impact of each independent variable on flooding was eval-
uated by analyzing each independent variable with the
dependent flood layer. The validation dataset, which was not
used for model generation, was used to evaluate the flood
susceptibility map using the prediction rate method. The
results of the accuracy assessment showed a success rate of
92.7 % and a prediction rate of 82.3 %.
Keywords Flood susceptibility � Bivariate probability �Logistic regression � GIS � Busan
Introduction
Rapid urban growth and climate change in recent years
have led to many environmental problems and increased
risks of natural disasters (Kjeldsen 2010), including
flooding and associated losses of human lives and property
(Zwenzner and Voigt 2009). The damage that can occur
due to such disasters is incalculable. As well as the huge
economic cost, floods can bring pathogens into urban
environments and cause lingering damp and microbial
development in buildings and infrastructure (Taylor et al.
2011; Dawod et al. 2012). Because of the tremendous and
irreversible potential damages to agriculture, transporta-
tion, bridges, and many other aspects of urban infrastruc-
ture, flood control and prevention measures are urgently
required (Billa et al. 2006).
Early warnings and emergency responses to floods are
needed (Feng and Wang 2011) so that governments and
M. S. Tehrany � B. Pradhan � M. N. Jebur
Department of Civil Engineering, Geospatial Information
Science Research Center (GISRC), Faculty of Engineering,
Geospatial Information Science Research Center, University
Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
e-mail: [email protected]; [email protected]
M.-J. Lee
Korea Environment Institute, 290 Jinheungno, Eunpyeong-Gu,
Seoul 122-706, Korea
S. Lee
Korea University of Science and Technology, 217 Gajeong-ro
Yuseong-gu, Daejeon 305-350, Korea
S. Lee (&)
Geological Mapping Department, Korea Institute of
Geoscience and Mineral Resources (KIGAM), 124 Gwahang-no,
Yuseong-gu, Daejeon 305-350, Korea
e-mail: [email protected]
123
Environ Earth Sci
DOI 10.1007/s12665-014-3289-3
![Page 2: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/2.jpg)
agencies can prevent as much damage as possible. Because
floods can lead to human injury or death, prevention of
such events is preferable to compensation of damages.
However, measuring the benefits of flood reduction is
difficult because they are not tangible and require a long
time to be shown (Yi et al. 2010). In contrast, damage can
be calculated both qualitatively and quantitatively (De
Moel and Aerts 2011).
In natural hazards research, huge databases are often
needed (Regmi et al. 2013). These are not easy to collect,
and in some cases a lack of appropriate data can hamper
research (Liu and De Smedt 2005). Natural factors such as
hydrological and meteorological characteristics, soil types,
geological structures, geomorphology, and vegetation are
the most influential contributors to flooding. Human
interference in natural cycles by cutting trees and building
with impervious materials can accelerate flooding. New
insights into hydrological research involve determining and
mitigating flooding using geographic information system
(GIS), digital soil-type maps, topography, and land use/
cover data (Liu and De Smedt 2005).
Previous studies
Over the last two decades, remotely sensed data have been
used effectively for monitoring and analyzing hazards, and
high-resolution imagery has revolutionized remote sensing
research (Mason et al. 2010). Typically, studies of hazards
require multi-temporal datasets in order to identify spatial
changes and the process of hazards occurrence (Martinez and
Le Toan 2007). Remote sensing is useful for such research
because data can be required repeatedly over the same area,
sometimes even within a few hours. Active remote sensors
such as synthetic aperture radar (SAR) can be the best choice
for real-time hazard studies because they are weather inde-
pendent and can penetrate vegetation (Mason et al. 2010;
Pradhan et al. 2014). The ability to acquire data for various
bands, wavelengths, polarizations, and spatial resolutions
are other advantages of active remote sensing (Karjalainen
et al. 2012). One of the most important characteristics of
SAR imagery is its high temporal resolution, i.e., data can be
captured within few hours and delivered at real time during
disaster occurrence (Elbialy et al. 2013).
Previous studies have used various methods to identify
areas at risk of flooding. For example, flood susceptibility
can be analyzed by qualitative and quantitative techniques
such as artificial neural networks (ANNs) (Kia et al. 2012),
frequency ratio (Lee et al. 2012), analytical hierarchy
process (AHP) (Billa et al. 2006), logistic regression
(Pradhan 2010a), adaptive neuro-fuzzy interface system
(ANFIS) (Dixon 2005; Mukerji et al. 2009), etc. Qualita-
tive approaches such as AHP, in which the process and the
results rely on expert knowledge, are appropriate for
regional studies (Rozos et al. 2011). Quantitative methods
such as bivariate probability and multivariate logistic
regression statistical methods rely on numerical expres-
sions of the relationship between independent parameters
and flooding. Most multivariate statistical approaches
require the definition of strict assumptions prior to the
study. Logistic regression could overcome this difficulty,
offering an easy method of statistical analysis and also
allowing for the incorporation of bivariate statistical ana-
lysis (BSA) (Ayalew and Yamagishi 2005).
One of the important problems of AHP method is the
inability to determinate the uncertainty. The uncertainty in
spatial decision-making methodology may occur from
selection, comparison and ranking of multiple criteria. The
greatest source of uncertainty is often criteria ranking
(Bathrellos et al. 2013). Billa et al. (2006) used pairwise
comparisons in AHP to identify areas likely to be inundated
by future flooding. The biggest disadvantage of this method
is the need for expert knowledge in assigning weights, which
can be a source of bias. Subramanian and Ramanathan
(2012) stated that the AHP method is suitable for regional
studies; however, a main aim of disaster management is to
develop a transferable method that can be used globally.
Flood susceptibility mapping using an ANN approach has
also been applied in various case studies (Dixon 2005). The
weak points of this method are related to its complexity and
requirements of high computer capacity. Also poor predic-
tions can be expected when the validation data contain values
outside the range of those used for training. Furthermore, if
large numbers of variables are used, the modelling process is
time consuming (Ghalkhani et al. 2013).
Lee et al. (2012) applied individual bivariate probability
models to map flood-susceptible areas in Busan, South
Korea. A drawback of this method is that it considered the
relationship between flood occurrence and each indepen-
dent separately, while not considering the relationships
among all the independent layers themselves. Pradhan
(2010a) utilized multivariate logistic regression to examine
the relations between a dependent variable and several
independent variables to produce a susceptibility zonation
map of Kelantan, Malaysia. He noted that logistic regres-
sion had several advantages: the variables can be either
continuous and/or discrete and they do not necessarily have
to have normal distributions. Although the results of that
study showed the efficiency of logistic regression, the
impacts of classes of each variable were not considered.
As mentioned above, bivariate probability and logistic
regression are both popular methods of statistical analysis
for susceptibility mapping. Bivariate probability is a type
of BSA and logistic regression is a form of multivariate
statistical analysis. Mostly they are used individually, as
either can produce a model of susceptibility. Combining
Environ Earth Sci
123
![Page 3: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/3.jpg)
the two methods is expected to allow for more precise
analysis. Tehrany et al. (2013) combined these methods for
flood susceptibility analysis in tropical region of Kelantan,
Malaysia. Their results showed that the combined method
could enhance the performance of each individual method.
In other words, logistic regression considers the relation-
ship of the independent variables and dependent data but
not the impact of different classes of each independent
variable on flooding. Meanwhile, bivariate probability
analysis allows independent variables to be rearranged
based on the impacts of their different classes on flood
occurrence. In other words, all the classes of all indepen-
dent variables can be arranged based on their flood densi-
ties measured earlier using bivariate probability and then
logistic regression can be performed on the entire dataset to
give the final weights.
Flooding is a major risk in Korea, which has experienced
much flood damage to paddy fields (Okamoto et al. 1998),
property (Chang et al. 2009), and human life (Chae et al.
2005). However, few previous studies have attempted to
delineate these floods or to offer ways to ameliorate damage
(Yi et al. 2010). According to Yi et al. (2010), floods caused
159.6 million USD of economic loss in Korea during the
period 1982–2006, with most damage experienced in road-
side areas in urban or residential areas. Busan has suffered
some of the highest damage from flooding in the country.
Heavy rains lead to flooding in this area and appropriate
flood-prevention plans are needed (Lee et al. 2012).
Prediction of flooding can be highly useful in preventing
damage and loss. Through scientific analysis of floods,
flood-susceptible areas can be detected and thus flood
damages can be decreased. With this aim in mind, this
study conducted flood susceptibility analysis for Busan
using a combined approach of bivariate probability and
logistic regression models. The proposed method improves
on the weak points of each method and combines their
advantages by analyzing the relationships of flooding with
each independent layer and with each class of independent
layers. Furthermore, flood-related independent variables
can be assessed. Since this combined approach is almost
new in flood studies, through this research its efficiency
and capability can be examined. Therefore, the scope of
this research is to assess the reliability of the combined
method of BSA and LR in a non-tropical region. This leads
to increase in the popularity of this technique and assist the
researchers for future flood-related studies.
Data and study area
Busan, Korea’s second-largest city, is located on the south-
eastern tip of the Korean Peninsula at 128�E and 35�N.
Heavy rainfall is the main reason of flooding in this area,
and the lack of flood-prevention measures has led to much
damage (Lee et al. 2012). Most rain falls in the months of
April, May, June, July, August, and September. July is the
wettest month, with an average of 15 rainy days and
average precipitation of 250 mm. The current study
focused on floods that took place in July. Soil types vary by
area and include loam, sandy loam, and loam fine sand.
Andesite, granite, gravel soil, and sedimentary are the
dominant geologies in Busan.
Figure 1 shows the study area and flooded area on 16
July 2009, when 266.5 mm of rain inundated Busan,
causing severe flooding which submerged more than 1,200
homes and 3,000 ha of farmland. Most damage was
reported in Nan-Gu and Yeonje-Gu around the Namhae
coast, where 160 flooded areas occurred.
Flood inventory map
Flooding was identified and an inventory map containing
160 flooded zones was generated over the whole area
through fieldwork and the examination of aerial photo-
graphs. Literature provides several suggestions regarding
Fig. 1 Flood location map with hill-shaded map of Busan city, South
Korea
Environ Earth Sci
123
![Page 4: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/4.jpg)
the size of samples to be used for analysis and validation
(Ohlmacher and Davis 2003). On the basis of a literature
review, 70 % of the flooded locations were used ran-
domly to produce the dichotomous data (0 and 1) for the
purpose of training in order to assess the spatial distri-
bution of flooding (Pradhan 2010a). The rest of the
locations (validation data) were used for testing the
models. The same number of points was chosen for
flooded areas and for non-flooded areas. Non-flooded
areas where randomly selected on a topographic map by
choosing sites high on hills where flooding would not be
possible. Previous studies have often not selected areas
where flooding would certainly not occur; the approach
used in the present study is expected to increase the
precision of the results. Figure 1 shows a map of flooded
and non-flooded areas in the Busan area.
Flood influencing parameters
Various data called conditioning factors are needed as
independent variables in the process of susceptibility
mapping (Liu and De Smedt 2005). These variables con-
tribute to the occurrence of flooding in a specific region. If
all the relevant independent parameters are analyzed and
used in modelling, the model will be comprehensive but
will require much data, which can be challenging to
acquire. It is thus important to determine which variables
can be eliminated without harming the model’s precision
(Sanyal and Lu 2004). Recent studies have aimed to pro-
duce models that use the least number of independent data
while still achieving highly accurate results (Campolo et al.
2003).
The independent data should be measurable and there
should be a conjunction between the independent and
dependent data. Also the independent data should be col-
lected from the whole study area while it should not rep-
resent uniform spatial information (Ayalew and Yamagishi
2005). Furthermore, its effect should not lead to two types
of outcome at the end of the process. These parameters can
be in nominal, ordinal, interval, or ratio scale format (Park
et al. 2013). Several parameters may be influential in flood
occurrence for a specific region, while the same parameters
may not be effective for other environments. No exact
agreement exists on which parameters should be used in
flood susceptibility assessments. However, some of the
variables are mostly used by numerous researchers which
indicate their importance and vital role in flood studies.
Anthropogenic factors, such as urban areas, road network
or land uses, should be taken into consideration in flood
susceptibility assessment. These parameters are related to
flood events (Skilodimou et al. 2003). The parameters used
in the current research were selected through the
knowledge attained from the literature (Pradhan 2010a; Kia
et al. 2012; Lee et al. 2012). The parameters that influence
flooding were collected and compiled into spatial databases
using ArcGIS 9.3 software. The independent datasets
consisted of rainfall, digital elevation model (DEM), slope,
curvature, geology, green farmland, river, slope, soil
drainage, soil effect, soil texture, stream power index (SPI),
timber age, timber density, timber diameter, and timber-
type data. Table 1 describes the independent variables and
their characteristics. Each variable was resized to be pre-
sented by a 10 9 10 m grid, and the grid of the Busan area
was constructed by 757 columns and 1,119 rows (366,044
pixels; 36.6 km2).
The quantile method was used to classify each inde-
pendent variable. In the quantile classification method,
each class contains the same number of features. This
method is used by several researchers due to its efficiency
in classification (Papadopoulou-Vrynioti et al. 2013; Te-
hrany et al. 2014; Umar et al. 2014). This method partitions
the layer into the same number of pixels in each class
which provides fair analysis. In the slope (0–64), SPI
(0–8.6), and DEM (0–420) maps, ten categories were
constructed for each analysis (Fig. 2b, g, k, respectively).
In the curvature map, three classes were considered: con-
vex, flat, and concave (Fig. 2a). In the geology map, five
geology classes were used: Andesite, granite, gravel soil,
sedimentary, and no data (Fig. 2c). Figure 2d shows the
classes of the green farmland map (paddy, other planta-
tions, orchard, no data, natural grassland, grassland, field,
and barren ground). In the rainfall map, nine classes
(2–10.3, 10.4–18.7, 18.8–27, 27.1–35.3, 35.4–43.7,
43.8–52, 52.1–60.3, 60.4–68.7, and 68.8–77 mm) were
constructed (Fig. 2e).
In the case of the river map, four buffer categories (10,
50, 100, and[100 m) were compiled using the buffer tool
in ArcGIS 10 software. The map of the four soil groups
(loam, sandy loam, loam fine sand, and no data) is shown in
Fig. 2j. Figure 2i presents the soil effect map, which con-
sists of five classes (20, 50, 100, 150, and no data). Six
categories were compiled for the soil drainage map: poorly
drained, somewhat poorly, moderately well, well drained,
very well drained, and no data (Fig. 2h). For the timber age
map, classes ranged from the first to sixth age classes as
well as one class of non-forest area (Fig. 2l). Non-forest
area and moderate and dense forest area made up the three
classes of the timber density map (Fig. 2m). Timber
diameter was mapped for four categories: non-forest area,
very small diameter, small diameter, and medium diameter
(Fig. 2n). Timber type was mapped as nine classes of non-
forest area, larch, pine, broadleaf tree, mixed broadleaf,
artificial larch, artificial pine, rigida pine, and paper pulp
(Fig. 2o).
Environ Earth Sci
123
![Page 5: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/5.jpg)
Table 1 Results of FR and LR in the case of each factor
Factor Class FR Logisticcoefficient
Factor Class FR Logisticcoefficient
DEM 0–4 0.35 0.025 Timber type Non forest area 1.19 0.021
4.01–5 0.52 Larch 0
5.01–10 1.58 Pine 0.41
10.01–19 1.37 Broad leaf-tree 0.38
19.01–30 1.66 Mixed broad-leaf 0.80
30.1–47 1.54 Artificial larch 0
47.1–68 1.15 Artificial pine 0
68.1–95 0.50 Rigida pine 0.75
95.1–144 1.01 Paper 0
145–420 0.40 Timber diameter Non forest area 1.18 0.015
Curvature Concave 1.27 -0.006 Very small diameter 0
Flat 0.62 Small diameter 0.44
Convex 0.93 Medium diameter 0.58
Geology Andesite 2.97 0.028 Timber density Non forest area 1.16 -0.019
Granite 0.25 Moderate 0.21
Gravel soil 0.53 Dense area 0.72
No data 0.62 Timberage Non forest area 1.18 -0.014
Sedimentary 1.36 1st age 0
Green farm Paddy 1.08 0.033 2nd age 0
Other plantations 0 3rd age 0.83
Orchard 0 4th age 0.50
No data 0 5th age 0
Natural grassland 0 6th age 0
Grassland 0 SPI 0 0.99 -0.005
Field 0 0.01–0.17 1.73
Barren ground 0.35 0.18–0.34 1.29
Rainfall 2–10.3 0.39 0.012 0.35–0.54 0.78
10.4–18.7 0.13 0.55–0.77 1.72
18.8–27 1.42 0.78–1 0.93
27.1–35.3 1.06 1.1–1.4 0.92
35.4–43.7 1.01 1.5–1.8 0.96
43.8–52 0.46 1.9–2.5 0.32
52.1–60.3 0.54 2.6–8.6 0.50
60.4–68.7 1.59 Soil texture No data 0 -0.005
68.8–77 2.24 Loam 10.34
River 100 m buffer 1.01 0.019 Sandy loam 0.84
50 m buffer 1.30 Loam fine sand 1.11
10 m buffer 0.38 Soil effect No data 0 -0.027
[100 1.31 20 0.65
Slope 0 0.75 0.013 50 1.08
0.001–2.77 2.23 100 1.13
2.78–5.03 1.37 150 0.79
5.04–8.8 1.69 Soil drainage No data 0 0.005
8.81–13.8 0.68 Poorly drained 0.41
13.9–18.6 0.84 Somewhat poorly 1.13
19.9–18.6 0.50 Moderately well 0
18.7–23.4 0.80 Well drained 1.30
23.5–29.2 1.09 Very well drained 0.82
29.3–64.1 0.64
Environ Earth Sci
123
![Page 6: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/6.jpg)
Fig. 2 List of all the influencing data layers. a Curvature, b DEM, c Geology, d Greenfarm, e Rainfall, f River, g slope, h Soil drain, i Soil effect,
j Soil texture, k SPI, l Timber age, m Timber density, n Timber diameter, o Timber type
Environ Earth Sci
123
![Page 7: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/7.jpg)
Fig. 2 continued
Environ Earth Sci
123
![Page 8: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/8.jpg)
Fig. 2 continued
Environ Earth Sci
123
![Page 9: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/9.jpg)
Fig. 2 continued
Environ Earth Sci
123
![Page 10: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/10.jpg)
Methodology
Application of bivariate statistical analysis
To measure flood occurrence in each class of independent
variables, BSA was used. BSA shows the ratio of the prob-
ability of the presence to the absence of any occurrence, such
as a flood or landslide (Lee and Pradhan 2007; Yilmaz 2009;
Pradhan 2010b; Pradhan and Lee 2010a). To build a proba-
bility model of a hazard, it is assumed that the hazard
occurrence can be determined based on its contributing
factors and that future hazards will happen under the same
conditions as past events (Akgun 2012). The bivariate
probability for each independent variable was measured by
the variable’s relationship with flood occurrence. The greater
the bivariate probability, the stronger the relationship
between flood occurrence and the variable (Lee and Talib
2005; Lee and Pradhan 2007; Pradhan and Lee 2010b).
Using the bivariate probability model, the spatial rela-
tionships between flood occurrence location and each of
the variables contributing to flood occurrence were derived.
Figure 2 shows the classified independent variables and all
the data layers. Bivariate probability was performed and
the range of bivariate probability was normalized. Then all
of the independent variables were reclassified based on the
normalized bivariate probability to be used in the logistic
regression analysis. The normalization of all independent
variables is necessary which facilitates final analysis and
interpretation (Ayalew and Yamagishi 2005). A higher
ratio value indicates a stronger relationship between the
variables and vice versa.
Application of multivariate statistical analysis
Logistic regression is a popular multivariate statistical
analysis method that allows for examination of the multi-
variate regression relationship between a dependent vari-
able and several independent variables (Pradhan and Lee
2010b). Logistic regression was created to measure the
probability of a disaster in an area using a specific formula
generated using the independent variables. One of the
requirements of this method is dependent data consisting of
values of 0 and 1, which indicate the absence and existence
of disasters, respectively.
Here, the dependent and reclassified independent vari-
ables derived by bivariate probability were converted from
raster to ASCII format, which is required in SPSS. SPSS
V.19 software was used to perform the multivariate logistic
regression analysis. The coefficients were measured and
are listed in Table 1. The higher the logistic coefficient, the
greater the expected impact on flooding occurrence. Using
the derived logistic coefficients, the probability (p) of flood
occurrence was calculated as
p ¼ 1=ð1 + e�zÞ ð1Þ
where p is the probability of flooding which is acquired
between 0 to 1 on an S-shaped curve. Z is the linear
combination and it follows that logistic regression involves
fitting an equation of the following form to the data:
z ¼ b0 þ b1x1 þ b2x2 þ b3x3 þ bnxn ð2Þ
where b0 is the intercept of the model, bi (i = 0, 1, 2, …,
n) represents the coefficients of the logistic regression
model, and xi (i = 0, 1, 2, …, n) denotes the independent
variables (Lee and Sambath 2006).
To produce a susceptibility map, the probability map
should be divided into different categories (Ohlmacher and
Davis 2003). Different popular categorization methods were
examined in the GIS environment, such as standard devia-
tion, natural break, equal interval, and quantile. In this study,
the best results were achieved through the quantile method.
The other methods exaggerated the areas of flood suscepti-
bility, showing a large part of Busan in the highly susceptible
zone. Ayalew and Yamagishi (2005) stated that the quantile
classification method has the drawback as it classifies
broadly different values into the same category. However,
this method produced reasonably good results in flood sus-
ceptibility mapping. This method is used by other researches
such as Papadopoulou-Vrynioti et al. (2013) and Tehrany
et al. (2013) which proved the capability and efficiency of
this classification method in susceptibility mapping.
Results and validation
First, the BSA method was applied to obtain weights for each
class of each variable. The bivariate probability for 15
variables was calculated from the relationship to the flood
events, as shown in Table 1. The output of the multivariate
statistical analysis was then acquired through the logistic
regression method (Table 1). The relationship between flood
occurrence and flood-influencing parameters was extracted
in SPSS V.19 software. Coefficients were measured using
logistic regression and are listed in Table 1. Furthermore,
through the performance of LR, another factor can be mea-
sured which is significant probability (Sig). This factor rec-
ognizes the independent variables that have a significant
influence on flooding (Papadopoulou-Vrynioti et al. 2013). If
the Sig be\0.05, it indicates that the independent variable
had statistically significant effects on flooding. Results
indicated that Dem, Greenfarm, and Geology with the Sig
values of (0.004) (0.025), and (0.038) were the most effective
independent variables, respectively. Other independent
variables such as curvature, slope, soil effect, etc. gained the
Sig value of more than 0.05 which showed that they were not
statistically significant in the model.
Environ Earth Sci
123
![Page 11: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/11.jpg)
To acquire the flood probability index, the regression
coefficients for each variable were entered in Eq. (2),
yielding
Z ¼�7:535 + 0:025demð Þ + �0:006curvatureð Þ+ 0:028geologyð Þ + 0:033greenfarmð Þ + 0:019riverð Þ+ 0:013slopð Þ + 0:005soil drainð Þ + �0:027soil effectð Þ+ �0:005soil textð Þ + �0:005spið Þ + �0:014timber ageð Þ+ �0:019timber densityð Þ + 0:015timberiað Þ+ 0:021timber typeð Þ + 0:012rainfallð Þ
As the next step, the probability index was calculated
from Eq. 1. The index values can range from 0 to 0.99.
Figure 3 shows the thematic map of flood probability. This
index represents the predicted probabilities of flooding for
each pixel in the presence of a given set of independent
variables. Finally, a flood susceptibility map was obtained
and the study area was divided into five classes of flood
susceptibility: very low (0–0.55), low (0.55–0.84), medium
(0.84–0.94), high (0.94–0.97), and very high (0.97–1).
Figure 4 shows the flood susceptibility map.
The prediction accuracy and quality of the developed
model should be examined. Here, the model was validated
by comparing the acquired flood probability map with
existing flood data (Lee et al. 2007; Tien Bui et al. 2012;
Pourghasemi et al. 2012). Specifically, the results were
quantitatively examined using the receiver operating
characteristic (ROC) curve, in which the basis of the
assessment is the true- and false-positive rates (Chauhan
et al. 2010). To examine the reliability and efficiency of the
flood probability map, both the success rate and prediction
rate curve were calculated. The success rate result was
obtained using the training dataset, which used 70 % of the
inventory flood locations (112 flood locations). The success
rate curves are presented in Fig. 5.
The model produced a value of 0.927 for the area under
the curve (AUC), which represents 92.7 % success accu-
racy. The training flooded areas were used to generate the
flood model, but could not be used to assess the prediction
capability of the model. The prediction rate shows how
well the model could predict flooding in a given area (Tien
Bui et al. 2012).
The prediction accuracy was calculated using the testing
data for the 30 % of the flooded areas (48 flood locations)
that were not used in the training process. This method can
show the generalization ability of a model (Maier and
Dandy 2000; Pourghasemi et al. 2012), in that the area
under the prediction rate curves (AUC) can measure the
prediction accuracy qualitatively (Pradhan and Lee 2010c).
The AUC values varied from 0.5 to 1.0. The value of 1.0
represented the highest accuracy, showing that the model
was completely capable of predicting disaster occurrence
Fig. 3 The probability map obtained after the LR analyses
Fig. 4 Flood susceptibility map produced from the LR model
Environ Earth Sci
123
![Page 12: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/12.jpg)
without any bias (Pradhan et al. 2010). Thus, AUC values
close to 1.0 are considered to show that a model is precise
and trustable (Tien Bui et al. 2012). The prediction curve is
shown in Fig. 5. The AUC value of 0.823 corresponds to a
prediction accuracy of 82.3 %.
Moreover, another way was provided in order to test
the performance of the used method. Confusion matrices
were created for both the training and testing results. The
confusion matrix represents the ratio of correctly pre-
dicted observations and demonstrates the number of true-
positive, false-positive, true-negative and false-negative
observations (Papadopoulou-Vrynioti et al. 2013). Con-
sequently, the outcomes of the confusion matrices pro-
duced by the training and testing datasets were
compared. Table 2 shows the confusion matrices.
Totally, 84.70 % of the 21,600 pixels were correctly
categorized for the training dataset. Furthermore, 10,500
pixels without flood occurrence and 1,600 pixel with
flood were categorized wrongly (Table 2a). The valida-
tion results for testing dataset (Table 2b) represents that
5,800 of the total 10,830 pixels were correctly catego-
rized (88.64 %). However, 500 of 4,800 pixels with
flood occurrence were wrongly categorized. The strength
of the overall model performance can be assessed using
confusion matrices.
Discussion
Due to the existence of few weak points which can be
found in some quantitative methods, scientists started to
produce and propose combined methods in order to over-
come these disadvantages. For instance, ANFIS was
developed through the integration of ANN and fuzzy
interface system (FIS). This method increased the precision
of the results compare to the individual ones. Therefore, it
was used in several studies which made this method highly
demandable in hazard studies. As it has been mentioned,
usually BSA and multivariate statistical analysis are indi-
vidually used in mapping the susceptible areas. However,
some researchers combined these methods to enhance the
capability and precision of each method. Current research
applied the combined method of BSA and LR which is
introduced by Tehrany et al. (2013) for flood susceptibility
mapping. The result indicated that through the combination
of these individual methods the reliability of acquired
susceptibility map increased. Therefore, this research can
be used as an evidence for the proficiency of this combined
method.
Based on the acquired results, the high and very high
susceptibility zones covered 19.4 % of the area and were
mostly located in the northern part of Busan. According to
the flood susceptibility map, most parts of Busan are
located in zones of very low susceptibility. The proportion
of high susceptibility increases significantly in the north
and southwest parts of the study area. Loamy fine sand soil
type, very low elevation, and high precipitation were the
key properties in the very high susceptibility zones.
Moreover, the results showed that the probability of flood
occurrence is large in low slope areas and in concave areas.
The acquired logistic coefficients for curvature, timber
density, timber age, SPI, soil texture, and soil effect were
negative. The negative values of logistic coefficient imply
that the occurrence of flood is negatively related to these
independent variables. For instance, as the timber density
increases, the possibility of flood occurrence decreases.
The independent variables such as DEM, geology, green
farm, rainfall and distance from river, slope, timber type,
timber diameter, and soil drainage had positive effects on
the flood occurrences.
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Cum
ulat
ive
perc
enta
ge o
f fl
ood
occu
rren
ce
Flood probability index (%)
ROC curve
success curve prediction curve
Fig. 5 The ROC curve
Table 2 Confusion matrices of (a) training dataset and (b) testing
dataset
Predicted Correct %
0 1 Total
(a) Observed 0 8,800 1,700 10,500 83.80
1 1,600 9,500 11,100 85.60
Total 10,400 11,200 21,600 84.70
(b) Observed 0 5,300 730 6,030 83.80
1 500 4,300 4,800 89.50
Total 5,800 5,030 10,830 88.64
0: absence of flood, 1: presence of flood, cut-off value: 0.5
Environ Earth Sci
123
![Page 13: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/13.jpg)
Conclusion
The goal of this research was to improve the efficiency of
logistic regression/multivariate statistical analysis in flood
modelling using this method in combination with the
bivariate probability method. Logistic regression has some
drawbacks for measuring the relationship between flood
occurrence and variable classes. It takes one of the first or
last classes as a reference and does not involve the other
classes in the weighting process. This means that the
information on one of the intervals is low. However, using
bivariate probability this problem was solved. The logistic
regression was performed using independent variables that
were reclassified and weighted based on the bivariate
probability, in order to produce a more precise flood sus-
ceptibility map of the Busan area. The flood inventory map,
which represented 160 flood events that took place in July
2009, was used for the statistical analysis. Among all the
flood events, 112 cases were used to build the method. The
remaining 48 flooded events were used to validate the
developed model. Fifteen independent variables were used
as independent data. These consisted of rainfall, the DEM,
slope, curvature, geology, green farmland, river, slope, soil
drainage, soil effect, soil texture, SPI, timber age, timber
density, timber diameter, and timber type.
Initially, all the independent variables were resized to a
10 9 10 m grid. The grid of the Busan area was con-
structed in 757 columns and 1,119 rows. As a next step,
each variable was classified using the quantile method,
and the bivariate probability was used to evaluate the
relationship of each class with flood occurrence. To
assign the weights that were derived using bivariate
probability, each independent variable was reclassified
and normalized for use in logistic regression multivariate
statistical analysis. The logistic regression coefficients
were calculated in SPSS V.19 software and the proba-
bility index was then calculated. Finally, a flood suscep-
tibility map of Busan City was constructed by classifying
the probability index into five classes: very low, low,
medium, high, and very high susceptibility. The high and
very high susceptibility zones made up 19.4 % of the total
study area and were mostly located in the north part of
Busan.
The proposed combined method overcame the weak
points of logistic regression in terms of BSA. The AUC
was used to examine the efficiency of the proposed method.
The prediction and success rate accuracies that achieved
for the combined bivariate probability and logistic regres-
sion methods were 82.3 and 92.7 %, respectively. Because
the independent variables contributing to flooding may
vary among study sites, it was not easy to define specific
parameters to generate the model. The elevation and cur-
vature were found to be the best predictor variables for
estimating the probability of flood occurrences in the cur-
rent study area.
The results of the bivariate probability analysis indicated
that more flooding should be expected in loamy soil areas
under high rainfall. In addition, most flooding occurred in
concave and low elevation areas. In terms of hydrological
factors, a lower SPI and greater distance to the river
increased the chance of flood occurrence. The timber factor
illustrated a greater flood possibility by type than by age
and diameter. The combined method employed in this
study showed that the green farmland and geology
parameters had strong positive correlations with the pre-
dicted probability map. In contrast, a negative correlation
existed between flood occurrence and SPI, timber age, and
soil effect.
The achieved accuracies demonstrate the efficiency of
the combined method. Other sophisticated methods have
been developed to model flooding, such as ANFIS and
genetic algorithm-based artificial neural network (ANN-
GA) approaches. Although these methods are more
applicable to flood modelling than are statistical methods
like logistic regression, the complexity, computation time,
and requirement of a large number of parameters make
them difficult to use. The method proposed here, com-
bining logistic regression and bivariate probability, pro-
duced acceptable results and the process of analysis was
easily understandable. Thus, this method is recommended
for use in hydrological studies and disaster management.
The proposed method can be considered as a robust
method for flood susceptibility mapping and the produced
map can aid planners, decision makers, and governments
engaged in flood management and planning in the study
area.
Acknowledgments This research was supported by the Basic
Research Project of the Korea Institute of Geoscience and Mineral
Resources (KIGAM) and by the Development Project of Environ-
mental Technology for Climate Change by the Korea Environmental
Industry & Technology Institute. Thanks to anonymous reviewers for
their valuable feedback on the manuscript.
References
Akgun A (2012) A comparison of landslide susceptibility maps
produced by logistic regression, multi-criteria decision, and
likelihood ratio methods: a case study at Izmir, Turkey.
Landslides 9:93–106
Ayalew L, Yamagishi H (2005) The application of GIS-based logistic
regression for landslide susceptibility mapping in the Kakuda-
Yahiko Mountains, Central Japan. Geomorphology 65:15–31
Bathrellos GD, Gaki-Papanastassiou K, Skilodimou HD, Skianis GA,
Chousianitis KG (2013) Assessment of rural community and
agricultural development using geomorphological-geological
factors and GIS in the Trikala prefecture (Central Greece).
Stoch Enviorn Res Risk A 27:573–588
Environ Earth Sci
123
![Page 14: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/14.jpg)
Billa L, Shattri M, Mahmud AR, Ghazali AH (2006) Comprehensive
planning and the role of SDSS in flood disaster management in
Malaysia. Disaster Prev Manage 15:233–240
Campolo M, Soldati A, Andreussi P (2003) Artificial neural network
approach to flood forecasting in the River Arno. Hydrolog Sci J
48:381–398
Chae EH, Kim TW, Rhee SJ, Henderson TD (2005) The impact of
flooding on the mental health of affected people in South Korea.
Community Ment Health J 41:633–645
Chang H, Franczyk J, Kim C (2009) What is responsible for
increasing flood risks? The case of Gangwon Province, Korea.
Nat Hazards 48:339–354
Chauhan S, Sharma M, Arora MK (2010) Landslide susceptibility
zonation of the Chamoli region, Garhwal Himalayas, using
logistic regression model. Landslides 7:411–423
Dawod GM, Mirza MN, Al-Ghamdi KA (2012) GIS-based estimation
of flood hazard impacts on road network in Makkah city, Saudi
Arabia. Environ Earth Sci 67:2205–2215
De Moel H, Aerts J (2011) Effect of uncertainty in land use, damage
models and inundation depth on flood damage estimates. Nat
Hazards 58:407–425
Dixon B (2005) Applicability of neuro-fuzzy techniques in predicting
ground-water vulnerability: a GIS-based sensitivity analysis.
J Hydrol 309:17–38
Elbialy S, Mahmoud A, Pradhan B, Buchroithner M (2013) Appli-
cation of spaceborne synthetic aperture radar data for extraction
of soil moisture and its use in hydrological modelling at
Gottleuba Catchment, Saxony, Germany. J Flood Risk Manag.
doi:10.1111/jfr3.12037
Feng CC, Wang YC (2011) GIScience research challenges for
emergency management in Southeast Asia. Nat Hazards
59:597–616
Ghalkhani H, Golian S, Saghafian B, Farokhnia A, Shamseldin A
(2013) Application of surrogate artificial intelligent models for
real-time flood routing. Water Environ J 27:535–548
Karjalainen M, Kankare V, Vastaranta M, Holopainen M, Hyyppa J
(2012) Prediction of plot-level forest variables using TerraSAR-
X stereo SAR data. Remote Sens Environ 117:338–347
Kia MB, Pirasteh S, Pradhan B, Mahmud AR, Sulaiman WNA,
Moradi A (2012) An artificial neural network model for flood
simulation using GIS: Johor River Basin, Malaysia. Environ
Earth Sci 67:251–264
Kjeldsen TR (2010) Modelling the impact of urbanization on flood
frequency relationships in the UK. Hydrol Res 41:391–405
Lee S, Pradhan B (2007) Landslide hazard mapping at Selangor,
Malaysia using frequency ratio and logistic regression models.
Landslides 4:33–41
Lee S, Sambath T (2006) Landslide susceptibility mapping in the
Damrei Romel area, Cambodia using frequency ratio and logistic
regression models. Environ Geol 50:847–855
Lee S, Talib JA (2005) Probabilistic landslide susceptibility and
factor effect analysis. Environ Geol 47:982–990
Lee S, Ryu JH, Kim IS (2007) Landslide susceptibility analysis and
its verification using likelihood ratio, logistic regression, and
artificial neural network models: case study of Youngin, Korea.
Landslides 4:327–338
Lee MJ, Kang JE, Jeon S (2012) Application of frequency ratio model
and validation for predictive flooded area susceptibility mapping
using GIS. In: Geoscience and Remote Sensing Symposium
(IGARSS), 2012 IEEE International. Munich, 895–898
Liu Y, De Smedt F (2005) Flood modeling for complex terrain using
GIS and remote sensed information. Water Resour Manag
19:605–624
Maier HR, Dandy GC (2000) Neural networks for the prediction and
forecasting of water resources variables: a review of modelling
issues and applications. Environ Model Softw 15:101–124
Martinez JM, Le Toan T (2007) Mapping of flood dynamics and
spatial distribution of vegetation in the Amazon floodplain using
multitemporal SAR data. Remote Sens Environ 108:209–223
Mason DC, Speck R, Devereux B, Schumann GP, Neal JC, Bates PD
(2010) Flood detection in urban areas using TerraSAR-X. IEEE
T Geosci Remote 48:882–894
Mukerji A, Chatterjee C, Raghuwanshi NS (2009) Flood forecasting
using ANN, neuro-fuzzy, and neuro-GA models. J Hydrol Eng
14:647–652
Ohlmacher GC, Davis JC (2003) Using multiple logistic regression
and GIS technology to predict landslide hazard in northeast
Kansas, USA. Eng Geol 69:331–343
Okamoto K, Yamakawa S, Kawashima H (1998) Estimation of flood
damage to rice production in North Korea in 1995. Int J Remote
Sens 19:365–371
Papadopoulou-Vrynioti K, Bathrellos GD, Skilodimou HD, Kaviris
G, Makropoulos K (2013) Karst collapse susceptibility mapping
considering peak ground acceleration in a rapidly growing urban
area. Eng Geol 158:77–88
Park S, Choi C, Kim B, Kim J (2013) Landslide susceptibility
mapping using frequency ratio, analytic hierarchy process,
logistic regression, and artificial neural network methods at the
Inje area, Korea. Environ Earth Sci 68:1443–1464
Pourghasemi HR, Pradhan B, Gokceoglu C (2012) Application of
fuzzy logic and analytical hierarchy process (AHP) to landslide
susceptibility mapping at Haraz watershed. Iran Nat Hazards
63:965–996
Pradhan B (2010a) Flood susceptible mapping and risk area
delineation using logistic regression, GIS and remote sensing.
J Spat Hydrol 9:1–18
Pradhan B (2010b) Landslide susceptibility mapping of a catchment
area using frequency ratio, fuzzy logic and multivariate logistic
regression approaches. J Indian Soc Remote 38:301–320
Pradhan B, Lee S (2010a) Delineation of landslide hazard areas on
Penang Island, Malaysia, by using frequency ratio, logistic
regression, and artificial neural network models. Environ Earth
Sci 60:1037–1054
Pradhan B, Lee S (2010b) Landslide susceptibility assessment and
factor effect analysis: backpropagation artificial neural networks
and their comparison with frequency ratio and bivariate logistic
regression modelling. Environ Model Softw 25:747–759
Pradhan B, Lee S (2010c) Regional landslide susceptibility analysis
using back-propagation neural network model at Cameron
Highland, Malaysia. Landslides 7:13–30
Pradhan B, Oh HJ, Buchroithner M (2010) Weights-of-evidence
model applied to landslide susceptibility mapping in a tropical
hilly area. Geomat Nat Hazards Risk 1:199–223
Pradhan B, Hagemann U, Tehrany MS, Prechtel N (2014) An easy to
use ArcMap based texture analysis program for extraction of
flooded areas from TerraSAR-X satellite image. Comput Geosci
63:34–43
Regmi AD, Yoshida K, Dhital MR, Pradhan B (2013) Weathering and
mineralogical variation in gneissic rocks and their effect inSangrumba Landslide, East Nepal. Environ Earth Sci 71:1–17
Rozos D, Bathrellos GD, Skillodimou HD (2011) Comparison of the
implementation of rock engineering system and analytic hierar-
chy process methods, upon landslide susceptibility mapping,
using GIS: a case study from the Eastern Achaia County of
Peloponnesus, Greece. Environ Earth Sci 63:49–63
Sanyal J, Lu X (2004) Application of remote sensing in flood
management with special reference to monsoon Asia: a review.
Nat Hazards 33:283–301
Skilodimou H, Livaditis G, Bathrellos G, Verikiou-Papaspiridakou E
(2003) Investigating the flooding events of the urban regions of
Glyfada and Voula, Attica, Greece: a contribution to Urban
Geomorphology. Geogr Ann A 85:197–204
Environ Earth Sci
123
![Page 15: Flood susceptibility mapping using integrated bivariate and multivariate statistical models](https://reader035.vdocuments.site/reader035/viewer/2022072113/57509f2b1a28abbf6b174e59/html5/thumbnails/15.jpg)
Subramanian N, Ramanathan R (2012) A review of applications of
analytic hierarchy process in operations management. Int J Prod
Econ 138:215–241
Taylor J, Davies M, Clifton D, Ridley I, Biddulph P (2011) Flood
management: prediction of microbial contamination in large-
scale floods in urban environments. Environ Int 37:1019–1029
Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of flood
susceptible areas using rule based decision tree (DT) and a novel
ensemble bivariate and multivariate statistical models in GIS.
J Hydrol 504:69–79
Tehrany MS, Pradhan B, Jebur MN (2014) Flood susceptibility
mapping using a novel ensemble weights-of-evidence and
support vector machine models in GIS. J Hydrol 512:332–343.
doi:10.1016/j.jhydrol.2014.03.008
Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012)
Landslide susceptibility mapping at Hoa Binh province (Viet-
nam) using an adaptive neuro-fuzzy inference system and GIS.
Comput Geosci 45:199–211
Umar Z, Pradhan B, Ahmad A, Jebur MN, Tehrany MS (2014)
Earthquake induced landslide susceptibility mapping using an
integrated ensemble frequency ratio and logistic regression
models in West Sumatera Province, Indonesia. Catena
118:124–135
Yi CS, Lee JH, Shim MP (2010) GIS-based distributed technique for
assessing economic loss from flood damage: pre-feasibility study
for the Anyang Stream Basin in Korea. Nat Hazards 55:251–272
Yilmaz I (2009) Landslide susceptibility mapping using frequency
ratio, logistic regression, artificial neural networks and their
comparison: a case study from Kat landslides (Tokat—Turkey).
Comput Geosci 35:1125–1138
Zwenzner H, Voigt S (2009) Improved estimation of flood parameters
by combining space based SAR data with very high resolution
digital elevation data. Hydrol Earth Syst Sci 13:567–576
Environ Earth Sci
123