flood susceptibility mapping using integrated bivariate and multivariate statistical models

ORIGINAL ARTICLE

Flood susceptibility mapping using integrated bivariateand multivariate statistical models

Mahyat Shafapour Tehrany • Moung-Jin Lee •

Biswajeet Pradhan • Mustafa Neamah Jebur •

Saro Lee

Received: 29 July 2013 / Accepted: 15 April 2014

� Springer-Verlag Berlin Heidelberg 2014

Abstract Flooding can have catastrophic effects on human

lives and livelihoods and thus comprehensive flood man-

agement is needed. Such management requires information

on the hydrologic, geotechnical, environmental, social, and

economic aspects of flooding. The number of flood events

that took place in Busan, South Korea, in 2009 exceeded the

normal situation for that city. Mapping the susceptible areas

helps us to understand flood trends and can aid in appropriate

planning and flood prevention. In this study, a combination

of bivariate probability analysis and multivariate logistic

regression was used to produce flood susceptibility maps of

Busan City. The main aim of this research was to overcome

the weakness of logistic regression regarding bivariate

probability capabilities. A flood inventory map with a total of

160 flood locations was extracted from various sources.

Then, the flood inventory was randomly split into a testing

dataset 70 % for training the models and the remaining

30 %, which was used for validation. Independent variables

datasets included the rainfall, digital elevation model, slope,

curvature, geology, green farmland, rivers, slope, soil

drainage, soil effect, soil texture, stream power index, timber

age, timber density, timber diameter, and timber type. The

impact of each independent variable on flooding was eval-

uated by analyzing each independent variable with the

dependent flood layer. The validation dataset, which was not

used for model generation, was used to evaluate the flood

susceptibility map using the prediction rate method. The

results of the accuracy assessment showed a success rate of

92.7 % and a prediction rate of 82.3 %.

Keywords Flood susceptibility � Bivariate probability �Logistic regression � GIS � Busan

Introduction

Rapid urban growth and climate change in recent years

have led to many environmental problems and increased

risks of natural disasters (Kjeldsen 2010), including

flooding and associated losses of human lives and property

(Zwenzner and Voigt 2009). The damage that can occur

due to such disasters is incalculable. As well as the huge

economic cost, floods can bring pathogens into urban

environments and cause lingering damp and microbial

development in buildings and infrastructure (Taylor et al.

2011; Dawod et al. 2012). Because of the tremendous and

irreversible potential damages to agriculture, transporta-

tion, bridges, and many other aspects of urban infrastruc-

ture, flood control and prevention measures are urgently

required (Billa et al. 2006).

Early warnings and emergency responses to floods are

needed (Feng and Wang 2011) so that governments and

M. S. Tehrany � B. Pradhan � M. N. Jebur

Department of Civil Engineering, Geospatial Information

Science Research Center (GISRC), Faculty of Engineering,

Geospatial Information Science Research Center, University

Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia

e-mail: [email protected]; [email protected]

M.-J. Lee

Korea Environment Institute, 290 Jinheungno, Eunpyeong-Gu,

Seoul 122-706, Korea

S. Lee

Korea University of Science and Technology, 217 Gajeong-ro

Yuseong-gu, Daejeon 305-350, Korea

S. Lee (&)

Geological Mapping Department, Korea Institute of

Geoscience and Mineral Resources (KIGAM), 124 Gwahang-no,

Yuseong-gu, Daejeon 305-350, Korea

e-mail: [email protected]

123

Environ Earth Sci

DOI 10.1007/s12665-014-3289-3

agencies can prevent as much damage as possible. Because

floods can lead to human injury or death, prevention of

such events is preferable to compensation of damages.

However, measuring the benefits of flood reduction is

difficult because they are not tangible and require a long

time to be shown (Yi et al. 2010). In contrast, damage can

be calculated both qualitatively and quantitatively (De

Moel and Aerts 2011).

In natural hazards research, huge databases are often

needed (Regmi et al. 2013). These are not easy to collect,

and in some cases a lack of appropriate data can hamper

research (Liu and De Smedt 2005). Natural factors such as

hydrological and meteorological characteristics, soil types,

geological structures, geomorphology, and vegetation are

the most influential contributors to flooding. Human

interference in natural cycles by cutting trees and building

with impervious materials can accelerate flooding. New

insights into hydrological research involve determining and

mitigating flooding using geographic information system

(GIS), digital soil-type maps, topography, and land use/

cover data (Liu and De Smedt 2005).

Previous studies

Over the last two decades, remotely sensed data have been

used effectively for monitoring and analyzing hazards, and

high-resolution imagery has revolutionized remote sensing

research (Mason et al. 2010). Typically, studies of hazards

require multi-temporal datasets in order to identify spatial

changes and the process of hazards occurrence (Martinez and

Le Toan 2007). Remote sensing is useful for such research

because data can be required repeatedly over the same area,

sometimes even within a few hours. Active remote sensors

such as synthetic aperture radar (SAR) can be the best choice

for real-time hazard studies because they are weather inde-

pendent and can penetrate vegetation (Mason et al. 2010;

Pradhan et al. 2014). The ability to acquire data for various

bands, wavelengths, polarizations, and spatial resolutions

are other advantages of active remote sensing (Karjalainen

et al. 2012). One of the most important characteristics of

SAR imagery is its high temporal resolution, i.e., data can be

captured within few hours and delivered at real time during

disaster occurrence (Elbialy et al. 2013).

Previous studies have used various methods to identify

areas at risk of flooding. For example, flood susceptibility

can be analyzed by qualitative and quantitative techniques

such as artificial neural networks (ANNs) (Kia et al. 2012),

frequency ratio (Lee et al. 2012), analytical hierarchy

process (AHP) (Billa et al. 2006), logistic regression

(Pradhan 2010a), adaptive neuro-fuzzy interface system

(ANFIS) (Dixon 2005; Mukerji et al. 2009), etc. Qualita-

tive approaches such as AHP, in which the process and the

results rely on expert knowledge, are appropriate for

regional studies (Rozos et al. 2011). Quantitative methods

such as bivariate probability and multivariate logistic

regression statistical methods rely on numerical expres-

sions of the relationship between independent parameters

and flooding. Most multivariate statistical approaches

require the definition of strict assumptions prior to the

study. Logistic regression could overcome this difficulty,

offering an easy method of statistical analysis and also

allowing for the incorporation of bivariate statistical ana-

lysis (BSA) (Ayalew and Yamagishi 2005).

One of the important problems of AHP method is the

inability to determinate the uncertainty. The uncertainty in

spatial decision-making methodology may occur from

selection, comparison and ranking of multiple criteria. The

greatest source of uncertainty is often criteria ranking

(Bathrellos et al. 2013). Billa et al. (2006) used pairwise

comparisons in AHP to identify areas likely to be inundated

by future flooding. The biggest disadvantage of this method

is the need for expert knowledge in assigning weights, which

can be a source of bias. Subramanian and Ramanathan

(2012) stated that the AHP method is suitable for regional

studies; however, a main aim of disaster management is to

develop a transferable method that can be used globally.

Flood susceptibility mapping using an ANN approach has

also been applied in various case studies (Dixon 2005). The

weak points of this method are related to its complexity and

requirements of high computer capacity. Also poor predic-

tions can be expected when the validation data contain values

outside the range of those used for training. Furthermore, if

large numbers of variables are used, the modelling process is

time consuming (Ghalkhani et al. 2013).

Lee et al. (2012) applied individual bivariate probability

models to map flood-susceptible areas in Busan, South

Korea. A drawback of this method is that it considered the

relationship between flood occurrence and each indepen-

dent separately, while not considering the relationships

among all the independent layers themselves. Pradhan

(2010a) utilized multivariate logistic regression to examine

the relations between a dependent variable and several

independent variables to produce a susceptibility zonation

map of Kelantan, Malaysia. He noted that logistic regres-

sion had several advantages: the variables can be either

continuous and/or discrete and they do not necessarily have

to have normal distributions. Although the results of that

study showed the efficiency of logistic regression, the

impacts of classes of each variable were not considered.

As mentioned above, bivariate probability and logistic

regression are both popular methods of statistical analysis

for susceptibility mapping. Bivariate probability is a type

of BSA and logistic regression is a form of multivariate

statistical analysis. Mostly they are used individually, as

either can produce a model of susceptibility. Combining

Environ Earth Sci

123

the two methods is expected to allow for more precise

analysis. Tehrany et al. (2013) combined these methods for

flood susceptibility analysis in tropical region of Kelantan,

Malaysia. Their results showed that the combined method

could enhance the performance of each individual method.

In other words, logistic regression considers the relation-

ship of the independent variables and dependent data but

not the impact of different classes of each independent

variable on flooding. Meanwhile, bivariate probability

analysis allows independent variables to be rearranged

based on the impacts of their different classes on flood

occurrence. In other words, all the classes of all indepen-

dent variables can be arranged based on their flood densi-

ties measured earlier using bivariate probability and then

logistic regression can be performed on the entire dataset to

give the final weights.

Flooding is a major risk in Korea, which has experienced

much flood damage to paddy fields (Okamoto et al. 1998),

property (Chang et al. 2009), and human life (Chae et al.

2005). However, few previous studies have attempted to

delineate these floods or to offer ways to ameliorate damage

(Yi et al. 2010). According to Yi et al. (2010), floods caused

159.6 million USD of economic loss in Korea during the

period 1982–2006, with most damage experienced in road-

side areas in urban or residential areas. Busan has suffered

some of the highest damage from flooding in the country.

Heavy rains lead to flooding in this area and appropriate

flood-prevention plans are needed (Lee et al. 2012).

Prediction of flooding can be highly useful in preventing

damage and loss. Through scientific analysis of floods,

flood-susceptible areas can be detected and thus flood

damages can be decreased. With this aim in mind, this

study conducted flood susceptibility analysis for Busan

using a combined approach of bivariate probability and

logistic regression models. The proposed method improves

on the weak points of each method and combines their

advantages by analyzing the relationships of flooding with

each independent layer and with each class of independent

layers. Furthermore, flood-related independent variables

can be assessed. Since this combined approach is almost

new in flood studies, through this research its efficiency

and capability can be examined. Therefore, the scope of

this research is to assess the reliability of the combined

method of BSA and LR in a non-tropical region. This leads

to increase in the popularity of this technique and assist the

researchers for future flood-related studies.

Data and study area

Busan, Korea’s second-largest city, is located on the south-

eastern tip of the Korean Peninsula at 128�E and 35�N.

Heavy rainfall is the main reason of flooding in this area,

and the lack of flood-prevention measures has led to much

damage (Lee et al. 2012). Most rain falls in the months of

April, May, June, July, August, and September. July is the

wettest month, with an average of 15 rainy days and

average precipitation of 250 mm. The current study

focused on floods that took place in July. Soil types vary by

area and include loam, sandy loam, and loam fine sand.

Andesite, granite, gravel soil, and sedimentary are the

dominant geologies in Busan.

Figure 1 shows the study area and flooded area on 16

July 2009, when 266.5 mm of rain inundated Busan,

causing severe flooding which submerged more than 1,200

homes and 3,000 ha of farmland. Most damage was

reported in Nan-Gu and Yeonje-Gu around the Namhae

coast, where 160 flooded areas occurred.

Flood inventory map

Flooding was identified and an inventory map containing

160 flooded zones was generated over the whole area

through fieldwork and the examination of aerial photo-

graphs. Literature provides several suggestions regarding

Fig. 1 Flood location map with hill-shaded map of Busan city, South

Korea

Environ Earth Sci

123

the size of samples to be used for analysis and validation

(Ohlmacher and Davis 2003). On the basis of a literature

review, 70 % of the flooded locations were used ran-

domly to produce the dichotomous data (0 and 1) for the

purpose of training in order to assess the spatial distri-

bution of flooding (Pradhan 2010a). The rest of the

locations (validation data) were used for testing the

models. The same number of points was chosen for

flooded areas and for non-flooded areas. Non-flooded

areas where randomly selected on a topographic map by

choosing sites high on hills where flooding would not be

possible. Previous studies have often not selected areas

where flooding would certainly not occur; the approach

used in the present study is expected to increase the

precision of the results. Figure 1 shows a map of flooded

and non-flooded areas in the Busan area.

Flood influencing parameters

Various data called conditioning factors are needed as

independent variables in the process of susceptibility

mapping (Liu and De Smedt 2005). These variables con-

tribute to the occurrence of flooding in a specific region. If

all the relevant independent parameters are analyzed and

used in modelling, the model will be comprehensive but

will require much data, which can be challenging to

acquire. It is thus important to determine which variables

can be eliminated without harming the model’s precision

(Sanyal and Lu 2004). Recent studies have aimed to pro-

duce models that use the least number of independent data

while still achieving highly accurate results (Campolo et al.

2003).

The independent data should be measurable and there

should be a conjunction between the independent and

dependent data. Also the independent data should be col-

lected from the whole study area while it should not rep-

resent uniform spatial information (Ayalew and Yamagishi

2005). Furthermore, its effect should not lead to two types

of outcome at the end of the process. These parameters can

be in nominal, ordinal, interval, or ratio scale format (Park

et al. 2013). Several parameters may be influential in flood

occurrence for a specific region, while the same parameters

may not be effective for other environments. No exact

agreement exists on which parameters should be used in

flood susceptibility assessments. However, some of the

variables are mostly used by numerous researchers which

indicate their importance and vital role in flood studies.

Anthropogenic factors, such as urban areas, road network

or land uses, should be taken into consideration in flood

susceptibility assessment. These parameters are related to

flood events (Skilodimou et al. 2003). The parameters used

in the current research were selected through the

knowledge attained from the literature (Pradhan 2010a; Kia

et al. 2012; Lee et al. 2012). The parameters that influence

flooding were collected and compiled into spatial databases

using ArcGIS 9.3 software. The independent datasets

consisted of rainfall, digital elevation model (DEM), slope,

curvature, geology, green farmland, river, slope, soil

drainage, soil effect, soil texture, stream power index (SPI),

timber age, timber density, timber diameter, and timber-

type data. Table 1 describes the independent variables and

their characteristics. Each variable was resized to be pre-

sented by a 10 9 10 m grid, and the grid of the Busan area

was constructed by 757 columns and 1,119 rows (366,044

pixels; 36.6 km2).

The quantile method was used to classify each inde-

pendent variable. In the quantile classification method,

each class contains the same number of features. This

method is used by several researchers due to its efficiency

in classification (Papadopoulou-Vrynioti et al. 2013; Te-

hrany et al. 2014; Umar et al. 2014). This method partitions

the layer into the same number of pixels in each class

which provides fair analysis. In the slope (0–64), SPI

(0–8.6), and DEM (0–420) maps, ten categories were

constructed for each analysis (Fig. 2b, g, k, respectively).

In the curvature map, three classes were considered: con-

vex, flat, and concave (Fig. 2a). In the geology map, five

geology classes were used: Andesite, granite, gravel soil,

sedimentary, and no data (Fig. 2c). Figure 2d shows the

classes of the green farmland map (paddy, other planta-

tions, orchard, no data, natural grassland, grassland, field,

and barren ground). In the rainfall map, nine classes

(2–10.3, 10.4–18.7, 18.8–27, 27.1–35.3, 35.4–43.7,

43.8–52, 52.1–60.3, 60.4–68.7, and 68.8–77 mm) were

constructed (Fig. 2e).

In the case of the river map, four buffer categories (10,

50, 100, and[100 m) were compiled using the buffer tool

in ArcGIS 10 software. The map of the four soil groups

(loam, sandy loam, loam fine sand, and no data) is shown in

Fig. 2j. Figure 2i presents the soil effect map, which con-

sists of five classes (20, 50, 100, 150, and no data). Six

categories were compiled for the soil drainage map: poorly

drained, somewhat poorly, moderately well, well drained,

very well drained, and no data (Fig. 2h). For the timber age

map, classes ranged from the first to sixth age classes as

well as one class of non-forest area (Fig. 2l). Non-forest

area and moderate and dense forest area made up the three

classes of the timber density map (Fig. 2m). Timber

diameter was mapped for four categories: non-forest area,

very small diameter, small diameter, and medium diameter

(Fig. 2n). Timber type was mapped as nine classes of non-

forest area, larch, pine, broadleaf tree, mixed broadleaf,

artificial larch, artificial pine, rigida pine, and paper pulp

(Fig. 2o).

Environ Earth Sci

123

Table 1 Results of FR and LR in the case of each factor

Factor Class FR Logisticcoefficient

Factor Class FR Logisticcoefficient

DEM 0–4 0.35 0.025 Timber type Non forest area 1.19 0.021

4.01–5 0.52 Larch 0

5.01–10 1.58 Pine 0.41

10.01–19 1.37 Broad leaf-tree 0.38

19.01–30 1.66 Mixed broad-leaf 0.80

30.1–47 1.54 Artificial larch 0

47.1–68 1.15 Artificial pine 0

68.1–95 0.50 Rigida pine 0.75

95.1–144 1.01 Paper 0

145–420 0.40 Timber diameter Non forest area 1.18 0.015

Curvature Concave 1.27 -0.006 Very small diameter 0

Flat 0.62 Small diameter 0.44

Convex 0.93 Medium diameter 0.58

Geology Andesite 2.97 0.028 Timber density Non forest area 1.16 -0.019

Granite 0.25 Moderate 0.21

Gravel soil 0.53 Dense area 0.72

No data 0.62 Timberage Non forest area 1.18 -0.014

Sedimentary 1.36 1st age 0

Green farm Paddy 1.08 0.033 2nd age 0

Other plantations 0 3rd age 0.83

Orchard 0 4th age 0.50

No data 0 5th age 0

Natural grassland 0 6th age 0

Grassland 0 SPI 0 0.99 -0.005

Field 0 0.01–0.17 1.73

Barren ground 0.35 0.18–0.34 1.29

Rainfall 2–10.3 0.39 0.012 0.35–0.54 0.78

10.4–18.7 0.13 0.55–0.77 1.72

18.8–27 1.42 0.78–1 0.93

27.1–35.3 1.06 1.1–1.4 0.92

35.4–43.7 1.01 1.5–1.8 0.96

43.8–52 0.46 1.9–2.5 0.32

52.1–60.3 0.54 2.6–8.6 0.50

60.4–68.7 1.59 Soil texture No data 0 -0.005

68.8–77 2.24 Loam 10.34

River 100 m buffer 1.01 0.019 Sandy loam 0.84

50 m buffer 1.30 Loam fine sand 1.11

10 m buffer 0.38 Soil effect No data 0 -0.027

[100 1.31 20 0.65

Slope 0 0.75 0.013 50 1.08

0.001–2.77 2.23 100 1.13

2.78–5.03 1.37 150 0.79

5.04–8.8 1.69 Soil drainage No data 0 0.005

8.81–13.8 0.68 Poorly drained 0.41

13.9–18.6 0.84 Somewhat poorly 1.13

19.9–18.6 0.50 Moderately well 0

18.7–23.4 0.80 Well drained 1.30

23.5–29.2 1.09 Very well drained 0.82

29.3–64.1 0.64

Environ Earth Sci

123

Fig. 2 List of all the influencing data layers. a Curvature, b DEM, c Geology, d Greenfarm, e Rainfall, f River, g slope, h Soil drain, i Soil effect,

j Soil texture, k SPI, l Timber age, m Timber density, n Timber diameter, o Timber type

Environ Earth Sci

123

Fig. 2 continued

Environ Earth Sci

123

Methodology

Application of bivariate statistical analysis

To measure flood occurrence in each class of independent

variables, BSA was used. BSA shows the ratio of the prob-

ability of the presence to the absence of any occurrence, such

as a flood or landslide (Lee and Pradhan 2007; Yilmaz 2009;

Pradhan 2010b; Pradhan and Lee 2010a). To build a proba-

bility model of a hazard, it is assumed that the hazard

occurrence can be determined based on its contributing

factors and that future hazards will happen under the same

conditions as past events (Akgun 2012). The bivariate

probability for each independent variable was measured by

the variable’s relationship with flood occurrence. The greater

the bivariate probability, the stronger the relationship

between flood occurrence and the variable (Lee and Talib

2005; Lee and Pradhan 2007; Pradhan and Lee 2010b).

Using the bivariate probability model, the spatial rela-

tionships between flood occurrence location and each of

the variables contributing to flood occurrence were derived.

Figure 2 shows the classified independent variables and all

the data layers. Bivariate probability was performed and

the range of bivariate probability was normalized. Then all

of the independent variables were reclassified based on the

normalized bivariate probability to be used in the logistic

regression analysis. The normalization of all independent

variables is necessary which facilitates final analysis and

interpretation (Ayalew and Yamagishi 2005). A higher

ratio value indicates a stronger relationship between the

variables and vice versa.

Application of multivariate statistical analysis

Logistic regression is a popular multivariate statistical

analysis method that allows for examination of the multi-

variate regression relationship between a dependent vari-

able and several independent variables (Pradhan and Lee

2010b). Logistic regression was created to measure the

probability of a disaster in an area using a specific formula

generated using the independent variables. One of the

requirements of this method is dependent data consisting of

values of 0 and 1, which indicate the absence and existence

of disasters, respectively.

Here, the dependent and reclassified independent vari-

ables derived by bivariate probability were converted from

raster to ASCII format, which is required in SPSS. SPSS

V.19 software was used to perform the multivariate logistic

regression analysis. The coefficients were measured and

are listed in Table 1. The higher the logistic coefficient, the

greater the expected impact on flooding occurrence. Using

the derived logistic coefficients, the probability (p) of flood

occurrence was calculated as

p ¼ 1=ð1 + e�zÞ ð1Þ

where p is the probability of flooding which is acquired

between 0 to 1 on an S-shaped curve. Z is the linear

combination and it follows that logistic regression involves

fitting an equation of the following form to the data:

z ¼ b0 þ b1x1 þ b2x2 þ b3x3 þ bnxn ð2Þ

where b0 is the intercept of the model, bi (i = 0, 1, 2, …,

n) represents the coefficients of the logistic regression

model, and xi (i = 0, 1, 2, …, n) denotes the independent

variables (Lee and Sambath 2006).

To produce a susceptibility map, the probability map

should be divided into different categories (Ohlmacher and

Davis 2003). Different popular categorization methods were

examined in the GIS environment, such as standard devia-

tion, natural break, equal interval, and quantile. In this study,

the best results were achieved through the quantile method.

The other methods exaggerated the areas of flood suscepti-

bility, showing a large part of Busan in the highly susceptible

zone. Ayalew and Yamagishi (2005) stated that the quantile

classification method has the drawback as it classifies

broadly different values into the same category. However,

this method produced reasonably good results in flood sus-

ceptibility mapping. This method is used by other researches

such as Papadopoulou-Vrynioti et al. (2013) and Tehrany

et al. (2013) which proved the capability and efficiency of

this classification method in susceptibility mapping.

Results and validation

First, the BSA method was applied to obtain weights for each

class of each variable. The bivariate probability for 15

variables was calculated from the relationship to the flood

events, as shown in Table 1. The output of the multivariate

statistical analysis was then acquired through the logistic

regression method (Table 1). The relationship between flood

occurrence and flood-influencing parameters was extracted

in SPSS V.19 software. Coefficients were measured using

logistic regression and are listed in Table 1. Furthermore,

through the performance of LR, another factor can be mea-

sured which is significant probability (Sig). This factor rec-

ognizes the independent variables that have a significant

influence on flooding (Papadopoulou-Vrynioti et al. 2013). If

the Sig be\0.05, it indicates that the independent variable

had statistically significant effects on flooding. Results

indicated that Dem, Greenfarm, and Geology with the Sig

values of (0.004) (0.025), and (0.038) were the most effective

independent variables, respectively. Other independent

variables such as curvature, slope, soil effect, etc. gained the

Sig value of more than 0.05 which showed that they were not

statistically significant in the model.

Environ Earth Sci

123

To acquire the flood probability index, the regression

coefficients for each variable were entered in Eq. (2),

yielding

Z ¼�7:535 + 0:025demð Þ + �0:006curvatureð Þ+ 0:028geologyð Þ + 0:033greenfarmð Þ + 0:019riverð Þ+ 0:013slopð Þ + 0:005soil drainð Þ + �0:027soil effectð Þ+ �0:005soil textð Þ + �0:005spið Þ + �0:014timber ageð Þ+ �0:019timber densityð Þ + 0:015timberiað Þ+ 0:021timber typeð Þ + 0:012rainfallð Þ

As the next step, the probability index was calculated

from Eq. 1. The index values can range from 0 to 0.99.

Figure 3 shows the thematic map of flood probability. This

index represents the predicted probabilities of flooding for

each pixel in the presence of a given set of independent

variables. Finally, a flood susceptibility map was obtained

and the study area was divided into five classes of flood

susceptibility: very low (0–0.55), low (0.55–0.84), medium

(0.84–0.94), high (0.94–0.97), and very high (0.97–1).

Figure 4 shows the flood susceptibility map.

The prediction accuracy and quality of the developed

model should be examined. Here, the model was validated

by comparing the acquired flood probability map with

existing flood data (Lee et al. 2007; Tien Bui et al. 2012;

Pourghasemi et al. 2012). Specifically, the results were

quantitatively examined using the receiver operating

characteristic (ROC) curve, in which the basis of the

assessment is the true- and false-positive rates (Chauhan

et al. 2010). To examine the reliability and efficiency of the

flood probability map, both the success rate and prediction

rate curve were calculated. The success rate result was

obtained using the training dataset, which used 70 % of the

inventory flood locations (112 flood locations). The success

rate curves are presented in Fig. 5.

The model produced a value of 0.927 for the area under

the curve (AUC), which represents 92.7 % success accu-

racy. The training flooded areas were used to generate the

flood model, but could not be used to assess the prediction

capability of the model. The prediction rate shows how

well the model could predict flooding in a given area (Tien

Bui et al. 2012).

The prediction accuracy was calculated using the testing

data for the 30 % of the flooded areas (48 flood locations)

that were not used in the training process. This method can

show the generalization ability of a model (Maier and

Dandy 2000; Pourghasemi et al. 2012), in that the area

under the prediction rate curves (AUC) can measure the

prediction accuracy qualitatively (Pradhan and Lee 2010c).

The AUC values varied from 0.5 to 1.0. The value of 1.0

represented the highest accuracy, showing that the model

was completely capable of predicting disaster occurrence

Fig. 3 The probability map obtained after the LR analyses

Fig. 4 Flood susceptibility map produced from the LR model

Environ Earth Sci

123

without any bias (Pradhan et al. 2010). Thus, AUC values

close to 1.0 are considered to show that a model is precise

and trustable (Tien Bui et al. 2012). The prediction curve is

shown in Fig. 5. The AUC value of 0.823 corresponds to a

prediction accuracy of 82.3 %.

Moreover, another way was provided in order to test

the performance of the used method. Confusion matrices

were created for both the training and testing results. The

confusion matrix represents the ratio of correctly pre-

dicted observations and demonstrates the number of true-

positive, false-positive, true-negative and false-negative

observations (Papadopoulou-Vrynioti et al. 2013). Con-

sequently, the outcomes of the confusion matrices pro-

duced by the training and testing datasets were

compared. Table 2 shows the confusion matrices.

Totally, 84.70 % of the 21,600 pixels were correctly

categorized for the training dataset. Furthermore, 10,500

pixels without flood occurrence and 1,600 pixel with

flood were categorized wrongly (Table 2a). The valida-

tion results for testing dataset (Table 2b) represents that

5,800 of the total 10,830 pixels were correctly catego-

rized (88.64 %). However, 500 of 4,800 pixels with

flood occurrence were wrongly categorized. The strength

of the overall model performance can be assessed using

confusion matrices.

Discussion

Due to the existence of few weak points which can be

found in some quantitative methods, scientists started to

produce and propose combined methods in order to over-

come these disadvantages. For instance, ANFIS was

developed through the integration of ANN and fuzzy

interface system (FIS). This method increased the precision

of the results compare to the individual ones. Therefore, it

was used in several studies which made this method highly

demandable in hazard studies. As it has been mentioned,

usually BSA and multivariate statistical analysis are indi-

vidually used in mapping the susceptible areas. However,

some researchers combined these methods to enhance the

capability and precision of each method. Current research

applied the combined method of BSA and LR which is

introduced by Tehrany et al. (2013) for flood susceptibility

mapping. The result indicated that through the combination

of these individual methods the reliability of acquired

susceptibility map increased. Therefore, this research can

be used as an evidence for the proficiency of this combined

method.

Based on the acquired results, the high and very high

susceptibility zones covered 19.4 % of the area and were

mostly located in the northern part of Busan. According to

the flood susceptibility map, most parts of Busan are

located in zones of very low susceptibility. The proportion

of high susceptibility increases significantly in the north

and southwest parts of the study area. Loamy fine sand soil

type, very low elevation, and high precipitation were the

key properties in the very high susceptibility zones.

Moreover, the results showed that the probability of flood

occurrence is large in low slope areas and in concave areas.

The acquired logistic coefficients for curvature, timber

density, timber age, SPI, soil texture, and soil effect were

negative. The negative values of logistic coefficient imply

that the occurrence of flood is negatively related to these

independent variables. For instance, as the timber density

increases, the possibility of flood occurrence decreases.

The independent variables such as DEM, geology, green

farm, rainfall and distance from river, slope, timber type,

timber diameter, and soil drainage had positive effects on

the flood occurrences.

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Cum

ulat

ive

perc

enta

ge o

f fl

ood

occu

rren

ce

Flood probability index (%)

ROC curve

success curve prediction curve

Fig. 5 The ROC curve

Table 2 Confusion matrices of (a) training dataset and (b) testing

dataset

Predicted Correct %

0 1 Total

(a) Observed 0 8,800 1,700 10,500 83.80

1 1,600 9,500 11,100 85.60

Total 10,400 11,200 21,600 84.70

(b) Observed 0 5,300 730 6,030 83.80

1 500 4,300 4,800 89.50

Total 5,800 5,030 10,830 88.64

0: absence of flood, 1: presence of flood, cut-off value: 0.5

Environ Earth Sci

123

Conclusion

The goal of this research was to improve the efficiency of

logistic regression/multivariate statistical analysis in flood

modelling using this method in combination with the

bivariate probability method. Logistic regression has some

drawbacks for measuring the relationship between flood

occurrence and variable classes. It takes one of the first or

last classes as a reference and does not involve the other

classes in the weighting process. This means that the

information on one of the intervals is low. However, using

bivariate probability this problem was solved. The logistic

regression was performed using independent variables that

were reclassified and weighted based on the bivariate

probability, in order to produce a more precise flood sus-

ceptibility map of the Busan area. The flood inventory map,

which represented 160 flood events that took place in July

2009, was used for the statistical analysis. Among all the

flood events, 112 cases were used to build the method. The

remaining 48 flooded events were used to validate the

developed model. Fifteen independent variables were used

as independent data. These consisted of rainfall, the DEM,

slope, curvature, geology, green farmland, river, slope, soil

drainage, soil effect, soil texture, SPI, timber age, timber

density, timber diameter, and timber type.

Initially, all the independent variables were resized to a

10 9 10 m grid. The grid of the Busan area was con-

structed in 757 columns and 1,119 rows. As a next step,

each variable was classified using the quantile method,

and the bivariate probability was used to evaluate the

relationship of each class with flood occurrence. To

assign the weights that were derived using bivariate

probability, each independent variable was reclassified

and normalized for use in logistic regression multivariate

statistical analysis. The logistic regression coefficients

were calculated in SPSS V.19 software and the proba-

bility index was then calculated. Finally, a flood suscep-

tibility map of Busan City was constructed by classifying

the probability index into five classes: very low, low,

medium, high, and very high susceptibility. The high and

very high susceptibility zones made up 19.4 % of the total

study area and were mostly located in the north part of

Busan.

The proposed combined method overcame the weak

points of logistic regression in terms of BSA. The AUC

was used to examine the efficiency of the proposed method.

The prediction and success rate accuracies that achieved

for the combined bivariate probability and logistic regres-

sion methods were 82.3 and 92.7 %, respectively. Because

the independent variables contributing to flooding may

vary among study sites, it was not easy to define specific

parameters to generate the model. The elevation and cur-

vature were found to be the best predictor variables for

estimating the probability of flood occurrences in the cur-

rent study area.

The results of the bivariate probability analysis indicated

that more flooding should be expected in loamy soil areas

under high rainfall. In addition, most flooding occurred in

concave and low elevation areas. In terms of hydrological

factors, a lower SPI and greater distance to the river

increased the chance of flood occurrence. The timber factor

illustrated a greater flood possibility by type than by age

and diameter. The combined method employed in this

study showed that the green farmland and geology

parameters had strong positive correlations with the pre-

dicted probability map. In contrast, a negative correlation

existed between flood occurrence and SPI, timber age, and

soil effect.

The achieved accuracies demonstrate the efficiency of

the combined method. Other sophisticated methods have

been developed to model flooding, such as ANFIS and

genetic algorithm-based artificial neural network (ANN-

GA) approaches. Although these methods are more

applicable to flood modelling than are statistical methods

like logistic regression, the complexity, computation time,

and requirement of a large number of parameters make

them difficult to use. The method proposed here, com-

bining logistic regression and bivariate probability, pro-

duced acceptable results and the process of analysis was

easily understandable. Thus, this method is recommended

for use in hydrological studies and disaster management.

The proposed method can be considered as a robust

method for flood susceptibility mapping and the produced

map can aid planners, decision makers, and governments

engaged in flood management and planning in the study

area.

Acknowledgments This research was supported by the Basic

Research Project of the Korea Institute of Geoscience and Mineral

Resources (KIGAM) and by the Development Project of Environ-

mental Technology for Climate Change by the Korea Environmental

Industry & Technology Institute. Thanks to anonymous reviewers for

their valuable feedback on the manuscript.

References

Akgun A (2012) A comparison of landslide susceptibility maps

produced by logistic regression, multi-criteria decision, and

likelihood ratio methods: a case study at Izmir, Turkey.

Landslides 9:93–106

Ayalew L, Yamagishi H (2005) The application of GIS-based logistic

regression for landslide susceptibility mapping in the Kakuda-

Yahiko Mountains, Central Japan. Geomorphology 65:15–31

Bathrellos GD, Gaki-Papanastassiou K, Skilodimou HD, Skianis GA,

Chousianitis KG (2013) Assessment of rural community and

agricultural development using geomorphological-geological

factors and GIS in the Trikala prefecture (Central Greece).

Stoch Enviorn Res Risk A 27:573–588

Environ Earth Sci

123

Billa L, Shattri M, Mahmud AR, Ghazali AH (2006) Comprehensive

planning and the role of SDSS in flood disaster management in

Malaysia. Disaster Prev Manage 15:233–240

Campolo M, Soldati A, Andreussi P (2003) Artificial neural network

approach to flood forecasting in the River Arno. Hydrolog Sci J

48:381–398

Chae EH, Kim TW, Rhee SJ, Henderson TD (2005) The impact of

flooding on the mental health of affected people in South Korea.

Community Ment Health J 41:633–645

Chang H, Franczyk J, Kim C (2009) What is responsible for

increasing flood risks? The case of Gangwon Province, Korea.

Nat Hazards 48:339–354

Chauhan S, Sharma M, Arora MK (2010) Landslide susceptibility

zonation of the Chamoli region, Garhwal Himalayas, using

logistic regression model. Landslides 7:411–423

Dawod GM, Mirza MN, Al-Ghamdi KA (2012) GIS-based estimation

of flood hazard impacts on road network in Makkah city, Saudi

Arabia. Environ Earth Sci 67:2205–2215

De Moel H, Aerts J (2011) Effect of uncertainty in land use, damage

models and inundation depth on flood damage estimates. Nat

Hazards 58:407–425

Dixon B (2005) Applicability of neuro-fuzzy techniques in predicting

ground-water vulnerability: a GIS-based sensitivity analysis.

J Hydrol 309:17–38

Elbialy S, Mahmoud A, Pradhan B, Buchroithner M (2013) Appli-

cation of spaceborne synthetic aperture radar data for extraction

of soil moisture and its use in hydrological modelling at

Gottleuba Catchment, Saxony, Germany. J Flood Risk Manag.

doi:10.1111/jfr3.12037

Feng CC, Wang YC (2011) GIScience research challenges for

emergency management in Southeast Asia. Nat Hazards

59:597–616

Ghalkhani H, Golian S, Saghafian B, Farokhnia A, Shamseldin A

(2013) Application of surrogate artificial intelligent models for

real-time flood routing. Water Environ J 27:535–548

Karjalainen M, Kankare V, Vastaranta M, Holopainen M, Hyyppa J

(2012) Prediction of plot-level forest variables using TerraSAR-

X stereo SAR data. Remote Sens Environ 117:338–347

Kia MB, Pirasteh S, Pradhan B, Mahmud AR, Sulaiman WNA,

Moradi A (2012) An artificial neural network model for flood

simulation using GIS: Johor River Basin, Malaysia. Environ

Earth Sci 67:251–264

Kjeldsen TR (2010) Modelling the impact of urbanization on flood

frequency relationships in the UK. Hydrol Res 41:391–405

Lee S, Pradhan B (2007) Landslide hazard mapping at Selangor,

Malaysia using frequency ratio and logistic regression models.


Lee S, Sambath T (2006) Landslide susceptibility mapping in the

Damrei Romel area, Cambodia using frequency ratio and logistic

regression models. Environ Geol 50:847–855

Lee S, Talib JA (2005) Probabilistic landslide susceptibility and

factor effect analysis. Environ Geol 47:982–990

Lee S, Ryu JH, Kim IS (2007) Landslide susceptibility analysis and

its verification using likelihood ratio, logistic regression, and

artificial neural network models: case study of Youngin, Korea.


Lee MJ, Kang JE, Jeon S (2012) Application of frequency ratio model

and validation for predictive flooded area susceptibility mapping

using GIS. In: Geoscience and Remote Sensing Symposium

(IGARSS), 2012 IEEE International. Munich, 895–898

Liu Y, De Smedt F (2005) Flood modeling for complex terrain using

GIS and remote sensed information. Water Resour Manag

19:605–624

Maier HR, Dandy GC (2000) Neural networks for the prediction and

forecasting of water resources variables: a review of modelling

issues and applications. Environ Model Softw 15:101–124

Martinez JM, Le Toan T (2007) Mapping of flood dynamics and

spatial distribution of vegetation in the Amazon floodplain using

multitemporal SAR data. Remote Sens Environ 108:209–223

Mason DC, Speck R, Devereux B, Schumann GP, Neal JC, Bates PD

(2010) Flood detection in urban areas using TerraSAR-X. IEEE

T Geosci Remote 48:882–894

Mukerji A, Chatterjee C, Raghuwanshi NS (2009) Flood forecasting

using ANN, neuro-fuzzy, and neuro-GA models. J Hydrol Eng

14:647–652

Ohlmacher GC, Davis JC (2003) Using multiple logistic regression

and GIS technology to predict landslide hazard in northeast

Kansas, USA. Eng Geol 69:331–343

Okamoto K, Yamakawa S, Kawashima H (1998) Estimation of flood

damage to rice production in North Korea in 1995. Int J Remote

Sens 19:365–371

Papadopoulou-Vrynioti K, Bathrellos GD, Skilodimou HD, Kaviris

G, Makropoulos K (2013) Karst collapse susceptibility mapping

considering peak ground acceleration in a rapidly growing urban

area. Eng Geol 158:77–88

Park S, Choi C, Kim B, Kim J (2013) Landslide susceptibility

mapping using frequency ratio, analytic hierarchy process,

logistic regression, and artificial neural network methods at the

Inje area, Korea. Environ Earth Sci 68:1443–1464

Pourghasemi HR, Pradhan B, Gokceoglu C (2012) Application of

fuzzy logic and analytical hierarchy process (AHP) to landslide

susceptibility mapping at Haraz watershed. Iran Nat Hazards

63:965–996

Pradhan B (2010a) Flood susceptible mapping and risk area

delineation using logistic regression, GIS and remote sensing.

J Spat Hydrol 9:1–18

Pradhan B (2010b) Landslide susceptibility mapping of a catchment

area using frequency ratio, fuzzy logic and multivariate logistic

regression approaches. J Indian Soc Remote 38:301–320

Pradhan B, Lee S (2010a) Delineation of landslide hazard areas on

Penang Island, Malaysia, by using frequency ratio, logistic

regression, and artificial neural network models. Environ Earth

Sci 60:1037–1054

Pradhan B, Lee S (2010b) Landslide susceptibility assessment and

factor effect analysis: backpropagation artificial neural networks

and their comparison with frequency ratio and bivariate logistic

regression modelling. Environ Model Softw 25:747–759

Pradhan B, Lee S (2010c) Regional landslide susceptibility analysis

using back-propagation neural network model at Cameron

Highland, Malaysia. Landslides 7:13–30

Pradhan B, Oh HJ, Buchroithner M (2010) Weights-of-evidence

model applied to landslide susceptibility mapping in a tropical

hilly area. Geomat Nat Hazards Risk 1:199–223

Pradhan B, Hagemann U, Tehrany MS, Prechtel N (2014) An easy to

use ArcMap based texture analysis program for extraction of

flooded areas from TerraSAR-X satellite image. Comput Geosci

63:34–43

Regmi AD, Yoshida K, Dhital MR, Pradhan B (2013) Weathering and

mineralogical variation in gneissic rocks and their effect inSangrumba Landslide, East Nepal. Environ Earth Sci 71:1–17

Rozos D, Bathrellos GD, Skillodimou HD (2011) Comparison of the

implementation of rock engineering system and analytic hierar-

chy process methods, upon landslide susceptibility mapping,

using GIS: a case study from the Eastern Achaia County of

Peloponnesus, Greece. Environ Earth Sci 63:49–63

Sanyal J, Lu X (2004) Application of remote sensing in flood

management with special reference to monsoon Asia: a review.

Nat Hazards 33:283–301

Skilodimou H, Livaditis G, Bathrellos G, Verikiou-Papaspiridakou E

(2003) Investigating the flooding events of the urban regions of

Glyfada and Voula, Attica, Greece: a contribution to Urban

Geomorphology. Geogr Ann A 85:197–204

Environ Earth Sci

123

http://dx.doi.org/10.1111/jfr3.12037

Subramanian N, Ramanathan R (2012) A review of applications of

analytic hierarchy process in operations management. Int J Prod

Econ 138:215–241

Taylor J, Davies M, Clifton D, Ridley I, Biddulph P (2011) Flood

management: prediction of microbial contamination in large-

scale floods in urban environments. Environ Int 37:1019–1029

Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of flood

susceptible areas using rule based decision tree (DT) and a novel

ensemble bivariate and multivariate statistical models in GIS.

J Hydrol 504:69–79

Tehrany MS, Pradhan B, Jebur MN (2014) Flood susceptibility

mapping using a novel ensemble weights-of-evidence and

support vector machine models in GIS. J Hydrol 512:332–343.

doi:10.1016/j.jhydrol.2014.03.008

Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012)

Landslide susceptibility mapping at Hoa Binh province (Viet-

nam) using an adaptive neuro-fuzzy inference system and GIS.

Comput Geosci 45:199–211

Umar Z, Pradhan B, Ahmad A, Jebur MN, Tehrany MS (2014)

Earthquake induced landslide susceptibility mapping using an

integrated ensemble frequency ratio and logistic regression

models in West Sumatera Province, Indonesia. Catena

118:124–135

Yi CS, Lee JH, Shim MP (2010) GIS-based distributed technique for

assessing economic loss from flood damage: pre-feasibility study

for the Anyang Stream Basin in Korea. Nat Hazards 55:251–272

Yilmaz I (2009) Landslide susceptibility mapping using frequency

ratio, logistic regression, artificial neural networks and their

comparison: a case study from Kat landslides (Tokat—Turkey).

Comput Geosci 35:1125–1138

Zwenzner H, Voigt S (2009) Improved estimation of flood parameters

by combining space based SAR data with very high resolution

digital elevation data. Hydrol Earth Syst Sci 13:567–576

Environ Earth Sci

123

http://dx.doi.org/10.1016/j.jhydrol.2014.03.008

flood susceptibility mapping using integrated bivariate and multivariate statistical models

Documents