using spatial analysis and bayesian network to model the

Using spatial analysis and Bayesian network to model the vulnerabilityand make insurance pricing of catastrophic risk

Lian-Fa Lia,b*, Jin-Feng Wanga and Hareton Leungb

aState Key Laboratory of Resources and Environmental Information Systems, Institute ofGeographical Sciences & Natural Resources Research, Chinese Academy of Sciences, Beijing, China;

bDepartment of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon,Hong Kong

(Received 12 May 2009; final version received 6 September 2009)

Vulnerability refers to the degree of an individual subject to the damage arising from acatastrophic disaster. It is affected by multiple indicators that include hazard intensity,environment, and individual characteristics. The traditional area aggregate approach doesnot differentiate the individuals exposed to the disaster. In this article, we propose a newsolution of modeling vulnerability. Our strategy is to use spatial analysis and Bayesiannetwork (BN) to model vulnerability and make insurance pricing in a spatially explicitmanner. Spatial analysis is employed to preprocess the data, for example kernel densityanalysis (KDA) is employed to quantify the influence of geo-features on catastrophic riskand relate such influence to spatial distance. BN provides a consistent platform tointegrate a variety of indicators including those extracted by spatial analysis techniquesto model uncertainty of vulnerability. Our approach can differentiate attributes of differ-ent individuals at a finer scale, integrate quantitative indicators from multiple-sources,and evaluate the vulnerability even with missing data. In the pilot study case of seismicrisk, our approach obtains a spatially located result of vulnerability and makes aninsurance price at a finer scale for the insured buildings. The result obtained with ourmethod is informative for decision-makers to make a spatially located planning ofbuildings and allocation of resources before, during, and after the disasters.

Keywords: spatial analysis; Bayesian network; insurance pricing; vulnerability; data mining

1. Introduction

This century has witnessed the rise of the loss of lives and properties from catastrophicdisasters because of the trend of global changes that include meteorological, ecological,physical, and environmental ones (Easterling et al. 2000, Shi 2002, Linnerooth-Bayer et al.2005). The last recorded catastrophic event is the Wenchuan earthquake of 8 Mw occurringon 12 May 2008 that caused the casualties of about 300,000 people, among which over80,000 were dead or missing, and the property loss of US $20 billion (Civil and StructuralGroups of Tsinghua University 2008, Paterson et al. 2008). Another tragedy is the tsunamiof 26 December 2004, beneath the Indian Ocean west of Sumatra, Indonesia, that took livesof over 300,000 with over 1,000,000 people displaced (ADB 2005). The catastrophic eventshave caused the huge loss of lives and properties and they received an increasing concernfrom the government and public. There is an urgent need for new technologies to provide

International Journal of Geographical Information ScienceVol. 24, No. 12, December 2010, 1759–1784

*Corresponding author. Email: [email protected]

ISSN 1365-8816 print/ISSN 1362-3087 online# 2010 Taylor & FrancisDOI: 10.1080/13658816.2010.510473http://www.informaworld.com

Downloaded By: [Inst of Geographical Sciences & Natural Resources Research] At: 00:29 3 December 2010

better services for forecast and prevention before the disaster, mitigation during the disaster,and assessment and planning after the disaster (Easterling et al. 2000, Hanson and Roberts2005).

As a geospatial technology, geographical information system (GIS) has been widelyapplied in prevention, mitigation, and assessment of natural disasters because of its efficientspatial location and management of spatial databases (John and Langley 1995, Jiang andEastman 2000, Easterman 2001, FEMA 2001, Huang 2001, Li et al. 2005). However,although GIS is widely used, there are still some constraints on the existing methods ofusing GIS to make the catastrophic risk assessment:

l Although insurance companies in the United States use precise estimates of wind lossrisk (e.g., distance from the coast) to make an insurance price, in many assessments ofcatastrophic risks such as China’s seismic risk, the often-used approach is mainlybased on coarse geographical scales or ‘area aggregate’ method (AAM) (William andArthur 1982, Alexander 1993, Shi 2002). In the AAM, related indicators are aggre-gated into census-designated statistical areas whose size has a serious influence uponthe result (even different conclusions have been obtained because of different aggre-gated areas) (Kloog et al. 2009). So the approach based on aggregate areas is notsuitable for the duties with a requirement of high-accuracy location of risk, say rescueof the casualties, assessment and planning of the disaster, and insurance pricing.

l Many of the existing risk assessment methods are based on classical statistics or fuzzycomprehensive evaluation (William and Arthur 1982, Easterman 2001, Arnold et al.2006). These existing methods are limited in quantifying the uncertainty of predictiveindicators and their interactions (Straub 2005, Gret-Regamey and Straub 2006). Theuse of a Bayesian network (BN) offers innovation relative to existing methods ofapplying spatial vulnerability: uncertainty quantification of predictive indicators andtheir interaction within BN is a major advantage of BN that offers an efficient means(Korb and Nicholson 2004) to model the inherent uncertainty of natural disasters.

l The exact use of precise spatial risk by the insurance companies is proprietary whichmakes the companies’ result of spatial risk not publicly unavailable. The proprietaryconstraint hinders the precise risk information from being publicly known and theestimated risk from being improved (Allmann and Smolka 2000, Hayes and Jacobson2001, Li et al. 2005).

Spatial analysis is to detect spatial patterns of geo-features according to their spatial ornonspatial attributes (Anselin 1992, Goodchild et al. 1992). Such techniques of spatial analysisas kernel density analysis (KDA) and accumulative road cost surface modeling (ARCSM) canquantify such spatial patterns which are used with other quantitative indicators for vulner-ability assessment. This extends the set of predictive indicators which is beneficial forimprovement in vulnerability prediction. But in many of the existing traditional approaches,such as China’s seismic risk assessment, such techniques of spatial analysis have been seldomused which may lower the performance of the traditional approach.

In this article, we propose a spatially explicit approach to modeling vulnerability andmaking insurance pricing of catastrophic risk. We use BN to make uncertainty analysis andcomputation of vulnerability and insurance pricing under new scenarios within the spatiallyexplicit modeling environment. Our approach is based on the grid dataset in support ofGIS. Based on the grid dataset and GIS, we can use spatial analysis such as KDA andARCSM to extract predictive influence indicators from the vulnerability-related geo-features such as active faults and rivers. The cells of the grid dataset can be extracted as a

1760 L. Li et al.


two-dimensional table, each record of which corresponds to a cell of the grid and can be usedeither as a training sample for uncertainty inference and learning or as a test or new samplefor prediction. The results and their uncertainties obtained can be visualized in GIS. So, inour approach, we not only utilize GIS to visualize and manage spatial data, but also regard itas a platform of uncertainty analysis and representation of the result.

Throughout this article, we make the following contributions:

l Proposal of a spatially explicit approach to uncertainty analysis of vulnerability andinsurance pricing. BN is the framework to integrate spatiotemporal data and domainknowledge from different sources. We generalize the generic framework of BN andspatially explicit modeling techniques for vulnerability assessment of natural disas-ters. Based on BN that is used as a means of uncertainty analysis, our approach isdifferent from the methods of classical risk assessment (Gret-Regamey and Straub2006) and offers innovation relative to the classical statistical methods of applyingspatial vulnerability. Our approach can locate a more specific location of vulnerabilityand insurance premium estimated, as it is based on the finer scale (a cell of the grid). Itcan differentiate vulnerability of the individuals exposed to the hazard. The vulnerableindividuals such as buildings at different locations have different environmentalprofiles and different individuals at the finer grid scale have different characteristics(for instance, different buildings have different ages, types, and materials). Thus, theimpacts from the same hazard on different individuals at different locations aredifferent, which results in their different vulnerabilities to the same hazard. AAMcannot differentiate such individual characteristics and environmental profiles at thefiner grid scale and thus cannot differentiate their vulnerabilities.

l Combination of spatial indicators with other indicators using spatial analysis techniquessuch as KDA to derive new predictive indicators from risk-related geo-features forvulnerability evaluation. AAM does not consider the differences of individuals. Ourmethod can preserve the original number of observations without a need either toaggregate or average them. Thus, it is less biased than the prior methods. Throughderivation of new spatial predictive influence indicators such as from active faults andrivers, the set of predictors can be strengthened and thereby the model’s performance, forexample, the probability of detection (pd) of high-risk individuals exposed to the hazard,and precision can be improved, as demonstrated in our pilot study case of seismic risk.

l Use of publically available information to make a precise estimate of risk. Throughthis article, it is illustrated that our spatially explicit BN-based approach can have amarked improvement in pd, precision, and location of the high-risk individualscompared to the traditional AAM.

Next, we outline the structure of this article. Section 2 briefly describes the spatiallyexplicit modeling environment that is based on a GIS and a grid dataset. Section 3 presentsour methodology and relevant techniques of BN and spatial analysis for vulnerabilityanalysis. Section 4 provides an introduction to our study case and presents our result.Section 5 makes a comparison of our approach and the AAM’s prediction and discussesour approach’s implications. Section 6 concludes and discusses the future work.

2. Spatially explicit modeling environment

Figure 1 presents the spatially explicit modeling environment for our approach where GIS,spatial analysis, and BN are used to conduct vulnerability assessment and insurance pricing.

International Journal of Geographical Information Science 1761


Our modeling is based on the grid dataset (Figure1b), each cell of which corresponds to anindependent individual that has the predictive indicators and the target variable. We canextract the table of attributes (Figure 1c) from the dataset for further analysis. The grid-basedformat enables us to collect a variety of data from different sources and to integrate themwithin a consistent modeling system of BN using geospatial techniques such as rasterization,KDA, and resampling.

Our modeling environment is partially constructed on the base of open-source packageswhich considerably decrease the development effort. In the modeling environment, we useGeotools to implement the GIS functionality (Figure 1a); NASA’s World Wind to generate athree-dimensional visualization of spatial distribution of related indicators or estimates of thetarget variable; Java’s advanced image library to assist input, processing, and output of thegrid dataset; WEKA (Witten and Frank 2005) and Netica API (Norsys 2009) to conduct dataanalysis and mining. We use MatLab as the engine to support the computation that can bepackaged into components. In the modeling system, we have developed some specialistfunctionalities including data conversion, construction, and learning (Figure 1d) of BN,vulnerability assessment and insurance pricing (Figure 1e), using Java in support of MatLabcomputation engine, WEKA codes, and Netica API. But for some complex functionalitiessuch as KDA and ARCSM, we used ArcGIS’ spatial analysis module (ESRI 2001) toconduct the analysis and then convert the result into the grid dataset for further analysis.

In the spatial explicit modeling environment, we not only use GIS to manage andvisualize spatial data, but also use spatial analysis methods, such as KDA and ARCSM toextract predictive indicators from relevant geo-features. As the first law of geography states,‘Everything is related to everything else, but near things are more related than distant things’(Tobler 1979). Natural disasters, as the geospatial events, occur in geographical areas and

Learning, inference and predictionDatabase

KDA or ARCSM

KDA or ARCSM

Rasterization

Resampling

Points

Polylines

Polygons

Image

a. GIS

b. Grid dataset

Cell Elevation SlopeTraffic

cost

1 130 10

11

30

362 154

c. Data table

d. Bayesian networkmodeling

X2 X1

X3

X1

X4

X1

Y1Y1

Domainknowledge

Knowledge base

e. Vulnerability assessment and insurance pricing

Computation engine: MatLabData mining: WEKA+Netica APISpatial analysis: java lib with support of ARCGIS

GIS: Geotools /SuperMap3-D functionality: World WindImage processing: Java Image LibraryDBMS: MS Access

Figure 1. Spatially explicit modeling using Bayesian network.

1762 L. Li et al.


they should also have spatial characteristics. Thus, Tobler’s first law of geography isapplicable to them and we use KDA in vulnerability assessment that considers influenceof points such as pollution sources or polylines, for example, faults of an earthquake on thesurroundings; the influence gradually decreases with the increase of the distance from thepoints or polylines. KDAmodels such influences and derives predictive influence indicatorsfrom the geo-features (Hastie et al. 2001, Miller and Han 2001). ARCSM calculates theaccumulative cost for each cell to the neighbor shelters and the cost determined by relevantgeographical indicators has an important influence on vulnerability of the individualexposed to the disaster (ESRI 2001, Torun and Duzgun 2006, Varnakovida and Messina2008).

3. Bayesian methodology for vulnerability analysis

In our spatially explicit modeling environment, the techniques of spatial analysis are used toprocess spatial attributes of geo-features to derive the relevant indicators. Then theseindicators with other quantitative indicators are integrated within the consistent modelingsystem of BN with each node corresponding to an indicator for vulnerability estimation andinsurance pricing.

3.1. Indicators involved

In total, there are two aspects of indicators for vulnerability analysis:

l Indicators related to exposure: These indicators are directly related to the occurrenceof natural disasters. For instance, heavy rainfall can cause flood and landslide;extremely intense wind can cause typhoon; movements of the earth’s crusts cancause earthquakes. Most exposure-related indicators are natural and some are artificialor related with artificial activities.

l Indicators related to vulnerability: Vulnerability-related indicators include environ-mental- and system-resistance ones. Environmental indicators are relevant to theenvironment that breeds the disasters. They are physical or artificial and are able tomitigate or amplify the destructive power of a hazard. For instance, a land with a goodwater-soil conservation capability can avoid a mudslide or mitigate its destructiveeffect. System-resistance indicators refer to the characteristics of an individual orsystem against the damage from a natural disaster. For instance, a house of lightweightsteel structure has a better ability against earthquakes than that of woods or bricks;people who live in the floodplain are more vulnerable to the flood disaster than thosewho do not; people who are closer to a shelter have more opportunities to avoiddamages from the disasters, thus having less vulnerability; citizens with more knowl-edge about tsunami have higher vigilance against tsunami.

Table 1 lists some exposure-related and vulnerability-related indicators of five naturaldisasters, namely earthquake, flood, typhoon, mudslide, and avalanche. This table also givessome empirical modeling methods that can be used as domain knowledge to construct BN.

3.2. Spatial analysis techniques

Spatial analysis techniques are used to deal with spatial or nonspatial attributes of geo-features to detect and discover relevant spatial patterns. These techniques include local



spatial autocorrelation, hot-spot identification of G-statistics, KDA, and ARCSM. In vulner-ability analysis, spatial analysis techniques can be used to preprocess and quantify thediscrete or qualitative indicators for use with other quantitative indicators. For instance,spatial autocorrelation is used to detect similarity or dissimilarity of the damage or loss fromthe disaster and G-statistics is used to detect hot spots (high risk) or cold spots (low risk). Inour approach, the KDA is especially useful for processing those qualitative spatial featuresrelated to catastrophic risks such as rivers or faults and quantifying them to be used withother quantitative indicators in vulnerability assessment. This section mainly describes thistechnique (Section 3.2.1).

3.2.1. Kernel density analysis of points or polylines

KDA is a technique of spatial analysis that transforms a sample of observations recorded asgeographically referenced point or polyline data into a continuous surface, indicating theintensity of individual observation over space (Silverman 1986, Kloog et al. 2009). Points orpolylines near the center of a search area are weighted more heavily than those lying near theedge which embodies Tobler’s first law of geography. The kernel weights vary within its

Table 1. Exposure-related indicators (parameters), vulnerability-related indicators, and experiencedmodeling methods.

Disastertype

Exposure-relatedindicators

(parameters)Vulnerability-related

indicators Experienced modeling methods

Earthquake Magnitude,distance fromseismic sources,PGA

Soil type, soil profile,soil response, geologytype, slope, etc.

Gutenberg and Richter’s (1944)magnitude recurrence relationship forMw’s annual probabilities,attenuation relationship by Booreet al. (1997) for estimating thepseudo-acceleration response spectra,a version of MIMQKE (Lestuzzi andBadoux 2003) to simulate the seismicdemand

Flood Flooding cover,water depth

Rainfall, elevation,terrain, slope, soil,vegetation, etc.

SHE (Abbott et al. 1986),TOPOMODEL (Ambroise et al.1996), and DBIN (Garrote et al. 1995)for estimating the relationshipbetween rainfall and flows

Typhoon Wind speed, routeof typhoon

Meteorological states,elevation, terrain,slope, vegetation, etc.

Typhoon re-occurrence model (Huang2001, Fang et al. 2002, William andArthur 1982) to simulate thetyphoon’s occurrence and route

Mudslide Volume of mud,flow rich ofsands

Rainfall, elevation,terrain, slope, soil,vegetation, etc.

Jakob’s (1996) equation to model therelationship between the volume ofmudslide and the peak value of theflow

Avalanche Pressure Meteorological states,terrain characteristics,geological profile

Extreme value statistics for extrapolatedhistory data, flow simulation model(Burkhard and Salm 1992) for therelationship between fracture andresistance, Bayesian inference forestimating a type friction

Release scenarios: different intensity with different return periods.

1764 L. Li et al.


‘sphere of influence’ according to their distance from the point or polyline as the intensityestimated: the surface value is highest at the location of the geo-feature and diminishes overdistance from the geo-feature. The z’s density estimate using KD function is

DensityðzÞ ¼ 1

n

Xni¼1

Zi � Klðz; ZiÞ ð1Þ

where n is number of sample units, z any unit in the geographical area, Zi the value of thesample unit, and Kl(z,Zi) the kernel density function.

For the KD function, we can use the normal function to simulate it:

Klðz; ZiÞ ¼1ffiffiffiffiffiffi2pp exp � 1

2

dðz; ZiÞl

� �2" #

ð2Þ

where the bandwidth or search radius, l, can be set according to relevant empirical knowl-edge of the factor variable and the goal, d(z,Zi), is the Euclidean distance between any unit(cell), z, and the sample unit Zi on the geographical area under investigation, the unit canrepresent loss of a disaster event or intensity of a certain indicator. For instance, if a cell iscloser to a river, it is more possible for it to have more loss in a flood.

How to set l is affected by empirical knowledge and the goal. A big l means moregeneralized over the entire study area, whereas a small l means localized over the area. Asour goal is to reflect the influence of relevant indicators such as rivers and faults on thedamage to the individuals exposed to a disaster, l can be set according to the biggestinfluence range of the relevant indicators in practical disasters.

As a smoothing technique of spatial data, KDA enables us to derive the continuoussurface data from geo-features such as points or polylines. This facilitates integration of ourmodeling system with the data of geo-features such as points and polylines. Anotheradvantage of KDA is that it avoids the drawback of that ‘aggregate’ approach withinwhich the estimated average exposure in a particular region may serve as a reasonablesurrogate for the actual exposure of individuals. Individual exposure levels cannot accu-rately be inferred from aggregated data and KDA helps to preserve the original number andintensity of observation without a need either to aggregate or average them (McCoy andJohnston 2001).

According to the KD, we can get the intensity classification of the related indicator.Different intensity can have different influence upon the disaster. For instance, if a unit ofbuildings is closer to the active faults, the faults have more intensive influences on thedamage to the unit of buildings exposed to a seismic disaster.

3.3. Bayesian network modeling framework

A BN is a directed acyclic graph

BS ¼ G V ;Eð Þ ð3Þ

called the network structure of the network, B, where V is the set of random variables (r.v.),E 2 V ·V is the set of directed edges, representing the probabilistically conditional depen-dency relationship between r.v. nodes that satisfies theMarkov property, that is, there are nodirect dependencies in BS which are not already explicitly shown via edges, E, and



BP ¼ fgu : �u · �pu ! ½0 . . . 1�ju 2 Vg ð4Þ

is a set of assessment functions, where the state space �u is the finite set of values of u; pu isthe set of parent nodes of variables of u, indicated by BS; if X is a set of variable, �X is theCartesian product of all state spaces of the variables in X; and gu uniquely defines the jointprobability distribution P(u|pu) of u conditional on its parent set, pu.

BN is based on the Bayesian theorem, that is, inference of the posteriori probability(called Belief) of a hypothesis according to some evidences. In the assessment of cata-strophic risk, evidences come from exposure-related or vulnerability-related indicators(Table 1) and the hypothesis refers to the state of risk that may be classified as severaldamage states from low levels to high levels. Let t be such a hypothesis variable of damagerisk and its state space �r has, say, seven states, �r ={none, slight, light, moderate, heavy,major, destroyed}. In a specified BN, if some evidences are given, we can estimate aposteriori probability or belief of the target variable t being in a certain state by calculatingthe marginal probability:

Belief ðx; tÞ ¼X

ui2V ;ui�r

pðu1; u2; . . . ; r; . . . ; unÞ ð5Þ

where x is the unit or entity, for example, a cell of the grid surface that represents persons orbuildings exposed to the disaster, t is the value of a certain state of the damage, t 2 �r,pðu1; . . . ; unÞ ¼

Qui2V

pðuijpuiÞ is the joint probability over V.

To construct a BN, we need to identify the variables of indicators necessary andsufficient to model the problem framework, to build the interdependencies between the r.v.nodes, graphically shown by arrows, to assign the states or quantities to nodes, and tocalculate the assessment parameters of conditional probabilities.

Figure 2 describes our BN modeling framework for loss risk assessment of properties,say buildings. In this framework, R denotes release scenario, MP the set of exposure-relatedindicators, E the exposure intensity, EV the environmental variables such as soil type,landform, or geological condition, BC building characteristics related to vulnerability ofbuilding, and BD damage to buildings. BD is the target variables and their probabilities beingin a certain state are what we aim to estimate. In this framework, we use capitalized italicized

MP: modelingparameters of exposure R: release scenario

E: exposure intensity and derived hazardsExposure

System resistence

Vulnerability

EV: Environmentalvariables

BD: Building/content damage

States of the target variable, BD: none, slight, light, moderate, major, destroyed

BC: Building characteristicssuch as type, story area, historyand physical environment etc.

Figure 2. Generic Bayesian modeling framework for vulnerability assessment.

1766 L. Li et al.


characters to represent a univariate and capitalized characters to represent the set of multi-variates. So in this framework, R and BD denote a univariate but MP, E, EV, and BC denote aset of multiple variables. Also, MP, R, and E are exposure indicators; EVand BC are system-resistance indicators; BD is the target variables.

We construct the framework according to domain knowledge (William and Arthur 1982,Alexander 1993, Amendola et al. 2000, Huang 2001, Straub 2005). In this framework, it isassumed that release scenarios (R) and related parameters (MP) of exposure have determi-nistic influences on the exposure intensity or derived hazards (E); environmental variables(EV) have a mitigating or amplifying effect on exposure intensity or derived hazards that inturn affects the vulnerability; building damage (BD) is assumed to be determined byexposure (E) and building characteristics (BC). For the set of multivariates (MP, E, BC),there may be interdependencies between these variables besides their dependenciesdescribed in Figure 2. We can construct their interdependencies according to domainknowledge or learning from the training data. Once a BN framework is constructed, weneed to refine the interdependencies between local domain variables in MP and BC andcalculate the conditional assessment parameters (probability).

Table 2 lists major methods for construction, inference, and prediction of BN. Inpractice, we can choose adaptive methods for construction, learning, and prediction of BNfor vulnerability assessment.

3.4. Vulnerability assessment

Vulnerability assessment is to estimate the susceptibility of individuals exposed to naturaldisasters (William and Arthur 1982) and it corresponds to the potential degree of damage tothe individuals exposed to natural disasters. The greater the vulnerability, the more damagethe individual has.

Table 2. Methods for construction, inference, and prediction of BN.

Steps Type Methods

Structure Domain knowledge based Construct BN according to domain or empiricalknowledge

Dependency analysis based Conditional independence (CI) (Korb and Nicholson2004)

Search scoring based(Bouckaert 1995)

Qualitymeasure

Bayesian approach, information criterion approach,and minimum description length approach

Learningmethods

Heuristic search strategies: K2, hill climbing, TAN,etc.

General-purpose search strategies: Tabu, simulatedannealing, genetic programming, etc.

Parameterlearning

Domain knowledge based Reports, statistics, and experienced modelsDistribution based Dirichlet-based parameter estimatorWith missing data (Korb andNicholson 2004)

Expectation maximization, Gibbs sampling

Inference Exact inference Joint probability, naı̈ve Bayesian, graph reduction,and polytree (Pearl 1988)

Approximate inference Forward simulation, random simulation (Pearl 1988)Prediction Three types of reasoning Causal, diagnostic and intercausal (Korb and

Nicholson 2004)



There is no concerted definition for vulnerability. In this study, we regard vulnerability asthe comprehensive damage index because of exposure to a natural disaster. In our BNframework (Figure 2), the target variable is damage states of buildings that are affected bymultiple indicators such as exposure intensity, environmental variables, and characteristicsof buildings. Using BN, we can integrate a variety of exposure and environmental- andvulnerability-related indicators from different sources to estimate the probabilities of thebuildings being in a damage state. Each state (none, slight, light, moderate, heavy, major, ordestroyed) of the target variable, BD has a corresponding damage factor range and a centraldamage factor (Table 3). Damage factor is the fraction or percent of the damaged part of theindividual exposed to a natural disaster.

Integrating the multiplication of the probability of the vulnerable individual being in adamage state and its corresponding central damage factor, we can get the vulnerability indexof the individual:

VulðxÞ ¼ðt¼Max

t¼MinCDFðtÞBelief ðx; tÞdt �

Xt¼destroyedt¼none CDFðtÞBelief ðx; tÞ ð6Þ

where x denotes a unit to be estimated, for example, a cell in a grid data that represents thevulnerable buildings, t the damage state of the target variable, BD (t 2 �r={none, slight,light, moderate, heavy, major, destroyed}), CDF central damage factor, and Belief(x,l) thelikelihood or posterior probability of x being in a damage state.

3.5. Insurance pricing

To make an insurance premium, we make an assumption that the more vulnerable is theinsured individual, the higher is the premium. Under this assumption, we derive the defini-tion of insurance rate of a disaster according to Hayes and Jacobson (2001), Meng and Yuan(2000), and Li et al. (2005):

rðxÞ ¼ ððMax

I¼Min

pðIÞVulðx; IÞdIÞ � d � s � ue

ð7Þ

where x is defined same as in Equation (8); Min, minimum exposure intensity of the naturaldisaster; Max, maximum exposure intensity of the natural disaster; I, exposure intensitydetermined by release scenarios; p (I), natural probability of the disaster at exposure level I,given by the release scenario; Vul(x, I), the fraction of the potential damage ratio to the insured

Table 3. Different states of the target variable and their damage factors.

Damage state Damage factor range (%) Central damage factor (%)

None 0 0Slight 0–1 0.5Light 1–10 5Moderate 10–30 20Heavy 30–60 45Major 60–100 80Destroyed 100 100

1768 L. Li et al.


individual (persons or buildings), same as the vulnerability index in Equation (8); d, damageadjustment expense factor (expressed as a percentage of damages or claims payments topolicyholders); �, deductible offset, determined by the size of claim; u, under-insurance factor,reflecting the difference between property values and the amount of insurance purchasedwithinbasic and additional coverage limits for each category of risk, and determined by a review ofinsurance claims; e, expected loss ratio, serving to load the actuarial rates for insurance agents’commissions and other acquisition expenses and a small contingency loading.

With the insurance rates obtained, we can easily compute the premium for a cell thatrepresents the buildings with an insurance term:

preðxÞ ¼ aðxÞ · rðxÞ · co ð8Þ

where pre is the premium to be estimated, x the cell, a(x) the insured amount of x in monetaryvalues, r(x) the insurance rate of x inferred using our Bayesian model, and co the transfercoefficient of insurance term which is given by insurance experts.

3.6. Uncertainty and sensitivity analyses

Uncertainty and sensitivity analyses are to establish the effect of the uncertainty associatedwith the distribution of the variables used in the BN. Because of the uncertain inherency ofnatural disasters, it is necessary to detect the uncertainty of the variable of risk and indicatorsand decrease such uncertainties by conducting uncertainty and sensitivity analyses.

There are two sources of uncertainty: epistemic uncertainty and aleatory uncertainty.Lack of knowledge causes epistemic uncertainty that can be remedied by further learningand experiments. Aleatory uncertainty is associated with randomness. Aleatory uncertaintycan be better estimated, but it cannot be reduced.

Our method is based on the Bayesian probability model that has a better detection ofcatastrophic risk and can decrease the aleatory uncertainty which the disaster-reducingdepartment of the government and insurance industry need to address. Therefore, we onlyincorporate epistemic uncertainty in the uncertainty and sensitivity analyses.

In uncertainty analysis, the BN, as an inherent means of uncertainty inference, cancalculate the probabilities of the damage states given some evidences such as characteristicsof buildings and hazard risk. Thus, when the uncertainties of predictive indicators arepropagated across the network, the model can determine the distribution of the expectedvulnerability with respect to these uncertainties. We use the upper (u) and lower (l) bounds ofthe 95% Bayesian credible intervals (a.k.a. confidence intervals) to represent the uncertaintyof the target variable, Vul of Equation (6):

u ¼ F�1Vulð0:975Þ ð9Þ

l ¼ F�1Vulð0:025Þ ð10Þ

with F�1VulðpÞ being the inverse of the cumulative probability distribution function of Vul.Sensitivity analysis is to detect howmuch impact each of the predictive indicators related

to epistemic uncertainty has on the uncertainty of the vulnerability index (Vul). We useShannon’s mutual information (Shannon and Waver 1949, Pearl 1988) to measure thesensitivity of the target variable:



IðT ;X Þ ¼ HðTÞ � HðT jX Þ ¼ �Xni¼1

Xmj¼1

Pðtj; xiÞ logPðtj; xiÞPðtjÞPðxiÞ

ð11Þ

where T is the target variable, BD, and the variables X are the predictive indicators describedin the BN of Figure 2.

Using this analysis, we can quantify the impact of the predictive indicators on the vulnerabilityand determine the indicator with the maximum impact. Thus, we can find the indicator with thelargest uncertainty which has a certain implication for vulnerability analysis and insurance pricing.

4. The study case

We use the earthquake disaster as an illustration of our methodology. In Section 4.1, weintroduce the study region and goal. Section 4.2 presents the simulation of the peak groundacceleration (PGA) distribution under different scenarios (the occurrence frequency), thenwe use the simulation data with other geospatial data to compute the vulnerability(Section 4.3), and next we make insurance pricing based on the prior result (Section 4.4).In Section 4.5, we make uncertainty and sensitivity analyses.

4.1. Study region and goal

The study region of interest (ROI) (Figure 3a) is a rectangular region that is located at DuJiangyan, Sichuan province of China, lying between north latitude 30�57057.318’’ and31�1012.768’’ and between east longitude 103�35019.657’’ and 103�4107.6’’. The studyregion is close to the catastrophic disaster of the 12 May 2008 Wenchuan earthquake thattook lives of about 80,000 persons and caused the loss of billions. This earthquake is a resultfrom the interactive movements of Indian Plate and Eurasian Plate in opposite directions.

Our specific goal of the study case is to use the historic seismic catalog to simulate underthe seismicity context (Figure 3b) the probabilistic seismic hazard risk, that is, PGAvalues attwo levels of exceedance probability and use the two release scenarios to conduct vulner-ability analysis and insurance pricing of the residency buildings lying in ROI.

4.2. Dataset

Because of difficulty of obtaining the survey data of buildings in the ROI, our simulation issimplified using spatial data of residency and landuse. The simplified simulation is based on adataset (Figure 3c) of rectangular polygons each of which is simply regarded as one basic unitof buildings, a.k.a. a cell of the grid dataset. The rectangular polygons (Figure 3c) are extractedaccording to the spatial data of residency and landuse from the National Geomatics Center ofChina, using the resampling technique. The acquirement of data is described as follows:

l Indicators related to exposure include release scenario (rs), magnitude (m), distance(d), landslide risk (lsr), liquefaction risk (lfr), and ground-motion risk (pga). We canobtain rs, m, d, and pga by exposure modeling according to the catalog of historicalearthquakes and seismicity around this region (Figure 3b). Also lsr can be quantifiedusing the relevant method (Jakob 1996, Chen et al. 2008) and relevant environmentalindicators that include slope, soil type, PGA, and kernel densities of rivers and activefaults. But for lfr, the lack of relevant soil data of this region makes us discard therelevant analysis and so lfr and its relevant indicators are not included in the dataset.

1770 L. Li et al.


l Indicators related to system resistance include environmental variables and buildingcharacteristics. Environmental variables include soil type (st), close to faults? (kdf),close to rivers? (kdr), and slope (sl). We used the kernel density function described inSection 3.2.1 to quantify kdf and kdr and made a suitable classification of them(Table 4). Characteristics of buildings include building presence (bpr), (average) ageof buildings (bag), the (average) number of stories (bst), and (major) structure type(btype). If there are multiple buildings in a rectangular polygon unit, we use theiraverage age as bga, their average number of stories as bst, and their major structure typeas btype. We assigned relevant building characteristics (bpr, bag, bst, and btype) to eachrectangular polygon according to relevant survey, aerial photos, and other materials.

l These rectangular polygons are set as buildings according to the spatial data of residencyand landuse in Du Jiangyan. Each polygon is a square of 2 m · 2 m with the area of4 m2. This gives our dataset enough resolution to simulate the practical situation. Thevalues of each polygon’s relevant indicators are assigned this way: spatial data ofvectors such as faults and rivers are converted into the grid surface using the kerneldensity method, all the grids are resampled to the target grid with 2 m · 2 m, and thetarget grid is then vectorized into the dataset (Figure 3c) of rectangular polygons witheach polygon corresponding to one or several residency buildings.

4.3. Hazard analysis

We use the well-known probabilistic seismic hazard analysis (PSHA) to conduct the hazardanalysis of earthquake. The goal of PSHA is to quantify the rate (or probability) of exceeding

LegendHistorical magnitude

6.0–6.56.0

6.5–7.57.5–8.0

FaultsBroken beltsActive faultsPleistocene faults

ROI90 45 0 90

km

km

N

N

N

Legend

Legend

ROIFacilitiesRiversResidency

0 .5 1 2km

(a)

(b)

(c)PB: simulated polygongs of buildings

PB

ROI

1 0.5 1

Figure 3. The study region of interest (ROI) (a), the ROI’s background of seismicity (b), and thetargeted polygons of buildings (c).



various ground-motion levels at a site (say, a cell in a grid dataset) given all possibleearthquakes. The numerical/PSHA method was initially developed by Cornell (1968) andits computer form was developed by McGuire (1978) and Algermissen and Perkins (1976).

Table 4. Variables and their descriptions in BN of seismic vulnerability.

Variable # States or intervals (unit) Source of probability distribution

Release scenario(rs)

2 10% in 50 years; 10% in 100 years Set according to the PGA exceedanceprobability in T years

Magnitude (m) 6 0–5.0–5.5–6.5–7.0–7.5–1 The annual probabilities calculated usingthe Gutenberg–Richter magnituderecurrence relationship (Gutenberg andRichter 1944)

Distance (d) 5 0–10–20–40–80–1 (km) Distance is that from the seismic sources tothe site of interest

Ground motions(PGA) risk(pga)

11 0–30–50–150–250–350–450–550–650–750–850–1 (gal)

Deterministic relations, calculated as afunction of magnitude, distance, soiltype by PSHA (Appendix 1)

Soil type (st) 5 Unknown; hard rock; soft rock;medium soil;soft soil

Amplification factor for PGA and landsliderisk: 1.0 (unknown, medium soil), 0.55(hard rock), 0.7 (soft rock), 1.3 (softrock) (Day 2002; Bard 1994)

Close to faults?(kdf)

6 0–1–2–3–4–5– Quantified using kernel density function(Section 3.2); assume that closer toactive faults, more risk of damages(Hastie 2001)

Close to rivers?(kdr)

11 0–100–200–300–400–500–600–700–800–900–1000–

Quantified using kernel density function(Section 3.2.1); assume that closer torivers, more risk of damages (Hastie2001)

Slope (sl) 9 0–5–10–15–20–25–30–35–40–90 Slopes is assumed to cause mudslides thatcause damages to buildings (Jang 2002)

Landslide risk(lsr)

3 Safe or slight risk; moderate risk;high risk

Five factors, that is, rivers, faults, soil,slope, and PGA, are responsible forlandslides; modeled using the fuzzymethod (Chen et al. 2008)

Liquefactionrisk (lfr)

2 Ground amplification;liquefaction

Modeled using the method of earthquakeengineering (Bard 1994, Day 2002)

Buildingpresence (bpr)

2 Yes/no Assume that the prior probability p = thearea of buildings/the area of ROI

Age (bag) 4 0–15–30–40– (years) Older houses are more vulnerable;classification according to Zhang (2008)

Structure type(btype)

5 Shear wall; steel or concretemoment frame; wood;masonry; brick or mud wall

Different structures are assumed to havedifferent resistance to the seismicdisaster, that is, different damage factors(Yi 1995, Day 2002)

Story (bst) 4 1–5–7–13– Higher buildings are assumed to have a lessvulnerability (Yi 1995, Day 2002)

Buildingdamage (bd)

7 None; slight; light; moderate;heavy; major; destroyed

Refer to Table 3 for corresponding damagefactors of the states; the conditionalprobabilities are set according to domainand empirical knowledge (Guo and Chen1992, Yi 1995, Kramer 1996, Day 2002)

Note: #, number of states.

1772 L. Li et al.


In our analysis, we use the traditional PGA to simulate the ground motion in PSHA. PGAis used to define lateral forces and shear stresses in the equivalent-static-force procedures ofsome building codes and in liquefaction analyses and it is a good indicator of ground motions.

PSHA has five steps (Figure 4) that are described in Appendix 1.In our study case, we conducted PSHA according to the incomplete catalog of historical

earthquakes and seismicity context (Figure 3b) around the study region. Also, in the thirdstep of PSHA, we consider the influence of the environmental variable, soil on PGA, andderive the attenuation relationship from the original Equation (A.3):

PGAðm; d; s; rÞ ¼ exp tðsÞ t1ðm� a1Þ � t2 lnðd2 þ a2Þ þ a3� ��

ð12Þ

where s is the soil type that can mitigate or amplify PGA. The amplification factor isquantified using the function t(s) that is the function of s. Specifically, we modeled therelationship based on Boore et al. (1997). The amplification factor function t(s) is set ast = 1.0 when the soil is unknown or medium, t = 0.55 when the soil is hard rock, t = 0.7 whenthe soil is soft rock, and t = 1.3 when the soil is soft soil (Bard 1994, Day 2002).

In the study case, we got the PGA maps of two levels of exceedance probability, that is,10% chance of exceedance PGA within 50 years (Figure 5a) and 10% chance within 100years (Figure 5b) that, respectively, have the return periods of 475 and 950 years.

4.4. Vulnerability modeling and insurance pricing

Vulnerability modeling involves specification of the BN that includes construction of the BNtopology and extraction of the CPT parameters, as well as the computation of expectedvulnerability index.

FAULT 1

FAULT 2

AREASOURCE

Site

(a) scenarios of seismic sources

LOG

N

(b) recurrence of magnitude:Poissonian

MAGNITUDE

GR

OU

ND

AC

CE

LER

AT

ION

DISTANCE, KM

(c) ground motion model

(e) Generation of the PGA map (d) exceedance probability

1.0

CD

F

0

ACCELERATION, g0 2.0

Figure 4. Steps of probabilistic seismic hazard analysis (PSHA).



We constructed the BN of seismic vulnerability according to domain knowledge ofearthquake engineering (Guo and Chen 1992, Bard 1994, Kramer 1996, Day 2002, ADB2005, Bayraktarli et al. 2005) and the generic framework of Figure 2. Figure 6 presents itsnetwork topology.

As shown in Figure 6, exposure-related indicators include rs,m, d, pga, lfr, and lsr. Amongthese indicators, m and d are modeling parameters of exposure, pga is the intensity indicator,and pga, lfr, and lsr are three risk indicators responsible for the damages to buildings, bd. Thus,we have three causal links from three risk indicators, pga, lfr, and lsr to the target variable, bd.Because of lack of some soil data such as soil profile, and clay content, the modeling ofliquefaction risk was discarded. But they should be considered if such data are available.Therefore, to give a complete modeling of seismic risk, the soil variables are also given inFigure 6, shown with dotted ovals, indicating they were not used in the study case. Thus, inthe practical test, by modeling pga and lsr, we simulated the different vulnerability states ofROI under two scenarios of different PGA at the two exceedance probabilities.

In Figure 6, the indicators relevant to system resistance include environmental variablesand characteristics of buildings. Environmental variables include soil type (st), close tofaults? (kdf), close to rivers? (kdr), and slope (sl). We used the kernel density function (1) toquantify kdf and kdr and made a suitable classification of them (Table 4). Characteristics ofbuildings include building presence (bpr), age (bag), the number of stories (bst), and structure

N

LegendROI

PGA (gal)High: 189.5

Low: 172.2

N

LegendROI

PGA (gal)High: 370.3

Low: 233.6

0.025 0.0125 0km

(a)

(b)

0.03 0.015 0km

Figure 5. The surfaces of 10% chance of exceedance PGA (a) within 50 years and 10% chance ofexceedance PGA (b) within 100 years.

1774 L. Li et al.


(btype). Day (2002) and Guo and Chen (1992) demonstrate that characteristics of buildingssuch as age, structure, and stories have a considerable effect of mitigating or amplifying thedamage power of an earthquake disaster.

The target variable of the Bayesian model is building damage (bd) that has seven states asdescribed in Figure 2. Its corresponding damage factors are set according to experience ofearthquake engineering (Table 5) (Yi 1995, Day 2002).

Table 4 gives a brief description of the variables (indicators) involved in our BN(Figure 6) and the sources of the distributions of their probabilities.

Using the modeled BN to make the prediction of the damage probabilities and Equation(6), we computed the vulnerability index for the rectangular polygons of buildings inROI (Figure 7). Figure 7a presents the vulnerability index of buildings in ROI under thescenario of 10% chance of exceedance probability of PGAwithin 50 years; Figure 7b presentsthe vulnerability index under the scenario of 10% chance of exceedance probability of PGAwithin 100 years. The former corresponds to a return period of 475 years and the latter to areturn period of 950 years that is similar to the Wenchuan earthquake on 20 May 2008.Comparing the vulnerability indexes of the 475 and 950 return periods, we found that the 950’svulnerability is considerably bigger than the 475’s, indicating higher vulnerability. The spatialdistribution of the 950’s vulnerability has some differences from that of 475’s. This can beexplained by the difference of the 950’s PGA from the 475’s: as the 950’s PGA is stronger thanthe 475’s, their spatial distributions of PGA are different (Figure 5a vs. 5 b).

Table 5. Shannon mutual information with vulnerability index, Vul, as the target variable, and otherpredictive factors influencing Vul.

Predictive factors Shannon mutual information

Structure type (btype) 0.3692Age (bag) 0.1417Slope(sl) 0.0855Close to river? (kdr) 0.0719PGA risk (pga) 0.0585Close to faults (kdf) 0.0549#Storey (bst) 0.0532Soil type (st) 0.0346

Release scenario (rs)

Magnitude (m) Distance (d)

Soil profile

Liquid limitClay content

Soil type ( st)Close to faults? ( kdf)

Close to rivers? ( kdr)

Slope ( sl)Liquefactionrisk ( lfr)

Ground motions(PGA) risk ( pga)

Landsliderisk ( lsr)

#Story( bst)

Buildingdamage ( bd)

Structure type ( btype)

Building presence ( bpr)

Age( bag)

Liquefactionsusceptibility

Target variable Indicators relevant to system resistance Indicators related to exposure Not modelled in the study case

Figure 6. Bayesian network topology of seismic vulnerability.



Based on the vulnerability index of two scenarios computed by the BN, we can estimatethe insurance rate for the rectangular polygons of buildings in ROI. In the simulation ofinsurance rates, we can determine the values of other parameters in Equation (7) by a reviewof insurance claim files or by insurance experts: l = 0.95, u = 0.99, � = 0.98, e = 0.97.Figure 8b presents the rates (Equation 7) simulated for ROI using our Bayesian methodunder two scenarios of different PGA return periods.

As shown in Figure 8b, the spatial distribution of insurance rates is not even across ROI:the polygons with a higher vulnerability index have higher rates that are determined by theirbuilding characteristics, surrounding environment, and the risk intensity of PGA and landslideestimated. This illustrates a major advantage of our method for the insurance industry ofdisasters: better spatial location of vulnerability and thus more objective pricing of the rates.

If we choose a polygon of buildings as the insured object (Figure 8a), its insured amount(a) is assumed to be RMB 500,000 yuan, the transfer coefficient (co) of the insurance term of10 years is assumed to be 8.91, and its insurance rate is 1.21‰ as computed using ourapproach, then we can use Equation (8) to compute the house’s premium for 10 years, that is,500,000 · 1.21‰ · 8.91 = 5390.55.

Legend

Vulnerability

Legend

Vulnerability0.020–0.03

0.03–0.051

0.051–0.062

0.062–0.069

0.069–0.77

0.077–0.148

0.148–0.175

0.175–0.400

0.000–0.033

0.033–0.044

0.044–0.055

0.055–0.064

0.064–0.071

0.071–0.084

0.084–0.213

0.23–0.400

ROI

ROI

0.5 01 1km

0.5 01 1km

(a)

(b)

Figure 7. Vulnerability index estimated by the Bayesian method: (a) 10% chance of exceedancePGA probability within 50 years; (b) 10% within 100 years.

1776 L. Li et al.


4.5. Uncertainty and sensitivity analyses

In the uncertainty and sensitivity analyses of this study case, we consider the epistemicuncertainty whose sources are from eight variables, btype, bag, sl, kdr, pga, kdf, bst, and st.

Figure 9 shows the expected values, lower (l) and upper (u) bounds of the Bayesiancredible intervals with 95% credible degree for the vulnerability of the polygons of build-ings. As shown in Figure 9, some polygons have less variance but some have more variance(the difference between l and u is larger) that indicates more uncertainty. For these units withmore variance, we need more survey to reduce the uncertainty.

Table 5 presents the result of sensitivity analysis: Shannon mutual information of eightvariables with vulnerability, Vul as the target variable. From this table, we can see that btypeand bag have the largest values that indicate their greatest influence on the vulnerabilityindex. Following them are sl, kdr, pga, etc. As the uncertainty sources of the four indicators areepistemic, this result suggests that decision-makers should prioritize local data collectionefforts on these indicators rather than on other indicators listed in the BN.

5. Discussion

As a means of probability inference, BN offers several specific advantages over othermethods in evaluation of catastrophic risk.

Primarily, BN supports a good platform for integrating information sources from multi-disciplinary specialist fields and using them to make uncertainty inference. We propose ageneric Bayesian framework for modeling catastrophic risk (Figure 2). As illustrated in ourstudy case of seismic risk, BN integrates probability seismic hazard model of seismic expert,assessment of the environment and system resistance by architects and engineers, and knowl-edge of insurance pricing from actuaries within its consistent modeling system. Also BN canupdate the prediction of risk given partial new evidences even with missing data. Thisfacilitates the decision-makers to gain a good knowledge of the practical situation or somepotential scenarios in the future before taking actions or making the emergency plans.

Our simulations under two different scenarios of PGA (Figure 5) demonstrate thatdifferent ground motions produce different vulnerabilities for the target polygon of buildings(Figure 7). In the simulation of the return period of 950 years, we also used the AAM to

(a) The assumed insured polygon of buildings

LegendInsurance rate

0.000061–0.000100.000102–0.000130.000132–0.000150.000159–0.000170.000180–0.000190.000198–0.000210.000215–0.000240.000242–0.000450.000454–0.000550.000559–0.00121

0.01 0.005 0 0.01

km(b)

Figure 8. Insurance rates for the rectangular polygons of buildings estimated by BN.



predict the loss risk and compared AAM and our approach with the ROI’s damage situationestimated by aerial photos and practical surveys (Civil and Structural Groups of TsinghuaUniversity 2008) of the Wenchuan earthquake of 12 May 2008. In AAM, we used thecensus-related areas (streets) as the statistical areas (Figure 10) and made the assessment ofloss risk. Table 6 lists the probability of detection (pd), probability of false alarms (pf),precision and receiver operating characteristic areas (ROC area) for each damage statepredicted by AAM and our approach. As shown in Table 6, our method achieved a better pd,a better precision, and a better ROC area for each damage state than AAM. In the pd,precision, and ROC area columns of the BN rows of Table 6, the percentages in the bracketsindicate the improvement percent of our approach compared to AAM. As seen from thistable, our approach’s improvements in pd and precision at each damage state are marked(ranging from 21.6% to 1811%). Our approach’s improvements in ROC area range from2.5% to 38.2% and the improvements are slight for the ‘none and slight’ and ‘destroyed’

Legend

Lower bound

Expected

Upper bound

.3

Legend

Lower bound

Expected

Upper bound

.3

0.3 0.15 0

0.3 0.15 0

km

km

(a)

(b)

N

N

Figure 9. Expected values, lower and upper bounds for the polygons of buildings: (a) 10% chanceof exceedance probability within 50 years; (b) 10% within 100 years.

1778 L. Li et al.


states but our approach is still better than AAM. In total, our approach achieved a predictionaccuracy of 0.746 with kappa statistic being 0.635 but AAM just had an accuracy of 0.55with kappa statistic being 0.344 that is much worse than our approach. From the comparison,it can be seen that our approach is better than the aggregate area method and its prediction isacceptable for monitoring and forecast of seismic disasters.

Another advantage of our method is use of spatial analysis and GIS to assess, visualize,and locate the vulnerability and insurance rate. We use the technique of KDA of spatialanalysis to quantify the influence of potential geo-features such as faults and rivers. Weassume that the closer the individuals exposed are to the active faults or rivers, the higher isthe vulnerability. This assumption is normal under many situations. Use of the spatialanalysis technique such as KDA and ARCSM is significant because potential influencesfrom many risk-related geo-features can be quantified and modeled with other quantitativevariables using this method. Also use of GIS makes it easier to locate and represent thegeographical variations of the vulnerability and their uncertainties. Similarly, we can alsovisualize geographical variation of insurance rates (e.g., Figure 8) which is more objectiveand conforms to a practical situation. Use of spatial analysis and GIS together in our grid-based modeling has significant implication: the risk-prone sites at a more fined scale arespatially specifically located. This enables the decision-makers to make better planning of

Legend

LandmarkRiverLakeROI

0 0.01 0.02km

Figure 10. The census-related areas (streets, as gray polygons) used for the aggregate areas inthe AAM.

Table 6. Comparison of the simulation of 950 years with the practical situation.

Damage state Method Pd Pf Precision ROC area

None and slight BN 0.580 (34.6%) 0.078 0.637 (22.0%) 0.820 (2.5%)AAM 0.431 0.109 0.522 0.801

Light BN 0.876 (21.6%) 0.249 0.749 (22.4%) 0.874 (10.5%)AAM 0.713 0.400 0.612 0.791

Moderate BN 0.707 (1811%) 0.013 0.866 (246.4%) 0.982 (38.2%)AAM 0.037 0.013 0.250 0.711

Heavy BN 0.776 (68.7%) 0.024 0.691 (128.1%) 0.973 (18.7%)AAM 0.460 0.073 0.303 0.820

Major BN 0.956 (243.9%) 0.026 0.783 (127.0%) 0.974 (11.3%)AAM 0.278 0.051 0.345 0.875

Destroyed BN 0.788 (31.3%) 0.001 0.976 (33.3%) 0.955 (6.1%)AAM 0.601 0.018 0.732 0.901

Note: BN, our BN-based method; AAM, the aggregate area method; in pd, precision, and ROC area columns ofthe BN row, the percents in the brackets indicate the improvement percent of our approach compared to AAM.



buildings before the disaster and more effectively allocate resources during the disaster. Thiscan also help the insurance industry to make a more adaptive premium.

Because of the limited record of historical seismicity around ROI, we just simulate twoscenarios of two return periods (475 and 950 years) to compute the insurance rate. If there aremore records or simulations of occurrence of earthquakes, we can input them into our systemand estimate the PGA and vulnerability of more return periods that can be used to update theestimation of insurance rate and premium.

BN provides a flexible framework (Figure 2) based on which we can easily extend thenetwork by adding more predictive indicators from other sources such as soil profile and claycontent (if such data are available) or using sophisticated learning algorithms to improve thenetwork. In the future, we will put more emphasis on the extension of our approach.

6. Conclusion and future work

Given uncertainty of natural disasters, we propose a generic Bayesian framework forvulnerability assessment and insurance pricing of catastrophic risk in a spatially explicitmanner. Our framework and method are applicable for most natural disasters. In this method,BN provides a platform integrating a variety of information sources from different fields andfacilitates communications between different domain specialists, for example, experts ofnatural disasters, architects and engineers, and economists. BN is a means of uncertaintyanalysis through propagations of uncertainties from different sources which makes it aninnovation means relative to existing methods of applying spatial vulnerability. Thus, ourapproach is suitable for vulnerability assessment of natural disasters that are affected bymultiple random indicators, exposure-related, environmental- and system resistance.

Integrating a probability approach into a GIS and use of spatial analysis allow derivationof more quantitative predictive influence indicators from potentially relevant geo-featuressuch as active faults and rivers, and visualization of uncertainties in a spatially explicitmanner. Our study case of seismic risk illustrates that different ground motions, environ-mental conditions (e.g., slope and soil), and characteristics of buildings produce differentvulnerability and insurance rate whose spatial pattern is uneven as shown in GIS (Figures 7and 8). The estimated vulnerability and insurance rates are more objective and spatiallylocated than those obtained with the AAM which has implication for vulnerability assess-ment and insurance pricing. Using the publically available information, this article alsoillustrates how much improvement could be made in estimating seismic risk employing ourapproach compared to the AAM.

Our study case also conducted uncertainty and sensitivity analyses. The uncertaintyanalysis identified where the uncertainty was high and sensitivity analysis helped to identifywhich indicators had the largest influence upon the vulnerability. Thus, uncertainty andsensitivity analyses enable us to quantify the sources of uncertainty and focus our efforts ofsurvey and data collection where necessary.

Although our study case focuses on seismic risk, the Bayesian modeling framework andthe relevant methods are generic (Section 3) and our methodology can be applied to othertypes of catastrophic risk such as flood, typhoon, mudslides, and avalanches, throughincorporating relevant domain knowledge and learning for specification of BN as illustratedin our study case.

BN provides a flexible framework (Figure 2) based on which we can easily extend theframework and enhance the model’s prediction if more data or better learning algorithms areavailable. In the future, we primarily consider the following aspects as our work’s extension:

1780 L. Li et al.


l We can combine domain knowledge and learning from spatial data to improve theperformance. Given enough training samples, we can learn the interdependencyrelationship between variables in EV, MP, BC of the modeling framework and usedomain knowledge to refine the learned relationships which is beneficial for enhan-cing the prediction.

l We will consider the influence of temporal factor in BN for seismic vulnerabilityassessment. It is supposed that the probability of earthquake occurrence increases withthe elapsed time since the last major (or characteristic) earthquake on the fault thatcontrols the regional earthquake hazard. Thus, the temporal node can be added into theBN to improve the hazard analysis and the accuracy of the forecast. This is a majordirection of our future work.

l Another natural extension of our work is more simulation of seismic probability risk(PGA) and vulnerability that can improve vulnerability assessment and insurancepricing of seismic risk.

l Based on our spatially explicit BN-based approach’s application in seismic riskassessment, we will extend our approach to other types of catastrophic risks primarilyincluding flood, typhoon, and mudslide (three typical natural disasters in China). Wewill study the exposure-related indicators and set up the specific BN modeling frame-work for each type of disaster. With accumulation of domain knowledge and continuallearning from history data, the models are refined and become smart. The output willbe better applied in practical assessment, mitigation, and related decision-making ofnatural disasters.

Acknowledgements

The authors appreciate the constructive comments from the anonymous referees. They are alsothankful to the support from the NSFC grant (40601077/D0120 and 40471111/D0120), MOST grants(2007AA12Z233 and O88RA204SA), and PolyU grant (H-ZG20).

References

Abbott, B.M., et al., 1986a. An introduction to the European Hydrological System-SystemeHydrologique Europeen, ‘SHE’, 1: History and philosophy of a physically-based, distributedmodeling system. Journal of Hydrology, 87, 45–49.

ADB, 2005. An initial assessment of the impact of the Earthquake and Tsunami of December 26, 2004on South and Southeast Asia. Metro Manila, Philippines: Asian Development Bank.

Alexander, D., 1993. Natural disasters. New York: Chapman & Hall.Algermissen, S.T. and Perkins, D.M., 1976. A probabilistic estimate of maximum acceleration in rock

in the contiguous United States (Open File Report): USGS.Allmann, A. and Smolka, A., 2000. Increasing loss potential in earth risk-a reinsurance perspective.

Paper presented at the Euroconference on Earthquake Risk, IIASA Laxenburg bei Wien,Singapore, http://www.iiasa.ac.at/Research/RMP/july2000/papers.html

Ambroise, B., Beven, J.K., and Freer, J., 1996. Toward a generalization of the TOPMODEL concepts:topographic indices of hydrological similarity. Water Resources Research, 32, 2135–2145.

Amendola, A., et al. 2000. A systems approach to modeling catastrophic risk and insurability. NaturalHazards, 21, 381–393.

Anselin,L.(1992).SpatialdataanalysiswithGIS:anintroductiontoapplicationinthesocialsciences.NationalCenter for Geographic Information and Analysis, University of California, Santa Barbara, CA. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.75.9557&rep=repl&type=pdf.

Arnold, M., et al. eds., (2006). Natural disasters hotspots: case studies. Washington, DC: TheInternational Bank for Reconstruction and Development/The World Bank.



Bard, P., 1994. Local effects of strong ground motion: basic physical phenomena and estimationmethods for microzoning studies: Laboratoire Central de Ponts-et-Chausees and Observatoire deGrenoble.

Bayraktarli, Y.Y., et al., 2005. On the application of Bayesian probabilistic networks for earthquakerisk management Paper presented at the 9th International Conference on Structural Safety andReliability (ICOSSAR 05).

Boore, M.D., Joyner, B.W., and Fumal, E.T., 1997. Equations for estimating horizontal responsespectra and peak acceleration from Western North American Earthquakes: a summary of recentwork. Seismological Research Letters, 68, 128–155.

Bouckaert, R.R. 1995. Bayesian Belief Network: from Construction to Inference [Dissertation].Utrecht: Universiteit Utrecht.

Burkhard, A. and Salm, B., 1992. Die Bestimmung der mittleren Anrissmaechtigkeit d0 zurBerechnung von Fliesslawinen. Switzerland: SLF, Davos.

Chen, X., Qi, W., and Ye, H., 2008. Fuzzy comprehensive study on seismic landslide hazard based onGIS. Acta Scientiarum Naturalium Universitatis Pekinensis, 44 (3), 434–438.

Civil and Structural Groups of Tsinghua University, Xi’an Jiaotong University and Beijing JiaotongUniversity, 2008. Analysis on seismic damage of buildings in the Wenchuan Earthquake. Journalof Building Structures (in Chinese), 29 (4), 1–9.

Cornell, C.A., 1968. Engineering seismic risk analysis. Bulletin of the Seismological Society ofAmerica, 58, 1583–1606.

Day, R.W., 2002. Geotechnical earthquake engineering handbook. New York: McGraw-Hill.Easterling, D.R., et al., 2000. Climate extremes: observations, modeling, and impacts. Science, 289

(5487), 2068–2074.Easterman, J.R., 2001. Uncertainty management in GIS: decision support tools for effective use of

spatial data. Paper presented at the Spatial Uncertainty in Ecology: Implications for RemoteSensing and GIS Applications, New York.

ESRI, 2001. ArcGIS spatial analyst: advanced GIS spatial analysis using raster and vector data.New York: Environmental System Research Institute.

Fang, X., Ji, L., and Ma, Y., 2002. The application of artificial intelligence technology in tropicalcyclone forecast. Marine Forecast (in Chinese) (19), 54–63.

FEMA, 2001. The road to E-FEMA: information technology architecture. In FEMA ITS Directorate(Vol. 1). Washington DC: Federal Emergency Management Agency Information TechnologyServices (ITS) Directorate.

Garrote, L., and Bras, R.L., 1995. A distributed model for real-time forecasting using digital elevationmodels, Journal of Hydrology, 167(1995), 279–306.

Goodchild, M., Haining, R., and Wise, S., 1992. Integrating GIS and spatial data analysis: problemsand possibilities. International Journal of Geographical Information Systems, 6, 407–423.

Gret-Regamey, A. and Straub, D., 2006. Spatially explicit avalanche risk assessment linking Bayesiannetworks to a GIS. Natural Hazards and Earth System Sciences, 6, 911–926.

Guo, Z. and Chen, X., 1992. Strategies against Earthquakes for Cities (in Chinese). Beijing:Earthquake Press.

Gutenberg, B. and Richter, C.-F., 1944. Frequency of earthquake in California. Bulletin of theSeismological Society of America, 34, 185–188.

Hanson, B. and Roberts, L., 2005. Resiliency in the face of disaster. Science, 309 (1), 1029.Hastie, T., Tibshirani, R., and Friedman, J. (2001). The elements of statistical learning: data mining,

inference and prediction. New York: Springer-Verlag.Hayes, T.L. and Jacobson, R.A., 2001. Actuarial rate review, Federal Emergency Management Agency

and Federal Insurance and Mitigation Administration. Washington, DC: FEMA.Huang, C., 2001. Risk analysis of natural disasters. Beijing: Beijing Normal University Press.Jakob, M., 1996.Morphometric and geotechnical controls of debris flow frequency and magnitude in

southwestern British Columbia. Unpublished dissertation (PhD). University of British Columbia,Vancouver.

Jang, J., Lee, C., and Chen Y., 2002. A preliminary study of earthquake building damage and life lossdue to the Chi-Chi earthquake, Journal of the Chinese Institute of Engineers, 25(5), 567–576.

Jiang, H. and Eastman, J.R., 2000. Application of fuzzy measures in multi-criteria evaluation in GIS.International Journal of Geographical Information Science, 14 (2), 173–184.

1782 L. Li et al.


John, H.G. and Langley, P., 1995. Estimating continuous distributions in Bayesian classifiers. Paperpresented at the Proceeding of the Eleventh Conference on Uncertainty in Artificial Intelligence.Montreal, Quebec: Morgan Kaufmann.

Kloog, I., Haim, A., and Portnov, A.B., 2009. Using kernel density functions as an urban analysis tool:investigating the association between nightlight exposure and the incidence of breast cancer inHaifa, Israel. Computers, Environment and Urban Systems, 33, 55–63.

Korb, K.B., and Nicholson, A.E., 2004. Bayesian artificial intelligence. Boca Raton, FL: Chapman &Hall/CRC.

Kramer, S.L., 1996. Geotechnical earthquake engineering. New Jersey: Prentice Hall.Lestuzzi, P. and Badoux, M., 2003. The gamma-model: a simple hysteretic model for reinforced

concrete walls. Paper presented at the Proceedings of the Fib-Symposium: Concrete Structuresin Seismic Regions, Athens (Karageorgi Servias St.).

Li, L., Wang, J., and Wang, C., 2005. Typhoon insurance pricing with spatial decision support tools.International Journal of Geographical Information Science, 19 (3), 363–384.

Linnerooth-Bayer, J., Mechler, R., and Pflug, G., 2005. Refocusing disaster aid. Science, 309 (1),1044–1046.

McCoy, J. and Johnston, K., 2001. Using ArcGIS spatial analyst. Redlands: ESRI.McGuire, R.K., 1978. FRISK – a computer program for seismic risk analysis (Open File Report).

Virginia: Department of Interior, Geological Survey.Meng, S. and Yuan,W., 2000. Practical non-life actuarial sciences. Beijing: Economics Science Press.Miller, H.J. and Han, J., 2001. Geographic data mining and knowledge discovery. London and New

York: Taylor & Francis.Norsys, S.C., 2009. Netica APIs. Vancouver: Norsys Software Corp.Paterson, E., Re, D., and Wang, Z., 2008. The 2008 Wenchuan earthquake: risk management lessons

and implications. Beijing: Risk Management Solutions, Inc.Pearl, J., 1988. Probabilistic reasoning in intelligent systems: networks of plausible inference. San

Francisco: Morgan Kaufmann.Shannon, C.E. and Weaver, W., 1949. The Mathematical Theory of Communication. Urbana, Illinois:

University of Illinois Press.Shi, P., 2002. Theory on disaster science and disaster dynamics. Natural Disasters (in Chinese), 11(3),

1–9.Silverman, B.W., 1986. Density estimation for statistics and data analysis. New York: Chapman and

Hall.Straub, D., 2005. Natural hazards risk assessment using Bayesian networks. Paper presented at the 9th

International Conference on Structural Safety and Reliability (ICOSSAR 05) (Millpress).Tobler, W.R., 1979. Cellular geography, philosophy in geography. Dordrecht: Reidel.Torun, A. and Duzgun, S., 2006.Using spatial data mining techniques to reveal vulnerability of people

and places due to oil transportation and accidents: a case study of Istanbul strait. Paper presentedat the International Achieves of Photogrammetry, Remote Sensing, and Spatial InformationSciences (SPRS), Technical Commission II Symposium.

Varnakovida, P. and Messina, P.J., 2008. Hospital site selection analysis. Michigan: Michigan StateUniversity.

William, J.P. and Arthur, A.A., 1982. Natural hazard risk assessment and public policy. New York:Springer-Verlag New York, Inc.

Witten, I.H. and Frank, E., 2005. Data mining: practical machine learning tools and techniques. 2 ed.San Francisco: Morgan Kaufmann.

Yi, Z., 1995. Forecast methods of seismic disaster and loss. Beijing: Geology Publisher.Zhang, P., Steinbach, M., Kumar, V., and Shekhar, S., 2005, Discovery of Patterns in Earth Science

Data Using Data Mining. In New Generation of Data Mining Applications. New York: JohnWiley& Sons.

Appendix 1: Steps of probabilistic seismic risk analysis

PSHA has five steps (Figure 4) which are described as follows:

(1) Acquirement of N scenarios of historical or simulated seismic sources that each include adescription of the magnitude (mi), location (Li), and timing (ri) of all earthquakes (limited tothose that pose a significant threat):



Ei ¼ E mi; Li; rið Þ ðA:1Þ

mi is a specific fault or a discrete value sampled from a fault or a region that has a continuous,say Gutenberg–Richter, distribution of events. Li is given as a point or a rectangular surface.

(2) Modeling of the annual rate (ri) of each scenario of seismic source. The time-independentPoissonian process is used to convert ri into probability:

Ppos ¼ 1� exp ri�Tð Þ ðA:2Þ

where T is the average return period.(3) Modeling of an attenuation relationship. The attenuation relationship gives the ground-motion

level as a function of magnitude and distance with other parameters, such as soil types:

PGAðm; d; s; rÞ ¼ exp t1ðm� a1Þ � t2 lnðd2 þ a2Þ þ a3� �

ðA:3Þ

where m is the magnitude; d the distance; a1, a2, and a3 the parameters estimated fromhistorical data by experts. In practice, we can derive the other form from Equation (A.3) giveninfluence from other factors such as soil and landform type.

(4) Computation of exceedance probability of PGA. First, we get the distribution of possibleground-motion levels for a scenario of seismic source from the attenuation relationship:

Piðln PGAÞ ¼1

snffiffiffiffiffiffi2pp exp � ln PGA � gðmi; diÞ½ �2

2sn

( )ðA:4Þ

where g(mi, di) and si are the mean and standard deviation of ln PGA. Then, we get theexceedance probability by integration:

Pið> ln PGAÞ ¼1

snffiffiffiffiffiffi2pp

ð1ln PGA

exp � ln PGA � gðmi; diÞ½ �2

2sn

( )d ln PGA ðA:5Þ

Then, summing over all scenarios of seismic sources, we get the total annual rate ofexceeding each lnPGA:

Rtotð> ln PGAÞ ¼XNi¼1

Rið> ln PGAÞ ¼XNi¼1

riPið> ln PGAÞ ðA:6Þ

Then, using the Poissonian distribution, we can compute the exceedance probabilityof each ground-motion level within the T years:

Pð> ln PGA; TÞ ¼ 1� expð�RtotTÞ ðA:7Þ

(5) Generation of the map of PGAwith a certain level of exceeding probability. With the curve ofPð> ln PGA; TÞ from Equation (A.5), we can get the PGA for each site (cell) with a certain levelof exceeding probability, for example, 10% chance of exceedance PGAs. Thus, with thesimulated PGA value of each cell, we simulate the PGA grid surface with a certain level ofexceedance probability.

1784 L. Li et al.


using spatial analysis and bayesian network to model the

Documents