on performance of case-based reasoning in chinese business failure prediction from sensitivity,...

Of

HS

a

ARRAA

KBCSSPS

1

ctsbfiiCuo[de(aMaia

1d

Applied Soft Computing 11 (2011) 460–467

Contents lists available at ScienceDirect

Applied Soft Computing

journa l homepage: www.e lsev ier .com/ locate /asoc

n performance of case-based reasoning in Chinese business failure predictionrom sensitivity, specificity, positive and negative values

ui Li ∗, Jie Sunchool of Business Administration, Zhejiang Normal University, 91 Sub-mailbox in P.O. Box 62, YingBinDaDao 688, Jinhua 321004, Zhejiang, PR China

r t i c l e i n f o

rticle history:eceived 6 January 2008eceived in revised form 25 March 2009ccepted 6 December 2009vailable online 16 December 2009

eywords:usiness failure prediction

a b s t r a c t

Case-based reasoning (CBR) is a machine learning technique of high performance in classification prob-lems, and it is also a chief method in predicting business failure. Recently, several techniques havebeen introduced into the life-cycle of CBR for business failure prediction (BFP). The drawback of for-mer researches on CBR-based BFP is that they only use total predictive accuracy when assessing CBR.In this research, we provide evidence on performance of CBR in Chinese BFP from various views of sen-sitivity, specificity, positive and negative values. Data are collected from Shanghai Stock Exchange andShenzhen Stock Exchange in China. And we present how data are preprocessed from the view of data

ase-based reasoningensitivitypecificityositive and negative valuesupport vector machines

mining. The classical CBR model on the base of Euclidean metric, the grey CBR model on the base of greycoefficient metric, and the pseudo CBR model on the base of pseudo outranking relations are employed tomake a comparative study on CBR’s predictive performance in BFP. Meanwhile, support vector machine(SVM) is employed to be a baseline model for comparison. The results indicate that pseudo CBR producesbetter performance in Chinese BFP than classical CBR and grey CBR significantly on the whole, and it out-

by t
performs SVM marginallythan SVM by specificity.
. Introduction

Bankruptcy and business failure prediction (BFP) for financialompanies and banks have been an important research area sincehe late 1960s. With the globalization of world economy and diver-ity of customers’ demands, it is more risky for Chinese commercialanks and financial institutions to make a wrong decision. Identi-cation of companies that are at risk remains a goal of auditors,

nvestors, managers and employees. BFP for listed companies inhina is both an interesting and important problem. There is nonderlying theory of bankruptcy or business failure. Thus, researchn BFP has been a searching process for more accurate methods8]. In the beginning, statistical techniques, such as multivariateiscriminant analysis (MDA) [1,4] and Logit regression [21], weremployed. Then, intelligent techniques, such as neural networksNN) [25] and support vector machine (SVM) [33], began to bepplied in this area and they often outperformed statistical ones.eanwhile, group decision technique and information fusion were

lso employed in the area [28,37,38]. Data mining refers to min-ng knowledge from data [11]. Refs. [19,36] used the data miningpproach to find hidden patterns from business data.

∗ Corresponding author. Tel.: +86 158 8899 3616.E-mail addresses: [email protected] (H. Li), [email protected] (J. Sun).

568-4946/$ – see front matter © 2009 Elsevier B.V. All rights reserved.oi:10.1016/j.asoc.2009.12.005

otal predictive accuracy and sensitivity, while it is not significantly worse

© 2009 Elsevier B.V. All rights reserved.

Case-based reasoning (CBR) could also be regarded as a datamining technique [26]. It was born in 1980 or so [3,31,32,34] andbegan to be used in BFP 13 years ago. The life-cycle of CBR con-sists of four Rs, i.e., case retrieval, solution Reuse, solution Revise,and case library Retain. CBR systems can be applied for a variety ofaccounting tasks [7]. Refs. [8,15,16], respectively, attempted to useCBR for BFP, with the results that CBR is not more applicable thancomparative models. In order to improve predictive performance ofCBR in BFP, Ref. [27] introduced analytic hierarchy process (AHP)into CBR and presented a framework under the mechanism of k-nearest neighbor (KNN). Refs. [47,48] successfully applied CBR forBFP with data collected from Australian firms. The majority of for-mer researches employ the classical CBR algorithm with Euclideanmetric as its heart when dealing with the problem of BFP. Lately,Ref. [35] presented an improved BFP model by employing grey cor-relation metric and weighted KNN. Ref. [18] attempted to introducethe technique of pseudo outranking approaches into the retrievalprocess of CBR for BFP. The drawback of former researches is thatthey only use total predictive accuracy for performance evaluationof CBR.

CBR is similar to human beings’ reasoning process. It can learn
over time, and reason with concepts that have not been fullydefined or modeled. Finally, CBR can provide an explanation ofhow the prediction is produced. These advantages of CBR makeit a chief predictive tool in BFP. The drawback of CBR lies on thecomputational cost needed in the process of case retrieval. How-
dx.doi.org/10.1016/j.asoc.2009.12.005

http://www.sciencedirect.com/science/journal/15684946

www.elsevier.com/locate/asoc

mailto:[email protected]

mailto:[email protected]

dx.doi.org/10.1016/j.asoc.2009.12.005

Compu

enTEcanCpoMv2pofidr

2

mstancrItaimhlmtbbdmtWT

tctapvstabcsif

S

S

H. Li, J. Sun / Applied Soft

ver, the majority of researches in the area of BFP use data set witho more than 1000 samples [17]. It is also the case in BFP in China.he number of listed companies in Shenzhen and Shanghai Stockxchanges is just about 2000. Thus, only limited samples can beollected for BFP of Chinese listed companies. This situation is suit-ble for those techniques that are good for small data sets or toolseeding abundant computational costs. Thus, we concentrate onBR-based BFP in this research and devote to the investigation ofredictive performance of CBR in Chinese BFP from various viewsf sensitivity, specificity, positive and negative predictive values.eanwhile, we also present how data are preprocessed from the

iew of data mining. This paper is organized as follows: Sectiongives a brief description of validation standards on classifiers’

redictive performance. Section 3 describes specific methodologyf using CBR to predict business failure. CBRs respectively derivingrom Euclidean metric, grey coefficient metric and pseudo outrank-ng metric are summarized. Sections 4 and 5 present experimentalesign and Chinese BFP from the view of data mining. Empiricalesults are discussed in Section 6. Section 7 makes conclusion.

. Assessment standards

Total predictive accuracy is a chief standard in assessing perfor-ance of CBR models. However, there are some other assessment

tandards that can be used for model evaluation. There are twoypes of errors in typical classification problems, i.e. Type I errornd Type II error. Type I error refers to predict a company in busi-ess failure as healthy. Type II error is committed when a healthyompany is classified as one in failure. In traditional area of patternecognition, such as: cancer diagnosis and voice identification, Typeerror and Type II error have different costs. It is more cautious

o predict normal cells as cancer one than to predict cancer cellss normal one. When introducing the technique of classificationnto the area of BFP, it is not surely more cautious to predict nor-

al companies as failure one than to predict failure companies asealthy one. Classifying a company in failure as a healthy one may

ead to incorrect decisions and cause serious economic damage. Theisclassification of a healthy one may just cause additional inves-

igations costing more time. However, there are some differencesetween areas of traditional pattern recognition and BFP. Humaneings and their wills are not involved in the former area but itoes in the latter area. A company that is not in serious distressay be forced to run into failure just because human beings take

his company as one in failure and treat it as a company in failure.hatever, it could provide more information when distinguishing

ype I and Type II errors.Besides total predictive accuracy with equal weight of the two

ypes of errors, four statistical indices can be calculated for thease of two financial state labels (failure and healthy): i.e., sensi-ivity (SeD and SeH), specificity (SpD and SpH), positive value (PPVDnd PPVH), and negative value (NPVD and NPVH). They can provideerformance comparison of predictive models from more broadlyiews. True positive (TPi) is the number that companies’ financialtates are correctly predicted; true negative (TNi) is the numberhat companies’ financial states that does not belong to the i classnd are not predicted in the i class; false positive (FPi) is the num-er that companies’ states that are predicted erroneously in the ilass; false negative (FNi) is the number that companies’ financialtates that are predicted in a different class. Consider that there are(i > 1) classes. The four types of statistics can be computed by theollowing formulas [14,22].

ei = TPi

TPi + FNi(1)

pi = TNi

TNi + FPi(2)

ting 11 (2011) 460–467 461

PPVi = TPi

TPi + FPi(3)

NPVi = TNi

TNi + FNi(4)

where SpD = SeH, SpH = SeD, NPVD = PPVH, NPVH = PPVD, the ratio ofType I errors = 1 − SeD, and the ratio of Type II errors = 1 − SpD.

In this research, we attempt to use the four statistical indicesas assessments on predictive performance of CBR-base BFP. Thisoperation can provide a comprehensive understanding on perfor-mance of predictive models. Meanwhile, statistical technique isalso attempted to be employed to test the significant differencebetween each pair of comparative classifiers.

3. Methodology of CBR-based BFP

3.1. Concept model of CBR-based BFP

CBR focuses on how people could solve new problem basedon their past experiences. It arises out of researches on cognitivescience. The idea of CBR is appealing because it is similar to thebehavior of human beings’ problem solving. When employing CBRinto BFP, the function of case library can be implemented by sampledata. Each sample expresses a case. The prediction on a company’sbusiness state can be produced by integrating label values of com-panies that have similar symptoms with the current one. Thus, wegive the definition of concept model of CBR-based BFP as follows.

Definition 1. The concept model of CBR-based BFP could beexpressed as:

The concept model of prediction based on CBR

= case library + similarity measure

+ majority voting. (5)

The four Rs in CBR refer to Retrieval, Reuse, Revise, and Retain.When applying CBR in the area of prediction, the two Rs, i.e.,Retrieval and Reuse, are used. The process of Retrieval is commonlyfounded on case library and similarity measure. When all similarcases are retrieved, the process of Reuse is employed to generatea prediction by integrating labels of similar cases. Commonly, themechanism of majority voting is used.

The task of supervised learning with training data set is to obtainoptimal parameters and structures of various models. The chieftask of supervised learning in CBR is on the base of the scale ofcase library, the mechanism of similarity measure, the integrationmeans on class labels, and optimal parameters. The most commonlyused type of CBR is on the base of distances between cases. Thus,we can give the definition of the concept model of CBR’s supervisedlearning as follows.

Definition 2. The concept model of supervised learning for CBR-based BFP could be expressed as:

The concept model of supervised learning

= scale of case library + type of similarity measure

+ type of majority voting

+ optimal parameter values. (6)

where, distance-based similarity measure includes the traditionalmeasure on the base of Euclidean metric, the measure on thebase of grey coefficient metric, and the measure on the base ofpseudo outranking relations metric. The integrating means of classlabels includes pure majority voting and weighted majority voting.

4 Comp

Ps

mttfpa

3m

dbStf

d

AEct

castc

S

3c

tubcLbat

y

wm�

E

a, b)âb

62 H. Li, J. Sun / Applied Soft

arameters to be optimized are correlated with the mechanism ofimilarity measure.

The concept model of CBR-based BFP consists of predictionodel and supervised learning model. In this research, we attempt

o investigate predictive performance of CBR from views of sensi-ivity, specificity, positive and negative values. The two importantactors in the concept model, i.e., the type of similarity and optimalarameter values, are different among CBRs. And the rest issuesmong all CBRs are to set as the same.

.2. Model I: CBR with similarity measure based on Euclideanetric

The common CBR employs a similarity measure based on featureistance. Suppose the situation with two cases, i.e., case a and case. Let dabk express the kth feature distance between the two cases.uppose that xak and xbk denote values of the kth feature of thewo cases. Thus, feature distance can be computed by the followingormula.

abk = |xak − xbk| (7)

fter feature distance on each feature has been calculated,uclidean distance between the two cases can be computed. Wean transfer distance between the two cases into similarity by sub-ract or divide it with 1. If features have different importance, we

an assume that wk expresses the weight of the kth feature. Afterll data values are scaled into the range of [−1, 1], the calculation ofimilarity between case a and case b can be implemented throughhe following formula. And the CBR on the base of Euclidean metrican be named Model I.

IMab = 1 −

⎡⎢⎢⎢⎢⎣

12

m∑k=1

⎛⎜⎜⎜⎜⎝

wkdabkm∑

k=1

wk

⎞⎟⎟⎟⎟⎠

2⎤⎥⎥⎥⎥⎦

1/2

(8)

.3. Model II: CBR with similarity measure based on greyoefficient metric

Grey coefficient metric is a mechanism from grey theory andhis metric can be calculated with values of maximum feature val-es and minimum feature values. All grey correlation coefficientselong to the range of [0, 1]. After grey coefficient metric has beenalculated, case similarity can be computed by dividing it with 1.et yabk express the grey correlation degree between case a and case. Assume that �(a) expresses the rest case library without case a,nd b∈�(a). The measure of grey correlation coefficient betweenwo cases can be calculated by the following formula [35].

abk =inf

k

∣∣xak − x�(a)k

∣∣ + ˇsupk


∣∣dabk + ˇsup

k


∣∣ (9)

SIMab =

⎧⎪⎨⎪⎩

SIMâb

SIMâb

∏k ∈ {k:d

k(a,b)>SIMk

ab}1 − d

k(

1 − SIM

here inf| xak − x�(a)k| and sup|xak − x�(a)k| respectively express theinimum and maximum distance between case a and all cases in

(a) on the kth feature. ˇ ∈ [0, 1] is a turn parameter.After grey correlation coefficients have been computed, we use

uclidean distance to transfer grey correlation coefficients into

uting 11 (2011) 460–467

similarity. The transformation can be expressed as the followingformula. And the CBR model on the base of grey coefficient metriccan be called Model II.

SIMab = 1 −[

m∑k=1

[wk(1 − yabk)∑m

k=1wk

]2]1/2

(10)

3.4. Model III: CBR with similarity measure based on pseudooutranking relations

Concepts of outranking relations derive from the Europeanschool of multi-criteria decision making [5,6,29,30,40]. Pseudopreference is a function from the outranking approach of ELECTREIII. Ref. [18] presents the use of pseudo outranking relations in CBRto construct a new similarity measure. When we claim that casea and case b are similar, we also mean that case a is indifferent tocase b. Two positive parameters of outranking relations, i.e. q andp (q < p), are introduced as indifference and difference thresholds.They are both in the range of [0, 2] if all feature values are in therange of [−1, 1]. Meanwhile, a veto indicator between case a andcase b on the kth feature, named as d

k(a, b), is defined by intro-

ducing a veto threshold v (v > p). Similarity measure of CBR-basedon pseudo outranking relations can be expressed as the followingformula.

if dkk(a, b) ≤ SIMab for every feature

if dkk(a, b) > SIMab for at least one feature

(11)

where

dk(a, b) =

⎧⎪⎨⎪⎩

0 if dabk ≤ pv − dabk

v − pif p < dabk ≤ v

1 if dabk > v

(12)

SIMâb =

m∑k=1

wkck(a, b)

m∑k=1

wk

(13)

where

ck(a, b) =

⎧⎪⎨⎪⎩

0 if dabk > pp − dabk

p − qif q < dabk ≤ p

1 if dabk ≤ q

(14)

And the CBR model on the base of pseudo outranking relations canbe called Model III.

3.5. Generating a prediction using CBR

The mechanism of distance-based CBR in generating a predic-tion is KNN. Thus, the first issue is to determine how many nearestneighbors from case library should be retrieved and correspondinglabels should be combined to generate the prediction. An effectivemeans to control the number of nearest neighbors is to set up athreshold on similarity. We use relative value of threshold. Let perbe a threshold percentage of similarity of the most similar case tothe current one. Assume that K expresses the number of nearestneighbors, and max (SIMa�(a)) denotes the similarity value of the
most similar case. The concept model determining the value of Kcan be expressed as follows.
K = the number of cases meet this condition : SIMa�(a) ≥ per

× max(SIMa�(a)) (15)

Compu

Tne

L

wsmTsmSt

4

4

toctsaofftbar

mnvtarsBebFdipoe

4

[taotbtukt


hus, we define the subset of �(a) consisting of the K nearesteighbors as �(a − K). The expected label value of case a, which isxpressed by LVa, could be computed by the following formula.

Va = LVb if∑

LV�(a−K)=LVb

SIMa�(a−K) = max (∑

same LV

SIMa�(a−K)) (16)

here LVb denotes the predicted label value of case b. The formerummation on similarity values on the condition of LV�(a−K) = LVbeans value for cases with the same label as case b in �(a − K).

he latter summation of similarity values on the condition of theame LV means value for cases with the same label in �(a − K). Theaximum value is used to produce the predictive label of case a.

imilarity values are used to calculate the weights of various labelso form a weighted KNN.

. Experimental design

.1. Validation techniques

Resubstitution, holdout method, and V-fold cross-validation arehree common techniques used to estimate predictive performancef models [11,42]. The operation of using all data to construct alassifier and then to estimate the performances of the model withhe same data will result in over-fitting. In holdout method, dataet is randomly split into a training set for classifier constructionnd a testing set for performance evaluation. Main disadvantagef this technique is that it reduces the effective sample size. In V-old cross-validation, all data are randomly split into V exclusiveolds. Each time one-fold is held out and the rest is used as theraining set to construct a classifier. Its performance is evaluatedy using the heldout data. This operation is repeated for V times,nd V predictive results can be produced. The mean value of theesults is used to produce the final predictive accuracy.

Leave-one-out cross-validation (LOO-CV) is a specific imple-entation of V-fold cross-validation in the condition of taking the

umber of cases as V. The common argue on why five-fold cross-alidation or 10-fold cross-validation should be employed is thathe computational cost of LOO-CV is unacceptable because therere always large data sets in the area of pattern recognition. As aesult, LOO-CV has low efficiency. Even in this situation, there aretill some researches [2,9,10,39] that focus on LOO-CV. CBR-basedFP is a case with small sample size, especially when the strat-gy of pairing is employed to collect initial data. As we all know,ankruptcy or business failure is a rare event in the population.or example, there are only about 4% of large publicly traded firmseclaring bankruptcy in USA [8]. Thus, the efficiency of LOO-CV

s acceptable in the area of BFP. Another reason why LOO-CV isreferred is that it is unbiased estimate for performance [20]. Thether techniques are not. Thus, we use the technique of LOO-CV tostimate predictive performance of various models.

.2. Model IV: the baseline model of SVM

SVM is an excellent model for BFP. Abundant researches12,13,23,24,33,45] have validated that SVM is more applicablehan NN, MDA, and logit for BFP. The objective of SVM is to findn optimal separating plane. The optimal separating plane can notnly separate training samples with the least error, but also make
he margin width between the two parallel bounding planes get theiggest value. To fulfill this purpose, kernel functions are employedo map data into a high-dimensional space. The three commonlysed kernel functions are polynomial kernel, i.e., (xt + 1)p, Gaussianernel, i.e., exp(−v||x − t||2), and sigmoid kernel, i.e., tan h(kxt + ı). Inhis research, results of SVM-based BFP are utilized as a comparison.
ting 11 (2011) 460–467 463

4.3. Classifiers training and parameters searching

The four models used in this research belong to two differenttypes. The three CBR models belong to lazy learning, in which thereare no specific models generated. The objective of CBR’s learningprocess is to find optimal parameters, which directly determinethe results of case retrieval and case reuse. Taking the validationtechnique of LOO-CV into consideration, one sample is held out andthe rest are used as the case library for CBRs in training process.Assuming the number of available samples is N, this operation isrepeated for N times to find the optimal values of parameters forCBR that produce the best average predictive accuracy. Refer toRefs. [41,44,46,47,48,50,52] for more details about how to train aCBR. For SVM, the training process is to find a specific model. Referto Refs. [43,45,49,51] for more details about how to train a SVM. Weimplemented all the four models on the platform of Matlab. And thetoolkit of LibSVM was used for SVM training and predicting.

Taking CBR-based BFP into consideration, we put equal impor-tance on each feature achieved by data preprocessing. Thus, thereare no parameters to be optimized in CBR with Euclidean metric;there is one parameter, i.e. ˇ, to be optimized in CBR with greycoefficient metric; and there are three parameters, i.e. q, p, andv, to be optimized in CBR with pseudo outranking metric. Mean-while, the number of nearest neighbors, which is determined byper, should be optimized. Taking SVM-based BFP into consideration,we employed Gaussian kernel function as the majority researchesof SVM-based BFP do. There are two parameters, i.e. C and v, tobe optimized. To obtain optimized parameter values, we employedgrid-search technique to search optimal parameters in their candi-date space. Optimal values of ˇ, q, p, v, and per were respectivelysearched in [0.1:0.05:1], [0:0.01:0.06], [0.5:0.01:1], [p:0.01:1], and[0.85:0.01:1]. Optimal values of C and v were both searched in [2−10,2−9,. . ., 29, 210]. Meanwhile, total predictive accuracy was used asthe assessment in parameter optimization.

5. Chinese business failure prediction (BFP)

There are five steps in data mining when using CBR for BFP,i.e., data collection, data preprocessing, constructing the modelof CBR, predictive performance assessment, and prediction. Datacollection is the first step, in which relevant data from publiclyrevealed information of listed companies are collected. Featuresof the data set may include financial ratios or items, class of finan-cial state, and some other essential information features such asoperating strategy to help the company get out of business failure.Data preprocessing consists of data cleaning, data integration, datatransformation, and data reduction. Constructing the model of CBRis to learn classification pattern from preprocessed data and con-struct a CBR model which represents the classification knowledgefor BFP. Performance assessment of the model is to estimate perfor-mance of CBR respectively through training data set and validationdata set. If predictive performance of CBR is acceptable, it can beutilized to predict business failure.

5.1. Data collection

China Securities Supervision and Management Committee willspecially treat Chinese listed companies that have had negativenet profit in two consecutive years. We regard these companies inbusiness failure. And we regard companies that have never been
specially treated as healthy samples. Thirty-five financial ratioswere used to represent cases. They cover activity ratios, long-term debt ratios, short-term debt ratios, profitability ratios, growthratios and structural ratios. We initially collected 135 pairs of fail-ure and healthy companies listed in Shenzhen Stock Exchange and

4 Computing 11 (2011) 460–467

Sipwec

5

cugobd

5

ifincmi

5

ilro

5

TcsoutttnsmAta

5

finv

TU

Table 2Correlation coefficient, tolerance and variance inflation factor of financial ratios.

Variables X1 X2 X3 X4 X5 TOL VIF

X1 1 – – – – 0.94 1.06X2 0.156 1 – – – 0.91 1.10

64 H. Li, J. Sun / Applied Soft

hanghai Stock Exchange. The earlier a company that yields to failnto business failure is predicted, the less cost or loss the com-any would cause to the society, investors, and employees. Thus,e attempt to investigate predictive performance of various mod-

ls 3 years before failure. Suppose that t expresses the year of aompany’s falling into failure. The year we concentrate is t − 3.

.2. Data preprocessing

Data preprocessing is to improve the quality of the data, the effi-iency and ease of the training process. There are four commonlysed techniques of data preprocessing, i.e. data cleaning, data inte-ration, data reduction, and data transformation. Whether the databtained after the former four processes are suitable for BFP shoulde verified. Thus, the process named data verification is added toata preprocessing.

.2.1. Data cleaningData cleaning refers to fill missing values, smooth noisy data,

dentify outliers, and resolve inconsistencies. In this process, werst filtered out duplicated data, then eliminated sample compa-ies that miss at least one financial ratio value, and finally excludedompanies with financial ratios deviating from the mean value asuch as three times of S.D. The final number of sample companies

s 182.

.2.2. Data integrationThis process merges the following three types of data for BFP

nto a data cube, i.e., business symptoms of listed companies, classabels whether the company runs into business failure or not, andelated operating strategies to help the company in failure to getut.

.2.3. Data reductionThis process produces a reduced representation of the data set.

he new data representation is much smaller in volume, but itan produce better predictive performance. The strategy of dimen-ion reduction realized by removing irrelevant, weakly relevantr redundant features through stepwise discriminant analysis wassed in this research for data reduction. There are some researcheshat use feature subsets picked out by domain experts. However,his type of process is a time-consuming task and it is also difficulto carry out effectively, because the behavior of the data for BFP isot well known. The procedure of stepwise discriminant analysistarted with the full or empty set of 35 features. Performance ofultivariate discriminant function was employed as assessment.t each step, it removed the worst feature or added the best fea-

ure remaining in the initial set. Finally, we got five features. Theyre shown in Table 1.

.2.4. Data verification
Whether or not the data set after data preprocessing is suitable
or BFP should be verified. The two techniques for data verificationnclude the test of multi-collinearity among feature values and sig-ificance test on difference between samples. Tolerance (TOL) andariance inflation factor (VIF) were used to carry out verification of

able 1seful feature subset.

Data sample Variables Meaning

t − 3 X1 Current asset turnoverX2 Fixed asset turnoverX3 The ratio of cash to current liabilityX4 Asset-liability ratioX5 The proportion of current liability

X3 0.186 −0.087 1 – – 0.85 1.18X4 0.067 0.235 −0.388 1 – 0.75 1.33X5 −0.095 −0.021 −0.136 −0.170 1 0.92 1.07

multi-collinearity among the five features. Statistics of useful datafor BFP is shown in Table 2.

It is commonly assumed that there are multi-collinearity amongfeatures in the condition of TOL < 0.1 or VIF > 10. Hence, we canfind that there are no multi-collinearities among the five features,because all TOL values are lower than 0.1 and all VIF values amongthem are no more than 2.

The technique of t statistic is commonly used in the test of signif-icance on differences between samples in failure and health. Thereis limited information contained for BFP in the data set 3 yearsbefore the company is specially treated. Thus, we make the assump-tion that there are significant difference between samples in failureand those in health if the significance level of Sig. is no more than0.1. Statistic characteristics of the data are shown in Table 3, fromwhich we can find that the assumption is accepted at the significantlevel of 0.1.

5.2.5. Data transformationThe three CBR models, i.e. the classical CBR-based on Euclidean

metric, the grey CBR-based on grey coefficient metric, and thepseudo CBR-based on pseudo outranking relations, all belong todistance-based method. In this process, we attempt to keep orig-inal distribution of the data set and to normalize data in order toprevent features with initially large range from outweighing thosewith initially smaller ranges. To fulfill this purpose, we scaled thedata to a specific range of [−1, 1] by maximum normalization. Sup-pose maxk x′

ikis the maximum value of a feature Fk, and x′

ikdenotes

the value of case ci at feature Fk. Max normalization maps x′ik

to xikin the range of [−1, 1] by computing

xik = x′ik

maxk|x′ik

| (17)

If all data are ready, CBRs and SVM can be used for Chinese BFP.

6. Results and discussions

In this section, predictive abilities of the three CBRs and the com-parative model of SVM are compared. Table 4 lists total predictiveaccuracy and ratios of Type I error and Type II error of each model.Table 5 describes Se, Sp, PPV, and NPV obtained with the four differ-ent models, and their graphical representations are illustrated asFig. 1.

From Table 4 we can find that Model III achieved higher totalpredictive accuracy than Model I, Model II, and Model IV by 5.5%,4.9%, and 0.5%, respectively. As Table 5 shows, Model III achievedhigher NPVD than Model I, Model II, and Model IV by 8.48%, 3.03%,and 3.83%. Model III also achieved higher SeD than Model I, Model II,and Model IV by 7.61%, 1.09%, and 4.35%. Model III achieved higherSpD than Model I and Model II by 3.33% and 8.89%, and achievedlower SpD than Model IV by 3.33%. And Model II achieved higherPPVD than Model I and Model II by 3.8% and 5.32%, and produced
lower PPVD than Model IV by 1.18%. We can find that all of the fourmodels achieved a lower ratio of Type I error than Type II error. Thisresult may be resulted from characteristics of the data set and thesearching method for parameter optimization. Model III achievedthe highest hit ratios of total accuracy, NPVD, and SeD, and pro-

H. Li, J. Sun / Applied Soft Computing 11 (2011) 460–467 465

Table 3t-Values and statistics for input variables.

Variables Mean (S.D.) of companies in health Mean (S.D.) of companies in failure t-Value Sig. (two-tailed)

X1 0.5990 (0.7774) 1.1298 (0.4063) 5.676 0.000***

X2 2.4380 (7.5598) 3.9247 (3.6994) 1.670 0.095*

X3 0.0220 (0.3048) 0.2060 (0.1735) 4.922 0.000***

X4 0.4803 (0.1439) 0.3921 (0.1512) −3.965 0.000***

X5 0.9231 (0.1481) 0.8828 (0.1081) −1.690 0.091*

* Significant at the level of 10%.*** Significant at the level of 1%.

Table 4Performances of the four different models.

Model I Model II Model III Model IV

Mean accuracy (%) 73.6 74.2 79.1 78.6Ratio of Type I errors (%) 18.48 11.96 10.87 15.22Ratio of Type II errors (%) 34.44 40.00 31.11 27.78

Table 5Se, Sp, PPV, and NPV obtained with the four different models.

Models Statistical indices Failure Healthy

Model I NPV (%) 77.63 70.75Se (%) 81.52 65.56Sp (%) 65.56 81.52PPV (%) 70.75 77.63

Model II NPV (%) 83.08 69.23Se (%) 88.04 60.00Sp (%) 60.00 88.04PPV (%) 69.23 83.08

Model III NPV (%) 86.11 74.55Se (%) 89.13 68.89Sp (%) 68.89 89.13PPV (%) 74.55 86.11

Model IV NPV (%) 82.28 75.73

dpII

Mtbimi

Table 6p-Values of the four models by total accuracy.


Model I 1 1.000 0.021** 0.093*

Model II – 1 0.078* 0.185Model III – – 1 1.000Model IV – – – 1

* Significant at the level of 10%.** Significant at the level of 5%.

Table 7p-Values of the four models by SeD.


Model I 1 0.180 0.039** 0.607Model II – 1 1.000 0.581

Se (%) 84.78 72.22Sp (%) 72.22 84.78PPV (%) 75.73 82.28

uced the second highest ratios of SpD and PPVD. Model III alsoroduced lower hit ratios on both types of error rates than Modeland Model II. If Type I error is considered more important, ModelII outperforms Model IV by 4.35%.

The McNemar test was utilized to examine whether or notodel III has superior predictive ability statistically than the other

hree models. The three standards of total accuracy, Se, and Sp coulde analyzed statistically because numbers of total samples, samples

n failure, and samples healthy are constant values in the experi-ent. Hence, statistical analysis could be carried out by frequency

n LOO-CV. Denominators of PPV and NPV, i.e. TP + FP and TN + FN,

Fig. 1. Statistical indices estimated by the four models for BFP.

Model III – – 1 0.344Model IV – – – 1

** Significant at the level of 5%.

are variables determined by predictive results. As a result, statis-tical analysis on them cannot be realized by frequency. Thus, theformer three standards were used for statistical analysis. Table 6shows the results of significance test on the base of total accuracy.Table 7 lists results of significance test on the base of Type I error(SeD), and Table 8 lists results of significance test on the base ofType II error (SpD).

As Table 6 shows, Model III is statistically better than Model Iand Model II at the significant level of 5% and 10%, respectively.And Model III is at least as good as Model IV in statistic by totalpredictive ability. Whereas, Model I and Model II achieved almostthe same predictive performance in statistic. Model IV outperformsModel I at the significant level of 10%, and Model IV does not statis-tically outperform Model II significantly. From Table 7 we can findthat Model III outperforms Model I at the significant level of 5%,and Model III is at least as good as Model II by Type I errors. ThoughModel II and Model III outperform Model IV, yet there is no signifi-cance in statistic. From Table 8 we can find that Model III achievedbetter results than Model II by Type II errors at the significant levelof 10%. At the same time, Model IV outperforms model I by TypeII errors at the significant level of 10% and outperforms Model II at
the significant level of 1%.
Statistic characteristics on whether or not Model III outperformsthe other three models by the three standards of total predictiveaccuracy, Type I error and Type II error are listed in Table 9.

Table 8p-Values of the four models by SpD.


Model I 1 0.227 0.453 0.070*

Model II – 1 0.057* 0.007***

Model III – – 1 0.375Model IV – – – 1

* Significant at the level of 10%.*** Significant at the level of 1%.

466 H. Li, J. Sun / Applied Soft Computing 11 (2011) 460–467

Table 9Statistic characteristics of the fact that Model III outperforms other models.

Total accuracy SeD SpD

Model I Model III is better at the significant level of 5%. Model III is better at the significant level of 5%. Model III is marginally better.leastargina

rMrt

7

ovubBdmdtspocrsfaat

acigaf

A

eNge

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

Model II Model III is better at the significant level of 10%. Model III is atModel IV Model III is at least as good as it. Model III is m

Finally, we find from Table 9 that Model III achieved betteresults than Model I and Model II significantly on the whole, andodel II outperforms Model IV marginally by total predictive accu-

acy and Type I errors, while Model III is not significantly worsehan Model IV by Type II errors.

. Conclusion

The conclusion of this study is that the pseudo CBR on the basef pseudo outranking relations can achieve acceptable results fromarious views of sensitivity, specificity, positive and negative val-es. Predictive performances of CBR system before 3 years haveeen improved significantly by using pseudo outranking relations.FP is presented from the view of data mining, i.e., data collection,ata preprocessing, model construction, and performance assess-ent. In the process of data processing, we add the process of

ata verification to carry out statistical analysis on whether or nothe useful data are suitable for data mining. LOO-CV and the fourtatistic indices, i.e. sensitivity, specificity, positive and negativeredictive values, are employed to assess predictive performancef various CBR models. From results of the experiment, we canonclude that the pseudo CBR on the base of pseudo outrankingelations offers a viable approach for Chinese BFP. Empirical resultshow that pseudo CBR significantly offers better predictive per-ormance than the classical CBR on the base of Euclidean metricnd the grey CBR on the base of grey coefficient metric. Pseudo CBRlso outperforms SVM by total predictive accuracy and Type I error,hough it is not significant in statistic.

This research also has some limitations. Conclusions drawn outre on the foundation of the experiment. All discussions should beonsidered under the environment of assessment standards, exper-mental design, data used and data preprocessing. Of course, theeneralization of pseudo CBR should be tested further more bypplying it into some other problems or using data for businessailure from other counties.

cknowledgements

This research is partially supported by the National Natural Sci-nce Foundation of China (# 70801055), and the Zhejiang Provincialatural Science Foundation of China (# Y6090392). The authorsratefully thank anonymous referees for their useful comments andditors for their work.

eferences

[1] E.I. Altman, Financial ratios discriminant analysis and the prediction of corpo-rate bankruptcy, Journal of Finance 23 (4) (1968) 589–609.

[2] S. An, W. Liu, S. Venkatesh, Fast cross-validation algorithms for least squaressupport vector machine and kernel ridge regression, Pattern Recognition 40(2007) 2154–2162.

[3] R. Barletta, An introduction to case-based reasoning, AI Expert 8 (1991) 42–49.[4] W. Beaver, Financial ratios as predictors of failure, Journal of Accounting

Research 4 (1966) 71–111.[5] J.P. Brans, Ph. Vincke, A preference ranking organization method: the

promethee method for multiple criteria decision making, Management Science31 (6) (1985) 647–656.

[6] D. Bouyssou, M. Pirlot, A characterization of concordance relations, EuropeanJournal of Operational Research 167 (2005) 427–443.

[

[

as good as it. Model III is better at the significant level of 10%.lly better. Model III is marginally worse.

[7] C.E. Brown, U. Gupta, Applying case-based reasoning to the accountingdomain, Intelligent Systems in Accounting, Finance and Management 3 (1994)205–221.

[8] S.M. Bryant, A case-based reasoning approach to bankruptcy prediction mod-eling, Intelligent Systems in Accounting, Finance and Management 6 (1997)195–214.

[9] G. Gawley, N. Talbot, Efficient leave-one-out cross-validation of kernel Fisherdiscriminant classifiers, Pattern Recognition 36 (2003) 2585–2592.

10] G. Gawley, N. Talbot, Fast exact leave-one-out cross-validation of sparseleast-squares support vector machines, Neural Networks 17 (10) (2004)1467–1475.

11] J.-W. Han, M. Kamber, Data Mining Concepts and Techniques, Morgan Kauf-mann Publishers Inc., San Mateo, 2001.

12] Z.-S. Hua, Y. Wang, X.-Y. Xu, et al., Predicting corporate financial distress basedon integration of support vector machine and logistic regression, Expert Sys-tems with Applications 33 (2) (2007) 434–440.

13] X.-F. Hui, J. Sun, An application of support vector machine to companies’ finan-cial distress prediction, Lecture Notes in Artificial Intelligence 3885 (2006)274–282.

14] I. Jekova, G. Bortolan, I. Christov, Assessment and comparison of different meth-ods for heartbeat classification, Medical Engineering and Physics 30 (2) (2008)248–257.

15] H. Jo, I. Han, Integration of case-based forecasting, neural network, and discrim-inant analysis for bankruptcy prediction, Expert Systems with Applications 11(4) (1996) 415–422.

16] H Jo, I. Han, H. Lee, Bankruptcy prediction using case-based reasoning, neuralnetwork and discriminant analysis for bankruptcy prediction, Expert Systemswith Applications 13 (2) (1997) 97–108.

17] P.R. Kumar, V. Ravi, Bankruptcy prediction in banks and firms via statistical andintelligent techniques—a review, European Journal of Operational Research 180(4) (2007) 1–28.

18] H. Li, J. Sun, Hybridizing principles of the Electre method with case-based rea-soning for data mining, European Journal of Operational Research 197 (2009)214–224.

19] F.-Y. Lin, S. McClean, A data mining approach to the prediction of corporatefailure, Knowledge-Based Systems 14 (3–4) (2001) 189–195.

20] A. Lunts, V. Brailovskiy, Evaluation of attributes obtained in statistical decisionrules, Engineering Cybernetics 3 (1967) 98–109.

21] D. Martin, Early warning of bank failure: a Logit regression approach, Journalof Banking and Finance 1 (1977) 249–276.

22] J. Michaelis, S. Wellek, J.L. Willems, References standards for software evalua-tion, Methods of Information in Medicine 29 (1990) 289–297.

23] J.-H. Min, Y.-C. Lee, Bankruptcy prediction using support vector machine withoptimal choice of kernel function parameters, Expert Systems with Applica-tions 28 (2005) 603–614.

24] S.-H. Min, J.-M. Lee, I. Han, Hybrid genetic algorithms and support vectormachines for bankruptcy prediction, Expert Systems with Applications 31(2006) 652–660.

25] M. Odom, R. Sharda, A neural networks model for bankruptcy prediction, in:Proceedings of International Joint Conference on Neural Networks, San Diego,CA, 1990, pp. 163–168.

26] S.K. Pal, S. Shiu, Foundations of Soft Case-Based Reasoning, Wiley, New Jersey,2004.

27] C.-S. Park, I. Han, A case-based reasoning with the feature weights derivedby analytic hierarchy process for bankruptcy prediction, Expert Systems withApplications 23 (3) (2002) 255–264.

28] V. Ravi, H. Kurniawan, P. Thai, et al., Soft computing system for bank perfor-mance prediction, Applied Soft Computing 8 (2008) 305–315.

29] B. Roy, Problems and methods with multiple objective functions, MathematicalProgramming 1 (1971) 239–266.

30] B. Roy, The outranking approach and the foundations of ELECTRE methods,Theory and Decision 31 (1) (1991) 49–73.

31] R. Schank, Dynamic, Memory: A Theory of Learning in Computers and People,Cambridge University Press, New York, 1982.

32] R. Schank, R.P. Abelson, Scripts, Plans, Goals and Understanding, Erlbaum, Hills-dale, NJ, 1977.

33] K.-S. Shin, T.-S. Lee, H.-J. Kim, An application of support vector machines inbankruptcy prediction model, Expert Systems with Applications 28 (1) (2005)
127–135.
34] S.C.K. Shiu, S.K. Pal, Case-based reasoning: concepts, features and soft comput-ing, Applied Intelligence 21 (2004) 233–238.

35] J Sun, X.-F. Hui, Financial distress prediction based on similarity weighted vot-ing CBR, in: X. Li, R. Zaiane, Z. Li (Eds.), Advanced Data Mining and Applications,Springer-Verlag, Berlin, 2006, pp. 947–958.

Compu

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[


36] J. Sun, H. Li, Data mining method for listed companies’ financial distress pre-diction, Knowledge-Based Systems 21 (2008) 1–5.

37] J. Sun, H. Li, Financial distress early warning based on group decision making,Computers & Operations Research 36 (3) (2009) 885–906.

38] J. Sun, H. Li, Listed companies’ financial distress prediction based on weightedmajority voting combination of multiple classifiers, Expert Systems with Appli-cations 35 (3) (2008) 818–827.

39] V. Vapnik, O. Chapelle, Bounds on error expectation for support vectormachines, Neural Computation 12 (9) (2000) 2013–2036.

40] B. Roy, R. Slowinski, Handling effects of reinforced preference and counterveto in credibility of outranking, European Journal of Operational Research 188(2008) 185–190.

41] H. Wang, C. Chiu, Y. Juan, Decision support model based on case-based reasoning approach for estimating the restoration budget ofhistorical buildings, Expert Systems with Applications 35 (4) (2008)1601–1610.

42] S.M. Weiss, I. Kapouleas, An empirical comparison of pattern recognition, neu-ral nets, and machine learning classification methods, in: Proceedings 1lthInternational Joint Conference on Artificial Intelligence, Detroit, MI, 1989, pp.781–785.

43] W. Wong, F. Shih, J. Liu, Shape-based image retrieval using support vectormachines. Fourier descriptors and self-organizing maps, Information Sciences177 (8) (2007) 1878–1891.

44] D.-S. Wu, T. Liang, Zero anaphora resolution by case-based reasoning andpattern conceptualization, Expert Systems with Applications 36 (4) (2009)7544–7551.

45] C.-H. Wu, G.-H. Tzeng, Y.-J. Goo, et al., A real-valued genetic algorithm to opti-mize the parameters of support vector machine for predicting bankruptcy,Expert Systems with Applications 32 (2007) 397–408.

46] H. Yang, C. Wang, Two stages of case-based reasoning–Integrating genetic algo-rithm with data mining mechanism, Expert Systems with Applications 35 (1–2)(2008) 262–272.

ting 11 (2011) 460–467 467

47] A.Y.N. Yip, H. Deng, A case-based reasoning approach to business failureprediction, in: V. Palade, R.J. Howlett, L.C. Jain (Eds.), Knowledge-Based Intelli-gent Information and Engineering Systems, Springer-Verlag, Berlin, 2003, pp.1075–1080.

48] A.Y.N. Yip, Predicting business failure with a case-based reasoning approach,in: M. Negoita, R. Howlett, L. Jain, et al. (Eds.), Knowledge-Based Intelli-gent Information and Engineering Systems, Springer-Verlag, Berlin, 2004,pp. 665–671.

49] W. Yu, X. Li, On-line fuzzy modeling via clustering and support vector machines,Information Sciences 178 (22) (2008) 4264–4279.

50] F.-C. Yuan, C. Chiu, A hierarchical design of case-based reasoning in the bal-anced scorecard application, Expert Systems with Applications 36 (1) (2009)333–342.

51] J. Zhang, Y. Wang, A rough margin based support vector machine, InformationSciences 178 (9) (2008) 2204–2214.

52] Z. Zhuang, L. Churilov, F. Burstein, et al., Combining data mining and case-based reasoning for intelligent decision support for pathology ordering bygeneral practitioners, European Journal of Operational Research 195 (3) (2009)662–675.

Hui LI, who received his Ph.D. from Harbin Institute of Technology in China, is ayoung member of World Federation on Soft Computing and a member of Associationfor Information Systems. He is an associate professor of Zhejiang Normal Univer-sity in China. He has nearly fifty papers published or accepted in some significantjournals and conferences, e.g., European Journal of Operational Research, Informa-tion Sciences, Expert Systems with Applications, Computers and Operations Research,
Knowledge-Based Systems, Journal of Forecasting, Applied Soft Computing, and Intel-ligent Data Engineering and Automated Learning. He has also led several nationalfunded research projects, e.g., the National Natural Science Foundation of China andthe Zhejiang Provincial Natural Science Foundation of China. In what he is interestedrecently includes: business failure prediction, applied soft computing, case-basedreasoning, and business intelligence.

on performance of case-based reasoning in chinese business failure prediction from sensitivity,...

Documents