Download - FA Competition (Final)
-
8/13/2019 FA Competition (Final)
1/15
NUS FINANCIAL ANALYTICS COMPETITON:
BANKRUPTCY PREDICTION OF FIRMS IN
CHINA, HONG KONG AND SINGAPORE
USING MACHINE-LEARNING APPROACHES
Submitted By:
Choy Pui Yee (A0119666) | Ho Jun Hao (A0028383) | Lee Seng Yin, Daniel (A0040255) |
Tay Yuzhong, Zeldon (A0119093)
-
8/13/2019 FA Competition (Final)
2/15
ABSTRACT
For many corporations, assessing the creditworthiness of investment targets is vital to investment
decisions. Data mining and machine learning techniques have been known to be applicable in solving
bankruptcy prediction and credit scoring problems. However, many default prediction models may
have drawn on studies of empirical data extracted from mature, Western markets. In this report, our
team has adopted an experimental approach to seeking data mining techniques that might be more
suitable for profiling Asian firms. The six classification approaches used are: ID3 Decision Tree (J48),
Random Forest, Random Tree, Logistic Regression, Support Vector Machines and Neural Networks.
From the experiments, the team found that Decision Tree models generate the highest expected
returns for all countries, while Logistic Regression, SVM and Neural Network models produced lower
Type 2 errors.In addition, the most-recent-year core ratios are sufficient to predict bankruptcy. With
these findings, the team recommends that financial institutions assess the risk of extending loans to
firms separately, based on their country. Our team recognises our experiments to be preliminary to
developing a full system for the Asian markets default prediction. Further studies can be extendedto our study in future in order to better yield a complete model.
-
8/13/2019 FA Competition (Final)
3/15
CONTENTS
Abstract ................................................................................................................................................... 0
1 Introduction .................................................................................................................................... 1
2 Literature Review ............................................................................................................................ 1
2.1 Bankruptcy Prediction ............................................................................................................. 1
2.2 Classification Models for Bankruptcy Prediction .................................................................... 2
2.2.1 ID3 Decision Tree (J48) .................................................................................................... 2
2.2.2 Random Forest ................................................................................................................ 3
2.2.3 Random Tree ................................................................................................................... 3
2.2.4 Logistics Regression ........................................................................................................ 3
2.2.5 Support Vector Machines ............................................................................................... 3
2.2.6 Neural Networks (NN) ..................................................................................................... 3
3 Data Preparation ............................................................................................................................. 4
4 Experimental Design ....................................................................................................................... 6
5 Evaluation of Classification Models ................................................................................................ 7
5.1 Accuracy .................................................................................................................................. 7
5.2 Expected Returns .................................................................................................................... 7
6 Results and Discussion .................................................................................................................... 7
6.1 China ....................................................................................................................................... 7
6.2 Hong Kong ............................................................................................................................... 8
6.3 Singapore ................................................................................................................................ 8
6.4 Combination of China, Hong Kong and Singapore .................................................................. 9
6.5 Key Observations from China, Hong Kong and Singapore ...................................................... 9
7 Recommendations ........................................................................................................................ 10
8 Conclusion ..................................................................................................................................... 10
9 References ..................................................................................................................................... 11
-
8/13/2019 FA Competition (Final)
4/15
1
1 INTRODUCTION
Understanding default likelihood is critical to credit risk management, macro policy making and
financial regulation. Data mining techniques have been developed very early on in the field of
bankruptcy and default prediction. These machine learning techniques have the ability to read huge
amounts of data, while filtering out redundant or irrelevant information and identifying correlated
attributes, which describe the characteristics of likely defaulting firms [1]. These are then used to
build models to evaluate whether corporations face financial distress. For financial institutions, the
models act as early warning systems as well as decision making tools for evaluation of candidate
firms for collaboration or investment. Such decisions have to take into account the opportunity cost
and the risk of failures [2]. However, many studies on default prediction models either looked at, or
incorporated empirical data, extracted from mature, Western markets as part of their sample data.
In this century, with many emerging markets and developing firms existing and centering their
activities on the commodity-rich and emerging Asia region, there may be hidden combinations of
characteristics, that defaulting Asian firms share. These factors may be different from their Westerncounterparts, and can make default prediction more effective. The data mining algorithms
commonly used for profiling firms from a global samples perspective might be made more effective
if a solely tailored regional sample is used for modelling.
The Credit Research Initiative (CRI) is a non-profit undertaking by the Risk Management Institute at
the National University of Singapore (NUS). It seeks to promote research and developments in the
critical area of credit risk. As such, it welcomes suggestions and improvements to this area of
research and grants the public access to its database of firm specific data which covers over 60,400
listed firms around the world for the purpose of such work. Its foundation is the probability ofdefault (PD) model developed from its extensive database. It continually calibrates its model and has
on-going work in identifying common company-specific attributes that are more indicative of
defaults in emerging markets. [3]
Our team seeks to discover data mining techniques that may be more effective in classifying Asian
defaulting firms, by considering 6 different data mining techniques, on sample data of firms listed in
3 Asian financial bourses taken from the RMI CRIs database. Prediction accuracy will be used to
evaluate each techniques performance.
2 LITERATURE REVIEW
2.1 BANKRUPTCY PREDICTION
Sometimes a distressed firm can continue to operate in that condition for a prolonged period of
years. Other times, firms enter bankruptcy immediately after a highly distressing event, such as a
major fraud. This seemingly disordered chance of default can be correlated to a combination of firm-
specific and external economic factors. Lensberg et al.[4] has investigated much related work and
categorized various factors affecting bankruptcy potentially.
-
8/13/2019 FA Competition (Final)
5/15
2
In the past, Beaver[5,6]used financial ratios as the input variables of linear regression models for
firm bankruptcy classification. Altman[7] was one of the first to identify the classical multivariate
discriminate analysis technique. On the other hand, many recent studies focus on using data mining
techniques[8] for bankruptcy prediction. Other groups of researchers showed that data mining
models which require lesser knowledge of financial knowledge, such as neural networks, outperform
statistical approaches such as logistic regression, linear discriminate analysis, and multiple
discriminate analysis, that rely on financial ratios and statistical rules[912, 30].
Table 1 below shows a comparison of related studies published from 2001 to 2007 to examine which
models they built. Many of the studies emphasized on designing more sophisticated classifiers.
Table 1. Summary of related studies
Author(s) Feature Selection Prediction Models
Atiya (2001) Yes Neural Networks
Lee et al. (2002) No Discriminant Analysis & Neural
Network
Malhortra and Malhorta No Fuzzy Logic & Neural Networks
McKee and Lensberg (2002) No Genetic Algorithms
Shin and Lee (2002) Yes Genetic Algorithms
Kim and Han (2003) No Genetic Algorithms
Huang et al. (2004) Yes SVM
Canbas et al. (2005) Yes Discriminant Analysis & Logistic
Regression
Lee et al. (2005) No Self-organizing Maps
Min and Lee (2005) Yes Support Vector Machines
Ong et al. (2005) No Neural Networks &
Discriminant Analysis
Shin et al. (2005) Yes Support Vector Machines
Gestel et al. (2006) No Support Vector Machines
Huysmans et al. (2006) No Self-organizing Maps & Neutral
Networks
Lensberg et al. (2006) No Genetic Algorithms
Min et al. (2006) No Genetic Algorithms & Support
Vector Machines
Tsakonas et al. (2006) No Neural Logics Networks &
Genetic AlgorithmsWu et al. (2007) No Genetic Algorithms & Support
Vector Machines
Tsai and Wu (2008) No Neural Networks
2.2 CLASSIFICATION MODELS FOR BANKRUPTCY PREDICTION
2.2.1 ID3 Decision Tree (J48)
Instead of generating a decision rule in the form of a discriminant function, the ID3 algorithm
produces a decision tree that classifies the training sample by using the entropy measure
[13,14]. This is an inductive machine learning method and has been applied to many business
classification problems today, including credit scoring [15], corporate failures prediction [16],
-
8/13/2019 FA Competition (Final)
6/15
3
stock portfolio construction [17], stock market behaviour prediction [18], and bankruptcy risk
prediction *19+. For our project, our team used the open source machine learning program Weka
3.6, which adopts the use of the J48 decision tree algorithm, which is an extension of the ID3
algorithm developed by Ross Quinlan.
2.2.2 Random ForestRandom Forests (RF) are classification algorithms developed by Breiman [20]. They use an ensemble
of classification trees [8, 21] in the construction of the model. The concept of RF is to combine many
binary decision trees learnt using several bootstrap samples coming from the main sample, then
choosing randomly at each node, a further subset of explanatory variables. RFs rank variables by a
variable importance index [19], and so can suggest the significance of a variable based on the
classification accuracy, while considering the interaction among variables. The algorithm estimates
the importance of a variable by looking at how much prediction error increases when data not
present in the bootstrap sample for that variable is permuted while all others are held unchanged.
The necessary calculations are carried out tree by tree as the RF is constructed. Typically, the rank
order of the importance score is reported [23].
2.2.3 Random Tree
The Random Tree algorithm constructs a decision tree that considers K randomly chosen attributes
at each node. It performs no pruning and has the ability to allow estimation of class probabilities
based on a hold-out set. The Random Tree is usually used as a 'building block' with RFs, with many
Random Trees coming together to make an RF as mentioned above. Generally, the Random Tree on
its own tends to be too weak and requires an ensemble of algorithms to make it strong enough.
However, not much research has used it as a standalone technique to evaluate bankruptcy data but
our team decided to test it because of its simplicity.2.2.4
Logistics Regression
Logistic Regression has been the classic answer to many credit default analysis problems for many
years. Ohlson [24] was the first to apply the Multiple Logistic Regression Analysis (Logit) to
the failure prediction study while claiming that the model was superior to MDA due to lesser
limitations in statistical normality. He successfully developed the model with nine predictors (7
financial ratios and 2 categorical variables) and many research built on his study by using Logit
analysis instead of MDA [2427]
2.2.5 Support Vector Machines
Support Vector Machines (SVM) are derived from statistical learning theories and follow structural
risk minimization principles [28, 29]. The basic idea of SVM is to define a hyperplane that
geometrically separates binary classes in high dimension spaces. The optimal hyperplane is obtained
by maximizing the margin between the data points of the two classes whereby a structural risk
minimum is achieved [31, 32].
2.2.6 Neural Networks (NN)
Many studies on bankruptcy prediction using the non-linear NNs have been around since 1990, and
are still active now. NNs have generally outperformed the other existing methods. Currently, several
of the major commercial loan default prediction products are based on NNs. For example, Moodys
Public Firm Risk Model [33] is based on NNs as the main technology. Many banks have also
developed and are using proprietary NN default prediction models.
-
8/13/2019 FA Competition (Final)
7/15
4
3 DATA PREPARATION
The team has opted to use Core Ratios (CR) as the financial data to build the classifier model. With
CR data, meaningful comparisons can be made on companies of different sizes. Of the available CR
data, only the most updated annual reports data are selected. In addition, only CR attributes that are
more than 90% complete in all three countries studied are selected. The 45 CR attributes used are
listed in Table 2. As the classifier accuracy may be adversely affected by missing values, the missing
values are replaced with the country mean.
Table 2. List of selected CR attributes
CR Attributes 1 to 15 CR Attributes 16 to 30 CR Attributes 31 to 45
SALES_GROWTH PRETAX_MARGIN TOT_DEBT_TO_COM_EQY
ASSET_GROWTH TRAIL_12M_SALES_PER_SH TOT_DEBT_TO_TOT_EQY
ASSET_TURNOVER TRAIL_12M_EPS_BEF_XO_ITE
M
TOT_DEBT_TO_TOT_ASSET
OPER_MARGIN CASH_AND_ST_INVESTMENTS NET_DEBT_TO_SHRHLDR_EQTYPRETAX_INC_TO_NET_SALES TRAIL_12M_NET_SALES SHORT_AND_LONG_TERM_DEB
T
PROF_MARGIN TRAIL_12M_OPER_INC NET_DEBT
RETURN_ON_ASSET TRAIL_12M_EPS NET_CHNG_ST_DEBT
TAX_BURDEN TRAIL_12M_CASH_FROM_OPE
R
NET_CHANGE_LIABILITIES
FNCL_LVRG TRAIL_12M_NET_INC INCR_IN_LIAB_PCT_OF_TOT
REVENUE_PER_SH TOT_COMMON_EQY NET_CHANGE_TOTAL_EQUITY
OPER_INC_PER_SH TOT_DEBT_TO_TOT_CAP INCR_IN_EQY_PCT_OF_TOT
PRETAX_INC_PER_SH ASSET_TO_EQY COM_EQY_TO_TOT_ASSETCONT_INC_PER_SH LT_DEBT_TO_COM_EQY CASH_TO_TOT_ASSET
CASH_ST_INVESTMENTS_PER_S
H
LT_DEBT_TO_TOT_CAP ACCT_RCV_TO_TOT_ASSET
BOOK_VAL_PER_SH LT_DEBT_TO_TOT_ASSET TRAIL_12M_INC_BEF_XO_ITEM
For each instance of the CR data, five default indicators are appended for companies. The indicators
are numbered 1 to 5. The number represents the years of consideration from the latest credit event.
A CR instance will be assigned a True indicator if, it recorded a credit event listed in Table 3 and
falls within the period of consideration. For example, if a company recorded a credit event listed in
Table 3 and has annual report filings from FY 2000 to 2004, the indicators appended to the CR datawill be as follows:
Figure 1. Illustration of class indicators
FY YEAR B1 B2 B3 B4 B5
2000 FALSE FALSE FALSE FALSE TRUE
2001 FALSE FALSE FALSE TRUE TRUE
2002 FALSE FALSE TRUE TRUE TRUE
2003 FALSE TRUE TRUE TRUE TRUE
2004 TRUE TRUE TRUE TRUE TRUE
-
8/13/2019 FA Competition (Final)
8/15
5
Table 3. Credit events that are classified as default events
Action Type Subcategory
Delisting Filing Type: Administration
Delisting Filing Type: Canadian CCAA
Delisting Filing Type: Chapter 11
Delisting Filing Type: Judicial Management
Delisting Filing Type: Liquidation
Delisting Filing Type: Protection
Delisting Filing Type: Receivership
Delisting Filing Type: Reorganization
Delisting Filing Type: Restructuring
Delisting Filing Type: Unknown
Delisting Filing Type: Winding Up
Bankruptcy Filing Reason: Bankruptcy
Bankruptcy Filing Reason: Coupon & principal paymentBankruptcy Filing Reason: Coupon payment only
Bankruptcy Filing Reason: Debt Restructuring
Bankruptcy Filing Reason: Interest payment
Bankruptcy Filing Reason: Loan payment
Bankruptcy Filing Reason: Principal payment
Bankruptcy Filing Reason: Unknown
Bankruptcy Filing Reason for delisting: Bankruptcy
Noting that some industries may inherently be more risky than others, an attribute to account for
the industry is appended to the data. BICS sector attribute from the company information, which
indicates the industry that the company is functioning in, is selected. The structure of the final data
is shown below in Table 4.
Table 4. Structure of data for model building
Data Source: Company
Information
Fundamentals CR Credit Events
Selected
Attributes
BICS_SECTOR 45 CR Attributes (see Table 4) Action type, subcategory
and date
Data
preparation
conducted
1. Selected most updated
annual report data.
2. Selected only attributes that
are 90% complete in all three
countries
3. Replaces missing values with
country mean
Created 5 class indicators.
Output BICS_SECTOR
attribute
45 CR attributes that are 100%
complete
5 class indicators
-
8/13/2019 FA Competition (Final)
9/15
6
Different combinations of countries data were also explored. The summary of the class distribution
for each combination is shown below.
Table 5. Class distribution of each combination
Combination
of countries
Number
of
instances
Number
of Non
Default
(ND) and
Default
(D)
Default Indicators
B1 B2 B3 B4 B5
Singapore
(SG)
6828 ND 6795 6764 6738 6716 6695
D33 64 90 112 133
(0.48%) (0.94%) (1.32%) (1.64%) (1.95%)
Hong Kong
(HK)
12239 ND 12196 12157 12121 12090 12063
D43 82 118 149 176
(0.35%) (0.67%) (0.96%) (1.22%) (1.44%)
China
(CN)
21091 ND 20850 20624 20410 20205 20006
D241 467 681 886 1085
(1.14%) (2.21%) (3.23%) (4.20%) (5.14%)
SG and HK 19067 ND 18991 18921 18859 18806 18758
D76 146 208 261 309
(0.40%) (0.77%) (1.09%) (1.37%) (1.62%)
SG and CN 21919 ND 27645 27388 27148 26921 26701
D274 531 771 998 1218
(0.98%) (1.90%) (2.76%) (3.57%) (4.36%)
HK and CN 33330 ND 33046 32781 32531 32295 32069
D284 549 799 1035 1261
(0.85%) (1.65%) (2.40%) (3.11%) (3.78%)
SG, HK and
CN
40158 ND 39841 39545 39269 39011 38764
D317 613 889 1147 1394
(0.79%) (1.53%) (2.21%) (2.86%) (3.47%)
4 EXPERIMENTAL DESIGN
The classification models used in our project experiments are: ID3 Decision Tree (J48), RandomForest, Random Tree, Logistic Regression, Support Vector Machines and Neural Networks. The three
key questions that the team aims to address are:
a) How many years of data from a firm are required for accurate prediction of default?b) Which is the most appropriate Data Mining Algorithm for a particular country specific dataset?c) How accurate and reliable is each Data Mining Technique in predicting default for each dataset?We ran the six algorithms to each of the three datasets and tabulated their confusion matrix results
to obtain prediction accuracy. Each experiment was repeated with different bankruptcy indicators
from B1 to B5, to ascertain the number of years of data required for accurate default prediction. We
further combined the datasets and re-ran the tests to see if permuted combinations of datasets willcause the algorithms to yield different performances.
-
8/13/2019 FA Competition (Final)
10/15
7
Lastly, in order to gain a better perspective of how the results can help in loan decisions, we applied
a cost function to the confusion matrix of each result to simulate the expected gains of prediction.
We assumed good loans to return an optimistic gain of 6% interest while, bad loans return the worst
outcome of losing 100% of investment along with the opportunity cost of 6%. Loans that were not
made due to the algorithms classifying them as a bankrupt will incur an opportunity cost of 6%
interest.
5 EVALUATION OF CLASSIFICATION MODELS
The team built more than 30 models for each financial market. To identify the most appropriate
model, the team compared the models across the six classification approaches, using accuracy and
expected returns as key performance measures.
5.1 ACCURACYThe team determined accuracy based on the proportion of incorrectly classified cases. Two types of
errors may occur during classification, i.e. Type I and Type II errors. Type I errors occur when the
model classifies the non-bankruptcy group into the bankruptcy group. These errors represent
potential loss in interest revenue of the financial institutions. On the other hand, Type II errors
classify the bankruptcy group into the non-bankruptcy group. In comparison, Type II errors appear to
be more costly to financial institutions as they might be unable to recover the principal amount.
Hence, we placed a higher emphasis on Type II errors. A model is considered to have high accuracy if
it has a low proportion of Type II cases.
5.2 EXPECTED RETURNS
The team calculated the expected returns for each models using the assigned costs for each of the
four possible outcomes, i.e. correctly classified bankruptcy, incorrectly classified bankruptcy,
correctly classified non-bankruptcy, and incorrectly classified non-bankruptcy.
Table 6: Cost Assignment
Bankruptcy Non-Bankruptcy
Correctly Classified 0% 6%
Incorrectly Classified -6% -106%
6 RESULTS AND DISCUSSION
6.1 CHINA
Amongst all the classification models, decision tree models (i.e. J48, Random Forest, Random Tree)
yielded the highest accuracy, with no Type 2 errors. This implies that the models developed are
highly accurate in bankruptcy predication. The expected returns are also higher for decision tree
models, with the Random Forest model generating an expected return of close to 6%. The financial
institution can expect an average return of 5.8% using the decision tree models.
Table 7: China
Model B1 B2 B3 B4 B5
-
8/13/2019 FA Competition (Final)
11/15
8
Accuracy
J48 0.000 0.000 0.000
Not Available
Random Forest 0.000 0.000 0.000
Random Tree 0.000 0.000 0.000
Logistic Regression 0.005 0.010 0.016
SVM 0.007 0.014 0.021NN(MLP) 0.004 0.002 0.009
Expected Returns
J48 5.80 5.58 5.52
Not Available
Random Forest 5.91 5.79 5.71
Random Tree 5.79 5.65 5.44
Logistic Regression 2.16 1.56 1.03
SVM 1.96 0.79 -0.15
NN(MLP) 4.26 3.99 3.17
6.2 HONG KONGThe Support Vector Machine (SVM) model produced the lowest proportion of Type II errors.
However, the expected returns from this model is also the lowest as compared to other models. On
the other hand, the decision tree models yielded higher expected returns, ranging from 5.44% to
5.57%.
Table 8: Hong Kong
Model B1 B2 B3 B4 B5
Accuracy
J48 0.0036 0.0073 0.0103 0.0130 0.0147
Random Forest 0.0038 0.0074 0.0106 0.0131 0.0147
Random Tree 0.0039 0.0074 0.0110 0.0130 0.0140Logistic Regression 0.0038 0.0069 0.0102 0.0132 0.0159
SVM 0.0035 0.0071 0.0105 0.0144 0.0163
NN(MLP) 0.0039 0.0075 0.0102 0.0122 0.0138
Expected Returns
J48 5.48 5.06 4.73 4.38 4.18
Random Forest 5.57 5.16 4.80 4.51 4.32
Random Tree 5.44 5.03 4.43 4.42 4.18
Logistic Regression 2.22 1.98 2.15 2.27 1.56
SVM 0.24 0.06 1.33 1.07 1.01
NN(MLP) 4.64 1.66 2.68 2.37 1.84
6.3 SINGAPORE
The SVM model has the lowest proportion of Type II errors, followed by Logistic Regression model.
These models, however, produced very low expected returns. Despite the lower accuracy rate, the
decision tree models work best in generating a good return on loans.
Table 9: Singapore
Model B1 B2 B3 B4 B5
Accuracy
J48 0.0055 0.0110 0.0121 0.0159 0.0160
Random Forest 0.0049 0.0098 0.0132 0.0147 0.0171Random Tree 0.0050 0.0097 0.0091 0.0158 0.0162
-
8/13/2019 FA Competition (Final)
12/15
9
Logistic Regression 0.0034 0.0104 0.0142 0.0147 0.0153
SVM 0.0033 0.0090 0.0146 0.0166 0.0162
NN(MLP) 0.0041 0.0107 0.0136 0.0134 0.0143
Expected Returns
J48 5.27 4.46 4.11 4.00 3.98
Random Forest 5.44 4.90 4.51 4.34 4.08Random Tree 5.08 4.40 4.62 3.80 3.82
Logistic Regression 2.38 1.62 1.21 1.58 1.12
SVM 0.92 1.19 0.63 0.31 0.47
NN(MLP) 5.06 4.34 3.05 2.06 1.14
6.4 COMBINATION OF CHINA,HONG KONG AND SINGAPORE
The team combined the datasets for China, Hong Kong and Singapore to predict bankruptcy for Asia
as a whole. The logistic regression model works best for Asia in minimising the number of Type II
errors. However, the expected returns are better for decision tree models.
Table 10: China, Hong Kong and Singapore
Model B1 B2 B3 B4 B5
Accuracy
J48 0.0041 0.0089 0.0128 0.0150 0.0173
Random Forest 0.0046 0.0088 0.0125 0.0152 0.0180
Random Tree 0.0046 0.0086 0.0125 0.0156 0.0178
Logistic Regression 0.0040 0.0081 0.0102 0.0130 0.0165
SVM 0.0045 0.0102 0.0121 0.0153 0.0168
NN(MLP) 0.0042 0.0080 0.0101 0.0123 0.0153
Expected Returns
J48 5.35 4.83 4.36 4.08 3.76Random Forest 5.49 5.01 4.59 4.26 3.97
Random Tree 5.31 4.85 4.33 4.01 3.69
Logistic Regression 1.49 1.18 1.51 1.03 0.68
SVM 1.28 -0.26 0.95 0.75 -0.64
NN(MLP) 1.30 -0.25 0.88 1.06 -0.46
6.5 KEY OBSERVATIONS FROM CHINA,HONG KONG AND SINGAPORE
The team made the following observations when comparing the models of China, Hong Kong and
Singapore:
a) Decision Tree models generate high expected returns for all countries. Currently, many financialinstitutions use logistic regression and other statistic approaches in predicating bankruptcy. Our
models seem to suggest that classifications using the decision tree approaches yield better
outcomes in terms of expected returns. In particular, the Random Tree models work well across
financial markets.
b) Logistic Regression, SVM and Neural Network models have lower Type II errors. These modelsare better at preventing bad loans.
c) China predicts bankruptcy of firms accurately.Of the three countries studied, China yielded thehighest accuracy and expected returns. This could be due to a more mature loan market, where
the financial institutions are relatively accurate in predicting bankruptcy of firms.
-
8/13/2019 FA Competition (Final)
13/15
10
d) Better to build the models for different countries separately. The models built using thecombined datasets for China, Hong Kong and Singapore did not improve accuracy and expected
returns.
e) The most-recent-year core ratios are sufficient to predict bankruptcy.Comparing the modelsbuilt using core ratios of different years, the models built using the most-recent-year core ratios
(i.e. B1) yielded the best outcomes.
7 RECOMMENDATIONS
Financial institutions should assess the credit risk of firms from different countries separately. The
decision tree models work best for the China market, while the SVM model predicts bankruptcies
better for Hong Kong and Singapore. Risk-seeking financial institutions can maximise their returns
using the Random Forest model. This model yielded the highest expected returns for all countries.
8 CONCLUSION
Due to bankruptcy risk, numerous studies have attempted to develop credit risk or default
prediction models by using several statistical methods or machine learning algorithms. Much
research literature based their studies on datasets that included company data from developed,
western markets. Through our teams experiments with 6 different data mining techniques and
datasets of company core ratios obtained from 3 different Asian markets (Singapore, Hong Kong and
China), we discovered that, with just 1 year of company data, decision trees are more effective in
classifying default. Country-specific data for Asia should also be studied as standalones. Our team
recognises our experiments to be preliminary to developing a full system for the Asian markets
default prediction. Further studies can be extended to our study in future in order to better yield a
complete model. For example, features such as externally-driven events (interest rates, commodity
prices, government restrictions) that cause changes in company policies or balance sheet values
should also be studied so that profiling and company assessment would not be limited as a system.
However, we hope that it would be sufficient to spur future studies.
-
8/13/2019 FA Competition (Final)
14/15
11
9 REFERENCES
[1] J. Yang, S. Olafsson, Optimization-based feature selection with adaptive instance sampling,
Computers & Operations Research 33 (11) (2006) 30883106.
[2] A.I. Dimitras, S.H. Zanakis, C. Zopounidis, A survey of business failures with an emphasis on
prediction methods and industrial applications, European Journal of Operational Research 90(1996) 487513.
[3] RMI (2013), NUS-RMI Credit Research Initiative Technical Report Version: 2013 Update 2b,
Global Credit Review 3, pp105
[4] T. Lensberg, A. Eilifsen, T.E. McKee, Bankruptcy theory development and classification via
genetic programming, European Journal of Operational Research 169 (2006) 677697.
[5] W.H. Beaver, Financial ratios as predictors of failure, Journal of Accounting Research 4 (1966)
71102.
[6] W.H. Beaver, Alternative accounting measures as predictors of failure, Account Review 43 (1)
(1968) 113122.
[7] E.I. Altman, Financial ratios, discriminant analysis and the prediction of corporate bankruptcy,
Journal of Finance 23 (1968) 589609.[8] T. Hastie, R. Tibshirani, J.H. Friedman, The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, Springer, New York, 2001.
[9] P.R. Kumar, V. Ravi, Bankruptcy prediction in banks and firms via statistical and intelligent
techniquesa review, European Journal of Operational Research 180 (1) (2007) 128.
[10] J.H. Min, Y.-C. Lee, Bankruptcy prediction using support vector machine with optimal choice of
kernel function parameters, Expert Systems with Applications 28 (2005) 603614.
[11] K.S. Shin, T.S. Lee, H.J. Kim, An application of support vector machines in bankruptcy
prediction model, Expert Systems with Applications 28 (2005) 127135.
[12] G. Zhang, M.Y. Hu, B.E. Patuwo, D.C. Indro, Artificial neural networks in bankruptcy prediction:
general framework and cross-validation analysis, European Journal of Operational Research
116 (1999) 1632.[13] J.R. Quinlan, Discovering Rules by Induction from Large Collection of Examples, in: D.
Michie, Ed., Expert Systems in the Micro Electronic Age (Edinburg University Press, 1979).
[14] J.R. Quinlan, Induction of Decision Trees, Machine Learning I (1986) 81-106.
[15] C. Carter and J. Catlett, Assessing Credit Card Applications Using Machine Learning, IEEE
Expert (Fall 1987) 71-79.
[16] W.F. Messier and J.V. Hansen, Inducing Rules for Expert System Development: An
Example Using Default and Bankruptcy data, Management Science 34, No. 12 (Dec 1988)
1403-1415.
[17] K.Y. Tam and R. Chi, Inducing Stock Screening Rules for Portfolio Construction, Journal of
Operations Research Society ( 1991, to appear).
[18] H. Braun and J.S. Chandler, Predicting Stock Market Behaviour through Rule Induction:An Application of the Learning-From-Example Approach, Decision Science 18 (1987) 415-
429.
[19] S.B. Lee and S.H. Oh, A Comparative Study of Recursive Partitioning Algorithm and
Analog Concept Learning System, Expert Systems with Applications 1 (1990) 403-416.
[20] L. Breiman, Random forest, Machine Learning 45 (2001) 532
[21] L. Breiman, Bagging predictors, Machine Learning 26 (2) (1996) 123140.
[22] A. Liaw, M. Wiener, Classification and regression by random forest, R News 2 (3) (2002) 1822
[23] R. Diaz-Uriarte, S. Alvarez de Andres, Gene selection and classification of microarray data
using random forest, BMC Bioinformatics 7 (3) (2006).
[24] Ohlson, J. A. (1980), Financial Ratios and the Probabilistic Prediction of Bankruptcy,
Journal of Accounting Research, 18 (1), 109-31.
-
8/13/2019 FA Competition (Final)
15/15
12
[25] Zavgren, C. V. (1985), Assessing the Vulnerability to Failure of American Industrial Firms:
A Logistic Analysis, Journal of Business Finance and Accounting, 12 (1), 19-45.
[26] Altman, E. I., and G. Sabato (2007), Modeling Credit Risk for SMEs: Evidence from the
U.S. Market, Abacus, 43 (3), 332-57.
[27] Altman, E. I., G. Sabato, and N. Wilson (2008), The Value of Non -Financial Information in
SME Risk Management, Working Paper, New York University.[28] Boser BE, Guyon IM, Vapnik VN (1992) A traininig algorithm for optimal margin classifers. In:
Haussler D (ed) Proceedings of the 5th annual ACM workshop on computational learning
theory. ACM Press, New York, pp 144152
[29] Cortes C, Vapnik VN (1995) Support-vector networks. Mach Learn 20(3):273297
[30] Huang Z, Chen H, Hsu CJ, Chen WH, Wu S (2004) Credit rating analysis with support vector
machines and neural networks: a market comparative study. Decis Support Syst 37:543558
[31] Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer, Berlin
[32] Scholkopf B, Smola AJ (2002) Learning with kernels. MIT Press, Cambridge
[33] Moodys Quantitative Risks Public Firm Risk Model. *Online+. Available: www.moodysqra.com