fa competition (final)

8/13/2019 FA Competition (Final)

1/15

NUS FINANCIAL ANALYTICS COMPETITON:

BANKRUPTCY PREDICTION OF FIRMS IN

CHINA, HONG KONG AND SINGAPORE

USING MACHINE-LEARNING APPROACHES

Submitted By:

Choy Pui Yee (A0119666) | Ho Jun Hao (A0028383) | Lee Seng Yin, Daniel (A0040255) |

Tay Yuzhong, Zeldon (A0119093)


2/15

ABSTRACT

For many corporations, assessing the creditworthiness of investment targets is vital to investment

decisions. Data mining and machine learning techniques have been known to be applicable in solving

bankruptcy prediction and credit scoring problems. However, many default prediction models may

have drawn on studies of empirical data extracted from mature, Western markets. In this report, our

team has adopted an experimental approach to seeking data mining techniques that might be more

suitable for profiling Asian firms. The six classification approaches used are: ID3 Decision Tree (J48),

Random Forest, Random Tree, Logistic Regression, Support Vector Machines and Neural Networks.

From the experiments, the team found that Decision Tree models generate the highest expected

returns for all countries, while Logistic Regression, SVM and Neural Network models produced lower

Type 2 errors.In addition, the most-recent-year core ratios are sufficient to predict bankruptcy. With

these findings, the team recommends that financial institutions assess the risk of extending loans to

firms separately, based on their country. Our team recognises our experiments to be preliminary to

developing a full system for the Asian markets default prediction. Further studies can be extendedto our study in future in order to better yield a complete model.


3/15

CONTENTS

Abstract ................................................................................................................................................... 0

1 Introduction .................................................................................................................................... 1

2 Literature Review ............................................................................................................................ 1

2.1 Bankruptcy Prediction ............................................................................................................. 1

2.2 Classification Models for Bankruptcy Prediction .................................................................... 2

2.2.1 ID3 Decision Tree (J48) .................................................................................................... 2

2.2.2 Random Forest ................................................................................................................ 3

2.2.3 Random Tree ................................................................................................................... 3

2.2.4 Logistics Regression ........................................................................................................ 3

2.2.5 Support Vector Machines ............................................................................................... 3

2.2.6 Neural Networks (NN) ..................................................................................................... 3

3 Data Preparation ............................................................................................................................. 4

4 Experimental Design ....................................................................................................................... 6

5 Evaluation of Classification Models ................................................................................................ 7

5.1 Accuracy .................................................................................................................................. 7

5.2 Expected Returns .................................................................................................................... 7

6 Results and Discussion .................................................................................................................... 7

6.1 China ....................................................................................................................................... 7

6.2 Hong Kong ............................................................................................................................... 8

6.3 Singapore ................................................................................................................................ 8

6.4 Combination of China, Hong Kong and Singapore .................................................................. 9

6.5 Key Observations from China, Hong Kong and Singapore ...................................................... 9

7 Recommendations ........................................................................................................................ 10

8 Conclusion ..................................................................................................................................... 10

9 References ..................................................................................................................................... 11


4/15

1

1 INTRODUCTION

Understanding default likelihood is critical to credit risk management, macro policy making and

financial regulation. Data mining techniques have been developed very early on in the field of

bankruptcy and default prediction. These machine learning techniques have the ability to read huge

amounts of data, while filtering out redundant or irrelevant information and identifying correlated

attributes, which describe the characteristics of likely defaulting firms [1]. These are then used to

build models to evaluate whether corporations face financial distress. For financial institutions, the

models act as early warning systems as well as decision making tools for evaluation of candidate

firms for collaboration or investment. Such decisions have to take into account the opportunity cost

and the risk of failures [2]. However, many studies on default prediction models either looked at, or

incorporated empirical data, extracted from mature, Western markets as part of their sample data.

In this century, with many emerging markets and developing firms existing and centering their

activities on the commodity-rich and emerging Asia region, there may be hidden combinations of

characteristics, that defaulting Asian firms share. These factors may be different from their Westerncounterparts, and can make default prediction more effective. The data mining algorithms

commonly used for profiling firms from a global samples perspective might be made more effective

if a solely tailored regional sample is used for modelling.

The Credit Research Initiative (CRI) is a non-profit undertaking by the Risk Management Institute at

the National University of Singapore (NUS). It seeks to promote research and developments in the

critical area of credit risk. As such, it welcomes suggestions and improvements to this area of

research and grants the public access to its database of firm specific data which covers over 60,400

listed firms around the world for the purpose of such work. Its foundation is the probability ofdefault (PD) model developed from its extensive database. It continually calibrates its model and has

on-going work in identifying common company-specific attributes that are more indicative of

defaults in emerging markets. [3]

Our team seeks to discover data mining techniques that may be more effective in classifying Asian

defaulting firms, by considering 6 different data mining techniques, on sample data of firms listed in

3 Asian financial bourses taken from the RMI CRIs database. Prediction accuracy will be used to

evaluate each techniques performance.

2 LITERATURE REVIEW

2.1 BANKRUPTCY PREDICTION

Sometimes a distressed firm can continue to operate in that condition for a prolonged period of

years. Other times, firms enter bankruptcy immediately after a highly distressing event, such as a

major fraud. This seemingly disordered chance of default can be correlated to a combination of firm-

specific and external economic factors. Lensberg et al.[4] has investigated much related work and

categorized various factors affecting bankruptcy potentially.


5/15

2

In the past, Beaver[5,6]used financial ratios as the input variables of linear regression models for

firm bankruptcy classification. Altman[7] was one of the first to identify the classical multivariate

discriminate analysis technique. On the other hand, many recent studies focus on using data mining

techniques[8] for bankruptcy prediction. Other groups of researchers showed that data mining

models which require lesser knowledge of financial knowledge, such as neural networks, outperform

statistical approaches such as logistic regression, linear discriminate analysis, and multiple

discriminate analysis, that rely on financial ratios and statistical rules[912, 30].

Table 1 below shows a comparison of related studies published from 2001 to 2007 to examine which

models they built. Many of the studies emphasized on designing more sophisticated classifiers.

Table 1. Summary of related studies

Author(s) Feature Selection Prediction Models

Atiya (2001) Yes Neural Networks

Lee et al. (2002) No Discriminant Analysis & Neural

Network

Malhortra and Malhorta No Fuzzy Logic & Neural Networks

McKee and Lensberg (2002) No Genetic Algorithms

Shin and Lee (2002) Yes Genetic Algorithms

Kim and Han (2003) No Genetic Algorithms

Huang et al. (2004) Yes SVM

Canbas et al. (2005) Yes Discriminant Analysis & Logistic

Regression

Lee et al. (2005) No Self-organizing Maps

Min and Lee (2005) Yes Support Vector Machines

Ong et al. (2005) No Neural Networks &

Discriminant Analysis

Shin et al. (2005) Yes Support Vector Machines

Gestel et al. (2006) No Support Vector Machines

Huysmans et al. (2006) No Self-organizing Maps & Neutral

Networks

Lensberg et al. (2006) No Genetic Algorithms

Min et al. (2006) No Genetic Algorithms & Support

Vector Machines

Tsakonas et al. (2006) No Neural Logics Networks &

Genetic AlgorithmsWu et al. (2007) No Genetic Algorithms & Support

Vector Machines

Tsai and Wu (2008) No Neural Networks

2.2 CLASSIFICATION MODELS FOR BANKRUPTCY PREDICTION

2.2.1 ID3 Decision Tree (J48)

Instead of generating a decision rule in the form of a discriminant function, the ID3 algorithm

produces a decision tree that classifies the training sample by using the entropy measure

[13,14]. This is an inductive machine learning method and has been applied to many business

classification problems today, including credit scoring [15], corporate failures prediction [16],


6/15

3

stock portfolio construction [17], stock market behaviour prediction [18], and bankruptcy risk

prediction *19+. For our project, our team used the open source machine learning program Weka

3.6, which adopts the use of the J48 decision tree algorithm, which is an extension of the ID3

algorithm developed by Ross Quinlan.

2.2.2 Random ForestRandom Forests (RF) are classification algorithms developed by Breiman [20]. They use an ensemble

of classification trees [8, 21] in the construction of the model. The concept of RF is to combine many

binary decision trees learnt using several bootstrap samples coming from the main sample, then

choosing randomly at each node, a further subset of explanatory variables. RFs rank variables by a

variable importance index [19], and so can suggest the significance of a variable based on the

classification accuracy, while considering the interaction among variables. The algorithm estimates

the importance of a variable by looking at how much prediction error increases when data not

present in the bootstrap sample for that variable is permuted while all others are held unchanged.

The necessary calculations are carried out tree by tree as the RF is constructed. Typically, the rank

order of the importance score is reported [23].

2.2.3 Random Tree

The Random Tree algorithm constructs a decision tree that considers K randomly chosen attributes

at each node. It performs no pruning and has the ability to allow estimation of class probabilities

based on a hold-out set. The Random Tree is usually used as a 'building block' with RFs, with many

Random Trees coming together to make an RF as mentioned above. Generally, the Random Tree on

its own tends to be too weak and requires an ensemble of algorithms to make it strong enough.

However, not much research has used it as a standalone technique to evaluate bankruptcy data but

our team decided to test it because of its simplicity.2.2.4

Logistics Regression

Logistic Regression has been the classic answer to many credit default analysis problems for many

years. Ohlson [24] was the first to apply the Multiple Logistic Regression Analysis (Logit) to

the failure prediction study while claiming that the model was superior to MDA due to lesser

limitations in statistical normality. He successfully developed the model with nine predictors (7

financial ratios and 2 categorical variables) and many research built on his study by using Logit

analysis instead of MDA [2427]

2.2.5 Support Vector Machines

Support Vector Machines (SVM) are derived from statistical learning theories and follow structural

risk minimization principles [28, 29]. The basic idea of SVM is to define a hyperplane that

geometrically separates binary classes in high dimension spaces. The optimal hyperplane is obtained

by maximizing the margin between the data points of the two classes whereby a structural risk

minimum is achieved [31, 32].

2.2.6 Neural Networks (NN)

Many studies on bankruptcy prediction using the non-linear NNs have been around since 1990, and

are still active now. NNs have generally outperformed the other existing methods. Currently, several

of the major commercial loan default prediction products are based on NNs. For example, Moodys

Public Firm Risk Model [33] is based on NNs as the main technology. Many banks have also

developed and are using proprietary NN default prediction models.


7/15

4

3 DATA PREPARATION

The team has opted to use Core Ratios (CR) as the financial data to build the classifier model. With

CR data, meaningful comparisons can be made on companies of different sizes. Of the available CR

data, only the most updated annual reports data are selected. In addition, only CR attributes that are

more than 90% complete in all three countries studied are selected. The 45 CR attributes used are

listed in Table 2. As the classifier accuracy may be adversely affected by missing values, the missing

values are replaced with the country mean.

Table 2. List of selected CR attributes

CR Attributes 1 to 15 CR Attributes 16 to 30 CR Attributes 31 to 45

SALES_GROWTH PRETAX_MARGIN TOT_DEBT_TO_COM_EQY

ASSET_GROWTH TRAIL_12M_SALES_PER_SH TOT_DEBT_TO_TOT_EQY

ASSET_TURNOVER TRAIL_12M_EPS_BEF_XO_ITE

M

TOT_DEBT_TO_TOT_ASSET

OPER_MARGIN CASH_AND_ST_INVESTMENTS NET_DEBT_TO_SHRHLDR_EQTYPRETAX_INC_TO_NET_SALES TRAIL_12M_NET_SALES SHORT_AND_LONG_TERM_DEB

T

PROF_MARGIN TRAIL_12M_OPER_INC NET_DEBT

RETURN_ON_ASSET TRAIL_12M_EPS NET_CHNG_ST_DEBT

TAX_BURDEN TRAIL_12M_CASH_FROM_OPE

R

NET_CHANGE_LIABILITIES

FNCL_LVRG TRAIL_12M_NET_INC INCR_IN_LIAB_PCT_OF_TOT

REVENUE_PER_SH TOT_COMMON_EQY NET_CHANGE_TOTAL_EQUITY

OPER_INC_PER_SH TOT_DEBT_TO_TOT_CAP INCR_IN_EQY_PCT_OF_TOT

PRETAX_INC_PER_SH ASSET_TO_EQY COM_EQY_TO_TOT_ASSETCONT_INC_PER_SH LT_DEBT_TO_COM_EQY CASH_TO_TOT_ASSET

CASH_ST_INVESTMENTS_PER_S

H

LT_DEBT_TO_TOT_CAP ACCT_RCV_TO_TOT_ASSET

BOOK_VAL_PER_SH LT_DEBT_TO_TOT_ASSET TRAIL_12M_INC_BEF_XO_ITEM

For each instance of the CR data, five default indicators are appended for companies. The indicators

are numbered 1 to 5. The number represents the years of consideration from the latest credit event.

A CR instance will be assigned a True indicator if, it recorded a credit event listed in Table 3 and

falls within the period of consideration. For example, if a company recorded a credit event listed in

Table 3 and has annual report filings from FY 2000 to 2004, the indicators appended to the CR datawill be as follows:

Figure 1. Illustration of class indicators

FY YEAR B1 B2 B3 B4 B5

2000 FALSE FALSE FALSE FALSE TRUE

2001 FALSE FALSE FALSE TRUE TRUE

2002 FALSE FALSE TRUE TRUE TRUE

2003 FALSE TRUE TRUE TRUE TRUE

2004 TRUE TRUE TRUE TRUE TRUE


8/15

5

Table 3. Credit events that are classified as default events

Action Type Subcategory

Delisting Filing Type: Administration

Delisting Filing Type: Canadian CCAA

Delisting Filing Type: Chapter 11

Delisting Filing Type: Judicial Management

Delisting Filing Type: Liquidation

Delisting Filing Type: Protection

Delisting Filing Type: Receivership

Delisting Filing Type: Reorganization

Delisting Filing Type: Restructuring

Delisting Filing Type: Unknown

Delisting Filing Type: Winding Up

Bankruptcy Filing Reason: Bankruptcy

Bankruptcy Filing Reason: Coupon & principal paymentBankruptcy Filing Reason: Coupon payment only

Bankruptcy Filing Reason: Debt Restructuring

Bankruptcy Filing Reason: Interest payment

Bankruptcy Filing Reason: Loan payment

Bankruptcy Filing Reason: Principal payment

Bankruptcy Filing Reason: Unknown

Bankruptcy Filing Reason for delisting: Bankruptcy

Noting that some industries may inherently be more risky than others, an attribute to account for

the industry is appended to the data. BICS sector attribute from the company information, which

indicates the industry that the company is functioning in, is selected. The structure of the final data

is shown below in Table 4.

Table 4. Structure of data for model building

Data Source: Company

Information

Fundamentals CR Credit Events

Selected

Attributes

BICS_SECTOR 45 CR Attributes (see Table 4) Action type, subcategory

and date

Data

preparation

conducted

1. Selected most updated

annual report data.

2. Selected only attributes that

are 90% complete in all three

countries

3. Replaces missing values with

country mean

Created 5 class indicators.

Output BICS_SECTOR

attribute

45 CR attributes that are 100%

complete

5 class indicators


9/15

6

Different combinations of countries data were also explored. The summary of the class distribution

for each combination is shown below.

Table 5. Class distribution of each combination

Combination

of countries

Number

of

instances

Number

of Non

Default

(ND) and

Default

(D)

Default Indicators

B1 B2 B3 B4 B5

Singapore

(SG)

6828 ND 6795 6764 6738 6716 6695

D33 64 90 112 133

(0.48%) (0.94%) (1.32%) (1.64%) (1.95%)

Hong Kong

(HK)

12239 ND 12196 12157 12121 12090 12063

D43 82 118 149 176

(0.35%) (0.67%) (0.96%) (1.22%) (1.44%)

China

(CN)

21091 ND 20850 20624 20410 20205 20006

D241 467 681 886 1085

(1.14%) (2.21%) (3.23%) (4.20%) (5.14%)

SG and HK 19067 ND 18991 18921 18859 18806 18758

D76 146 208 261 309

(0.40%) (0.77%) (1.09%) (1.37%) (1.62%)

SG and CN 21919 ND 27645 27388 27148 26921 26701

D274 531 771 998 1218

(0.98%) (1.90%) (2.76%) (3.57%) (4.36%)

HK and CN 33330 ND 33046 32781 32531 32295 32069

D284 549 799 1035 1261

(0.85%) (1.65%) (2.40%) (3.11%) (3.78%)

SG, HK and

CN

40158 ND 39841 39545 39269 39011 38764

D317 613 889 1147 1394

(0.79%) (1.53%) (2.21%) (2.86%) (3.47%)

4 EXPERIMENTAL DESIGN

The classification models used in our project experiments are: ID3 Decision Tree (J48), RandomForest, Random Tree, Logistic Regression, Support Vector Machines and Neural Networks. The three

key questions that the team aims to address are:

a) How many years of data from a firm are required for accurate prediction of default?b) Which is the most appropriate Data Mining Algorithm for a particular country specific dataset?c) How accurate and reliable is each Data Mining Technique in predicting default for each dataset?We ran the six algorithms to each of the three datasets and tabulated their confusion matrix results

to obtain prediction accuracy. Each experiment was repeated with different bankruptcy indicators

from B1 to B5, to ascertain the number of years of data required for accurate default prediction. We

further combined the datasets and re-ran the tests to see if permuted combinations of datasets willcause the algorithms to yield different performances.


10/15

7

Lastly, in order to gain a better perspective of how the results can help in loan decisions, we applied

a cost function to the confusion matrix of each result to simulate the expected gains of prediction.

We assumed good loans to return an optimistic gain of 6% interest while, bad loans return the worst

outcome of losing 100% of investment along with the opportunity cost of 6%. Loans that were not

made due to the algorithms classifying them as a bankrupt will incur an opportunity cost of 6%

interest.

5 EVALUATION OF CLASSIFICATION MODELS

The team built more than 30 models for each financial market. To identify the most appropriate

model, the team compared the models across the six classification approaches, using accuracy and

expected returns as key performance measures.

5.1 ACCURACYThe team determined accuracy based on the proportion of incorrectly classified cases. Two types of

errors may occur during classification, i.e. Type I and Type II errors. Type I errors occur when the

model classifies the non-bankruptcy group into the bankruptcy group. These errors represent

potential loss in interest revenue of the financial institutions. On the other hand, Type II errors

classify the bankruptcy group into the non-bankruptcy group. In comparison, Type II errors appear to

be more costly to financial institutions as they might be unable to recover the principal amount.

Hence, we placed a higher emphasis on Type II errors. A model is considered to have high accuracy if

it has a low proportion of Type II cases.

5.2 EXPECTED RETURNS

The team calculated the expected returns for each models using the assigned costs for each of the

four possible outcomes, i.e. correctly classified bankruptcy, incorrectly classified bankruptcy,

correctly classified non-bankruptcy, and incorrectly classified non-bankruptcy.

Table 6: Cost Assignment

Bankruptcy Non-Bankruptcy

Correctly Classified 0% 6%

Incorrectly Classified -6% -106%

6 RESULTS AND DISCUSSION

6.1 CHINA

Amongst all the classification models, decision tree models (i.e. J48, Random Forest, Random Tree)

yielded the highest accuracy, with no Type 2 errors. This implies that the models developed are

highly accurate in bankruptcy predication. The expected returns are also higher for decision tree

models, with the Random Forest model generating an expected return of close to 6%. The financial

institution can expect an average return of 5.8% using the decision tree models.

Table 7: China

Model B1 B2 B3 B4 B5


11/15

8

Accuracy

J48 0.000 0.000 0.000

Not Available

Random Forest 0.000 0.000 0.000

Random Tree 0.000 0.000 0.000

Logistic Regression 0.005 0.010 0.016

SVM 0.007 0.014 0.021NN(MLP) 0.004 0.002 0.009

Expected Returns

J48 5.80 5.58 5.52

Not Available

Random Forest 5.91 5.79 5.71

Random Tree 5.79 5.65 5.44

Logistic Regression 2.16 1.56 1.03

SVM 1.96 0.79 -0.15

NN(MLP) 4.26 3.99 3.17

6.2 HONG KONGThe Support Vector Machine (SVM) model produced the lowest proportion of Type II errors.

However, the expected returns from this model is also the lowest as compared to other models. On

the other hand, the decision tree models yielded higher expected returns, ranging from 5.44% to

5.57%.

Table 8: Hong Kong


Accuracy

J48 0.0036 0.0073 0.0103 0.0130 0.0147

Random Forest 0.0038 0.0074 0.0106 0.0131 0.0147

Random Tree 0.0039 0.0074 0.0110 0.0130 0.0140Logistic Regression 0.0038 0.0069 0.0102 0.0132 0.0159

SVM 0.0035 0.0071 0.0105 0.0144 0.0163

NN(MLP) 0.0039 0.0075 0.0102 0.0122 0.0138

Expected Returns

J48 5.48 5.06 4.73 4.38 4.18

Random Forest 5.57 5.16 4.80 4.51 4.32

Random Tree 5.44 5.03 4.43 4.42 4.18

Logistic Regression 2.22 1.98 2.15 2.27 1.56

SVM 0.24 0.06 1.33 1.07 1.01

NN(MLP) 4.64 1.66 2.68 2.37 1.84

6.3 SINGAPORE

The SVM model has the lowest proportion of Type II errors, followed by Logistic Regression model.

These models, however, produced very low expected returns. Despite the lower accuracy rate, the

decision tree models work best in generating a good return on loans.

Table 9: Singapore


Accuracy

J48 0.0055 0.0110 0.0121 0.0159 0.0160

Random Forest 0.0049 0.0098 0.0132 0.0147 0.0171Random Tree 0.0050 0.0097 0.0091 0.0158 0.0162


12/15

9


SVM 0.0033 0.0090 0.0146 0.0166 0.0162

NN(MLP) 0.0041 0.0107 0.0136 0.0134 0.0143

Expected Returns

J48 5.27 4.46 4.11 4.00 3.98

Random Forest 5.44 4.90 4.51 4.34 4.08Random Tree 5.08 4.40 4.62 3.80 3.82


SVM 0.92 1.19 0.63 0.31 0.47

NN(MLP) 5.06 4.34 3.05 2.06 1.14

6.4 COMBINATION OF CHINA,HONG KONG AND SINGAPORE

The team combined the datasets for China, Hong Kong and Singapore to predict bankruptcy for Asia

as a whole. The logistic regression model works best for Asia in minimising the number of Type II

errors. However, the expected returns are better for decision tree models.

Table 10: China, Hong Kong and Singapore


Accuracy

J48 0.0041 0.0089 0.0128 0.0150 0.0173

Random Forest 0.0046 0.0088 0.0125 0.0152 0.0180

Random Tree 0.0046 0.0086 0.0125 0.0156 0.0178


SVM 0.0045 0.0102 0.0121 0.0153 0.0168

NN(MLP) 0.0042 0.0080 0.0101 0.0123 0.0153

Expected Returns

J48 5.35 4.83 4.36 4.08 3.76Random Forest 5.49 5.01 4.59 4.26 3.97

Random Tree 5.31 4.85 4.33 4.01 3.69


SVM 1.28 -0.26 0.95 0.75 -0.64

NN(MLP) 1.30 -0.25 0.88 1.06 -0.46

6.5 KEY OBSERVATIONS FROM CHINA,HONG KONG AND SINGAPORE

The team made the following observations when comparing the models of China, Hong Kong and

Singapore:

a) Decision Tree models generate high expected returns for all countries. Currently, many financialinstitutions use logistic regression and other statistic approaches in predicating bankruptcy. Our

models seem to suggest that classifications using the decision tree approaches yield better

outcomes in terms of expected returns. In particular, the Random Tree models work well across

financial markets.

b) Logistic Regression, SVM and Neural Network models have lower Type II errors. These modelsare better at preventing bad loans.

c) China predicts bankruptcy of firms accurately.Of the three countries studied, China yielded thehighest accuracy and expected returns. This could be due to a more mature loan market, where

the financial institutions are relatively accurate in predicting bankruptcy of firms.


13/15

10

d) Better to build the models for different countries separately. The models built using thecombined datasets for China, Hong Kong and Singapore did not improve accuracy and expected

returns.

e) The most-recent-year core ratios are sufficient to predict bankruptcy.Comparing the modelsbuilt using core ratios of different years, the models built using the most-recent-year core ratios

(i.e. B1) yielded the best outcomes.

7 RECOMMENDATIONS

Financial institutions should assess the credit risk of firms from different countries separately. The

decision tree models work best for the China market, while the SVM model predicts bankruptcies

better for Hong Kong and Singapore. Risk-seeking financial institutions can maximise their returns

using the Random Forest model. This model yielded the highest expected returns for all countries.

8 CONCLUSION

Due to bankruptcy risk, numerous studies have attempted to develop credit risk or default

prediction models by using several statistical methods or machine learning algorithms. Much

research literature based their studies on datasets that included company data from developed,

western markets. Through our teams experiments with 6 different data mining techniques and

datasets of company core ratios obtained from 3 different Asian markets (Singapore, Hong Kong and

China), we discovered that, with just 1 year of company data, decision trees are more effective in

classifying default. Country-specific data for Asia should also be studied as standalones. Our team

recognises our experiments to be preliminary to developing a full system for the Asian markets

default prediction. Further studies can be extended to our study in future in order to better yield a

complete model. For example, features such as externally-driven events (interest rates, commodity

prices, government restrictions) that cause changes in company policies or balance sheet values

should also be studied so that profiling and company assessment would not be limited as a system.

However, we hope that it would be sufficient to spur future studies.


14/15

11

9 REFERENCES

[1] J. Yang, S. Olafsson, Optimization-based feature selection with adaptive instance sampling,

Computers & Operations Research 33 (11) (2006) 30883106.

[2] A.I. Dimitras, S.H. Zanakis, C. Zopounidis, A survey of business failures with an emphasis on

prediction methods and industrial applications, European Journal of Operational Research 90(1996) 487513.

[3] RMI (2013), NUS-RMI Credit Research Initiative Technical Report Version: 2013 Update 2b,

Global Credit Review 3, pp105

[4] T. Lensberg, A. Eilifsen, T.E. McKee, Bankruptcy theory development and classification via

genetic programming, European Journal of Operational Research 169 (2006) 677697.

[5] W.H. Beaver, Financial ratios as predictors of failure, Journal of Accounting Research 4 (1966)

71102.

[6] W.H. Beaver, Alternative accounting measures as predictors of failure, Account Review 43 (1)

(1968) 113122.

[7] E.I. Altman, Financial ratios, discriminant analysis and the prediction of corporate bankruptcy,

Journal of Finance 23 (1968) 589609.[8] T. Hastie, R. Tibshirani, J.H. Friedman, The Elements of Statistical Learning: Data Mining,

Inference, and Prediction, Springer, New York, 2001.

[9] P.R. Kumar, V. Ravi, Bankruptcy prediction in banks and firms via statistical and intelligent

techniquesa review, European Journal of Operational Research 180 (1) (2007) 128.

[10] J.H. Min, Y.-C. Lee, Bankruptcy prediction using support vector machine with optimal choice of

kernel function parameters, Expert Systems with Applications 28 (2005) 603614.

[11] K.S. Shin, T.S. Lee, H.J. Kim, An application of support vector machines in bankruptcy

prediction model, Expert Systems with Applications 28 (2005) 127135.

[12] G. Zhang, M.Y. Hu, B.E. Patuwo, D.C. Indro, Artificial neural networks in bankruptcy prediction:

general framework and cross-validation analysis, European Journal of Operational Research

116 (1999) 1632.[13] J.R. Quinlan, Discovering Rules by Induction from Large Collection of Examples, in: D.

Michie, Ed., Expert Systems in the Micro Electronic Age (Edinburg University Press, 1979).

[14] J.R. Quinlan, Induction of Decision Trees, Machine Learning I (1986) 81-106.

[15] C. Carter and J. Catlett, Assessing Credit Card Applications Using Machine Learning, IEEE

Expert (Fall 1987) 71-79.

[16] W.F. Messier and J.V. Hansen, Inducing Rules for Expert System Development: An

Example Using Default and Bankruptcy data, Management Science 34, No. 12 (Dec 1988)

1403-1415.

[17] K.Y. Tam and R. Chi, Inducing Stock Screening Rules for Portfolio Construction, Journal of

Operations Research Society ( 1991, to appear).

[18] H. Braun and J.S. Chandler, Predicting Stock Market Behaviour through Rule Induction:An Application of the Learning-From-Example Approach, Decision Science 18 (1987) 415-

429.

[19] S.B. Lee and S.H. Oh, A Comparative Study of Recursive Partitioning Algorithm and

Analog Concept Learning System, Expert Systems with Applications 1 (1990) 403-416.

[20] L. Breiman, Random forest, Machine Learning 45 (2001) 532

[21] L. Breiman, Bagging predictors, Machine Learning 26 (2) (1996) 123140.

[22] A. Liaw, M. Wiener, Classification and regression by random forest, R News 2 (3) (2002) 1822

[23] R. Diaz-Uriarte, S. Alvarez de Andres, Gene selection and classification of microarray data

using random forest, BMC Bioinformatics 7 (3) (2006).

[24] Ohlson, J. A. (1980), Financial Ratios and the Probabilistic Prediction of Bankruptcy,

Journal of Accounting Research, 18 (1), 109-31.


15/15

12

[25] Zavgren, C. V. (1985), Assessing the Vulnerability to Failure of American Industrial Firms:

A Logistic Analysis, Journal of Business Finance and Accounting, 12 (1), 19-45.

[26] Altman, E. I., and G. Sabato (2007), Modeling Credit Risk for SMEs: Evidence from the

U.S. Market, Abacus, 43 (3), 332-57.

[27] Altman, E. I., G. Sabato, and N. Wilson (2008), The Value of Non -Financial Information in

SME Risk Management, Working Paper, New York University.[28] Boser BE, Guyon IM, Vapnik VN (1992) A traininig algorithm for optimal margin classifers. In:

Haussler D (ed) Proceedings of the 5th annual ACM workshop on computational learning

theory. ACM Press, New York, pp 144152

[29] Cortes C, Vapnik VN (1995) Support-vector networks. Mach Learn 20(3):273297

[30] Huang Z, Chen H, Hsu CJ, Chen WH, Wu S (2004) Credit rating analysis with support vector

machines and neural networks: a market comparative study. Decis Support Syst 37:543558

[31] Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer, Berlin

[32] Scholkopf B, Smola AJ (2002) Learning with kernels. MIT Press, Cambridge

[33] Moodys Quantitative Risks Public Firm Risk Model. *Online+. Available: www.moodysqra.com

fa competition (final)

Documents