machine learning improves accounting estimates · 2019-11-27 · smu classification: restricted...
TRANSCRIPT
SMU Classification: Restricted
Machine Learning Improves Accounting Estimates
Presented by
Ting Sun College of New Jersey
The views and opinions expressed in this working paper are those of the author(s) and not necessarily those of the School of Accountancy, Singapore Management University.
Machine Learning Improves Accounting Estimates
Kexing Ding1
Baruch Lev2
Xuan Peng3
Ting Sun4
Miklos A. Vasarhelyi5
Nov 15, 2019
1 Southwestern University of Finance and Economics; and Rutgers, the State University of New Jersey, [email protected]. 2 Stern School of Business, New York University, [email protected]. 3 Southwestern University of Finance and Economics; and Rutgers, the State University of New Jersey, [email protected]. 4 The College of New Jersey, [email protected]. 5 Corresponding author. Rutgers, the State University of New Jersey, [email protected]; Acknowledgement: The authors would like to thank Richard Sloan (the Editor), Stephen Ryan, and two anonymous referees for helpful comments and suggestions. Kexing Ding and Xuan Peng are thankful to Rutgers Continuous Auditing Research Lab for financial support.
1
Machine Learning Improves Accounting Estimates
ABSTRACT
Managerial estimates are ubiquitous in accounting: most balance sheet and income
statement items are based on estimates; some, such as the pension and employee stock options
expenses, derive from multiple estimates. These estimates are affected by objective estimation
errors, as well as by managerial manipulation, thereby adversely affecting the reliability and
relevance of financial reports. We show in this study that machine learning can substantially
improve managerial estimates. Specifically, using insurance companies’ data on loss reserves
(future customer claims) estimates and realizations, we document that the loss estimates
generated by machine learning were superior to actual managerial estimates reported in financial
statements, in four out of five insurance lines examined. We subject our findings to various
robustness tests and find that they hold. Our evidence suggests that machine learning techniques
can be highly useful to managers and auditors in improving accounting estimates, thereby
enhancing the usefulness of financial information to investors. Obviously, the generalizability of
our findings beyond insurance data remains to be examined by future research.
Keywords: machine learning, accounting estimates
2
Machine Learning Improves Accounting Estimates
I. INTRODUCTION
The PCAOB recently introduced a new standard for auditing accounting estimates,
stating:
“Accounting estimates are an essential part of financial statements. Most
companies’ financial statements reflect accounts or amounts in disclosures
that require estimation. Accounting estimates are pervasive in financial
statements, often substantially affecting a company’s financial position and
results of operations… The evolution of financial reporting frameworks
toward greater use of estimates includes expanded use of fair value
measurements that need to be estimated.” (PCAOB 2018, p.3)
Indeed, most financial statement items are based on managerial subjective estimates: fixed assets
are presented net of depreciation―an estimate―and accounts receivables, net of estimated bad
debts. Liabilities, like pensions and post-retirement benefits are estimates, and revenues from
long-term projects or from contracts with future deliverables include estimates. Many expenses,
such as the stock options or warranty expenses, also require estimates. Some items, like the
pension expense, are based on multiple estimates, some of which, such as the expected gain on
pension assets, are essentially guesses. How influential are these estimates? General Electric
recently provided an example in its 2016 report: of an $8.2 billion net income, a full $3.9 billion
(48%) came from a single change in estimated future profits from long-term customer contracts.
Cambridge Dictionary defines an estimate as: “A guess of what the size, value, amount,
cost, etc. of something might be.” Accounting estimates are not just guesses; as research has
shown, they are often strategically biased guesses.1 Financial statement manipulation is
frequently done through tweaking the assumptions underlying the estimates, because even if an
1 For example, Dechow, Ge, Larson, and Sloan (2007, Introduction) reported that: “Manipulations of reserves,
including the allowance for doubtful debt is also common, occurring in 10 percent of the sample. Manipulations of
inventory and cost of goods sold [both including numerous estimates] occurred in 25 percent of the sample.” See
also Li and Sloan (2017) on the manipulation of the timing of goodwill write-offs, yet another substantial estimate.
3
ex-post realization turns out to be far off the estimate, managers can always claim that the
deviation was due to events that were unknown to them at the time of estimation.2 “Monday
morning quarterbacking” is often managers’ response to questions about the quality of their
estimates.
Generally effective audit and control procedures, such as obtaining third party
confirmations of assets and liabilities, inventory counts, or conformity of financial data with
market values are inapplicable to estimates, which are opinions, rather than facts that can be
validated. Auditors assessing managerial estimates fall back on weak procedures, like reliance on
the firm’s internal controls, or examining the reasonableness of the assumptions underlying the
estimates. By and large, accounting estimates are very difficult to audit, and even harder for
investors to assess their quality. There is, therefore, an urgent need to provide both managers and
auditors with an independent, free-of-bias estimates generator to be used as a substitute for, or as
a benchmark for managerial estimates.
Machine learning, quickly spreading in diverse areas, such as credit-card fraud detection,
medical diagnosis, and customer recommendation systems, has the potential of providing such an
independent estimates generator. We, accordingly, explore in this study the use of machine
learning (ML) in generating accounting estimates. In particular, we demonstrate, using insurance
companies’ data on estimates and realizations of “loss reserves” (estimates of future claims
related to current policies), that loss estimates derived from machine learning are, with a few
exceptions, superior to the actual managerial loss estimates underlying financial reports. We thus
establish, for the first time, the potential of ML to independently assess the reliability of
2 The fact that the subsequent realizations of most accounting estimates are not even disclosed in financial reports
increases further managerial incentives to manipulate financial items through estimates.
4
estimates underlying financial reports, thereby improving the quality and usefulness of financial
information. Furthermore, ML has the potential to substantially improve auditors’ ability to
evaluate accounting estimates, thereby enhancing the usefulness of financial information to
investors.
How did accounting, which used to be primarily based on facts (after all, accounting
comes from counting: units of money, inventory, etc.), become so totally reliant on managerial
estimates? The defining event was the significant policy shift made by the FASB in the late
1970s, adopted later by the IASB, from an income statement to a balance sheet standard-setting
model. This and other developments led the FASB to reject in the late 1970s the revenue-cost
matching as the main standard-setting model in favor of the balance sheet model, which calls for
the continuous valuation of assets and liabilities at fair value―essentially, a large-scale process
of estimation. Indeed, the standard-setting shift from the income statement to the balance sheet
model opened the way to the most consequential estimates-intensive accounting standards
enacted in the past three decades: Primarily, fair value accounting, assets and goodwill
impairments, stock option expensing, and revenue recognition. Research indeed relates the
dramatic rise in the number and impact of accounting estimates primarily to the various fair
value and asset write-off standards (Chen and Li 2017). Relatedly, Khan et al. (2017)
documented that estimates-intensive FASB standards detract from shareholder value.
There is, therefore, an urgent need to enhance the quality of accounting estimates and
auditors’ ability to independently evaluate the reliability of these estimates. Machine learning, as
demonstrated in this study, offers a unique opportunity to perform this task.
The order of discussion is as follows: Section II provides an overview of the background
of machine learning in accounting and reviews the literature in the insurance claims loss
5
estimation. Section III and Section IV discuss the machine learning algorithms used in this study
and the application of machine learning to generate insurance companies’ loss estimates,
respectively. We present our sample in Section V, while Section VI provides the empirical
results concerning the machine learning estimates compared with managers’ estimates.
Additional analyses on estimation errors are presented in Section VII. Finally, Section VIII
concludes the study.
II. BACKGROUND
1. Artificial Intelligence and Machine Learning in Accounting
The “management coefficients theory” proposed by Bowman (1963) highlights the
potential of automatic decision-making systems to generate consistent and reliable decisions.
Accordingly, researchers have explored the applications of mathematical models in the decision-
making process and found that statistical models frequently generate more reliable decisions and
forecasts than humans, especially when the latter rely on their intuition (e.g., Hoffman 1960;
Goldberg 1970; Dawes 1971). Building on these findings, researchers and practitioners have
developed computer-based techniques to aid and sometimes even replace human judgment (e.g.,
Dawes and Corrigan 1974). With the progress of computational capability and introduction of
advanced algorithms, business enters a higher level of automation, and the concept of artificial
intelligence spreads rapidly in the era of ‘big data’. Furthermore, instead of programming machines
apriori with known formulas or rules to solve problems, it is now possible to let machines learn
the rules on their own. By design, this data-driven process is more opaque than traditional closed-
form models that are usually guided by theories. Machine learning, generally defined as an
automatic strategy that optimizes a performance criterion through learning from experience or
examples (Alpaydin 2010), is now often used to deal with very complex data.
6
When used as a predictive tool, machine learning techniques have wide applications in
many domains. In accounting, the exploration of artificial intelligence applications started in the
1980s and covered a variety of tasks from financial report analyses to audit and assurance cases
(e.g., Hansen and Messier 1982; Murphy and Brown 1992; Lam 2004). For example, machine
learning models were developed to predict or detect firm events, such as corporate failure or
bankruptcy (see O’Leary 1998; Huang et al. 1994; Bell 1997; Zhang et al. 1999; McKee and
Lensberg 2002; Pendharkar 2005; Ding et al. 2019), bond ratings (Maher and Sen 1997), and
management fraud (Fanning and Cogger 1998). In financial market studies, Galeshchuk and
Mukherjee (2017), and Parot et al., (2019) used machine learning techniques to predict foreign
exchange rate changes and achieved remarkable forecasting accuracy (also see Dunis and Huang
2002; Dhamija and Bhalla 2011; Rivas et al. 2017). Furthermore, Cao and Tay (2003) showed the
promising performance of support vector machine (SVM) in predicting futures asset price
movements. Other machine learning models, such as artificial neural networks also performed well
in stock price predictions (e.g., Leigh et al. 2002; Zhang and Wu 2009).
Overall, increasing evidence indicates the potential of applying machine learning
techniques to perform complicated tasks in finance and accounting, and artificial intelligence has
been suggested as critical to the development of the accounting profession in the future (Elliot
1992). Along with previous studies developing systems to conduct audit tasks, including analytical
review procedures (e.g., Koskivaara 2004), risk assessment (Lin et al. 2003; Davis et al. 1997),
and going concern decisions (Lenard et al. 1995; Etheridge et al. 2000), we investigate the
application of machine learning in accounting in this paper. Focusing on the insurance industry,
we generate model estimates for a particular line item—insurance claims losses, and find that the
7
machine learning models provide more accurate estimates than managers’ estimates, and the
machine estimates are less affected by managerial incentives.
2. Errors in Insurance Claims Loss Estimation
Insurance companies provide protection to policyholders from certain risks that occur
within a predefined period. While insurers receive the policy premium payments before or early
during the period of coverage, the full costs of their activities―the total losses or claims by
policyholders―usually remain unknown for several years after the coverage period ended.
Insurance regulations generally require insurers to provide “management’s best estimate” for these
future claims in financial reports, and to disclose the gradual settlement of loss claims in the
following years. In other words, insurers match the ex-post payoffs directly to the year in which
the initial estimate was made and the related insurance premium revenue recognized. This setting
enables researchers to compare managers’ reported loss estimates to their eventual realizations
(Petroni 1992).
The unpaid component of the estimated future losses (claims) is the insurance loss reserve.
The loss reserve is often the most significant component in property and casualty (P/C) insurance
firms’ liabilities: it is reported that, on average, loss and loss adjustment expense reserves make
up approximately two-thirds of an insurer’s liabilities (Zhang and Browne 2013). Managers’ loss
estimation process is obviously subjective and requires considerable judgment because not all
claims for accidents that occur during a year are filed and settled by year-end. A substantial amount
of losses incurred may be “Incurred But Not Reported Claims”, in which case the insurance
policyholders do not report the losses to insurance firms by the end of the current year, but file the
claims in later years. In addition, after the claims are filed, the final cash settlement may take years
to complete. For example, injuries in a car accident may lead to several years of treatment and
8
result in extended payments. Thus, insurance firms must estimate the costs of claims filed during
the year as well as claims that are related to the current year but will be filed in subsequent years.
Given the material impact of loss estimation on insurance firms’ financial results and
condition, auditors, investors, and regulators are naturally concerned with the quality of the
estimates reported by managers. Barely do the initial loss estimates equal the ultimate losses paid,
thereby generating estimation errors. Previous studies classified the errors in estimation to two
major types: nondiscretionary and discretionary (e.g., McNichols 2000; Beaver and McNichols
1998). Nondiscretionary errors arise from the uncertainties in predicting future events. The
difficulties may be related to the economic conditions affecting the firm or the characteristics of
the business conducted (e.g., auto liability, homeowner/farmowner insurance). Discretionary
errors, on the other hand, are the result of intentional management actions (e.g., Beaver and
McNicoles,1998). Studies have examined the reasons for managers to manipulate loss reserves
and identified several incentives: tax deferral (Grace 1990; Petroni 1992; Nelson 2000), income
smoothing (Anderson 1973; Smith 1980; Weiss 1985; Beaver et al. 2003), solvency and regulatory
concerns (Forbes 1970; Petroni 1992; Nelson 2000; Gaver and Paterson 2004), and executive
compensation incentives (Browne et al. 2009; Eckles and Halek 2010; Hoyt and McCullough
2010). Anderson (1971) analyzed the insurance industry over the period 1955 to 1964 and
documented that insurers overestimated losses heavily in early times but gradually reduced the
degree of over-reserving to slightly under-reserving. However, Bierens and Bradford (2005) found
that insurance firms from 1983 to 1993, generally, tended to over-reserve. Grace and Leverty (2012)
utilized a more recent sample (1989 to 2002) and found that firms generally overestimated losses,
but there was considerable variation in insurers’ practices.
9
In sum, the findings of prior studies indicate the existence of considerable errors in loss
estimates and relate a certain portion of these errors to intentional management of earnings and
other financial items. In this study, we propose using machine learning techniques to generate
improved loss estimates.
III. MACHINE LEARNING TECHNIQUES
1. Machine learning algorithms
We ran in this study “horseraces” of four popular machine learning algorithms to predict
future insurance losses for five business lines and selected the algorithm with the best predictive
accuracy among those examined. 3 The model-generated predictions were then compared to
managers’ estimates included in financial reports. The four algorithms we used are: linear
regression, random forest, gradient boosting machine, and artificial neural network. We briefly
discuss each ML model used.
1) Linear regression
Some algorithms that were initially developed in the statistics field are “borrowed” by
machine learning for prediction purposes (Brownlee 2016a). Linear regression is a typical example.
When used as a machine learning algorithm, linear regression is a supervised learning method that
makes predictions based on the linear relationship between the numeric output and numeric input
attributes (Friedman 2001; Bishop 2006). The learning process estimates the coefficients of the
input attributes and aims to produce a prediction model that minimizes the mean squared error
between the prediction and the true value.
3 “Horseraces” refers to comparing the accuracy of machine learning algorithms in predicting insurance losses for
each business line of insurance.
10
To select data attributes for linear regression, we used the M5 method: In each iteration,
the attribute with the smallest standardized coefficient is selected and removed, and another
regression estimation is performed (Witten et al. 2011). If there is an improvement in the model
predictive accuracy in terms of the Akaike information criterion,4 the attribute is eliminated. This
process is repeated until no improvement is observed. We used the Cartesian grid searching
approach to configure optimal hyper-parameters5 for the linear regression model development as
well as for all other three models. Grid search methodically builds and evaluates a model through
each possible combination of a specified subset of hyper-parameters. For linear regression, the
hyper-parameters are the learning rate and the number of iterations. The grids of hyper-parameter
values we have tried are listed in Table 1. The learning rate, or step size, determines the amount
that the model weights are adjusted during training. The number of iterations is the number of
batches needed to complete one epoch, where an epoch is completed once an entire dataset is
passed forward and backward through the learning model. In other words, an epoch is the complete
cycle of an entire training data learned by a model.6
- Table 1 here -
2) Random forest
The random forest algorithm is derived from decision trees, which is a machine learning
technique that extracts information from data and displays it in a tree-like structure. A decision
tree consists of three components: a node, a branch, and a leaf: Each root node of the tree denotes
4 Akaike information criterion (AIC) is an estimator of the relative quality of a model named after the statistician
Hirotugu Akaike. 5 Hyper-parameter is a parameter whose value was set before the learning process begins. Setting up the value of a
hyper-parameter controls the process of defining the model. 6 For more information about learning rate and iteration, please see: https://towardsdatascience.com/epoch-vs-
iterations-vs-batch-size-4dfb9c7ce9c9
11
an input attribute; the tree splits into branches based on the input attributes, with each branch
representing a decision. The end of the branch is called a leaf, and each leaf node leads to a
prediction of the target value. Decision trees can be applied to either regression or classification
problem. We employed regression trees because the target variable (insurance losses) has
continuous values. A single decision tree could have limited capabilities to learn the data, whereas
random forest improves the accuracy of the decision trees with the ensemble technique (Breiman
2001, 2002). Specifically, the random forest algorithm first generates a “forest” of decision trees;
each tree uses a subset of randomly selected attributes. It then aggregates over the trees to yield
the most efficient predictor. In general, the higher the number of trees, the better the predictive
performance of the model. However, as adding too many trees can considerably slow down the
training process, after a certain point, the benefit in prediction performance from using more trees
will be lower than the cost of computation time for these additional trees (Ramon 2013).
The hyper-parameters for grid searching are the number of trees, the maximum depth of
the tree, and the minimum of the rows. The maximum value of tree depth represents the depth of
each tree in the random forest; A deeper tree will have more branches from the node to the root
node and capture more information about the data. However, as the tree depth increases, the model
may suffer from an overfitting problem because it captures too many details (Brownlee 2016b).
The third hyper-parameter refers to the minimum number of observations for a leaf. Although a
smaller leaf makes the model more prone to capturing noise in training data, a too small leaf size
may result in overfitting. On the other hand, if we choose a leaf size that is too large, the tree will
stop growing after a few splits, resulting in poor predictive performance. Table 1 provides the grids
of values that we have tried in the random forest.
12
3) Gradient boosting machine
Gradient boosting machine (GBM) uses an ensemble technique termed boosting to train
new prediction models with respect to the errors of the previous models, and converts weak
prediction models to stronger ones (Schapire 1990). The objective of boosting is to minimize the
model errors by adding weak learners (i.e., regression trees). After adding new trees, the learning
procedure subsequently corrects for errors made by previous trees and improves predictions to
reduce the residuals in a gradient descent manner (Friedman 2001; Mason et al. 2000). After the
number of trees reaches a limit, adding more trees will no longer improve the prediction
performance (Brownlee 2016b). The grid searching approach for GBM was the same as that for
the random forest, except that we started with five as the maximum tree depth to ensure that the
learners were weak but can still be constructed in the gradient boosting machine. The grids of
values tried are reported in Table 1.
4) Artificial neural networks
An artificial neural network consists of multiple layers of interconnected nodes between
the input and output data. Each layer transforms its input data into a more abstract representation,
which is then used as the input data by the next layer to produce representation. An artificial neural
network has three types of layers: input, hidden, and output layers. The input layer receives the
raw data of explanatory variables, and the number of nodes in the input layer equals the number
of the explanatory variables. Next, the input layer is connected to hidden layers, which apply
complex transformations to the incoming data and transmit the output to the next hidden layers.
The data processing is performed through a system of weighted connections: the values entering a
hidden node are multiplied by certain predetermined weights, and the weighted inputs are then
added to produce a single output. There may be one or more hidden layers. A neural network is
13
called deep when more than two hidden layers exist. The output of the final layer (called the output
layer) represents the extracted high-level information from the raw data (Sun and Vasarhelyi 2017).
We normalized all input variables to improve the model performance and used several different
values of hyper-parameters for grid searching. The details are provided in Table 1.
2. Data splitting and performance validation
We employed a training/validation/testing approach to partition our data. For each machine
learning algorithm, we fit the model based on a training dataset and tested it on a different set for
validation using the five-fold cross-validation method. Then, we utilized a new test dataset, known
as the holdout dataset, for robustness tests of the developed and validated models.
The five-fold cross-validation method is a widely-used technique in machine learning to
estimate a model’s performance. It is a resampling procedure used to evaluate the performance of
machine learning models on a limited data sample. Specifically, the data is shuffled randomly and
split into five groups. Each group is taken as a validation set, and the remaining four groups
combined are used as a training set. A model is developed based on the training set and evaluated
on the validation set using various evaluation metrics. The results are averaged to produce a unique
estimation. With this method, all observations are used for both training and validation, with each
observation used only once for validation.
We used a separate holdout sample to evaluate the practical usefulness of the machine
learning models. Specifically, we employed the models developed from the cross-validation
process to produce loss estimates for the holdout period following the training and validation
period. Observations in the holdout period have not been used in the model development process.
Figure 1 illustrates the splits. This method has two advantages in our setting. First, cross-validation
generally results in a less biased and more robust estimate of the model accuracy than a simple
14
train/test split method (Brownlee 2018). Second, the training/validation/testing approach alleviates
the potential problems introduced by the inherent time-series order of the data.
- Figure 1 here -
IV. APPLICATION TO INSURANCE LOSS ESTIMATION
We now explore the use of machine learning techniques in generating loss estimates for
insurance companies. We conducted two sets of tests for each line of insurance (private passenger
auto liability, commercial auto liability, etc.). The first set of tests did not include managers’ loss
estimates as an input attribute, keeping only variables that are not directly affected by managerial
judgement. In other words, the variables used in these tests are based on facts, such as the number
of outstanding claims, the number of loss claims closed with and without payment, etc. In the
second set of tests, we added managers’ initial estimates to the machine learning models. This
design enables us to evaluate the performance of machine learning techniques on a stand-alone
basis, as well as their incremental improvement over managers’ estimation. Moreover, because
firms may have experienced different loss claim and payment patterns during the 2008 financial
crisis, we constructed three test sets using different periods (See Figure 2).
- Figure 2 here -
1. Insurance Company Data
U.S. insurance companies follow the Statutory Accounting Principles (SAP)7 to prepare
statutory financial statements. Managers are required to provide the initial estimate for all losses
(payment to insured) incurred in the current year (paid, as well as expected to be paid in the future)
7 State laws and insurance regulations require that insurance companies operating in the United States and its
territories prepare statutory financial statements in accordance with Statutory Accounting Principles (SAP). SAP is
designed to assist state insurance departments to regulate insurance companies’ solvency.
15
as well as re-estimating the losses incurred in each of the previous nine years. In the statutory
filings, insurance firms report the estimated total losses as “incurred losses" and the actual
cumulative payments in each of the past ten years (including the current year). The difference
between the reported incurred losses and the cumulative paid losses for a given year is the reserve
for future loss payment.
As mentioned above, an advantage of the insurance industry data is that the extensive
reporting requirements under SAP make it possible to match the actual payoffs to insured directly
to the year in which the initial estimate was made. Table 2 provides an illustrative example of this
disclosure. Panel A shows the development of incurred losses for the National Lloyds Insurance
Company. By the end of 2008, the estimates for the most recent ten years (including the current
year) are disclosed in Column 10. For example, in 2008, the current estimates for the losses
incurred in the years 2007 and 2008 were $24,334 thousand and $36,893 thousand, respectively.
This is compared with the initial estimate for the losses incurred in 2007, which was $24,226
thousand (Column 9), meaning that Lloyds revised up the estimated losses incurred in 2007 by
$108 thousand ($24,334 - $24,226). Panel B reports the cumulative paid losses for each accident
year, from 1999 to 2008, by the end of 2008. For example, by the end of 2008, the company has
paid $32,324 thousand for the losses incurred during the year 2008, and $23,585 thousand for the
losses incurred during 2007 (Column 10). In this example, the cumulative paid losses and the
losses incurred for the year 1999 converge in 2006 and remain unchanged at $4,618 thousand
(Row 2), indicating that Lloyds likely paid off all the claims for accidents that occurred in 1999
by the end of 2006.
- Table 2 here -
16
2. Dependent Variable
Our dependent variable is the ActualLosses, which is the actual ultimate costs of insured
events occurring in the year of coverage. It is the sum of losses already paid in the coverage year
and the losses to be paid in the future. We chose total actual losses as our dependent variable
because they can be directly compared to managers’ “incurred losses” reported in annual reports.
Previous studies that examined insurance companies’ loss reserve errors measured the “actual
losses” as the cumulative losses paid during several subsequent years (Weiss 1985; Grace 1990;
Petroni and Beasley 1996; Browne et al. 2009; Grace and Leverty 2012). We measure in this study
the ActualLosses for an accident year t as the ten-year cumulative payment of losses incurred in
year t. This variable is extracted from the financial report in year t+9, which is also the last time
managers disclose the loss payment for the initial accident year t.8 For losses incurred in each
business line during a given accident year, we generated only one estimate based on the
information available at the end of this year, and the model estimates were compared to managers’
initial predictions made in year t.
3. Independent Variables
Our independent (predictor) variables consist of information already known at the time of
estimation, i.e., year t (no look-ahead bias). We included three sets of independent variables. First,
operational variables (e.g., claims outstanding, premiums written, premiums ceded to reinsurers)
for each business line were obtained from Schedule P of the statutory filings. The second set of
variables are company characteristics (e.g., total assets, State of operation) for the accident year.
8 We acknowledge the possibility that loss claims may take longer than ten years to settle, in which case the actual
losses are not fully captured by the cumulative payments in yeart+9. To alleviate the concern, we first discuss the
payment patterns of each business line studied in next section. Second, we used the manager estimates in yeart+9 as
our proxy for actual losses instead, and the main inferences remained unchanged.
17
Finally, we added exogenous environmental variables (personal consumption, GDP growth) that
reflect the macro-economic factors that may influence the payment for the accident year.
Definitions of all the independent variables are provided in Appendix A.
4. Evaluation Metrics
We used the mean absolute error (MAE) measure as the primary evaluation metric to
compare estimates with actuals. MAE is the average of the absolute differences between
predictions and actual observations. Compared to RMSE (root mean square error), MAE is easier
to interpret and less sensitive to outliers:
!"# = %&∑ |)*+,-./+,0 − !23,/#4567.5,0|&08% (1)
We also report results for the RMSE, which gives higher weights to large errors. The model
with the highest prediction power typically has a relatively small :!;#. We calculate the RMSE
as:
:!;# = <%&∑ ()*+,-./+,0 − !23,/#4567.5,0)?&08% (2)
We compared the machine learning loss predictions with managers’ estimates to evaluate the
machine’s performance, using the MAE or the RMSE.
V. OUR SAMPLE
We used the annual reports of US-based property-casualty insurance companies filed with
the National Association of Insurance Commissioners (NAIC). The data were extracted from the
SNL FIG website (S&P Global Market Intelligence platform), covering the period 1996 to 2017.
18
PC (property & casualty) insurers offer a wide variety of insurance products that cover
many different business lines, with each line having unique operating characteristics. The
following screening was separately performed on each business line to obtain the test samples:
1) For each business line, we first identified the insurance companies that had conducted
business in this line. To be included in the sample, the firm must have started this line’s
business before 2008 and remained active until 2017, so that we can extract the ultimate
payment (over ten years) for at least one accident year.
2) Firm-years with missing (zero) values for all operational variables were deleted from the
sample. If total asset, total liabilities, net premiums written, or the direct premiums written
were zero or negative, the firm-year was also excluded from the sample.
3) For each business line, we only kept observations that had positive total premiums and
cumulative paid losses.
We focused on five business lines: (1) private passenger auto liability, (2) commercial auto
liability, (3) workers’ compensation, (4) commercial multi-peril, and (5) homeowner/farmowner
line. The five lines were selected from the 20 business lines identified by NAIC primarily because
these business lines had a sufficiently large number of observations remaining after the screening
(more than 400 insurers), indicating significant operations in the business lines. We excluded
minor lines such as special liability, products liability, and international line, together making up
less than 5% of industry loss reserves (A. M. Best 1994). In the final sample, we have a total of
32,939 line-firm-year observations for all five business lines combined, with each line’s number
of observations provided in Table 4.
- Table 3 here -
19
Table 3 reports the payment patterns for each business line. The “tail” is an insurance
expression that describes the time between the occurrence of the insured event (accident) and the
final settlement of the claim. A longer tail usually implies higher uncertainty regarding the estimate
of ultimate losses. As indicated in Table 3, for the private passenger auto liability line, 40.64% of
the ultimate losses are paid off during the initial accident year, and 99.82% of the total payments
throughout the ten years are made during the first five years. The homeowner/farmowner business
line (bottom line) has a payment pattern that differs from the private passenger auto liability line
(top line). For homeowner/farmowner policies, the insurers pay off 72.62% of the ultimate losses
after the first year, and 93.50% of the loss payments are made by the end of the second year. This
suggests that the homeowner/farmowner business line has a relatively short tail compared to the
other four lines, consistent with prior studies that investigated the tail characteristics of insurance
business lines (Nelson 2000). Overall, for all five business lines, the majority of payments are
made during the first five years after the initial accident year.
Table 4 reports summary statistics for firms that operate in each business line. The private
passenger auto liability insurance is the largest business line in dollar terms, with total premiums
written of $142 million on average, followed by the homeowner/farmowner line, and workers’
compensation line. The total premiums written in the private passenger auto liability line during
2015 amount to $199.37 billion, making up 34% of all the PC insurance business.9 The commercial
auto liability and workers’ compensation are usually provided by larger insurance companies, with
average assets of $1,467 million and $1,437 million, respectively. In general, managers
overestimate the ultimate losses when they report their future loss projections for the first time,
except in the commercial auto liability line.
9 Insurance Information Institute, The Insurance Fact Book 2017.
20
- Table 4 Here -
VI. EMPIRICAL RESULTS
In this section, we first report the five-fold cross-validation machine learning results for
each business line and then present the holdout test results.10 For each machine learning algorithm,
we developed models with and without managers’ loss estimates as an input attribute, and the
results are reported separately. We then compared the model-generated estimates to managers’
predictions in financial reports. For example, the Sentry Select Insurance Company initially
estimated the total losses incurred for the year 2006 in the private passenger auto liability line as
$49,127 thousand, while the actual losses were $41,787 thousand.11 In the cross-validation process,
the random forest algorithm without managers’ loss estimates generated an estimate of $43,657
thousand, and the prediction was $41,990 thousand with managers’ estimates included in the
model. Thus, both machine models generated estimates which were substantially closer to the
actual losses than managers’ prediction. Moreover, in the holdout tests, where we used the model
developed from 1996 to 2006 to predict the losses incurred in 2007, the random forest models with
and without manager estimates predicted $38,953 thousand and $40,996 thousand, respectively.
The managers’ estimate was $43,650 thousand for the same year, while the actual losses turned
out to be $39,791 thousand, suggesting that the machine learning predictions were, again, more
accurate than the estimate provided by the firm.
We used the MAE and RMSE metrics to evaluate model performance. Overall, we found
that the random forest algorithm produced good predictions when the insurance business line has
long tails, whereas linear regression performed well when the tail is short and the business line is
10 Holdout test sample contains observations that have not been used in the cross-validation test. 11 This example is drawn from the cross-validation sample for the period 1996-2006.
21
complicated. Thus, we report the linear regression prediction results for the
homeowner/farmowner line and present the random forest prediction results for the other four
lines.12 We also report the model accuracy edge relative to manager estimates. The accuracy edge
of a machine learning model is computed as managers’ estimation MAE (RMSE) minus model
estimation MAE (RMSE), divided by managers’ estimation MAE (RMSE).
1. Five-fold cross-validation
As illustrated in Figure 2, we used three samples to train and validate the machine learning
models: the 1996-2005, 1996-2006, and 1996-2007 samples. Results are reported in Table 5 Panel
A. For each of the five business lines examined, we show the MAE and RMSE metrics of
managers’ estimation and the machine learning estimations, as well as the corresponding accuracy
edges. For example, managers’ estimation MAE (RMSE) for the private passenger auto liability
line (line no. 1, first row) during the period 1996-2005 were 9,461 (37,494). The random forest
algorithm without managers’ estimates yielded an MAE (RMSE) of 7,526 (36,694)� having an
accuracy edge of 20% (2%) over managers’ estimates. In the samples of 1996-2006 and 1996-
2007, the MAE and RMSE of the random forest estimation are both lower than those of managers,
indicating the higher prediction accuracy of machine learning. In the commercial auto liability line
(line no. 2), the random forest estimates were, on average, 26% (35%) more accurate than
managers’ predictions in terms of MAE (RMSE). When predicting losses for the workers’
compensation line (line no. 3), the random forest model exhibited a considerable accuracy edge:
on average, its MAE (RMSE) was 46% (39%) lower than managers’ estimation. Turning to the
commercial multi-peril line (line no. 4), the average accuracy edge of the random forest model
12 The results of cross-validation tests and holdout sample tests of the other machine learning models are provided in
Appendices B and C.
22
measured by the MAE (RMSE) was 25% (30%) relative to managers. Overall, the cross-validation
analyses of the first four business lines indicate that the random forest algorithm produced more
accurate estimations than managers.
- Table 5 here -
After including managers’ loss estimates as an additional attribute, machine learning
models continue to produce more accurate estimates than managers in these lines. The results are
reported in the right-hand side of Table 5. When estimating losses in the private passenger auto
liability line (line no. 1), the random forest algorithm had lower prediction errors than managers.
The MAE (RMSE) comparison results suggest that the random forest model achieved 29% (13%)
increase in estimation accuracy relative to managers. In the commercial auto liability (line no. 2)
and commercial multi-peril lines (line no. 4), the MAE and RMSE results suggest that the random
forest model achieved an increase in predictive accuracy by using managers’ estimates as inputs.
The accuracy edge of the random forest in the workers’ compensation line (line no. 3) was the
most significant― 50% (42%), based on the MAE (RMSE) on average.
In the homeowner/farmowner line (no. 5), however, managers outperformed the machine
learning models when managers’ loss estimates were not included in the models, which was the
only exception to the dominance of machine learning over managers in cross-validation analyses.
Consulting industry experts about the homeowner/farmowner exception, a possible explanation is
that the homeowner/farmowner line contains unique types of losses, such as catastrophes, property
damage, and bodily injury. These loss categories are not differentiated in firms’ financial reports.
Mixing up different loss types is problematic because different loss categories have their unique
payment patterns, which are intimately known to managers but not available in the development
of the machine learning prediction models. In addition, the homeowner/farmowner line has a
23
relatively short tail, implying that the majority of losses (claims) are reported and paid off during
the first year (See Table 3). In this case, the total losses for most homeowner/farmowner accidents
are already known to managers by year-end, making it challenging for machine learning models
to outperform managers. After we added managers’ loss estimates as an attribute to estimate losses
for the homeowner/farmowner line, we found the linear regression algorithm performed well. Its
MAE was lower than that of managers’ estimates in all samples, achieving an average accuracy
edge of 22%.
Collectively, our results suggest that machine learning models generate more accurate loss
predictions than managers in most circumstances. Furthermore, we found that, in general, models
incorporating ManagerEstimate have higher predictive accuracy compared to the models without
it.
To provide insight into the machine learning blackbox, we present in Table 5 Panel B the
top three significant variables identified by the random forest algorithm for each business line, and
the model MAE and RMSE after dropping these variables.13 For example, the premiums written
in the current accident year (LinePremiums) is the most powerful predictors for the random forest
algorithm to estimate losses in the private passenger auto liability line. After excluding
LinePremiums, the MAE (RMSE) of the random forest estimation increased from 7,526 (36,694)
to 7,824 (36,535), showing a decline in prediction accuracy. Direct premiums written during the
accident year (DPW) and total payment for the current accident year (LinePayment) are the second
and third most influential variables, respectively. The MAE (RMSE) was 7,859 (39,815) without
13 We thank an anonymous reviewer for suggesting this test. For brevity, we only report the variable importance for
models trained during the period 1996-2007 and without managers’ estimates due to the significant overlap of
important variables among these situations. The untabulated results suggest that ManagerEstimate (Managers’
estimation results) is an important predictor and including it generally increases the model accuracy. In addition, the
important variables for each line appear to be constant over the years.
24
LinePremiums and DPW, and 8,206 (46,073) without LinePremiums, DPW and LinePayment.
Similarly, in other lines, dropping the influential variables reduced the models’ prediction accuracy
as measured by both the MAE and RMSE. Overall, we observed that several predictors, such as
PaidLoss, LinePayment, and LinePremiums, consistently play an important role in generating
model predictions.
2. Holdout tests
Our holdout tests examined the predictive accuracy of machine learning models on holdout
samples―a more demanding prediction test. Specifically, the models developed from the cross-
validation process using the 1996-2005 sample were used to predict losses in 2006, and the models
estimated from the 1996-2006 (1996-2007) sample were employed to predict the losses in 2007
(2008).
- Table 6 here -
Table 6 Panel A reports the holdout test results. When ManagerEstimate was not included
in the model, the random forest algorithm consistently generated more accurate estimates than
managers’ estimation in the private passenger auto liability line (no. 1) based on the MAE analyses.
The RMSEs of the random forest algorithm’s prediction were similar to managers’ for the 2006
and 2007 holdout samples, and exhibited an accuracy edge of 17% for the 2008 holdout sample.
After we added ManagerEstimate to the model, its performance measured by both MAE and
RMSE was more accurate than managers’. In the commercial auto liability line (line no. 2), the
random forest without ManagerEstimate had smaller estimation errors compared to managers.
After we added ManagerEstimate to the model, its performance was further improved. In the
workers’ compensation line (line no. 3), the random forest predictions, with and without managers’
estimates, were markedly more accurate than managers’ estimation. Its accuracy edge over
25
managers was more than 50% for the 2006 and 2007 samples, although the edge became smaller
in 2008. Results of the commercial multi-peril line (line no. 4) suggest that the random forest
models outperformed managers in 2006 and 2007, but were surpassed by managers in 2008. The
model performance downturn in 2008 for the worker’s compensation and commercial multi-peril
lines might be caused by the economic turbulence during the financial crisis years, 2007-2009,
which interrupted the patterns of insurance loss payments. 14 Finally, the analyses of the
homeowner/farmowner line (no. 5) indicate that managers’ predictions were more accurate than
those of the linear regression when ManagerEstimate was not incorporated in the model. After
adding ManagerEstimate as an independent variable, the linear regression predictions improved
greatly, as indicated by the smaller values of RMSE than managers’ in all three holdout samples.
We used the bootstrap technique to examine the statistical significance of the difference in
prediction performance between machine learning models and managers.15 The bootstrap method
uses the existing sample to create a large number of simulated samples that can be used to estimate
the distribution of the performance difference. Specifically, we used the bootstrap to construct
10,000 simulated samples for each of our holdout samples, and for each of these bootstrap samples,
we computed the prediction difference as the average of the differences between managers’ and
the machine learning models’ prediction errors. These differences varied across the simulated
samples and formed a distribution. Table 6, Panel B reports the bootstrap mean of the prediction
difference, its standard error, and the result of a significance test of the prediction difference based
on the standard error. Overall, the differences between managers’ estimates and the machine
14 The National Bureau of Economic Research (NBER) marks December 2007-June 2009 as a peak recession
period. 15 We thank an anonymous reviewer for this suggestion.
26
learning predictions in the bootstrap analyses were consistent with the analyses using the holdout
samples.
Taken together, our results indicate the usefulness of machine learning models in
estimating insurance losses, particularly for long-tail lines of insurance business. In addition, the
random forest algorithm consistently shows superior prediction accuracy for long-tail business
lines and the linear regression performs better when the claims tail is short. Thus, it is essential to
understand the economics of a business line before applying a model to predict its losses.
Furthermore, leveraging the information in managers’ estimates enhances the prediction
performance of machine learning models.
VII. ADDITIONAL ANALYSES: ESTIMATION ERRORS
This section provides detailed analyses into managers’ and machine learning’s estimation
errors. We started by examining managers’ estimation errors and the rationales identified by prior
literature. Then, we investigated whether the ML model estimation errors were affected by the
incentives that lead to managers’ biases. In this section, managers’ estimation error
(ManagerError) is defined as the reported loss estimate minus the actual loss, divided by total
assets. Similarly, ML model estimation error (ModelError) is the model estimate minus the actual
loss, divided by total assets. Since the random forecast model performed well in most business
lines, we used the holdout prediction results generated by the random forest models for the years
2006, 2007, and 2008, to calculate model estimation errors. Table 7 compares the errors of
managers’ estimates to those of models’ estimates. On average, the model estimates were more
accurate than manager estimates: the average absolute estimation error of machine learning models
with (without) manager estimates as an input attribute was 0.0092 (0.0106), while the average
manager estimation error was 0.0128. The difference is significant at 1% level. In addition, the
27
managers’ estimation errors were more positive than model errors, suggesting that managers
tended to overstate insurance losses. The results are generally consistent with previous studies that
investigated insurance loss estimation errors (e.g., Grace and Leverty 2012).
1. Managers’ Estimation Errors
As discussed in Section II, insurance companies might manage (manipulate) loss estimates
to achieve certain objectives, like misreporting future loss estimates due to tax deferral incentives.
Because determining the taxable income involves loss estimates, insurers tend to shelter earnings
by overestimating loss reserves so that they can reduce current tax liabilities. Accordingly, we
measured insurers’ tax incentives with the variable TaxShield, which takes a larger value if the
insurer has a higher tax reduction incentive (e.g., Grace 1990):
).@;ℎ6,/3B =C,5DEF27,B + H244:,4,*I,B
)25./"44,54
Second, we evaluated the impact of the income smoothing incentive on managers’
estimation errors. According to the literature, insurers prefer smoother earnings to reduce external
finance costs (Froot et al. 1993), address managers’ career concerns (Grace 1990), or alleviate
regulatory concerns (Grace 1990). We followed prior research and used an insurer’s average return
on assets (ROA) during the past three years as our proxy for the income smoothing incentive
(Smooth). The intuition is that following three previous “good” years, insurers tend to
underestimate loss reserves in the current year in order to inflate earnings and continue the positive
trend (Grace 1990). In addition to the Smooth variable, we utilized another indicator variable,
SmallProfit, to capture the incentive of income smoothing. Specifically, Beaver et al. (2003) found
that firms with low positive earnings are likely to boost reported income by understating the loss
reserves. Similar to Grace and Leverty (2012), we identified the firm-years in the bottom 5 percent
28
of the positive earnings distribution. We expected these firms to have understated insurance loss
reserves.
Financial distress also drives insurance firms to manage reserve estimates. Financially
weak firms tend to under-reserve losses to present firm growth (Harrington and Danzon 1994) and
avoid regulatory scrutiny (Petroni 1992; Gaver and Paterson 2004). Insurance Regulatory
Information System (IRIS) ratio is used by regulators to monitor insurance firms’ financial
condition. The NAIC provides a “usual range” for each ratio; and if the ratio falls out of the range,
a ratio violation occurs. Therefore, we set the variable Violation equal to 1 if the firm has at least
one IRIS ratio violation, and 0 otherwise. We expected that firms with IRIS ratio violations are
more likely to underestimate losses. In addition, we added an indicator variable, Insolvency, which
equals one if the Risk Based Capital (RBC) ratio is smaller than two, and 0 otherwise. The RBC
ratio is defined as the total adjusted capital divided by the authorized control level RBC, a
hypothetical minimum capital level determined by the RBC formula. Insurers are required to
maintain sufficient capital, and regulators may take actions if the RBC ratio is less than two.
We included a series of control variables in our regression analyses: The insurance firms’
organizational structure—Mutual (whether it is a mutual insurer), Group (whether it is associated
with a group), or Public company (whether the firm is publicly traded). We also controlled for
other firm and business line characteristics. Detailed control variable definitions are provided in
Appendix A. Finally, we included the year and business line fixed effects. The following
regression model was used to examine the incentives related to manager estimation errors:
!.E.J,*#**2* = αL + α%).@;ℎ6,/3 + M%;7225ℎ + M?-62/.562E4 +
MNSmallProfit + αY-62/.562E + MZDE42/I,EI[ + M\;6], + M^;7.//H244 +
29
M_`*2a65 + MbH244 + M%LH6E,46], + M%%:,6E4+*.EF, + M%?`+c/6F +
M%N!+5+./ + M%Yd*2+e + f6@,3#aa,F54 + g. (3)
The regression results are reported in the column (1) and (2) of Table 8. Consistent with
prior research, we found that firms with a stronger tax reduction incentive were more likely to
over-reserve, as indicated by the significant and positive coefficient of TaxShield (Coeff. = 0.052,
p<0.01). The significantly negative values for the coefficients of -62/.562E (Coeff. = −0.002,
p<0.01) and SmallProfit (Coeff. = −0.003, p<0.1) suggest that financially weak firms and firms
with greater incentives to smooth income tended to report underestimated insurance losses. The
findings support the literature that managerial incentives, including tax reduction, income
smoothing, and financial strength concerns, affect insurance firms’ reserve levels. We performed
this analysis here to provide a benchmark for the next, new analysis of machine learning errors.
2. Model Estimation Errors
In this study we are particularly interested in whether our machine model estimates are less
affected by managers’ incentives, as one would expect. Therefore, we re-run equation (3), but
replaced the dependent variable with ModelError. If the model estimates were less affected by the
incentive biases than manager estimates, we would expect the coefficients of the incentive
variables in the model regressions to be insignificant. Column (3) in Table 8 reports the regression
results of the models without managers’ estimates as an input variable. The results indeed indicate
that the incentives that drive managerial estimation biases did not affect the model estimates, as
none of the incentive variables were statistically significant. However, when we used the models
including manager estimates, the coefficient of SmallProfit became significant at 5% level,
suggesting that incorporating managers’ estimates might also bring their biases into the models.
Overall, the results indicate that the influence of managerial incentives is hardly present in the
30
model estimation, and explain, in part, the model’s superior performance.16 Our findings imply
that machine learning models can to some extent alleviate the concerns with managers’ intentional
biases in making accounting estimation.
VIII. CONCLUSIONS
Managerial subjective estimates are endemic to financial information, and their frequency
and impact is constantly increasing, mainly by the expansion of fair value accounting by standard
setters. The adverse impact of managerial estimation errors, both intentional and unintentional, on
the quality of financial information is largely unknown (and unresearched), but it is likely high.
Undoubtedly, improvement in the quality and reliability of accounting estimates will substantially
enhance the relevance and usefulness of financial information.
Accounting estimates generated by machine learning are potentially superior to managerial
estimates because machines don’t manipulate, and they may use the archival (training) data more
consistently and systematically than managers. On the other hand, managers may include in their
estimates (forecasts) forward-looking information (e.g., on expected inflation, or the state of the
economy) that machines obviously ignore. Accordingly, the superiority of machines over humans
in generating accounting estimates is an empirical question, examined in this study.
Our results, based on a large set of insurance companies’ loss (future claim payments)
estimates, revisions, and realizations, indicate that with but one exception (homeowner/farmowner
insurance), our loss estimates generated by machine learning are superior to managers’ actual
estimates underlying financial reports, and in some cases, substantially superior. This is a
surprising and very encouraging finding, given the urgency to improve accounting estimates. At
16 We have re-estimated the regressions excluding the Workers’ Compensation line/using linear regression algorithm
predictions for the homeowner/farmowner line, and the results remain unchanged.
31
this early stage of applying machine learning to estimate accounting numbers, we don’t know how
general our findings are. Obviously, more research is needed to establish and generalize the use of
machine learning for other types of accounting estimates, such as bad debts and warranty reserves.
Accounting estimates generated by machine learning may have multiple uses in practice.
They can be used by managers and auditors as benchmarks against which managers’ estimates will
be compared, with large deviations suggesting a reexamination of managers’ estimates.
Alternatively, machine learning could be used to generate managers’ estimates in the first place,
enhancing the reliability (no manipulation) and consistency of accounting estimates. In any case,
the potential of machine learning, whose use is fast expanding in many other fields, to improve
financial information should be further researched.
32
REFERENCES
A. M. Best Company. 1994. Best’s Aggregates and Averages: Property-Casualty Edition.
Oldwick, NJ: A. M. Best Company.Alpaydin, E. 2010. Introduction to machine learning: MIT
press.
Anderson, D. R. 1973. Effects of loss reserve evaluation upon policyholders’ surplus. Madison,
Wisconsin: Bureau of Business Research and Service, University of Wisconsin, Monograph 6.
Anderson, D. R. 1971. Effects of under and overevaluations in loss reserves. Journal of Risk and
Insurance: 585-600.
Beaver, W. H., and M. F. McNichols. 1998. The characteristics and valuation of loss reserves of
property casualty insurers. Review of Accounting Studies 3 (1-2): 73-95.
Beaver, W. H., M. F. McNichols, and K. K. Nelson. 2003. Management of the loss reserve
accrual and the distribution of earnings in the property-casualty insurance industry. Journal of
Accounting and Economics 35 (3): 347-376.
Bell, T. B. 1997. Neural nets or the logit model? A comparison of each model’s ability to predict
commercial bank failures. Intelligent Systems in Accounting, Finance & Management 6 (3): 249-
264.
Bierens, H. J., and D. F. Bradford. 2005. Are Property-Casualty Insurance Reserves Biased? A
Non-Standard Random Effects Panel Data Analysis 1.
Bishop, C. M. 2006. Pattern recognition and machine learning: springer.
Bowman, E. H. 1963. Consistency and optimality in managerial decision making. Management
Science 9 (2): 310-321.
Breiman, L. 2001. Random forests. Machine Learning 45 (1): 5-32.
Breiman, L. 2002. Using models to infer mechanisms. IMS Wald Lecture 2.
Browne, M. J., Yu-Luen Ma, and P. Wang. 2009. Stock-Based Executive Compensation and
Reserve Errors in the Property and Casualty Insurance Industry. Journal of Insurance Regulation
27 (4): 35-54.
Brownlee, J. 2016a. How to Tune the Number and Size of Decision Trees with XGBoost in
Python. Machine learning mastery. https://machinelearningmastery.com/tune-number-size-
decision-trees-xgboost-python/.
Brownlee, J. 2016b. linear Regression for machine learning. Machine learning
mastery. https://machinelearningmastery.com/linear-regression-for-machine-learning/.
33
Brownlee, J. 2018. A gentle introduction to k-fold cross-validation. Machine learning
mastery. https://machinelearningmastery.com/k-fold-cross-validation/.
Cao, L., and F. E. H. Tay. 2003. Support vector machine with adaptive parameters in financial
time series forecasting. IEEE Transactions on Neural Networks 14 (6): 1506-1518.
Chen, J. V., and F. Li. 2017. Estimating the amount of estimation in accruals. Available at SSRN
2738842.
Davis, J. T., A. P. Massey, and R. E. Lovell II. 1997. Supporting a complex audit judgment task:
An expert network approach. European Journal of Operational Research 103 (2): 350-372.
Dawes, R. M. 1971. A case study of graduate admissions: Application of three principles of
human decision making. American psychologist 26 (2): 180.
Dawes, R. M., and B. Corrigan. 1974. Linear models in decision making. Psychological bulletin
81 (2): 95.
Dechow, P. M., W. Ge, C. R. Larson, and R. G. Sloan. 2011. Predicting material accounting
misstatements. Contemporary accounting research 28 (1): 17-82.
Dhamija, A., and V. Bhalla. 2011. Exchange rate forecasting: comparison of various
architectures of neural networks. Neural Computing and Applications 20 (3): 355-363.
Ding, K., X. Peng, and Y. Wang. 2019. A Machine Learning-Based Peer Selection Method with
Financial Ratios. Accounting Horizons.
Dunis, C. L., and X. Huang. 2002. Forecasting and trading currency volatility: An application of
recurrent neural regression and model combination. Journal of Forecasting 21 (5): 317-354.
Eckles, D. L., and M. Halek. 2010. Insurer reserve error and executive compensation. Journal of
Risk and Insurance 77 (2): 329-346.
Elliott, R. K. 1992. The third wave breaks on the shores of accounting. Accounting Horizons 6
(2): 61.
Etheridge, H. L., R. S. Sriram, and H. K. Hsu. 2000. A comparison of selected artificial neural
networks that help auditors evaluate client financial viability. Decision Sciences 31 (2): 531-550.
Fanning, K. M., and K. O. Cogger. 1998. Neural network detection of management fraud using
published financial data. Intelligent Systems in Accounting, Finance & Management 7 (1): 21-
41.
Financial Accounting Standards Board (FASB). 1998. The Framework of Financial Accounting
Concepts and Standards.
Forbes, S. W. 1970. Loss reserving performance within the regulatory framework. Journal of
Risk and Insurance: 527-538.
34
Friedman, J. H. 2001. Greedy function approximation: a gradient boosting machine. Annals of
statistics: 1189-1232.
Froot, K. A., D. S. Scharfstein, and J. C. Stein. 1993. Risk management: Coordinating corporate
investment and financing policies. the Journal of Finance 48 (5): 1629-1658.
Galeshchuk, S., and S. Mukherjee. 2017. Deep networks for predicting direction of change in
foreign exchange rates. Intelligent Systems in Accounting, Finance and Management 24 (4):
100-110.
Gaver, J. J., and J. S. Paterson. 2004. Do insurers manipulate loss reserves to mask solvency
problems?. Journal of Accounting and Economics 37 (3): 393-416.
Goldberg, L. R. 1970. Man versus model of man: A rationale, plus some evidence, for a method
of improving on clinical inferences. Psychological bulletin 73 (6): 422.
Grace, E. V. 1990. Property-liability insurer reserve errors: A theoretical and empirical analysis.
Journal of Risk and Insurance (1986-1998) 57 (1): 28.
Grace, M. F., and J. T. Leverty. 2012. Property–liability insurer reserve error: Motive,
manipulation, or mistake. Journal of Risk and Insurance 79 (2): 351-380.
Hansen, J. V., and W. F. Messier. 1982. Expert systems for decision support in EDP auditing.
International journal of computer & information sciences 11 (5): 357-379.
Harrington, S. E., and P. M. Danzon. 1994. Price cutting in liability insurance markets. Journal
of Business: 511-538.
Hoffman, P. J. 1960. The paramorphic representation of clinical judgment. Psychological
bulletin 57 (2): 116.
Hoyt, R. E., and K. A. McCullough. 2010. Managerial Discretion and the Impact of Risk-Based
Capital Requirements on Property-Liability Insurer Reserving Practices. Journal of Insurance
Regulation 29 (2).
Huang, C., R. E. Dorsey, and M. A. Boose. 1994. Life Insurer Financial Distress Prediction: A
Neural Network Model. Journal of Insurance Regulation 13 (2).
Khan, U., B. Li, S. Rajgopal, and M. Venkatachalam. 2017. Do the FASB's Standards Add
Shareholder Value?. The Accounting Review 93 (2): 209-247.
Koskivaara E. 2004. Artificial neural networks in analytical review procedures. Managerial
Auditing Journal 19(2): 191–223.
Lam, M. 2004. Neural network techniques for financial performance prediction: integrating
fundamental and technical analysis. Decision Support Systems 37 (4): 567-581.
Leigh, W., R. Purvis, and J. M. Ragusa. 2002. Forecasting the NYSE composite index with
technical analysis, pattern recognizer, neural network, and genetic algorithm: a case study in
romantic decision support. Decision Support Systems 32 (4): 361-377.
35
Lenard, M. J., P. Alam, and G. R. Madey. 1995. The application of neural networks and a
qualitative response model to the auditor's going concern uncertainty decision. Decision Sciences
26 (2): 209-227.
Li, K. K., and R. G. Sloan. 2017. Has goodwill accounting gone bad?. Review of Accounting
Studies 22 (2): 964-1003.
Lin, J. W., M. I. Hwang, and J. D. Becker. 2003. A fuzzy neural network for assessing the risk of
fraudulent financial reporting. Managerial Auditing Journal 18 (8): 657-665.
Maher, J. J., and T. K. Sen. 1997. Predicting bond ratings using neural networks: a comparison
with logistic regression. Intelligent Systems in Accounting, Finance & Management 6 (1): 59-72.
Mason, L., J. Baxter, P. L. Bartlett, and M. R. Frean. 2000. Boosting algorithms as gradient
descent. Paper presented at Advances in neural information processing systems.
McKee, T. E., and T. Lensberg. 2002. Genetic programming and rough sets: A hybrid approach
to bankruptcy classification. European Journal of Operational Research 138 (2): 436-451.
McNichols, M. F. 2000. Research design issues in earnings management studies. Journal of
Accounting and Public Policy 19 (4-5): 313-345.
Murphy, D., and C. E. Brown. 1992. The uses of advanced information technology in audit
planning. Intelligent Systems in Accounting, Finance and Management 1 (3): 187-193.
Nelson, K. K. 2000. Rate regulation, competition, and loss reserve discounting by property-
casualty insurers. The Accounting Review 75 (1): 115-138.
O’Leary, D. E. 1998. Using neural networks to predict corporate failure. Intelligent Systems in
Accounting, Finance & Management 7 (3): 187-197.
Parot, A., K. Michell, and W. D. Kristjanpoller. 2019. Using Artificial Neural Networks to
forecast Exchange Rate, including VAR‐VECM residual analysis and prediction linear
combination. Intelligent Systems in Accounting, Finance and Management 26 (1): 3-15.
Paton, W. A., and A. C. Littleton. 1970. An introduction to corporate accounting standards:
American Accounting Association.
Public Company Accounting Oversight Board (PCAOB), 2018, Auditing Accounting Estimates,
Including Fair Value Measurements, PCAOB Release No. 2018-005.
Pendharkar, P. C. 2005. A threshold-varying artificial neural network approach for classification
and its application to bankruptcy prediction problem. Computers & Operations Research 32 (10):
2561-2582.
Petroni, K. R. 1992. Optimistic reporting in the property-casualty insurance industry. Journal of
Accounting and Economics 15 (4): 485-508.
Petroni, K., and M. Beasley. 1996. Errors in Accounting Estimates and Their Relation to Audit
Firm Type. Journal of Accounting Research 34 (1): 151-171.
36
Ramon, J. 2013. Responses to question “How to determine the number of trees to be generated in
Random Forest algorithm?”. ResearchGate.
https://www.researchgate.net/post/How_to_determine_the_number_of_trees_to_be_generated_in
_Random_Forest_algorithm
Rivas, V., E. Parras‐Gutiérrez, J. Merelo, M. G. Arenas, and P. García‐Fernández. 2017. Time
series forecasting using evolutionary neural nets implemented in a volunteer computing system.
Intelligent Systems in Accounting, Finance and Management 24 (2-3): 87-95.
Schapire, R. E. 1990. The strength of weak learnability. Machine Learning 5 (2): 197-227.
Smith, B. D. 1980. An analysis of auto liability loss reserves and underwriting results. Journal of
Risk and Insurance: 305-320.
Sun, T., and M. A. Vasarhelyi. 2017. Deep Learning and the Future of Auditing: How an
Evolving Technology Could Transform Analysis and Improve Judgment. CPA Journal 87 (6).
Weiss, M. 1985. A Multivariate Analysis of Loss Reserving Estimates in Property-Liability
Insurers. The Journal of risk and insurance 52 (2): 199-221.
Witten, I. H., E. Frank, and M. A. Hall. 2011. Data Mining: Practical machine learning tools and
techniques.
Zhang, C., and M. J. Browne. 2013. Loss Reserve Errors, Income Smoothing and Firm Risk of
Property and Casualty Insurance Companies. Paper presented at Annual Meeting of the
American Risk and Insurance Association, Working Pa.
Zhang, G., M. Y. Hu, B. E. Patuwo, and D. C. Indro. 1999. Artificial neural networks in
bankruptcy prediction: General framework and cross-validation analysis. European Journal of
Operational Research 116 (1): 16-32.
Zhang, Y., and L. Wu. 2009. Stock market prediction of S&P 500 via combination of improved
BCO approach and BP neural network. Expert Systems with Applications 36 (5): 8849-885
37
Figure 1: Illustration of the training/validation/testing approach
Validate Train
Validate
Validate
Validate
Validate
Training/Validation Set Holdout Sample
Train
Train
Train
Train
Train
Train
Train
38
Figure 2: Three sets of test sample
Validate Train
Validate Train
Validate Train
Train Validate
Train Validate
Validate Train
Validate Train
Validate Train
Train Validate
Train Validate
Validate Train
Validate Train
Validate Train
Train Validate
Train Validate
39
Table 1: Tuning details for machine learning algorithms
Linear regression Learning Rate 0.1, 0.01, 0.001, 0.0001
Number of iterations 10, 50, 100, 500, 1000
Random forest the Number of Trees 50, 100
the Maximum depth of the tree 20, 30, 50
the Minimum Number of Raw 1, 5, 10, 50
Gradient boosting machine the Number of Trees 50, 100
the Maximum depth of the tree 5, 10, 20, 30, 50
the Minimum Number of Raw 1, 5, 10, 50
Artificial neural networks Activation Function Rectifier, Tanh, Max out, Rectifier with Dropout,
Tanh with Dropout, Max out With Dropout
Number of Hidden Layers 2,3, 4
Number of Nodes
the first layer had a number of nodes equal to the
number of independent variables, in each additional
layer the number of nodes decreased by
approximately 50%
Number of Epochs 10, 50, 100
Learning Rate examined at 100 possible learning rates, scaled from
0.0001 to 0.000001.
40
Tabl
e 2:
Illu
stra
tion
of in
curr
ed lo
sses
and
cum
ulat
ive
paid
net
loss
es r
epor
ted
in 2
008
for
Nat
iona
l Llo
yds I
nsur
ance
C
ompa
ny
41
Tabl
e 3:
Cum
ulat
ive
paym
ent p
erce
ntag
e in
the
first
five
yea
rs fo
r ea
ch b
usin
ess l
ine
Bus
ines
s Lin
e Y
ear 0
Y
ear 1
Y
ear 2
Y
ear 3
Y
ear 4
Y
ear 5
Priv
ate
Pass
enge
r Aut
o Li
abili
ty
40.6
4%
72.4
4%
86.7
6%
94.2
0%
97.6
1%
99.0
6%
Com
mer
cial
Aut
o Li
abili
ty
25.0
3%
50.7
4%
70.9
0%
85.5
7%
93.8
8%
97.7
0%
Wor
kers
’ Com
pens
atio
n 24
.99%
56
.11%
72
.90%
83
.20%
89
.09%
93
.03%
Com
mer
cial
Mul
ti-Pe
ril
44.5
2%
69.2
2%
80.0
3%
88.5
8%
93.8
5%
97.1
1%
Hom
eow
ner/F
arm
owne
r 72
.62%
93
.50%
96
.83%
98
.58%
99
.48%
99
.82%
42
Tabl
e 4:
Sum
mar
y St
atist
ics f
or th
e fiv
e in
sura
nce
busin
ess l
ines
Priv
ate
Pass
enge
r Aut
o Li
abili
ty
Com
mer
cial
Aut
o Li
abili
ty
Wor
kers
’ Com
pens
atio
n C
omm
erci
al M
ulti-
Peril
H
omeo
wne
r/Far
mow
ner
Var
iabl
es
Mea
n St
d. D
ev
Obs
M
ean
Std.
Dev
O
bs
Mea
n St
d. D
ev
Obs
M
ean
Std.
Dev
O
bs
Mea
n St
d. D
ev
Obs
Act
ualL
osse
s 90
246
5627
76
7,23
9 17
764
5213
9 6,
549
3883
6 11
9348
5,
118
2443
0 81
218
6,40
9 41
256
2563
40
7,62
4
Man
ager
Estim
ate
9318
3 58
0029
7,
239
1686
4 45
651
6,54
9 44
987
1365
26
5,11
8 24
818
7909
9 6,
409
4163
0 25
7594
7,
624
Ass
ets
1255
516
5130
236
7,23
9 14
6654
5 55
9763
1 6,
549
1436
801
4278
044
5,11
8 12
7587
0 43
0255
8 6,
409
4951
59
3776
991
7,62
4
Liab
ilitie
s 77
2594
28
9179
6 7,
239
9030
94
3178
499
6,54
9 95
3594
28
5431
3 5,
118
8342
12
2842
832
6,40
9 35
6919
12
8932
7 7,
624
NPW
42
9264
17
7583
4 7,
239
4626
30
1864
671
6,54
9 40
5332
11
2440
3 5,
118
3936
67
1414
870
6,40
9 36
5591
13
1120
9 7,
624
DPW
37
3990
14
1132
2 7,
239
4049
79
1480
464
6,54
9 37
0433
95
2908
5,
118
3490
02
1042
452
6,40
9 32
8399
97
7791
7,
624
NPE
42
0376
17
5563
8 7,
239
4517
07
1840
323
6,54
9 39
4541
10
9186
5 5,
118
3833
53
1387
281
6,40
9 35
6919
12
8932
7 7,
624
NI
4080
6 23
4719
7,
239
4572
5 25
2965
6,
549
4120
0 22
0877
5,
118
3971
0 23
7906
6,
409
3558
9 21
5027
7,
624
Paid
Cla
im
1460
1 86
669
7,23
9 14
72
4354
6,
549
3861
13
142
5,11
8 13
85
6097
6,
409
8823
55
271
7,62
4
Unp
aidC
laim
73
60
4995
6 7,
239
773
2764
6,
549
1792
88
58
5,11
8 74
4 23
78
6,40
9 28
31
1802
6 7,
624
Out
stC
laim
59
77
3217
8 7,
239
686
1897
6,
549
1976
56
47
5,11
8 54
6 15
47
6,40
9 11
03
5053
7,
624
Rep
orte
dCla
im
2793
5 16
2349
7,
239
2929
85
08
6,54
9 75
42
2586
7 5,
118
2670
93
84
6,40
9 12
668
7627
6 7,
624
Line
Prem
ium
s 14
2228
81
2935
7,
239
3191
2 87
021
6,54
9 77
485
2230
37
5,11
8 48
453
1437
24
6,40
9 73
833
3977
83
7,62
4
Prem
ium
Ced
ed
1298
0 45
691
7,23
9 68
91
2989
2 6,
549
1629
7 64
708
5,11
8 10
757
4151
8 6,
409
1170
0 47
174
7,62
4
Line
Paym
ent
3975
1 24
5636
7,
239
4547
12
484
6,54
9 98
89
2931
2 5,
118
1082
7 33
913
6,40
9 32
754
1919
33
7,62
4
Paym
entC
eded
34
51
1304
3 7,
239
851
4254
6,
549
1625
72
18
5,11
8 20
35
9569
6,
409
4652
34
432
7,62
4
Line
DC
C
619
2963
7,
239
203
889
6,54
9 74
0 33
50
5,11
8 26
9 88
5 6,
409
462
2356
7,
624
DC
CC
eded
89
58
0 7,
239
52
430
6,54
9 11
6 96
1 5,
118
56
427
6,40
9 64
38
8 7,
624
SSR
58
1 41
58
7,23
9 81
20
58
6,54
9 42
13
57
5,11
8 15
5 48
29
6,40
9 10
8 80
2 7,
624
Paid
Loss
37
182
2456
63
7,23
9 39
64
1098
0 6,
549
9220
28
873
5,11
8 92
93
3013
3 6,
409
2888
3 18
2834
7,
624
Gdp
Chg
2.
55
2.36
7,
239
2.43
2.
29
6,54
9 2.
45
2.33
5,
118
2.50
2.
30
6,40
9 2.
55
2.38
7,
624
Infla
tion
5.25
1.
46
7,23
9 5.
23
1.43
6,
549
5.24
1.
43
5,11
8 5.
26
1.41
6,
409
5.28
1.
47
7,62
4
All
varia
bles
are
def
ined
in A
ppen
dix
A a
nd a
ll va
riabl
es a
re in
thou
sand
s of d
olla
rs e
xcep
t the
GdpChg
and
Inflation,
whi
ch a
re in
per
cent
ages
.
�
43
Tabl
e 5:
Cro
ss-v
alid
atio
n re
sults
Pa
nel A
Cro
ss-v
alid
atio
n re
sults
Bus
ines
s lin
e Tr
aini
ng/V
alid
atio
n Sa
mpl
e O
bs
Man
ager
s’ E
stim
ates
M
achi
ne le
arni
ng w
ithou
t man
ager
est
imat
es
Mac
hine
lear
ning
with
man
ager
est
imat
es
M
AE
RM
SE
MA
E R
MSE
A
ccur
acy
edge
M
AE
RM
SE
Acc
urac
y ed
ge
�
MA
E)
(RM
SE)
�M
AE)
�
RM
SE)
Priv
ate
Pass
enge
r Aut
o Li
abili
ty
Ran
dom
fore
st
Ran
dom
fore
st
1996
-200
5 5,
949
9,46
1 37
,494
7,
526
36,6
94
20%
2%
6,
742
31,3
52
29%
16
%
1996
-200
6 6,
298
9,79
3 38
,266
7,
367
34,8
30
25%
9%
6,
945
34,3
66
29%
10
%
1996
-200
7 6,
602
9,57
5 37
,940
7,
363
35,1
27
23%
7%
6,
695
32,7
24
30%
14
%
Com
mer
cial
Aut
o Li
abili
ty
Ran
dom
fore
st
Ran
dom
fore
st
1996
-200
5 5,
383
4,20
9 18
,562
3,
135
12,2
30
26%
34
%
2,77
3 10
,802
34
%
42%
19
96-2
006
5,66
1 4,
155
18,3
75
3,03
7 11
,771
27
%
36%
2,
887
11,7
67
31%
36
%
1996
-200
7 5,
957
4,33
8 19
,175
3,
242
12,2
51
25%
36
%
2,93
6 12
,047
32
%
37%
Wor
kers
’ Com
pens
atio
n
Ran
dom
fore
st
Ran
dom
fore
st
1996
-200
5 4,
183
11,5
47
43,6
52
6,46
2 28
,046
44
%
36%
6,
293
28,2
52
46%
35
%
1996
-200
6 4,
398
12,3
60
44,1
87
6,60
9 27
,155
47
%
39%
6,
112
25,0
73
51%
43
%
1996
-200
7 4,
645
13,2
14
47,5
41
6,84
0 27
,769
48
%
42%
6,
300
24,8
12
52%
48
%
Com
mer
cial
Mul
ti-Pe
ril
Ran
dom
fore
st
Ran
dom
fore
st
1996
-200
5 5,
235
5,73
7 27
,615
4,
382
19,8
89
24%
28
%
4,08
7 18
,569
29
%
33%
19
96-2
006
5,45
7 5,
871
27,9
31
4,27
6 18
,784
27
%
33%
4,
036
18,3
91
31%
34
%
1996
-200
7 5,
846
6,01
7 28
,349
4,
500
20,2
24
25%
29
%
4,13
5 19
,253
31
%
32%
Hom
eow
ner/F
arm
owne
r
Line
ar r
egre
ssio
n Li
near
reg
ress
ion
1996
-200
5 6,
121
3,90
5 16
,789
5,
764
24,2
42
-48%
-4
4%
3,30
3 17
,154
15
%
-2%
19
96-2
006
6,54
4 3,
878
16,6
11
5,23
2 20
,223
-3
5%
-22%
2,
916
12,7
03
25%
24
%
1996
-200
7 6,
946
3,96
2 16
,826
5,
274
20,4
38
-33%
-2
1%
2,95
0 12
,786
26
%
24%
Pa
nel B
Influ
entia
l var
iabl
es
Bus
ines
s Lin
e To
p th
ree
impo
rtant
var
iabl
es
MA
E w
ithou
t the
fir
st v
aria
ble
RM
SE w
ithou
t the
fir
st v
aria
ble
MA
E w
ithou
t the
fir
st tw
o va
riabl
e R
MSE
with
out t
he
first
two
varia
ble
MA
E w
ithou
t the
fir
st th
ree
varia
ble
RM
SE w
ithou
t the
fir
st th
ree
varia
ble
Priv
ate
Pass
enge
r Aut
o Li
abili
ty
Line
Prem
ium
s, D
PW,
Line
Paym
ent
7,82
4 36
,535
7,
859
39,8
15
8,20
6 46
,073
Com
mer
cial
Aut
o Li
abili
ty
Paid
Loss
, Lin
ePre
miu
ms,
Line
Paym
ent
3,47
3 12
,915
4,
189
16,0
47
4,49
3 16
,723
Wor
kers
’ Com
pens
atio
n Pa
idLo
ss, L
ineP
aym
ent,
Line
Prem
ium
s 7,
556
32,0
63
8,88
7 35
,384
9,
554
35,9
74
Com
mer
cial
Mul
ti-Pe
ril
Paid
Loss
, Lin
ePre
miu
ms,
Line
Paym
ent
5,35
0 23
,229
5,
888
25,3
46
6,31
8 27
,192
Hom
eow
ner/F
arm
owne
r Li
nePr
emiu
ms,
Paid
Loss
, Pa
idC
laim
7,
354
20,1
88
5,34
3 20
,663
5,
290
20,4
02
44
Tab
le 6
: Hol
dout
test
res
ults
Pa
nel A
Hol
dout
test
resu
lts
Bus
ines
s lin
e H
oldo
ut
Sam
ple
Obs
M
anag
ers’
Est
imat
es
Mac
hine
lear
ning
with
out m
anag
er e
stim
ates
M
achi
ne le
arni
ng w
ith m
anag
er e
stim
ates
MA
E R
MSE
M
AE
RM
SE
Acc
urac
y ed
ge
MA
E R
MSE
A
ccur
acy
edge
�M
AE)
(R
MSE
) �
MA
E)
�R
MSE
)
Priv
ate
Pass
enge
r Aut
o Li
abili
ty
Ran
dom
fore
st
Ran
dom
fore
st
2006
67
0 9,
337
36,1
20
7,86
1 36
,590
16
%
-1%
7,
359
34,9
18
21%
3%
20
07
659
9,43
5 36
,221
8,
193
38,7
04
13%
-7
%
8,05
7 35
,533
15
%
2%
2008
63
7 10
,616
50
,851
8,
594
42,1
50
19%
17
%
8,77
3 43
,037
17
%
15%
Com
mer
cial
Aut
o Li
abili
ty
Ran
dom
fore
st
Ran
dom
fore
st
2006
62
0 3,
852
17,2
87
3,65
9 13
,800
5%
20
%
3,06
2 13
,883
21
%
20%
20
07
609
3,28
8 13
,481
3,
198
10,8
48
3%
20%
2,
842
10,3
34
14%
23
%
2008
59
2 4,
219
19,3
61
3,09
9 9,
857
27%
49
%
2,97
4 10
,274
30
%
47%
Wor
kers
’ Com
pens
atio
n
Ran
dom
fore
st
Ran
dom
fore
st
2006
49
9 18
,871
67
,596
8,
077
31,0
74
57%
54
%
7,35
8 27
,639
61
%
59%
20
07
498
15,4
69
62,1
24
6,95
5 28
,186
55
%
55%
6,
464
27,3
67
58%
56
%
2008
47
3 12
,507
44
,755
8,
669
36,0
76
31%
19
%
8,69
9 36
,754
30
%
18%
Com
mer
cial
Mul
ti-Pe
ril
Ran
dom
fore
st
Ran
dom
fore
st
2006
58
2 6,
122
26,0
34
3,83
5 13
,134
37
%
50%
3,
719
13,4
35
39%
48
%
2007
57
0 5,
725
27,3
61
4,47
2 21
,451
22
%
22%
3,
974
18,5
48
31%
32
%
2008
56
3 6,
685
32,1
12
7,83
1 37
,900
-1
7%
-18%
6,
817
35,2
03
-2%
-1
0%
Hom
eow
ner/F
arm
owne
r
Line
ar re
gres
sion
Li
near
regr
essi
on
2006
69
7 2,
964
12,2
31
5,45
2 15
,274
-8
4%
-25%
2,
099
6,06
8 29
%
50%
20
07
692
3,52
5 14
,042
4,
912
13,6
89
-39%
3%
2,
362
8,60
6 33
%
39%
20
08
678
5,56
5 24
,434
7,
354
24,6
57
-32%
-1
%
2,58
8 8,
606
54%
65
%
Pane
l B B
oots
trap
anal
yses
Bus
ines
s lin
e H
oldo
ut S
ampl
e
Pred
ictio
n er
ror d
iffer
ence
bet
wee
n m
achi
ne le
arni
ng m
odel
s and
man
ager
s M
odel
with
out m
anag
er e
stim
ates
Mod
el w
ith m
anag
er e
stim
ates
Mea
n St
anda
rd e
rror
D
iffer
ence
si
gnifi
canc
e M
ean
Stan
dard
erro
r
Priv
ate
Pass
enge
r Aut
o Li
abili
ty
R
ando
m fo
rest
Ran
dom
fore
st
20
06
1,47
3 12
.08
***
1980
11
.67
***
2007
1,
250
11.9
7 **
* 13
81
11.6
3 **
* 20
08
2,02
6 12
.52
***
1840
11
.65
***
Com
mer
cial
Aut
o Li
abili
ty
R
ando
m fo
rest
Ran
dom
fore
st
20
06
190
4.96
**
* 79
2 3.
42
***
2007
87
3.
41
***
445
3.07
**
* 20
08
1,12
0 5.
63
***
1,23
8 5.
08
***
Wor
kers
’ Com
pens
atio
n
R
ando
m fo
rest
Ran
dom
fore
st
20
06
10,8
21
22.1
4 **
* 11
,502
23
.05
***
2007
8,
517
22.1
4 **
* 8,
993
21.7
9 **
* 20
08
3,84
4 15
.03
***
3,80
3 15
.63
***
Com
mer
cial
Mul
ti-Pe
ril
R
ando
m fo
rest
Ran
dom
fore
st
20
06
2,28
5 7.
99
***
2,41
2 6.
37
***
2007
1,
253
10.6
3 **
* 1,
757
8.27
**
* 20
08
-1,1
41
10.1
7 **
* -1
23
8.18
**
*
Hom
eow
ner/F
arm
owne
r
Li
near
reg
ress
ion
Li
near
reg
ress
ion
20
06
-2,4
85
4.03
**
* 86
4 4.
53
***
2007
-1
,383
3.
28
***
1,16
3 3.
83
***
2008
-1
,786
3.
86
***
2,98
0 7.
86
***
45
Table 7: Comparing Manager Estimation Errors with Model Estimation Errors
ManagerError
ModelError
(without manager estimates as input
variable)
ModelError
(with manager estimates as input
variable)
Mean Median
Mean Median Diff. p-value
Mean Median Diff. p-value
Obs
Unsigned
Error 0.0128 0.0036
0.0106 0.0029 0.0022 0.00***
0.0092 0.0026 0.0036 0.00***
8,603
Signed
Error 0.0042 0.0015
-0.0023 -0.0002 0.0065 0.00***
-0.002 -0.0001 0.0059 0.00***
8,603
This table presents differences in manager estimation errors and model estimation errors. The model estimates were obtained from the
holdout sample predictions for the period 2006-2008. ManagerError is defined as the reported loss estimates minus true loss, divided by total
assets. ModelError is the model estimate minus true loss, divided by total assets. We compare both signed estimation errors and unsigned
estimation errors. P-values reflect the differences in mean estimation errors. *, **, *** Indicate significant difference in means at the 0.10,
0.05, and 0.01 levels, respectively, using a two-tailed t-test.
46
Tabl
e 8:
The
ass
ocia
tion
betw
een
estim
atio
n er
rors
and
man
ager
ial i
ncen
tives
(1
) (2
) (3
) (4
)
M
anag
erE
rror
M
anag
erE
rror
M
odel
Err
or w
ithot
man
ager
s’ e
stim
ates
M
odel
Err
or w
ith m
anag
ers’
est
imat
es
Pr
ed.
Sign
(1
999-
2008
) (2
006-
2008
)
Coe
ff.
Std
Err
.
Coe
ff.
Std
Err
.
Coe
ff.
Std
Err
.
Dif
f.
C
oeff
. St
d E
rr.
D
iff.
Tax
Shei
eld
+ 0.
052
0.01
20
***
0.05
3 0.
0208
**
* -0
.068
0.
0158
0.12
1 **
* -0
.038
0.
0170
0.09
1 **
*
Smoo
th
- -0
.005
0.
0129
0.01
6 0.
0207
0.06
9 0.
0580
-0.0
53
***
0.04
9 0.
0421
-0.0
33
Smal
lPro
fit
- -0
.003
0.
0019
*
-0.0
05
0.00
28
**
-0.0
02
0.00
25
-0
.003
-0.0
04
0.00
20
**
-0.0
01
Inso
lven
cy
- -0
.001
0.
0013
0.00
3 0.
0029
0.00
2 0.
0026
0.00
1
0.00
0 0.
0022
0.00
3 *
Vio
latio
n -
-0.0
02
0.00
07
***
-0.0
03
0.00
11
***
-0.0
01
0.00
11
-0
.002
-0.0
01
0.00
12
-0
.002
Size
?
-0.0
01
0.00
03
***
-0.0
01
0.00
03
***
-0.0
01
0.00
05
* 0.
000
-0
.001
0.
0005
-0.0
01
Smal
lLos
s ?
-0.0
03
0.00
17
**
-0.0
01
0.00
46
-0
.001
0.
0050
0.00
1
-0.0
01
0.00
47
0.
000
Prof
it +
-0.0
02
0.00
09
-0
.004
0.
0017
0.00
2 0.
0015
-0.0
05
* 0.
000
0.00
13
-0
.004
**
*
Los
s ?
0.00
1 0.
0014
-0.0
02
0.00
37
-0
.006
0.
0042
0.00
4
-0.0
05
0.00
36
0.
004
*
Lin
eSiz
e ?
0.00
0 0.
0001
**
0.
000
0.00
01
0.
000
0.00
02
0.
000
0.
000
0.00
01
0.
000
Rei
nsur
ance
?
0.00
3 0.
0016
*
0.00
3 0.
0025
-0.0
07
0.00
80
0.
010
-0
.008
0.
0074
0.01
0
Publ
ic
? 0.
002
0.00
12
* 0.
001
0.00
14
0.
003
0.00
17
* -0
.003
**
0.
002
0.00
15
-0
.002
Mut
ual
? 0.
002
0.00
10
0.
002
0.00
15
0.
002
0.00
11
* 0.
000
0.
002
0.00
10
0.
000
Gro
up
? -0
.006
0.
0121
-0.0
06
0.01
09
-0
.004
0.
0018
**
-0
.002
-0.0
03
0.00
16
**
-0.0
02
Lin
e Fi
xed
Eff
ects
Y
es
Yes
Y
es
Yes
Yea
r Fi
xed
Eff
ects
Y
es
Yes
Y
es
Yes
Obs
23,1
96
8,24
7 8,
247
8,24
7
R-s
quar
ed
0.
0226
0.
0206
0.
0182
0.
0119
Thi
s ta
ble
repo
rts
regr
essi
on a
naly
sis
resu
lts o
f th
e es
timat
ion
erro
rs b
y m
anag
ers
and
mac
hine
lear
ning
mod
els.
The
reg
ress
ions
are
est
imat
ed b
ased
on
8,24
7 lin
e-ye
ar
obse
rvat
ions
. The
mod
el e
stim
ates
are
the
hold
out s
ampl
e pr
edic
tion
resu
lts b
ased
on
Ran
dom
For
ecas
t mod
els
for
the
year
s 20
06 to
200
8. P-values a
re b
ased
on
one-
side
d te
sts
for
coef
fici
ents
with
pre
dict
ed s
igns
, and
two-
side
d ot
herw
ise.
Fir
m-c
lust
erin
g ad
just
ed s
tand
ard
erro
rs a
re u
sed
to c
alcu
late
p-values.
Col
umn
(1)
pres
ents
the
regr
essi
on r
esul
ts
for
the
full
sam
ple
of m
anag
er e
stim
ates
dur
ing
the
peri
od 1
996-
2008
. Col
umn
(2)
repo
rts
the
resu
lts f
or th
e pe
riod
200
6-20
08 s
o th
at th
ey c
an b
e di
rect
ly c
ompa
red
to m
odel
es
timat
es. C
olum
n (3
) an
d (4
) sh
ow th
e re
gres
sion
res
ults
whe
n th
e de
pend
ent v
aria
ble
is ModelError,
with
out a
nd w
ith m
anag
ers'
estim
ates
as
mod
el in
put a
ttrib
ute.
The
tabl
e al
so p
rese
nts
the
coef
fici
ent d
iffe
renc
es b
etw
een
man
ager
est
imat
ion
erro
rs a
nd m
odel
est
imat
ion
erro
rs, u
sing
two-
side
d t t
ests
. TaxShield
is c
alcu
late
d as
the
sum
of
net
inco
me
and
tota
l res
erve
s, d
ivid
ed b
y to
tal a
sset
s. S
moo
th is
def
ined
as
the
aver
age
retu
rn o
n as
sets
ove
r th
e pr
evio
us th
ree
year
s. SmallProfit
is a
n in
dica
tor
vari
able
whi
ch
equa
ls 1
if th
e ea
rnin
gs f
all i
n th
e bo
ttom
5 p
erce
nt o
f th
e po
sitiv
e ea
rnin
gs d
istr
ibut
ion.
Ins
olve
ncy
is s
et to
1 if
the
risk
cap
ital r
atio
(R
BC
rat
io)
is s
mal
ler
than
2, a
nd 0
ot
herw
ise.
Vio
latio
n is
an
indi
cato
r va
riab
le th
at is
equ
al to
1 if
the
insu
rer
has
mor
e th
an 2
IR
IS r
atio
vio
latio
ns, a
nd 0
oth
erw
ise.
Oth
er v
aria
ble
defi
nitio
ns a
re p
rovi
ded
in
App
endi
x A
.
47
Appendix A: Variable definition Insurance Claims Loss Prediction
ActualLosses The cumulative payment on all events for a given accident year cumulative in the next ten years (including the accident year)
ManagerEstimate Managers' initial estimates of the losses incurred during the accident year
Business Line Operational Variables
Outstclaim Cumulative claims outstanding for the current accident year
Reportedclaim Cumulative reported claims for the current accident year
PaidClaim Number of loss claims closed with payment UnpaidClaim Number of loss claims closed without payment LinePremiums Premiums written in the current accident year on the business line
PremiumsCeded Premiums ceded to reinsurers LinePayment Total Payments for the current accident year PaymentCeded Payments ceded to reinsurers
LineDCC Defense & Cost Containment Payments Direct & Assumed
DCC Ceded Defense & Cost Containment Payments Ceded
SSR Salvage & Subrogation Received
PaidLoss Cumulative paid losses for the current accident year Company Characteristics
State Categorical variable that represents the State to which the insurer files the statutory filing.
Assets Total assets at the end of the accident year. Liability Total liabilities at the end of the accident year. DPW Direct premiums written during the accident year. NPW Total net premiums written during the accident year. NPE Total net premiums earned during the accident year. NI Net Income Accident year dummies Dummy variables for accident year.
Company identifier Identifier variables for insurers.
External Environment Variables
Pricegrowth Personal consumption expenditure growth at state level
48
Gdpgrowth GDP growth at state level Estimation Error Analyses
ManagerError Calculated as the managers' initial estimation for the losses in year t minus the true losses, divided by the total assets.
ModelError Calculated as the model estimation (model without managers' estimates) for the losses in year t minus the true losses, divided by the total assets.
TaxSheild Calculated as the sum of net income and true reserve, divided by total assets
Smooth Calculated as the average return on assets over the previous three years before year t.
SmallProfit Set to 1 if the insurer's earnings fall into the lowest 5 percent of the positive earnings distribution, and zero otherwise.
Insolvency Set to 1 if the Risk Capital Ratio is smaller than two, and zero otherwise. Violation Set to 1 if the insurer has at least one ratio violations, and zero otherwise.
Size Total asset if the firm in thousands, taking algorithm.
SmallLoss Set to 1 if the insurer's earnings fall into the highest 5 percent of the negative earnings distribution, and zero otherwise.
Profit Set to 1 if the insurer's earnings fall into the top 90 percent of the negative earnings distribution, and zero otherwise.
Loss Set to 1 if the insurer's earnings fall into the bottom 90 percent of the negative earnings distribution, and zero otherwise.
LineSize Calculated as the total premiums written in this line divided by the total premium written by the insurer for all lines combined
Reinsurance Calculated as the premiums reinsured divided by total premiums written Public Set to 1 if the company is publicly traded, and zero otherwise.
Mutual Set to 1 if the company has a mutual organization structure, and zero otherwise.
Group Set to 1 if the company is a member of a group, and zero otherwise
49
App
endi
x B:
Cro
ss-v
alid
atio
n re
sults
of m
achi
ne le
arni
ng m
odel
s
Bus
ines
s lin
e
M
achi
ne le
arni
ng w
ithou
t man
ager
est
imat
es
Mac
hine
lear
ning
with
man
ager
est
imat
es
Trai
ning
/Val
idat
ion
Sam
ple
RF
DN
N
GB
M
LR
RF
DN
N
GB
M
LR
M
AE
RM
SE
MA
E R
MSE
M
AE
RM
SE
MA
E R
MSE
M
AE
RM
SE
MA
E R
MSE
M
AE
RM
SE
MA
E R
MSE
Priv
ate
Pass
enge
r A
uto
Liab
ility
1996
-200
5 7,
624
36,8
02
14,1
51
43,8
34
9,28
6 52
,134
7,
404
21,5
60
7,27
0 33
,028
13
,330
35
,262
10
,046
60
,378
7,
300
21,7
34
1996
-200
6 8,
070
38,0
04
14,8
37
40,5
93
9,90
7 48
,702
7,
595
22,0
65
6,98
8 33
,456
12
,347
44
,359
8,
991
47,3
55
7,43
6 22
,202
1996
-200
7 7,
527
37,1
23
23,1
48
55,2
26
10,0
15
51,6
04
7,53
8 22
,345
6,
773
32,4
65
15,7
07
45,7
00
9,54
9 52
,287
7,
312
21,7
17
Com
mer
cial
A
uto
Liab
ility
1996
-200
5 3,
142
12,1
30
4,95
4 15
,846
2,
995
9,66
3 3,
758
11,7
75
2,86
8 11
,383
4,
858
15,4
64
2,86
2 10
,862
3,
734
12,0
04
1996
-200
6 3,
115
12,3
77
4,27
9 15
,918
2,
961
9,87
9 3,
967
11,7
18
2,85
3 11
,121
4,
514
14,7
33
3,00
6 11
,971
3,
938
11,8
42
1996
-200
7 3,
193
12,0
94
4,96
2 16
,172
3,
122
10,7
45
4,12
2 12
,938
3,
076
12,5
81
4,73
2 17
,649
3,
021
10,8
78
4,07
3 12
,676
Wor
kers
’ C
ompe
nsat
ion
1996
-200
5 6,
605
29,0
31
10,2
75
30,2
04
7,85
3 32
,522
9,
192
36,6
82
6,27
2 26
,615
8,
828
27,5
45
7,15
8 30
,019
8,
907
34,5
92
1996
-200
6 6,
887
29,0
58
12,8
26
33,1
17
7,51
3 29
,740
9,
044
33,0
31
6,37
4 26
,876
10
,873
26
,814
6,
622
24,3
63
8,41
8 27
,355
1996
-200
7 7,
126
29,8
64
13,8
35
30,3
52
7,48
8 28
,583
9,
389
31,5
30
6,36
9 26
,332
10
,922
33
,558
6,
692
23,9
58
8,89
2 28
,625
Com
mer
cial
M
ulti-
Peril
1996
-200
5 4,
523
21,3
32
7,30
1 23
,806
4,
671
20,1
60
5,48
6 18
,045
4,
236
19,1
95
8,42
4 24
,730
4,
265
18,4
15
5,52
8 18
,188
1996
-200
6 4,
575
20,8
01
7,47
6 23
,400
4,
686
19,5
52
6,01
8 29
,288
4,
033
18,6
67
7,71
5 25
,592
4,
523
19,7
43
6,02
7 29
,524
1996
-200
7 4,
739
22,9
89
8,07
0 26
,657
4,
787
20,8
29
6,16
8 21
,501
4,
192
19,9
40
8,02
3 27
,552
4,
379
18,0
62
6,14
1 21
,531
Hom
eow
ner/
1996
-200
5 4,
981
36,3
40
7,08
7 26
,352
4,
664
28,7
08
5,76
5 24
,242
4,
574
35,4
56
7,35
7 29
,374
4,
938
34,4
32
4,31
4 18
,921
Farm
owne
r 19
96-2
006
4,87
0 35
,486
6,
867
24,1
20
4,73
5 28
,444
5,
232
20,2
23
4,21
7 30
,341
7,
902
24,5
17
4,99
7 34
,578
4,
135
14,6
69
19
96-2
007
5,30
0 43
,968
8,
866
27,3
93
4,87
3 21
,059
5,
274
20,4
38
4,37
6 34
,501
9,
578
28,2
97
4,95
8 34
,963
4,
198
14,3
85
50
App
endi
x C
: Hol
dout
test
res
ults
of m
achi
ne le
arni
ng m
odel
s
Bus
ines
s lin
e
M
achi
ne le
arni
ng w
ithou
t man
ager
est
imat
es
Mac
hine
lear
ning
with
man
ager
est
imat
es
Hol
dout
Sa
mpl
e R
F D
NN
G
BM
LR
R
F D
NN
G
BM
LR
M
AE
RM
SE
MA
E R
MSE
M
AE
RM
SE
MA
E R
MSE
M
AE
RM
SE
MA
E R
MSE
M
AE
RM
SE
MA
E R
MSE
Priv
ate
Pass
enge
r A
uto
Liab
ility
2006
7,
656
31,4
81
25,2
58
50,9
39
8,90
2 30
,532
7,
825
20,5
94
6,75
1 27
,938
12
,361
58
,797
8,
902
30,5
32
7,51
1 20
,531
2007
7,
806
31,0
55
15,7
25
53,0
09
11,1
33
47,4
45
9,81
3 39
,888
7,
021
30,2
86
12,3
08
38,2
09
11,1
69
57,3
50
9,26
7 32
,258
2008
8,
544
41,8
58
57,5
02
72,6
02
10,1
68
44,8
16
10,2
33
38,6
92
7,63
5 38
,999
23
,993
68
,434
9,
797
44,0
37
9,57
0 37
,335
Com
mer
cial
A
uto
Liab
ility
2006
3,
400
13,1
54
7,12
0 21
,492
3,
545
13,2
83
4,40
4 12
,808
3,
312
13,9
59
5,19
0 15
,287
3,
675
16,0
91
4,32
4 13
,361
2007
3,
107
10,4
10
8,32
2 16
,562
3,
447
11,3
58
4,56
0 13
,000
3,
045
10,9
79
6,74
0 15
,083
2,
960
9,67
9 4,
464
12,6
74
2008
2,
855
9,07
0 4,
865
13,7
83
2,99
9 9,
784
4,27
6 11
,392
2,
836
9,42
0 7,
657
18,1
31
2,82
7 9,
128
4,30
1 11
,270
Wor
kers
’ C
ompe
nsat
ion
2006
7,
642
28,4
37
14,4
10
31,4
75
7,43
0 23
,523
10
,313
31
,721
7,
599
28,1
73
10,0
03
27,2
94
7,16
7 21
,617
10
,745
36
,142
2007
7,
529
31,1
43
10,2
22
39,1
77
8,11
5 29
,873
10
,942
33
,152
6,
422
27,0
63
19,4
19
57,8
88
7,38
0 30
,020
9,
637
32,8
36
2008
9,
218
38,3
37
12,9
72
39,4
45
9,88
3 38
,380
12
,110
46
,065
8,
605
38,3
80
24,0
02
41,2
19
9,28
7 39
,225
11
,816
42
,062
Com
mer
cial
M
ulti-
Peril
2006
3,
943
13,5
49
13,3
73
23,2
58
4,58
6 16
,889
6,
163
18,4
34
3,64
4 13
,917
6,
727
18,4
85
4,30
7 15
,542
6,
203
18,4
10
2007
4,
671
22,9
01
9,55
4 45
,940
4,
460
18,4
89
7,42
4 32
,861
4,
309
20,0
62
9,63
8 48
,366
4,
239
18,4
85
7,36
9 32
,692
2008
7,
832
37,4
35
12,8
43
42,5
72
8,73
0 38
,860
8,
592
36,2
34
6,85
0 34
,425
9,
653
29,2
39
6,22
9 29
,095
8,
502
34,9
60
Hom
eow
ner/
2006
4,
708
31,3
77
7,20
3 30
,867
5,
127
33,7
68
5,45
2 15
,274
3,
918
28,8
98
6,23
2 21
,973
4,
725
33,3
53
3,64
2 9,
574
Farm
owne
r 20
07
4,77
3 30
,437
8,
165
18,9
38
5,08
7 33
,993
4,
912
13,6
89
3,87
5 23
,993
6,
241
22,4
91
4,94
8 34
,664
3,
969
12,1
03
20
08
13,0
44
30,0
34
14,8
87
84,4
13
8,73
0 38
,860
7,
354
24,6
58
10,8
29
22,0
57
9,69
9 66
,250
10
,873
38
,158
5,
335
20,5
74