machine learning improves accounting estimates · 2019-11-27 · smu classification: restricted...

SMU Classification: Restricted

Machine Learning Improves Accounting Estimates

Presented by

Ting Sun College of New Jersey

The views and opinions expressed in this working paper are those of the author(s) and not necessarily those of the School of Accountancy, Singapore Management University.


Kexing Ding1

Baruch Lev2

Xuan Peng3

Ting Sun4

Miklos A. Vasarhelyi5

Nov 15, 2019

1 Southwestern University of Finance and Economics; and Rutgers, the State University of New Jersey, [email protected]. 2 Stern School of Business, New York University, [email protected]. 3 Southwestern University of Finance and Economics; and Rutgers, the State University of New Jersey, [email protected]. 4 The College of New Jersey, [email protected]. 5 Corresponding author. Rutgers, the State University of New Jersey, [email protected]; Acknowledgement: The authors would like to thank Richard Sloan (the Editor), Stephen Ryan, and two anonymous referees for helpful comments and suggestions. Kexing Ding and Xuan Peng are thankful to Rutgers Continuous Auditing Research Lab for financial support.

1


ABSTRACT

Managerial estimates are ubiquitous in accounting: most balance sheet and income

statement items are based on estimates; some, such as the pension and employee stock options

expenses, derive from multiple estimates. These estimates are affected by objective estimation

errors, as well as by managerial manipulation, thereby adversely affecting the reliability and

relevance of financial reports. We show in this study that machine learning can substantially

improve managerial estimates. Specifically, using insurance companies’ data on loss reserves

(future customer claims) estimates and realizations, we document that the loss estimates

generated by machine learning were superior to actual managerial estimates reported in financial

statements, in four out of five insurance lines examined. We subject our findings to various

robustness tests and find that they hold. Our evidence suggests that machine learning techniques

can be highly useful to managers and auditors in improving accounting estimates, thereby

enhancing the usefulness of financial information to investors. Obviously, the generalizability of

our findings beyond insurance data remains to be examined by future research.

Keywords: machine learning, accounting estimates

2


I. INTRODUCTION

The PCAOB recently introduced a new standard for auditing accounting estimates,

stating:

“Accounting estimates are an essential part of financial statements. Most

companies’ financial statements reflect accounts or amounts in disclosures

that require estimation. Accounting estimates are pervasive in financial

statements, often substantially affecting a company’s financial position and

results of operations… The evolution of financial reporting frameworks

toward greater use of estimates includes expanded use of fair value

measurements that need to be estimated.” (PCAOB 2018, p.3)

Indeed, most financial statement items are based on managerial subjective estimates: fixed assets

are presented net of depreciation―an estimate―and accounts receivables, net of estimated bad

debts. Liabilities, like pensions and post-retirement benefits are estimates, and revenues from

long-term projects or from contracts with future deliverables include estimates. Many expenses,

such as the stock options or warranty expenses, also require estimates. Some items, like the

pension expense, are based on multiple estimates, some of which, such as the expected gain on

pension assets, are essentially guesses. How influential are these estimates? General Electric

recently provided an example in its 2016 report: of an $8.2 billion net income, a full $3.9 billion

(48%) came from a single change in estimated future profits from long-term customer contracts.

Cambridge Dictionary defines an estimate as: “A guess of what the size, value, amount,

cost, etc. of something might be.” Accounting estimates are not just guesses; as research has

shown, they are often strategically biased guesses.1 Financial statement manipulation is

frequently done through tweaking the assumptions underlying the estimates, because even if an

1 For example, Dechow, Ge, Larson, and Sloan (2007, Introduction) reported that: “Manipulations of reserves,

including the allowance for doubtful debt is also common, occurring in 10 percent of the sample. Manipulations of

inventory and cost of goods sold [both including numerous estimates] occurred in 25 percent of the sample.” See

also Li and Sloan (2017) on the manipulation of the timing of goodwill write-offs, yet another substantial estimate.

3

ex-post realization turns out to be far off the estimate, managers can always claim that the

deviation was due to events that were unknown to them at the time of estimation.2 “Monday

morning quarterbacking” is often managers’ response to questions about the quality of their

estimates.

Generally effective audit and control procedures, such as obtaining third party

confirmations of assets and liabilities, inventory counts, or conformity of financial data with

market values are inapplicable to estimates, which are opinions, rather than facts that can be

validated. Auditors assessing managerial estimates fall back on weak procedures, like reliance on

the firm’s internal controls, or examining the reasonableness of the assumptions underlying the

estimates. By and large, accounting estimates are very difficult to audit, and even harder for

investors to assess their quality. There is, therefore, an urgent need to provide both managers and

auditors with an independent, free-of-bias estimates generator to be used as a substitute for, or as

a benchmark for managerial estimates.

Machine learning, quickly spreading in diverse areas, such as credit-card fraud detection,

medical diagnosis, and customer recommendation systems, has the potential of providing such an

independent estimates generator. We, accordingly, explore in this study the use of machine

learning (ML) in generating accounting estimates. In particular, we demonstrate, using insurance

companies’ data on estimates and realizations of “loss reserves” (estimates of future claims

related to current policies), that loss estimates derived from machine learning are, with a few

exceptions, superior to the actual managerial loss estimates underlying financial reports. We thus

establish, for the first time, the potential of ML to independently assess the reliability of

2 The fact that the subsequent realizations of most accounting estimates are not even disclosed in financial reports

increases further managerial incentives to manipulate financial items through estimates.

4

estimates underlying financial reports, thereby improving the quality and usefulness of financial

information. Furthermore, ML has the potential to substantially improve auditors’ ability to

evaluate accounting estimates, thereby enhancing the usefulness of financial information to

investors.

How did accounting, which used to be primarily based on facts (after all, accounting

comes from counting: units of money, inventory, etc.), become so totally reliant on managerial

estimates? The defining event was the significant policy shift made by the FASB in the late

1970s, adopted later by the IASB, from an income statement to a balance sheet standard-setting

model. This and other developments led the FASB to reject in the late 1970s the revenue-cost

matching as the main standard-setting model in favor of the balance sheet model, which calls for

the continuous valuation of assets and liabilities at fair value―essentially, a large-scale process

of estimation. Indeed, the standard-setting shift from the income statement to the balance sheet

model opened the way to the most consequential estimates-intensive accounting standards

enacted in the past three decades: Primarily, fair value accounting, assets and goodwill

impairments, stock option expensing, and revenue recognition. Research indeed relates the

dramatic rise in the number and impact of accounting estimates primarily to the various fair

value and asset write-off standards (Chen and Li 2017). Relatedly, Khan et al. (2017)

documented that estimates-intensive FASB standards detract from shareholder value.

There is, therefore, an urgent need to enhance the quality of accounting estimates and

auditors’ ability to independently evaluate the reliability of these estimates. Machine learning, as

demonstrated in this study, offers a unique opportunity to perform this task.

The order of discussion is as follows: Section II provides an overview of the background

of machine learning in accounting and reviews the literature in the insurance claims loss

5

estimation. Section III and Section IV discuss the machine learning algorithms used in this study

and the application of machine learning to generate insurance companies’ loss estimates,

respectively. We present our sample in Section V, while Section VI provides the empirical

results concerning the machine learning estimates compared with managers’ estimates.

Additional analyses on estimation errors are presented in Section VII. Finally, Section VIII

concludes the study.

II. BACKGROUND

1. Artificial Intelligence and Machine Learning in Accounting

The “management coefficients theory” proposed by Bowman (1963) highlights the

potential of automatic decision-making systems to generate consistent and reliable decisions.

Accordingly, researchers have explored the applications of mathematical models in the decision-

making process and found that statistical models frequently generate more reliable decisions and

forecasts than humans, especially when the latter rely on their intuition (e.g., Hoffman 1960;

Goldberg 1970; Dawes 1971). Building on these findings, researchers and practitioners have

developed computer-based techniques to aid and sometimes even replace human judgment (e.g.,

Dawes and Corrigan 1974). With the progress of computational capability and introduction of

advanced algorithms, business enters a higher level of automation, and the concept of artificial

intelligence spreads rapidly in the era of ‘big data’. Furthermore, instead of programming machines

apriori with known formulas or rules to solve problems, it is now possible to let machines learn

the rules on their own. By design, this data-driven process is more opaque than traditional closed-

form models that are usually guided by theories. Machine learning, generally defined as an

automatic strategy that optimizes a performance criterion through learning from experience or

examples (Alpaydin 2010), is now often used to deal with very complex data.

6

When used as a predictive tool, machine learning techniques have wide applications in

many domains. In accounting, the exploration of artificial intelligence applications started in the

1980s and covered a variety of tasks from financial report analyses to audit and assurance cases

(e.g., Hansen and Messier 1982; Murphy and Brown 1992; Lam 2004). For example, machine

learning models were developed to predict or detect firm events, such as corporate failure or

bankruptcy (see O’Leary 1998; Huang et al. 1994; Bell 1997; Zhang et al. 1999; McKee and

Lensberg 2002; Pendharkar 2005; Ding et al. 2019), bond ratings (Maher and Sen 1997), and

management fraud (Fanning and Cogger 1998). In financial market studies, Galeshchuk and

Mukherjee (2017), and Parot et al., (2019) used machine learning techniques to predict foreign

exchange rate changes and achieved remarkable forecasting accuracy (also see Dunis and Huang

2002; Dhamija and Bhalla 2011; Rivas et al. 2017). Furthermore, Cao and Tay (2003) showed the

promising performance of support vector machine (SVM) in predicting futures asset price

movements. Other machine learning models, such as artificial neural networks also performed well

in stock price predictions (e.g., Leigh et al. 2002; Zhang and Wu 2009).

Overall, increasing evidence indicates the potential of applying machine learning

techniques to perform complicated tasks in finance and accounting, and artificial intelligence has

been suggested as critical to the development of the accounting profession in the future (Elliot

1992). Along with previous studies developing systems to conduct audit tasks, including analytical

review procedures (e.g., Koskivaara 2004), risk assessment (Lin et al. 2003; Davis et al. 1997),

and going concern decisions (Lenard et al. 1995; Etheridge et al. 2000), we investigate the

application of machine learning in accounting in this paper. Focusing on the insurance industry,

we generate model estimates for a particular line item—insurance claims losses, and find that the

7

machine learning models provide more accurate estimates than managers’ estimates, and the

machine estimates are less affected by managerial incentives.

2. Errors in Insurance Claims Loss Estimation

Insurance companies provide protection to policyholders from certain risks that occur

within a predefined period. While insurers receive the policy premium payments before or early

during the period of coverage, the full costs of their activities―the total losses or claims by

policyholders―usually remain unknown for several years after the coverage period ended.

Insurance regulations generally require insurers to provide “management’s best estimate” for these

future claims in financial reports, and to disclose the gradual settlement of loss claims in the

following years. In other words, insurers match the ex-post payoffs directly to the year in which

the initial estimate was made and the related insurance premium revenue recognized. This setting

enables researchers to compare managers’ reported loss estimates to their eventual realizations

(Petroni 1992).

The unpaid component of the estimated future losses (claims) is the insurance loss reserve.

The loss reserve is often the most significant component in property and casualty (P/C) insurance

firms’ liabilities: it is reported that, on average, loss and loss adjustment expense reserves make

up approximately two-thirds of an insurer’s liabilities (Zhang and Browne 2013). Managers’ loss

estimation process is obviously subjective and requires considerable judgment because not all

claims for accidents that occur during a year are filed and settled by year-end. A substantial amount

of losses incurred may be “Incurred But Not Reported Claims”, in which case the insurance

policyholders do not report the losses to insurance firms by the end of the current year, but file the

claims in later years. In addition, after the claims are filed, the final cash settlement may take years

to complete. For example, injuries in a car accident may lead to several years of treatment and

8

result in extended payments. Thus, insurance firms must estimate the costs of claims filed during

the year as well as claims that are related to the current year but will be filed in subsequent years.

Given the material impact of loss estimation on insurance firms’ financial results and

condition, auditors, investors, and regulators are naturally concerned with the quality of the

estimates reported by managers. Barely do the initial loss estimates equal the ultimate losses paid,

thereby generating estimation errors. Previous studies classified the errors in estimation to two

major types: nondiscretionary and discretionary (e.g., McNichols 2000; Beaver and McNichols

1998). Nondiscretionary errors arise from the uncertainties in predicting future events. The

difficulties may be related to the economic conditions affecting the firm or the characteristics of

the business conducted (e.g., auto liability, homeowner/farmowner insurance). Discretionary

errors, on the other hand, are the result of intentional management actions (e.g., Beaver and

McNicoles,1998). Studies have examined the reasons for managers to manipulate loss reserves

and identified several incentives: tax deferral (Grace 1990; Petroni 1992; Nelson 2000), income

smoothing (Anderson 1973; Smith 1980; Weiss 1985; Beaver et al. 2003), solvency and regulatory

concerns (Forbes 1970; Petroni 1992; Nelson 2000; Gaver and Paterson 2004), and executive

compensation incentives (Browne et al. 2009; Eckles and Halek 2010; Hoyt and McCullough

2010). Anderson (1971) analyzed the insurance industry over the period 1955 to 1964 and

documented that insurers overestimated losses heavily in early times but gradually reduced the

degree of over-reserving to slightly under-reserving. However, Bierens and Bradford (2005) found

that insurance firms from 1983 to 1993, generally, tended to over-reserve. Grace and Leverty (2012)

utilized a more recent sample (1989 to 2002) and found that firms generally overestimated losses,

but there was considerable variation in insurers’ practices.

9

In sum, the findings of prior studies indicate the existence of considerable errors in loss

estimates and relate a certain portion of these errors to intentional management of earnings and

other financial items. In this study, we propose using machine learning techniques to generate

improved loss estimates.

III. MACHINE LEARNING TECHNIQUES

1. Machine learning algorithms

We ran in this study “horseraces” of four popular machine learning algorithms to predict

future insurance losses for five business lines and selected the algorithm with the best predictive

accuracy among those examined. 3 The model-generated predictions were then compared to

managers’ estimates included in financial reports. The four algorithms we used are: linear

regression, random forest, gradient boosting machine, and artificial neural network. We briefly

discuss each ML model used.

1) Linear regression

Some algorithms that were initially developed in the statistics field are “borrowed” by

machine learning for prediction purposes (Brownlee 2016a). Linear regression is a typical example.

When used as a machine learning algorithm, linear regression is a supervised learning method that

makes predictions based on the linear relationship between the numeric output and numeric input

attributes (Friedman 2001; Bishop 2006). The learning process estimates the coefficients of the

input attributes and aims to produce a prediction model that minimizes the mean squared error

between the prediction and the true value.

3 “Horseraces” refers to comparing the accuracy of machine learning algorithms in predicting insurance losses for

each business line of insurance.

10

To select data attributes for linear regression, we used the M5 method: In each iteration,

the attribute with the smallest standardized coefficient is selected and removed, and another

regression estimation is performed (Witten et al. 2011). If there is an improvement in the model

predictive accuracy in terms of the Akaike information criterion,4 the attribute is eliminated. This

process is repeated until no improvement is observed. We used the Cartesian grid searching

approach to configure optimal hyper-parameters5 for the linear regression model development as

well as for all other three models. Grid search methodically builds and evaluates a model through

each possible combination of a specified subset of hyper-parameters. For linear regression, the

hyper-parameters are the learning rate and the number of iterations. The grids of hyper-parameter

values we have tried are listed in Table 1. The learning rate, or step size, determines the amount

that the model weights are adjusted during training. The number of iterations is the number of

batches needed to complete one epoch, where an epoch is completed once an entire dataset is

passed forward and backward through the learning model. In other words, an epoch is the complete

cycle of an entire training data learned by a model.6

- Table 1 here -

2) Random forest

The random forest algorithm is derived from decision trees, which is a machine learning

technique that extracts information from data and displays it in a tree-like structure. A decision

tree consists of three components: a node, a branch, and a leaf: Each root node of the tree denotes

4 Akaike information criterion (AIC) is an estimator of the relative quality of a model named after the statistician

Hirotugu Akaike. 5 Hyper-parameter is a parameter whose value was set before the learning process begins. Setting up the value of a

hyper-parameter controls the process of defining the model. 6 For more information about learning rate and iteration, please see: https://towardsdatascience.com/epoch-vs-

iterations-vs-batch-size-4dfb9c7ce9c9

11

an input attribute; the tree splits into branches based on the input attributes, with each branch

representing a decision. The end of the branch is called a leaf, and each leaf node leads to a

prediction of the target value. Decision trees can be applied to either regression or classification

problem. We employed regression trees because the target variable (insurance losses) has

continuous values. A single decision tree could have limited capabilities to learn the data, whereas

random forest improves the accuracy of the decision trees with the ensemble technique (Breiman

2001, 2002). Specifically, the random forest algorithm first generates a “forest” of decision trees;

each tree uses a subset of randomly selected attributes. It then aggregates over the trees to yield

the most efficient predictor. In general, the higher the number of trees, the better the predictive

performance of the model. However, as adding too many trees can considerably slow down the

training process, after a certain point, the benefit in prediction performance from using more trees

will be lower than the cost of computation time for these additional trees (Ramon 2013).

The hyper-parameters for grid searching are the number of trees, the maximum depth of

the tree, and the minimum of the rows. The maximum value of tree depth represents the depth of

each tree in the random forest; A deeper tree will have more branches from the node to the root

node and capture more information about the data. However, as the tree depth increases, the model

may suffer from an overfitting problem because it captures too many details (Brownlee 2016b).

The third hyper-parameter refers to the minimum number of observations for a leaf. Although a

smaller leaf makes the model more prone to capturing noise in training data, a too small leaf size

may result in overfitting. On the other hand, if we choose a leaf size that is too large, the tree will

stop growing after a few splits, resulting in poor predictive performance. Table 1 provides the grids

of values that we have tried in the random forest.

12

3) Gradient boosting machine

Gradient boosting machine (GBM) uses an ensemble technique termed boosting to train

new prediction models with respect to the errors of the previous models, and converts weak

prediction models to stronger ones (Schapire 1990). The objective of boosting is to minimize the

model errors by adding weak learners (i.e., regression trees). After adding new trees, the learning

procedure subsequently corrects for errors made by previous trees and improves predictions to

reduce the residuals in a gradient descent manner (Friedman 2001; Mason et al. 2000). After the

number of trees reaches a limit, adding more trees will no longer improve the prediction

performance (Brownlee 2016b). The grid searching approach for GBM was the same as that for

the random forest, except that we started with five as the maximum tree depth to ensure that the

learners were weak but can still be constructed in the gradient boosting machine. The grids of

values tried are reported in Table 1.

4) Artificial neural networks

An artificial neural network consists of multiple layers of interconnected nodes between

the input and output data. Each layer transforms its input data into a more abstract representation,

which is then used as the input data by the next layer to produce representation. An artificial neural

network has three types of layers: input, hidden, and output layers. The input layer receives the

raw data of explanatory variables, and the number of nodes in the input layer equals the number

of the explanatory variables. Next, the input layer is connected to hidden layers, which apply

complex transformations to the incoming data and transmit the output to the next hidden layers.

The data processing is performed through a system of weighted connections: the values entering a

hidden node are multiplied by certain predetermined weights, and the weighted inputs are then

added to produce a single output. There may be one or more hidden layers. A neural network is

13

called deep when more than two hidden layers exist. The output of the final layer (called the output

layer) represents the extracted high-level information from the raw data (Sun and Vasarhelyi 2017).

We normalized all input variables to improve the model performance and used several different

values of hyper-parameters for grid searching. The details are provided in Table 1.

2. Data splitting and performance validation

We employed a training/validation/testing approach to partition our data. For each machine

learning algorithm, we fit the model based on a training dataset and tested it on a different set for

validation using the five-fold cross-validation method. Then, we utilized a new test dataset, known

as the holdout dataset, for robustness tests of the developed and validated models.

The five-fold cross-validation method is a widely-used technique in machine learning to

estimate a model’s performance. It is a resampling procedure used to evaluate the performance of

machine learning models on a limited data sample. Specifically, the data is shuffled randomly and

split into five groups. Each group is taken as a validation set, and the remaining four groups

combined are used as a training set. A model is developed based on the training set and evaluated

on the validation set using various evaluation metrics. The results are averaged to produce a unique

estimation. With this method, all observations are used for both training and validation, with each

observation used only once for validation.

We used a separate holdout sample to evaluate the practical usefulness of the machine

learning models. Specifically, we employed the models developed from the cross-validation

process to produce loss estimates for the holdout period following the training and validation

period. Observations in the holdout period have not been used in the model development process.

Figure 1 illustrates the splits. This method has two advantages in our setting. First, cross-validation

generally results in a less biased and more robust estimate of the model accuracy than a simple

14

train/test split method (Brownlee 2018). Second, the training/validation/testing approach alleviates

the potential problems introduced by the inherent time-series order of the data.

- Figure 1 here -

IV. APPLICATION TO INSURANCE LOSS ESTIMATION

We now explore the use of machine learning techniques in generating loss estimates for

insurance companies. We conducted two sets of tests for each line of insurance (private passenger

auto liability, commercial auto liability, etc.). The first set of tests did not include managers’ loss

estimates as an input attribute, keeping only variables that are not directly affected by managerial

judgement. In other words, the variables used in these tests are based on facts, such as the number

of outstanding claims, the number of loss claims closed with and without payment, etc. In the

second set of tests, we added managers’ initial estimates to the machine learning models. This

design enables us to evaluate the performance of machine learning techniques on a stand-alone

basis, as well as their incremental improvement over managers’ estimation. Moreover, because

firms may have experienced different loss claim and payment patterns during the 2008 financial

crisis, we constructed three test sets using different periods (See Figure 2).

- Figure 2 here -

1. Insurance Company Data

U.S. insurance companies follow the Statutory Accounting Principles (SAP)7 to prepare

statutory financial statements. Managers are required to provide the initial estimate for all losses

(payment to insured) incurred in the current year (paid, as well as expected to be paid in the future)

7 State laws and insurance regulations require that insurance companies operating in the United States and its

territories prepare statutory financial statements in accordance with Statutory Accounting Principles (SAP). SAP is

designed to assist state insurance departments to regulate insurance companies’ solvency.

15

as well as re-estimating the losses incurred in each of the previous nine years. In the statutory

filings, insurance firms report the estimated total losses as “incurred losses" and the actual

cumulative payments in each of the past ten years (including the current year). The difference

between the reported incurred losses and the cumulative paid losses for a given year is the reserve

for future loss payment.

As mentioned above, an advantage of the insurance industry data is that the extensive

reporting requirements under SAP make it possible to match the actual payoffs to insured directly

to the year in which the initial estimate was made. Table 2 provides an illustrative example of this

disclosure. Panel A shows the development of incurred losses for the National Lloyds Insurance

Company. By the end of 2008, the estimates for the most recent ten years (including the current

year) are disclosed in Column 10. For example, in 2008, the current estimates for the losses

incurred in the years 2007 and 2008 were $24,334 thousand and $36,893 thousand, respectively.

This is compared with the initial estimate for the losses incurred in 2007, which was $24,226

thousand (Column 9), meaning that Lloyds revised up the estimated losses incurred in 2007 by

$108 thousand ($24,334 - $24,226). Panel B reports the cumulative paid losses for each accident

year, from 1999 to 2008, by the end of 2008. For example, by the end of 2008, the company has

paid $32,324 thousand for the losses incurred during the year 2008, and $23,585 thousand for the

losses incurred during 2007 (Column 10). In this example, the cumulative paid losses and the

losses incurred for the year 1999 converge in 2006 and remain unchanged at $4,618 thousand

(Row 2), indicating that Lloyds likely paid off all the claims for accidents that occurred in 1999

by the end of 2006.

- Table 2 here -

16

2. Dependent Variable

Our dependent variable is the ActualLosses, which is the actual ultimate costs of insured

events occurring in the year of coverage. It is the sum of losses already paid in the coverage year

and the losses to be paid in the future. We chose total actual losses as our dependent variable

because they can be directly compared to managers’ “incurred losses” reported in annual reports.

Previous studies that examined insurance companies’ loss reserve errors measured the “actual

losses” as the cumulative losses paid during several subsequent years (Weiss 1985; Grace 1990;

Petroni and Beasley 1996; Browne et al. 2009; Grace and Leverty 2012). We measure in this study

the ActualLosses for an accident year t as the ten-year cumulative payment of losses incurred in

year t. This variable is extracted from the financial report in year t+9, which is also the last time

managers disclose the loss payment for the initial accident year t.8 For losses incurred in each

business line during a given accident year, we generated only one estimate based on the

information available at the end of this year, and the model estimates were compared to managers’

initial predictions made in year t.

3. Independent Variables

Our independent (predictor) variables consist of information already known at the time of

estimation, i.e., year t (no look-ahead bias). We included three sets of independent variables. First,

operational variables (e.g., claims outstanding, premiums written, premiums ceded to reinsurers)

for each business line were obtained from Schedule P of the statutory filings. The second set of

variables are company characteristics (e.g., total assets, State of operation) for the accident year.

8 We acknowledge the possibility that loss claims may take longer than ten years to settle, in which case the actual

losses are not fully captured by the cumulative payments in yeart+9. To alleviate the concern, we first discuss the

payment patterns of each business line studied in next section. Second, we used the manager estimates in yeart+9 as

our proxy for actual losses instead, and the main inferences remained unchanged.

17

Finally, we added exogenous environmental variables (personal consumption, GDP growth) that

reflect the macro-economic factors that may influence the payment for the accident year.

Definitions of all the independent variables are provided in Appendix A.

4. Evaluation Metrics

We used the mean absolute error (MAE) measure as the primary evaluation metric to

compare estimates with actuals. MAE is the average of the absolute differences between

predictions and actual observations. Compared to RMSE (root mean square error), MAE is easier

to interpret and less sensitive to outliers:

!"# = %&∑ |)*+,-./+,0 − !23,/#4567.5,0|&08% (1)

We also report results for the RMSE, which gives higher weights to large errors. The model

with the highest prediction power typically has a relatively small :!;#. We calculate the RMSE

as:

:!;# = <%&∑ ()*+,-./+,0 − !23,/#4567.5,0)?&08% (2)

We compared the machine learning loss predictions with managers’ estimates to evaluate the

machine’s performance, using the MAE or the RMSE.

V. OUR SAMPLE

We used the annual reports of US-based property-casualty insurance companies filed with

the National Association of Insurance Commissioners (NAIC). The data were extracted from the

SNL FIG website (S&P Global Market Intelligence platform), covering the period 1996 to 2017.

18

PC (property & casualty) insurers offer a wide variety of insurance products that cover

many different business lines, with each line having unique operating characteristics. The

following screening was separately performed on each business line to obtain the test samples:

1) For each business line, we first identified the insurance companies that had conducted

business in this line. To be included in the sample, the firm must have started this line’s

business before 2008 and remained active until 2017, so that we can extract the ultimate

payment (over ten years) for at least one accident year.

2) Firm-years with missing (zero) values for all operational variables were deleted from the

sample. If total asset, total liabilities, net premiums written, or the direct premiums written

were zero or negative, the firm-year was also excluded from the sample.

3) For each business line, we only kept observations that had positive total premiums and

cumulative paid losses.

We focused on five business lines: (1) private passenger auto liability, (2) commercial auto

liability, (3) workers’ compensation, (4) commercial multi-peril, and (5) homeowner/farmowner

line. The five lines were selected from the 20 business lines identified by NAIC primarily because

these business lines had a sufficiently large number of observations remaining after the screening

(more than 400 insurers), indicating significant operations in the business lines. We excluded

minor lines such as special liability, products liability, and international line, together making up

less than 5% of industry loss reserves (A. M. Best 1994). In the final sample, we have a total of

32,939 line-firm-year observations for all five business lines combined, with each line’s number

of observations provided in Table 4.

- Table 3 here -

19

Table 3 reports the payment patterns for each business line. The “tail” is an insurance

expression that describes the time between the occurrence of the insured event (accident) and the

final settlement of the claim. A longer tail usually implies higher uncertainty regarding the estimate

of ultimate losses. As indicated in Table 3, for the private passenger auto liability line, 40.64% of

the ultimate losses are paid off during the initial accident year, and 99.82% of the total payments

throughout the ten years are made during the first five years. The homeowner/farmowner business

line (bottom line) has a payment pattern that differs from the private passenger auto liability line

(top line). For homeowner/farmowner policies, the insurers pay off 72.62% of the ultimate losses

after the first year, and 93.50% of the loss payments are made by the end of the second year. This

suggests that the homeowner/farmowner business line has a relatively short tail compared to the

other four lines, consistent with prior studies that investigated the tail characteristics of insurance

business lines (Nelson 2000). Overall, for all five business lines, the majority of payments are

made during the first five years after the initial accident year.

Table 4 reports summary statistics for firms that operate in each business line. The private

passenger auto liability insurance is the largest business line in dollar terms, with total premiums

written of $142 million on average, followed by the homeowner/farmowner line, and workers’

compensation line. The total premiums written in the private passenger auto liability line during

2015 amount to $199.37 billion, making up 34% of all the PC insurance business.9 The commercial

auto liability and workers’ compensation are usually provided by larger insurance companies, with

average assets of $1,467 million and $1,437 million, respectively. In general, managers

overestimate the ultimate losses when they report their future loss projections for the first time,

except in the commercial auto liability line.

9 Insurance Information Institute, The Insurance Fact Book 2017.

20

- Table 4 Here -

VI. EMPIRICAL RESULTS

In this section, we first report the five-fold cross-validation machine learning results for

each business line and then present the holdout test results.10 For each machine learning algorithm,

we developed models with and without managers’ loss estimates as an input attribute, and the

results are reported separately. We then compared the model-generated estimates to managers’

predictions in financial reports. For example, the Sentry Select Insurance Company initially

estimated the total losses incurred for the year 2006 in the private passenger auto liability line as

$49,127 thousand, while the actual losses were $41,787 thousand.11 In the cross-validation process,

the random forest algorithm without managers’ loss estimates generated an estimate of $43,657

thousand, and the prediction was $41,990 thousand with managers’ estimates included in the

model. Thus, both machine models generated estimates which were substantially closer to the

actual losses than managers’ prediction. Moreover, in the holdout tests, where we used the model

developed from 1996 to 2006 to predict the losses incurred in 2007, the random forest models with

and without manager estimates predicted $38,953 thousand and $40,996 thousand, respectively.

The managers’ estimate was $43,650 thousand for the same year, while the actual losses turned

out to be $39,791 thousand, suggesting that the machine learning predictions were, again, more

accurate than the estimate provided by the firm.

We used the MAE and RMSE metrics to evaluate model performance. Overall, we found

that the random forest algorithm produced good predictions when the insurance business line has

long tails, whereas linear regression performed well when the tail is short and the business line is

10 Holdout test sample contains observations that have not been used in the cross-validation test. 11 This example is drawn from the cross-validation sample for the period 1996-2006.

21

complicated. Thus, we report the linear regression prediction results for the

homeowner/farmowner line and present the random forest prediction results for the other four

lines.12 We also report the model accuracy edge relative to manager estimates. The accuracy edge

of a machine learning model is computed as managers’ estimation MAE (RMSE) minus model

estimation MAE (RMSE), divided by managers’ estimation MAE (RMSE).

1. Five-fold cross-validation

As illustrated in Figure 2, we used three samples to train and validate the machine learning

models: the 1996-2005, 1996-2006, and 1996-2007 samples. Results are reported in Table 5 Panel

A. For each of the five business lines examined, we show the MAE and RMSE metrics of

managers’ estimation and the machine learning estimations, as well as the corresponding accuracy

edges. For example, managers’ estimation MAE (RMSE) for the private passenger auto liability

line (line no. 1, first row) during the period 1996-2005 were 9,461 (37,494). The random forest

algorithm without managers’ estimates yielded an MAE (RMSE) of 7,526 (36,694)� having an

accuracy edge of 20% (2%) over managers’ estimates. In the samples of 1996-2006 and 1996-

2007, the MAE and RMSE of the random forest estimation are both lower than those of managers,

indicating the higher prediction accuracy of machine learning. In the commercial auto liability line

(line no. 2), the random forest estimates were, on average, 26% (35%) more accurate than

managers’ predictions in terms of MAE (RMSE). When predicting losses for the workers’

compensation line (line no. 3), the random forest model exhibited a considerable accuracy edge:

on average, its MAE (RMSE) was 46% (39%) lower than managers’ estimation. Turning to the

commercial multi-peril line (line no. 4), the average accuracy edge of the random forest model

12 The results of cross-validation tests and holdout sample tests of the other machine learning models are provided in

Appendices B and C.

22

measured by the MAE (RMSE) was 25% (30%) relative to managers. Overall, the cross-validation

analyses of the first four business lines indicate that the random forest algorithm produced more

accurate estimations than managers.

- Table 5 here -

After including managers’ loss estimates as an additional attribute, machine learning

models continue to produce more accurate estimates than managers in these lines. The results are

reported in the right-hand side of Table 5. When estimating losses in the private passenger auto

liability line (line no. 1), the random forest algorithm had lower prediction errors than managers.

The MAE (RMSE) comparison results suggest that the random forest model achieved 29% (13%)

increase in estimation accuracy relative to managers. In the commercial auto liability (line no. 2)

and commercial multi-peril lines (line no. 4), the MAE and RMSE results suggest that the random

forest model achieved an increase in predictive accuracy by using managers’ estimates as inputs.

The accuracy edge of the random forest in the workers’ compensation line (line no. 3) was the

most significant― 50% (42%), based on the MAE (RMSE) on average.

In the homeowner/farmowner line (no. 5), however, managers outperformed the machine

learning models when managers’ loss estimates were not included in the models, which was the

only exception to the dominance of machine learning over managers in cross-validation analyses.

Consulting industry experts about the homeowner/farmowner exception, a possible explanation is

that the homeowner/farmowner line contains unique types of losses, such as catastrophes, property

damage, and bodily injury. These loss categories are not differentiated in firms’ financial reports.

Mixing up different loss types is problematic because different loss categories have their unique

payment patterns, which are intimately known to managers but not available in the development

of the machine learning prediction models. In addition, the homeowner/farmowner line has a

23

relatively short tail, implying that the majority of losses (claims) are reported and paid off during

the first year (See Table 3). In this case, the total losses for most homeowner/farmowner accidents

are already known to managers by year-end, making it challenging for machine learning models

to outperform managers. After we added managers’ loss estimates as an attribute to estimate losses

for the homeowner/farmowner line, we found the linear regression algorithm performed well. Its

MAE was lower than that of managers’ estimates in all samples, achieving an average accuracy

edge of 22%.

Collectively, our results suggest that machine learning models generate more accurate loss

predictions than managers in most circumstances. Furthermore, we found that, in general, models

incorporating ManagerEstimate have higher predictive accuracy compared to the models without

it.

To provide insight into the machine learning blackbox, we present in Table 5 Panel B the

top three significant variables identified by the random forest algorithm for each business line, and

the model MAE and RMSE after dropping these variables.13 For example, the premiums written

in the current accident year (LinePremiums) is the most powerful predictors for the random forest

algorithm to estimate losses in the private passenger auto liability line. After excluding

LinePremiums, the MAE (RMSE) of the random forest estimation increased from 7,526 (36,694)

to 7,824 (36,535), showing a decline in prediction accuracy. Direct premiums written during the

accident year (DPW) and total payment for the current accident year (LinePayment) are the second

and third most influential variables, respectively. The MAE (RMSE) was 7,859 (39,815) without

13 We thank an anonymous reviewer for suggesting this test. For brevity, we only report the variable importance for

models trained during the period 1996-2007 and without managers’ estimates due to the significant overlap of

important variables among these situations. The untabulated results suggest that ManagerEstimate (Managers’

estimation results) is an important predictor and including it generally increases the model accuracy. In addition, the

important variables for each line appear to be constant over the years.

24

LinePremiums and DPW, and 8,206 (46,073) without LinePremiums, DPW and LinePayment.

Similarly, in other lines, dropping the influential variables reduced the models’ prediction accuracy

as measured by both the MAE and RMSE. Overall, we observed that several predictors, such as

PaidLoss, LinePayment, and LinePremiums, consistently play an important role in generating

model predictions.

2. Holdout tests

Our holdout tests examined the predictive accuracy of machine learning models on holdout

samples―a more demanding prediction test. Specifically, the models developed from the cross-

validation process using the 1996-2005 sample were used to predict losses in 2006, and the models

estimated from the 1996-2006 (1996-2007) sample were employed to predict the losses in 2007

(2008).

- Table 6 here -

Table 6 Panel A reports the holdout test results. When ManagerEstimate was not included

in the model, the random forest algorithm consistently generated more accurate estimates than

managers’ estimation in the private passenger auto liability line (no. 1) based on the MAE analyses.

The RMSEs of the random forest algorithm’s prediction were similar to managers’ for the 2006

and 2007 holdout samples, and exhibited an accuracy edge of 17% for the 2008 holdout sample.

After we added ManagerEstimate to the model, its performance measured by both MAE and

RMSE was more accurate than managers’. In the commercial auto liability line (line no. 2), the

random forest without ManagerEstimate had smaller estimation errors compared to managers.

After we added ManagerEstimate to the model, its performance was further improved. In the

workers’ compensation line (line no. 3), the random forest predictions, with and without managers’

estimates, were markedly more accurate than managers’ estimation. Its accuracy edge over

25

managers was more than 50% for the 2006 and 2007 samples, although the edge became smaller

in 2008. Results of the commercial multi-peril line (line no. 4) suggest that the random forest

models outperformed managers in 2006 and 2007, but were surpassed by managers in 2008. The

model performance downturn in 2008 for the worker’s compensation and commercial multi-peril

lines might be caused by the economic turbulence during the financial crisis years, 2007-2009,

which interrupted the patterns of insurance loss payments. 14 Finally, the analyses of the

homeowner/farmowner line (no. 5) indicate that managers’ predictions were more accurate than

those of the linear regression when ManagerEstimate was not incorporated in the model. After

adding ManagerEstimate as an independent variable, the linear regression predictions improved

greatly, as indicated by the smaller values of RMSE than managers’ in all three holdout samples.

We used the bootstrap technique to examine the statistical significance of the difference in

prediction performance between machine learning models and managers.15 The bootstrap method

uses the existing sample to create a large number of simulated samples that can be used to estimate

the distribution of the performance difference. Specifically, we used the bootstrap to construct

10,000 simulated samples for each of our holdout samples, and for each of these bootstrap samples,

we computed the prediction difference as the average of the differences between managers’ and

the machine learning models’ prediction errors. These differences varied across the simulated

samples and formed a distribution. Table 6, Panel B reports the bootstrap mean of the prediction

difference, its standard error, and the result of a significance test of the prediction difference based

on the standard error. Overall, the differences between managers’ estimates and the machine

14 The National Bureau of Economic Research (NBER) marks December 2007-June 2009 as a peak recession

period. 15 We thank an anonymous reviewer for this suggestion.

26

learning predictions in the bootstrap analyses were consistent with the analyses using the holdout

samples.

Taken together, our results indicate the usefulness of machine learning models in

estimating insurance losses, particularly for long-tail lines of insurance business. In addition, the

random forest algorithm consistently shows superior prediction accuracy for long-tail business

lines and the linear regression performs better when the claims tail is short. Thus, it is essential to

understand the economics of a business line before applying a model to predict its losses.

Furthermore, leveraging the information in managers’ estimates enhances the prediction

performance of machine learning models.

VII. ADDITIONAL ANALYSES: ESTIMATION ERRORS

This section provides detailed analyses into managers’ and machine learning’s estimation

errors. We started by examining managers’ estimation errors and the rationales identified by prior

literature. Then, we investigated whether the ML model estimation errors were affected by the

incentives that lead to managers’ biases. In this section, managers’ estimation error

(ManagerError) is defined as the reported loss estimate minus the actual loss, divided by total

assets. Similarly, ML model estimation error (ModelError) is the model estimate minus the actual

loss, divided by total assets. Since the random forecast model performed well in most business

lines, we used the holdout prediction results generated by the random forest models for the years

2006, 2007, and 2008, to calculate model estimation errors. Table 7 compares the errors of

managers’ estimates to those of models’ estimates. On average, the model estimates were more

accurate than manager estimates: the average absolute estimation error of machine learning models

with (without) manager estimates as an input attribute was 0.0092 (0.0106), while the average

manager estimation error was 0.0128. The difference is significant at 1% level. In addition, the

27

managers’ estimation errors were more positive than model errors, suggesting that managers

tended to overstate insurance losses. The results are generally consistent with previous studies that

investigated insurance loss estimation errors (e.g., Grace and Leverty 2012).

1. Managers’ Estimation Errors

As discussed in Section II, insurance companies might manage (manipulate) loss estimates

to achieve certain objectives, like misreporting future loss estimates due to tax deferral incentives.

Because determining the taxable income involves loss estimates, insurers tend to shelter earnings

by overestimating loss reserves so that they can reduce current tax liabilities. Accordingly, we

measured insurers’ tax incentives with the variable TaxShield, which takes a larger value if the

insurer has a higher tax reduction incentive (e.g., Grace 1990):

).@;ℎ6,/3B =C,5DEF27,B + H244:,4,*I,B

)25./"44,54

Second, we evaluated the impact of the income smoothing incentive on managers’

estimation errors. According to the literature, insurers prefer smoother earnings to reduce external

finance costs (Froot et al. 1993), address managers’ career concerns (Grace 1990), or alleviate

regulatory concerns (Grace 1990). We followed prior research and used an insurer’s average return

on assets (ROA) during the past three years as our proxy for the income smoothing incentive

(Smooth). The intuition is that following three previous “good” years, insurers tend to

underestimate loss reserves in the current year in order to inflate earnings and continue the positive

trend (Grace 1990). In addition to the Smooth variable, we utilized another indicator variable,

SmallProfit, to capture the incentive of income smoothing. Specifically, Beaver et al. (2003) found

that firms with low positive earnings are likely to boost reported income by understating the loss

reserves. Similar to Grace and Leverty (2012), we identified the firm-years in the bottom 5 percent

28

of the positive earnings distribution. We expected these firms to have understated insurance loss

reserves.

Financial distress also drives insurance firms to manage reserve estimates. Financially

weak firms tend to under-reserve losses to present firm growth (Harrington and Danzon 1994) and

avoid regulatory scrutiny (Petroni 1992; Gaver and Paterson 2004). Insurance Regulatory

Information System (IRIS) ratio is used by regulators to monitor insurance firms’ financial

condition. The NAIC provides a “usual range” for each ratio; and if the ratio falls out of the range,

a ratio violation occurs. Therefore, we set the variable Violation equal to 1 if the firm has at least

one IRIS ratio violation, and 0 otherwise. We expected that firms with IRIS ratio violations are

more likely to underestimate losses. In addition, we added an indicator variable, Insolvency, which

equals one if the Risk Based Capital (RBC) ratio is smaller than two, and 0 otherwise. The RBC

ratio is defined as the total adjusted capital divided by the authorized control level RBC, a

hypothetical minimum capital level determined by the RBC formula. Insurers are required to

maintain sufficient capital, and regulators may take actions if the RBC ratio is less than two.

We included a series of control variables in our regression analyses: The insurance firms’

organizational structure—Mutual (whether it is a mutual insurer), Group (whether it is associated

with a group), or Public company (whether the firm is publicly traded). We also controlled for

other firm and business line characteristics. Detailed control variable definitions are provided in

Appendix A. Finally, we included the year and business line fixed effects. The following

regression model was used to examine the incentives related to manager estimation errors:

!.E.J,*#**2* = αL + α%).@;ℎ6,/3 + M%;7225ℎ + M?-62/.562E4 +

MNSmallProfit + αY-62/.562E + MZDE42/I,EI[ + M\;6], + M^;7.//H244 +

29

M_`*2a65 + MbH244 + M%LH6E,46], + M%%:,6E4+*.EF, + M%?`+c/6F +

M%N!+5+./ + M%Yd*2+e + f6@,3#aa,F54 + g. (3)

The regression results are reported in the column (1) and (2) of Table 8. Consistent with

prior research, we found that firms with a stronger tax reduction incentive were more likely to

over-reserve, as indicated by the significant and positive coefficient of TaxShield (Coeff. = 0.052,

p<0.01). The significantly negative values for the coefficients of -62/.562E (Coeff. = −0.002,

p<0.01) and SmallProfit (Coeff. = −0.003, p<0.1) suggest that financially weak firms and firms

with greater incentives to smooth income tended to report underestimated insurance losses. The

findings support the literature that managerial incentives, including tax reduction, income

smoothing, and financial strength concerns, affect insurance firms’ reserve levels. We performed

this analysis here to provide a benchmark for the next, new analysis of machine learning errors.

2. Model Estimation Errors

In this study we are particularly interested in whether our machine model estimates are less

affected by managers’ incentives, as one would expect. Therefore, we re-run equation (3), but

replaced the dependent variable with ModelError. If the model estimates were less affected by the

incentive biases than manager estimates, we would expect the coefficients of the incentive

variables in the model regressions to be insignificant. Column (3) in Table 8 reports the regression

results of the models without managers’ estimates as an input variable. The results indeed indicate

that the incentives that drive managerial estimation biases did not affect the model estimates, as

none of the incentive variables were statistically significant. However, when we used the models

including manager estimates, the coefficient of SmallProfit became significant at 5% level,

suggesting that incorporating managers’ estimates might also bring their biases into the models.

Overall, the results indicate that the influence of managerial incentives is hardly present in the

30

model estimation, and explain, in part, the model’s superior performance.16 Our findings imply

that machine learning models can to some extent alleviate the concerns with managers’ intentional

biases in making accounting estimation.

VIII. CONCLUSIONS

Managerial subjective estimates are endemic to financial information, and their frequency

and impact is constantly increasing, mainly by the expansion of fair value accounting by standard

setters. The adverse impact of managerial estimation errors, both intentional and unintentional, on

the quality of financial information is largely unknown (and unresearched), but it is likely high.

Undoubtedly, improvement in the quality and reliability of accounting estimates will substantially

enhance the relevance and usefulness of financial information.

Accounting estimates generated by machine learning are potentially superior to managerial

estimates because machines don’t manipulate, and they may use the archival (training) data more

consistently and systematically than managers. On the other hand, managers may include in their

estimates (forecasts) forward-looking information (e.g., on expected inflation, or the state of the

economy) that machines obviously ignore. Accordingly, the superiority of machines over humans

in generating accounting estimates is an empirical question, examined in this study.

Our results, based on a large set of insurance companies’ loss (future claim payments)

estimates, revisions, and realizations, indicate that with but one exception (homeowner/farmowner

insurance), our loss estimates generated by machine learning are superior to managers’ actual

estimates underlying financial reports, and in some cases, substantially superior. This is a

surprising and very encouraging finding, given the urgency to improve accounting estimates. At

16 We have re-estimated the regressions excluding the Workers’ Compensation line/using linear regression algorithm

predictions for the homeowner/farmowner line, and the results remain unchanged.

31

this early stage of applying machine learning to estimate accounting numbers, we don’t know how

general our findings are. Obviously, more research is needed to establish and generalize the use of

machine learning for other types of accounting estimates, such as bad debts and warranty reserves.

Accounting estimates generated by machine learning may have multiple uses in practice.

They can be used by managers and auditors as benchmarks against which managers’ estimates will

be compared, with large deviations suggesting a reexamination of managers’ estimates.

Alternatively, machine learning could be used to generate managers’ estimates in the first place,

enhancing the reliability (no manipulation) and consistency of accounting estimates. In any case,

the potential of machine learning, whose use is fast expanding in many other fields, to improve

financial information should be further researched.

32

REFERENCES

A. M. Best Company. 1994. Best’s Aggregates and Averages: Property-Casualty Edition.

Oldwick, NJ: A. M. Best Company.Alpaydin, E. 2010. Introduction to machine learning: MIT

press.

Anderson, D. R. 1973. Effects of loss reserve evaluation upon policyholders’ surplus. Madison,

Wisconsin: Bureau of Business Research and Service, University of Wisconsin, Monograph 6.

Anderson, D. R. 1971. Effects of under and overevaluations in loss reserves. Journal of Risk and

Insurance: 585-600.

Beaver, W. H., and M. F. McNichols. 1998. The characteristics and valuation of loss reserves of

property casualty insurers. Review of Accounting Studies 3 (1-2): 73-95.

Beaver, W. H., M. F. McNichols, and K. K. Nelson. 2003. Management of the loss reserve

accrual and the distribution of earnings in the property-casualty insurance industry. Journal of

Accounting and Economics 35 (3): 347-376.

Bell, T. B. 1997. Neural nets or the logit model? A comparison of each model’s ability to predict

commercial bank failures. Intelligent Systems in Accounting, Finance & Management 6 (3): 249-

264.

Bierens, H. J., and D. F. Bradford. 2005. Are Property-Casualty Insurance Reserves Biased? A

Non-Standard Random Effects Panel Data Analysis 1.

Bishop, C. M. 2006. Pattern recognition and machine learning: springer.

Bowman, E. H. 1963. Consistency and optimality in managerial decision making. Management

Science 9 (2): 310-321.

Breiman, L. 2001. Random forests. Machine Learning 45 (1): 5-32.

Breiman, L. 2002. Using models to infer mechanisms. IMS Wald Lecture 2.

Browne, M. J., Yu-Luen Ma, and P. Wang. 2009. Stock-Based Executive Compensation and

Reserve Errors in the Property and Casualty Insurance Industry. Journal of Insurance Regulation

27 (4): 35-54.

Brownlee, J. 2016a. How to Tune the Number and Size of Decision Trees with XGBoost in

Python. Machine learning mastery. https://machinelearningmastery.com/tune-number-size-

decision-trees-xgboost-python/.

Brownlee, J. 2016b. linear Regression for machine learning. Machine learning

mastery. https://machinelearningmastery.com/linear-regression-for-machine-learning/.

33

Brownlee, J. 2018. A gentle introduction to k-fold cross-validation. Machine learning

mastery. https://machinelearningmastery.com/k-fold-cross-validation/.

Cao, L., and F. E. H. Tay. 2003. Support vector machine with adaptive parameters in financial

time series forecasting. IEEE Transactions on Neural Networks 14 (6): 1506-1518.

Chen, J. V., and F. Li. 2017. Estimating the amount of estimation in accruals. Available at SSRN

2738842.

Davis, J. T., A. P. Massey, and R. E. Lovell II. 1997. Supporting a complex audit judgment task:

An expert network approach. European Journal of Operational Research 103 (2): 350-372.

Dawes, R. M. 1971. A case study of graduate admissions: Application of three principles of

human decision making. American psychologist 26 (2): 180.

Dawes, R. M., and B. Corrigan. 1974. Linear models in decision making. Psychological bulletin

81 (2): 95.

Dechow, P. M., W. Ge, C. R. Larson, and R. G. Sloan. 2011. Predicting material accounting

misstatements. Contemporary accounting research 28 (1): 17-82.

Dhamija, A., and V. Bhalla. 2011. Exchange rate forecasting: comparison of various

architectures of neural networks. Neural Computing and Applications 20 (3): 355-363.

Ding, K., X. Peng, and Y. Wang. 2019. A Machine Learning-Based Peer Selection Method with

Financial Ratios. Accounting Horizons.

Dunis, C. L., and X. Huang. 2002. Forecasting and trading currency volatility: An application of

recurrent neural regression and model combination. Journal of Forecasting 21 (5): 317-354.

Eckles, D. L., and M. Halek. 2010. Insurer reserve error and executive compensation. Journal of

Risk and Insurance 77 (2): 329-346.

Elliott, R. K. 1992. The third wave breaks on the shores of accounting. Accounting Horizons 6

(2): 61.

Etheridge, H. L., R. S. Sriram, and H. K. Hsu. 2000. A comparison of selected artificial neural

networks that help auditors evaluate client financial viability. Decision Sciences 31 (2): 531-550.

Fanning, K. M., and K. O. Cogger. 1998. Neural network detection of management fraud using

published financial data. Intelligent Systems in Accounting, Finance & Management 7 (1): 21-

41.

Financial Accounting Standards Board (FASB). 1998. The Framework of Financial Accounting

Concepts and Standards.

Forbes, S. W. 1970. Loss reserving performance within the regulatory framework. Journal of

Risk and Insurance: 527-538.

34

Friedman, J. H. 2001. Greedy function approximation: a gradient boosting machine. Annals of

statistics: 1189-1232.

Froot, K. A., D. S. Scharfstein, and J. C. Stein. 1993. Risk management: Coordinating corporate

investment and financing policies. the Journal of Finance 48 (5): 1629-1658.

Galeshchuk, S., and S. Mukherjee. 2017. Deep networks for predicting direction of change in

foreign exchange rates. Intelligent Systems in Accounting, Finance and Management 24 (4):

100-110.

Gaver, J. J., and J. S. Paterson. 2004. Do insurers manipulate loss reserves to mask solvency

problems?. Journal of Accounting and Economics 37 (3): 393-416.

Goldberg, L. R. 1970. Man versus model of man: A rationale, plus some evidence, for a method

of improving on clinical inferences. Psychological bulletin 73 (6): 422.

Grace, E. V. 1990. Property-liability insurer reserve errors: A theoretical and empirical analysis.

Journal of Risk and Insurance (1986-1998) 57 (1): 28.

Grace, M. F., and J. T. Leverty. 2012. Property–liability insurer reserve error: Motive,

manipulation, or mistake. Journal of Risk and Insurance 79 (2): 351-380.

Hansen, J. V., and W. F. Messier. 1982. Expert systems for decision support in EDP auditing.

International journal of computer & information sciences 11 (5): 357-379.

Harrington, S. E., and P. M. Danzon. 1994. Price cutting in liability insurance markets. Journal

of Business: 511-538.

Hoffman, P. J. 1960. The paramorphic representation of clinical judgment. Psychological

bulletin 57 (2): 116.

Hoyt, R. E., and K. A. McCullough. 2010. Managerial Discretion and the Impact of Risk-Based

Capital Requirements on Property-Liability Insurer Reserving Practices. Journal of Insurance

Regulation 29 (2).

Huang, C., R. E. Dorsey, and M. A. Boose. 1994. Life Insurer Financial Distress Prediction: A

Neural Network Model. Journal of Insurance Regulation 13 (2).

Khan, U., B. Li, S. Rajgopal, and M. Venkatachalam. 2017. Do the FASB's Standards Add

Shareholder Value?. The Accounting Review 93 (2): 209-247.

Koskivaara E. 2004. Artificial neural networks in analytical review procedures. Managerial

Auditing Journal 19(2): 191–223.

Lam, M. 2004. Neural network techniques for financial performance prediction: integrating

fundamental and technical analysis. Decision Support Systems 37 (4): 567-581.

Leigh, W., R. Purvis, and J. M. Ragusa. 2002. Forecasting the NYSE composite index with

technical analysis, pattern recognizer, neural network, and genetic algorithm: a case study in

romantic decision support. Decision Support Systems 32 (4): 361-377.

35

Lenard, M. J., P. Alam, and G. R. Madey. 1995. The application of neural networks and a

qualitative response model to the auditor's going concern uncertainty decision. Decision Sciences

26 (2): 209-227.

Li, K. K., and R. G. Sloan. 2017. Has goodwill accounting gone bad?. Review of Accounting

Studies 22 (2): 964-1003.

Lin, J. W., M. I. Hwang, and J. D. Becker. 2003. A fuzzy neural network for assessing the risk of

fraudulent financial reporting. Managerial Auditing Journal 18 (8): 657-665.

Maher, J. J., and T. K. Sen. 1997. Predicting bond ratings using neural networks: a comparison

with logistic regression. Intelligent Systems in Accounting, Finance & Management 6 (1): 59-72.

Mason, L., J. Baxter, P. L. Bartlett, and M. R. Frean. 2000. Boosting algorithms as gradient

descent. Paper presented at Advances in neural information processing systems.

McKee, T. E., and T. Lensberg. 2002. Genetic programming and rough sets: A hybrid approach

to bankruptcy classification. European Journal of Operational Research 138 (2): 436-451.

McNichols, M. F. 2000. Research design issues in earnings management studies. Journal of

Accounting and Public Policy 19 (4-5): 313-345.

Murphy, D., and C. E. Brown. 1992. The uses of advanced information technology in audit

planning. Intelligent Systems in Accounting, Finance and Management 1 (3): 187-193.

Nelson, K. K. 2000. Rate regulation, competition, and loss reserve discounting by property-

casualty insurers. The Accounting Review 75 (1): 115-138.

O’Leary, D. E. 1998. Using neural networks to predict corporate failure. Intelligent Systems in

Accounting, Finance & Management 7 (3): 187-197.

Parot, A., K. Michell, and W. D. Kristjanpoller. 2019. Using Artificial Neural Networks to

forecast Exchange Rate, including VAR‐VECM residual analysis and prediction linear

combination. Intelligent Systems in Accounting, Finance and Management 26 (1): 3-15.

Paton, W. A., and A. C. Littleton. 1970. An introduction to corporate accounting standards:

American Accounting Association.

Public Company Accounting Oversight Board (PCAOB), 2018, Auditing Accounting Estimates,

Including Fair Value Measurements, PCAOB Release No. 2018-005.

Pendharkar, P. C. 2005. A threshold-varying artificial neural network approach for classification

and its application to bankruptcy prediction problem. Computers & Operations Research 32 (10):

2561-2582.

Petroni, K. R. 1992. Optimistic reporting in the property-casualty insurance industry. Journal of

Accounting and Economics 15 (4): 485-508.

Petroni, K., and M. Beasley. 1996. Errors in Accounting Estimates and Their Relation to Audit

Firm Type. Journal of Accounting Research 34 (1): 151-171.

36

Ramon, J. 2013. Responses to question “How to determine the number of trees to be generated in

Random Forest algorithm?”. ResearchGate.

https://www.researchgate.net/post/How_to_determine_the_number_of_trees_to_be_generated_in

_Random_Forest_algorithm

Rivas, V., E. Parras‐Gutiérrez, J. Merelo, M. G. Arenas, and P. García‐Fernández. 2017. Time

series forecasting using evolutionary neural nets implemented in a volunteer computing system.

Intelligent Systems in Accounting, Finance and Management 24 (2-3): 87-95.

Schapire, R. E. 1990. The strength of weak learnability. Machine Learning 5 (2): 197-227.

Smith, B. D. 1980. An analysis of auto liability loss reserves and underwriting results. Journal of

Risk and Insurance: 305-320.

Sun, T., and M. A. Vasarhelyi. 2017. Deep Learning and the Future of Auditing: How an

Evolving Technology Could Transform Analysis and Improve Judgment. CPA Journal 87 (6).

Weiss, M. 1985. A Multivariate Analysis of Loss Reserving Estimates in Property-Liability

Insurers. The Journal of risk and insurance 52 (2): 199-221.

Witten, I. H., E. Frank, and M. A. Hall. 2011. Data Mining: Practical machine learning tools and

techniques.

Zhang, C., and M. J. Browne. 2013. Loss Reserve Errors, Income Smoothing and Firm Risk of

Property and Casualty Insurance Companies. Paper presented at Annual Meeting of the

American Risk and Insurance Association, Working Pa.

Zhang, G., M. Y. Hu, B. E. Patuwo, and D. C. Indro. 1999. Artificial neural networks in

bankruptcy prediction: General framework and cross-validation analysis. European Journal of

Operational Research 116 (1): 16-32.

Zhang, Y., and L. Wu. 2009. Stock market prediction of S&P 500 via combination of improved

BCO approach and BP neural network. Expert Systems with Applications 36 (5): 8849-885

37

Figure 1: Illustration of the training/validation/testing approach

Validate Train

Validate

Validate

Validate

Validate

Training/Validation Set Holdout Sample

Train

Train

Train

Train

Train

Train

Train

38

Figure 2: Three sets of test sample

Validate Train

Validate Train

Validate Train

Train Validate

Train Validate

Validate Train

Validate Train

Validate Train

Train Validate

Train Validate

Validate Train

Validate Train

Validate Train

Train Validate

Train Validate

39

Table 1: Tuning details for machine learning algorithms

Linear regression Learning Rate 0.1, 0.01, 0.001, 0.0001

Number of iterations 10, 50, 100, 500, 1000

Random forest the Number of Trees 50, 100

the Maximum depth of the tree 20, 30, 50

the Minimum Number of Raw 1, 5, 10, 50

Gradient boosting machine the Number of Trees 50, 100

the Maximum depth of the tree 5, 10, 20, 30, 50

the Minimum Number of Raw 1, 5, 10, 50

Artificial neural networks Activation Function Rectifier, Tanh, Max out, Rectifier with Dropout,

Tanh with Dropout, Max out With Dropout

Number of Hidden Layers 2,3, 4

Number of Nodes

the first layer had a number of nodes equal to the

number of independent variables, in each additional

layer the number of nodes decreased by

approximately 50%

Number of Epochs 10, 50, 100

Learning Rate examined at 100 possible learning rates, scaled from

0.0001 to 0.000001.

40

Tabl

e 2:

Illu

stra

tion

of in

curr

ed lo

sses

and

cum

ulat

ive

paid

net

loss

es r

epor

ted

in 2

008

for

Nat

iona

l Llo

yds I

nsur

ance

C

ompa

ny

41

Tabl

e 3:

Cum

ulat

ive

paym

ent p

erce

ntag

e in

the

first

five

yea

rs fo

r ea

ch b

usin

ess l

ine

Bus

ines

s Lin

e Y

ear 0

Y

ear 1

Y

ear 2

Y

ear 3

Y

ear 4

Y

ear 5

Priv

ate

Pass

enge

r Aut

o Li

abili

ty

40.6

4%

72.4

4%

86.7

6%

94.2

0%

97.6

1%

99.0

6%

Com

mer

cial

Aut

o Li

abili

ty

25.0

3%

50.7

4%

70.9

0%

85.5

7%

93.8

8%

97.7

0%

Wor

kers

’ Com

pens

atio

n 24

.99%

56

.11%

72

.90%

83

.20%

89

.09%

93

.03%

Com

mer

cial

Mul

ti-Pe

ril

44.5

2%

69.2

2%

80.0

3%

88.5

8%

93.8

5%

97.1

1%

Hom

eow

ner/F

arm

owne

r 72

.62%

93

.50%

96

.83%

98

.58%

99

.48%

99

.82%

42

Tabl

e 4:

Sum

mar

y St

atist

ics f

or th

e fiv

e in

sura

nce

busin

ess l

ines

Priv

ate

Pass

enge

r Aut

o Li

abili

ty

Com

mer

cial

Aut

o Li

abili

ty

Wor

kers

’ Com

pens

atio

n C

omm

erci

al M

ulti-

Peril

H

omeo

wne

r/Far

mow

ner

Var

iabl

es

Mea

n St

d. D

ev

Obs

M

ean

Std.

Dev

O

bs

Mea

n St

d. D

ev

Obs

M

ean

Std.

Dev

O

bs

Mea

n St

d. D

ev

Obs

Act

ualL

osse

s 90

246

5627

76

7,23

9 17

764

5213

9 6,

549

3883

6 11

9348

5,

118

2443

0 81

218

6,40

9 41

256

2563

40

7,62

4

Man

ager

Estim

ate

9318

3 58

0029

7,

239

1686

4 45

651

6,54

9 44

987

1365

26

5,11

8 24

818

7909

9 6,

409

4163

0 25

7594

7,

624

Ass

ets

1255

516

5130

236

7,23

9 14

6654

5 55

9763

1 6,

549

1436

801

4278

044

5,11

8 12

7587

0 43

0255

8 6,

409

4951

59

3776

991

7,62

4

Liab

ilitie

s 77

2594

28

9179

6 7,

239

9030

94

3178

499

6,54

9 95

3594

28

5431

3 5,

118

8342

12

2842

832

6,40

9 35

6919

12

8932

7 7,

624

NPW

42

9264

17

7583

4 7,

239

4626

30

1864

671

6,54

9 40

5332

11

2440

3 5,

118

3936

67

1414

870

6,40

9 36

5591

13

1120

9 7,

624

DPW

37

3990

14

1132

2 7,

239

4049

79

1480

464

6,54

9 37

0433

95

2908

5,

118

3490

02

1042

452

6,40

9 32

8399

97

7791

7,

624

NPE

42

0376

17

5563

8 7,

239

4517

07

1840

323

6,54

9 39

4541

10

9186

5 5,

118

3833

53

1387

281

6,40

9 35

6919

12

8932

7 7,

624

NI

4080

6 23

4719

7,

239

4572

5 25

2965

6,

549

4120

0 22

0877

5,

118

3971

0 23

7906

6,

409

3558

9 21

5027

7,

624

Paid

Cla

im

1460

1 86

669

7,23

9 14

72

4354

6,

549

3861

13

142

5,11

8 13

85

6097

6,

409

8823

55

271

7,62

4

Unp

aidC

laim

73

60

4995

6 7,

239

773

2764

6,

549

1792

88

58

5,11

8 74

4 23

78

6,40

9 28

31

1802

6 7,

624

Out

stC

laim

59

77

3217

8 7,

239

686

1897

6,

549

1976

56

47

5,11

8 54

6 15

47

6,40

9 11

03

5053

7,

624

Rep

orte

dCla

im

2793

5 16

2349

7,

239

2929

85

08

6,54

9 75

42

2586

7 5,

118

2670

93

84

6,40

9 12

668

7627

6 7,

624

Line

Prem

ium

s 14

2228

81

2935

7,

239

3191

2 87

021

6,54

9 77

485

2230

37

5,11

8 48

453

1437

24

6,40

9 73

833

3977

83

7,62

4

Prem

ium

Ced

ed

1298

0 45

691

7,23

9 68

91

2989

2 6,

549

1629

7 64

708

5,11

8 10

757

4151

8 6,

409

1170

0 47

174

7,62

4

Line

Paym

ent

3975

1 24

5636

7,

239

4547

12

484

6,54

9 98

89

2931

2 5,

118

1082

7 33

913

6,40

9 32

754

1919

33

7,62

4

Paym

entC

eded

34

51

1304

3 7,

239

851

4254

6,

549

1625

72

18

5,11

8 20

35

9569

6,

409

4652

34

432

7,62

4

Line

DC

C

619

2963

7,

239

203

889

6,54

9 74

0 33

50

5,11

8 26

9 88

5 6,

409

462

2356

7,

624

DC

CC

eded

89

58

0 7,

239

52

430

6,54

9 11

6 96

1 5,

118

56

427

6,40

9 64

38

8 7,

624

SSR

58

1 41

58

7,23

9 81

20

58

6,54

9 42

13

57

5,11

8 15

5 48

29

6,40

9 10

8 80

2 7,

624

Paid

Loss

37

182

2456

63

7,23

9 39

64

1098

0 6,

549

9220

28

873

5,11

8 92

93

3013

3 6,

409

2888

3 18

2834

7,

624

Gdp

Chg

2.

55

2.36

7,

239

2.43

2.

29

6,54

9 2.

45

2.33

5,

118

2.50

2.

30

6,40

9 2.

55

2.38

7,

624

Infla

tion

5.25

1.

46

7,23

9 5.

23

1.43

6,

549

5.24

1.

43

5,11

8 5.

26

1.41

6,

409

5.28

1.

47

7,62

4

All

varia

bles

are

def

ined

in A

ppen

dix

A a

nd a

ll va

riabl

es a

re in

thou

sand

s of d

olla

rs e

xcep

t the

GdpChg

and

Inflation,

whi

ch a

re in

per

cent

ages

.

�

43

Tabl

e 5:

Cro

ss-v

alid

atio

n re

sults

Pa

nel A

Cro

ss-v

alid

atio

n re

sults

Bus

ines

s lin

e Tr

aini

ng/V

alid

atio

n Sa

mpl

e O

bs

Man

ager

s’ E

stim

ates

M

achi

ne le

arni

ng w

ithou

t man

ager

est

imat

es

Mac

hine

lear

ning

with

man

ager

est

imat

es

M

AE

RM

SE

MA

E R

MSE

A

ccur

acy

edge

M

AE

RM

SE

Acc

urac

y ed

ge

�

MA

E)

(RM

SE)

�M

AE)

�

RM

SE)

Priv

ate

Pass

enge

r Aut

o Li

abili

ty

Ran

dom

fore

st

Ran

dom

fore

st

1996

-200

5 5,

949

9,46

1 37

,494

7,

526

36,6

94

20%

2%

6,

742

31,3

52

29%

16

%

1996

-200

6 6,

298

9,79

3 38

,266

7,

367

34,8

30

25%

9%

6,

945

34,3

66

29%

10

%

1996

-200

7 6,

602

9,57

5 37

,940

7,

363

35,1

27

23%

7%

6,

695

32,7

24

30%

14

%

Com

mer

cial

Aut

o Li

abili

ty

Ran

dom

fore

st

Ran

dom

fore

st

1996

-200

5 5,

383

4,20

9 18

,562

3,

135

12,2

30

26%

34

%

2,77

3 10

,802

34

%

42%

19

96-2

006

5,66

1 4,

155

18,3

75

3,03

7 11

,771

27

%

36%

2,

887

11,7

67

31%

36

%

1996

-200

7 5,

957

4,33

8 19

,175

3,

242

12,2

51

25%

36

%

2,93

6 12

,047

32

%

37%

Wor

kers

’ Com

pens

atio

n

Ran

dom

fore

st

Ran

dom

fore

st

1996

-200

5 4,

183

11,5

47

43,6

52

6,46

2 28

,046

44

%

36%

6,

293

28,2

52

46%

35

%

1996

-200

6 4,

398

12,3

60

44,1

87

6,60

9 27

,155

47

%

39%

6,

112

25,0

73

51%

43

%

1996

-200

7 4,

645

13,2

14

47,5

41

6,84

0 27

,769

48

%

42%

6,

300

24,8

12

52%

48

%

Com

mer

cial

Mul

ti-Pe

ril

Ran

dom

fore

st

Ran

dom

fore

st

1996

-200

5 5,

235

5,73

7 27

,615

4,

382

19,8

89

24%

28

%

4,08

7 18

,569

29

%

33%

19

96-2

006

5,45

7 5,

871

27,9

31

4,27

6 18

,784

27

%

33%

4,

036

18,3

91

31%

34

%

1996

-200

7 5,

846

6,01

7 28

,349

4,

500

20,2

24

25%

29

%

4,13

5 19

,253

31

%

32%

Hom

eow

ner/F

arm

owne

r

Line

ar r

egre

ssio

n Li

near

reg

ress

ion

1996

-200

5 6,

121

3,90

5 16

,789

5,

764

24,2

42

-48%

-4

4%

3,30

3 17

,154

15

%

-2%

19

96-2

006

6,54

4 3,

878

16,6

11

5,23

2 20

,223

-3

5%

-22%

2,

916

12,7

03

25%

24

%

1996

-200

7 6,

946

3,96

2 16

,826

5,

274

20,4

38

-33%

-2

1%

2,95

0 12

,786

26

%

24%

Pa

nel B

Influ

entia

l var

iabl

es

Bus

ines

s Lin

e To

p th

ree

impo

rtant

var

iabl

es

MA

E w

ithou

t the

fir

st v

aria

ble

RM

SE w

ithou

t the

fir

st v

aria

ble

MA

E w

ithou

t the

fir

st tw

o va

riabl

e R

MSE

with

out t

he

first

two

varia

ble

MA

E w

ithou

t the

fir

st th

ree

varia

ble

RM

SE w

ithou

t the

fir

st th

ree

varia

ble

Priv

ate

Pass

enge

r Aut

o Li

abili

ty

Line

Prem

ium

s, D

PW,

Line

Paym

ent

7,82

4 36

,535

7,

859

39,8

15

8,20

6 46

,073

Com

mer

cial

Aut

o Li

abili

ty

Paid

Loss

, Lin

ePre

miu

ms,

Line

Paym

ent

3,47

3 12

,915

4,

189

16,0

47

4,49

3 16

,723

Wor

kers

’ Com

pens

atio

n Pa

idLo

ss, L

ineP

aym

ent,

Line

Prem

ium

s 7,

556

32,0

63

8,88

7 35

,384

9,

554

35,9

74

Com

mer

cial

Mul

ti-Pe

ril

Paid

Loss

, Lin

ePre

miu

ms,

Line

Paym

ent

5,35

0 23

,229

5,

888

25,3

46

6,31

8 27

,192

Hom

eow

ner/F

arm

owne

r Li

nePr

emiu

ms,

Paid

Loss

, Pa

idC

laim

7,

354

20,1

88

5,34

3 20

,663

5,

290

20,4

02

44

Tab

le 6

: Hol

dout

test

res

ults

Pa

nel A

Hol

dout

test

resu

lts

Bus

ines

s lin

e H

oldo

ut

Sam

ple

Obs

M

anag

ers’

Est

imat

es

Mac

hine

lear

ning

with

out m

anag

er e

stim

ates

M

achi

ne le

arni

ng w

ith m

anag

er e

stim

ates

MA

E R

MSE

M

AE

RM

SE

Acc

urac

y ed

ge

MA

E R

MSE

A

ccur

acy

edge

�M

AE)

(R

MSE

) �

MA

E)

�R

MSE

)

Priv

ate

Pass

enge

r Aut

o Li

abili

ty

Ran

dom

fore

st

Ran

dom

fore

st

2006

67

0 9,

337

36,1

20

7,86

1 36

,590

16

%

-1%

7,

359

34,9

18

21%

3%

20

07

659

9,43

5 36

,221

8,

193

38,7

04

13%

-7

%

8,05

7 35

,533

15

%

2%

2008

63

7 10

,616

50

,851

8,

594

42,1

50

19%

17

%

8,77

3 43

,037

17

%

15%

Com

mer

cial

Aut

o Li

abili

ty

Ran

dom

fore

st

Ran

dom

fore

st

2006

62

0 3,

852

17,2

87

3,65

9 13

,800

5%

20

%

3,06

2 13

,883

21

%

20%

20

07

609

3,28

8 13

,481

3,

198

10,8

48

3%

20%

2,

842

10,3

34

14%

23

%

2008

59

2 4,

219

19,3

61

3,09

9 9,

857

27%

49

%

2,97

4 10

,274

30

%

47%

Wor

kers

’ Com

pens

atio

n

Ran

dom

fore

st

Ran

dom

fore

st

2006

49

9 18

,871

67

,596

8,

077

31,0

74

57%

54

%

7,35

8 27

,639

61

%

59%

20

07

498

15,4

69

62,1

24

6,95

5 28

,186

55

%

55%

6,

464

27,3

67

58%

56

%

2008

47

3 12

,507

44

,755

8,

669

36,0

76

31%

19

%

8,69

9 36

,754

30

%

18%

Com

mer

cial

Mul

ti-Pe

ril

Ran

dom

fore

st

Ran

dom

fore

st

2006

58

2 6,

122

26,0

34

3,83

5 13

,134

37

%

50%

3,

719

13,4

35

39%

48

%

2007

57

0 5,

725

27,3

61

4,47

2 21

,451

22

%

22%

3,

974

18,5

48

31%

32

%

2008

56

3 6,

685

32,1

12

7,83

1 37

,900

-1

7%

-18%

6,

817

35,2

03

-2%

-1

0%

Hom

eow

ner/F

arm

owne

r

Line

ar re

gres

sion

Li

near

regr

essi

on

2006

69

7 2,

964

12,2

31

5,45

2 15

,274

-8

4%

-25%

2,

099

6,06

8 29

%

50%

20

07

692

3,52

5 14

,042

4,

912

13,6

89

-39%

3%

2,

362

8,60

6 33

%

39%

20

08

678

5,56

5 24

,434

7,

354

24,6

57

-32%

-1

%

2,58

8 8,

606

54%

65

%

Pane

l B B

oots

trap

anal

yses

Bus

ines

s lin

e H

oldo

ut S

ampl

e

Pred

ictio

n er

ror d

iffer

ence

bet

wee

n m

achi

ne le

arni

ng m

odel

s and

man

ager

s M

odel

with

out m

anag

er e

stim

ates

Mod

el w

ith m

anag

er e

stim

ates

Mea

n St

anda

rd e

rror

D

iffer

ence

si

gnifi

canc

e M

ean

Stan

dard

erro

r

Priv

ate

Pass

enge

r Aut

o Li

abili

ty

R

ando

m fo

rest

Ran

dom

fore

st

20

06

1,47

3 12

.08

***

1980

11

.67

***

2007

1,

250

11.9

7 **

* 13

81

11.6

3 **

* 20

08

2,02

6 12

.52

***

1840

11

.65

***

Com

mer

cial

Aut

o Li

abili

ty

R

ando

m fo

rest

Ran

dom

fore

st

20

06

190

4.96

**

* 79

2 3.

42

***

2007

87

3.

41

***

445

3.07

**

* 20

08

1,12

0 5.

63

***

1,23

8 5.

08

***

Wor

kers

’ Com

pens

atio

n

R

ando

m fo

rest

Ran

dom

fore

st

20

06

10,8

21

22.1

4 **

* 11

,502

23

.05

***

2007

8,

517

22.1

4 **

* 8,

993

21.7

9 **

* 20

08

3,84

4 15

.03

***

3,80

3 15

.63

***

Com

mer

cial

Mul

ti-Pe

ril

R

ando

m fo

rest

Ran

dom

fore

st

20

06

2,28

5 7.

99

***

2,41

2 6.

37

***

2007

1,

253

10.6

3 **

* 1,

757

8.27

**

* 20

08

-1,1

41

10.1

7 **

* -1

23

8.18

**

*

Hom

eow

ner/F

arm

owne

r

Li

near

reg

ress

ion

Li

near

reg

ress

ion

20

06

-2,4

85

4.03

**

* 86

4 4.

53

***

2007

-1

,383

3.

28

***

1,16

3 3.

83

***

2008

-1

,786

3.

86

***

2,98

0 7.

86

***

45

Table 7: Comparing Manager Estimation Errors with Model Estimation Errors

ManagerError

ModelError

(without manager estimates as input

variable)

ModelError

(with manager estimates as input

variable)

Mean Median

Mean Median Diff. p-value

Mean Median Diff. p-value

Obs

Unsigned

Error 0.0128 0.0036

0.0106 0.0029 0.0022 0.00***

0.0092 0.0026 0.0036 0.00***

8,603

Signed

Error 0.0042 0.0015

-0.0023 -0.0002 0.0065 0.00***

-0.002 -0.0001 0.0059 0.00***

8,603

This table presents differences in manager estimation errors and model estimation errors. The model estimates were obtained from the

holdout sample predictions for the period 2006-2008. ManagerError is defined as the reported loss estimates minus true loss, divided by total

assets. ModelError is the model estimate minus true loss, divided by total assets. We compare both signed estimation errors and unsigned

estimation errors. P-values reflect the differences in mean estimation errors. *, **, *** Indicate significant difference in means at the 0.10,

0.05, and 0.01 levels, respectively, using a two-tailed t-test.

46

Tabl

e 8:

The

ass

ocia

tion

betw

een

estim

atio

n er

rors

and

man

ager

ial i

ncen

tives

(1

) (2

) (3

) (4

)

M

anag

erE

rror

M

anag

erE

rror

M

odel

Err

or w

ithot

man

ager

s’ e

stim

ates

M

odel

Err

or w

ith m

anag

ers’

est

imat

es

Pr

ed.

Sign

(1

999-

2008

) (2

006-

2008

)

Coe

ff.

Std

Err

.

Coe

ff.

Std

Err

.

Coe

ff.

Std

Err

.

Dif

f.

C

oeff

. St

d E

rr.

D

iff.

Tax

Shei

eld

+ 0.

052

0.01

20

***

0.05

3 0.

0208

**

* -0

.068

0.

0158

0.12

1 **

* -0

.038

0.

0170

0.09

1 **

*

Smoo

th

- -0

.005

0.

0129

0.01

6 0.

0207

0.06

9 0.

0580

-0.0

53

***

0.04

9 0.

0421

-0.0

33

Smal

lPro

fit

- -0

.003

0.

0019

*

-0.0

05

0.00

28

**

-0.0

02

0.00

25

-0

.003

-0.0

04

0.00

20

**

-0.0

01

Inso

lven

cy

- -0

.001

0.

0013

0.00

3 0.

0029

0.00

2 0.

0026

0.00

1

0.00

0 0.

0022

0.00

3 *

Vio

latio

n -

-0.0

02

0.00

07

***

-0.0

03

0.00

11

***

-0.0

01

0.00

11

-0

.002

-0.0

01

0.00

12

-0

.002

Size

?

-0.0

01

0.00

03

***

-0.0

01

0.00

03

***

-0.0

01

0.00

05

* 0.

000

-0

.001

0.

0005

-0.0

01

Smal

lLos

s ?

-0.0

03

0.00

17

**

-0.0

01

0.00

46

-0

.001

0.

0050

0.00

1

-0.0

01

0.00

47

0.

000

Prof

it +

-0.0

02

0.00

09

-0

.004

0.

0017

0.00

2 0.

0015

-0.0

05

* 0.

000

0.00

13

-0

.004

**

*

Los

s ?

0.00

1 0.

0014

-0.0

02

0.00

37

-0

.006

0.

0042

0.00

4

-0.0

05

0.00

36

0.

004

*

Lin

eSiz

e ?

0.00

0 0.

0001

**

0.

000

0.00

01

0.

000

0.00

02

0.

000

0.

000

0.00

01

0.

000

Rei

nsur

ance

?

0.00

3 0.

0016

*

0.00

3 0.

0025

-0.0

07

0.00

80

0.

010

-0

.008

0.

0074

0.01

0

Publ

ic

? 0.

002

0.00

12

* 0.

001

0.00

14

0.

003

0.00

17

* -0

.003

**

0.

002

0.00

15

-0

.002

Mut

ual

? 0.

002

0.00

10

0.

002

0.00

15

0.

002

0.00

11

* 0.

000

0.

002

0.00

10

0.

000

Gro

up

? -0

.006

0.

0121

-0.0

06

0.01

09

-0

.004

0.

0018

**

-0

.002

-0.0

03

0.00

16

**

-0.0

02

Lin

e Fi

xed

Eff

ects

Y

es

Yes

Y

es

Yes

Yea

r Fi

xed

Eff

ects

Y

es

Yes

Y

es

Yes

Obs

23,1

96

8,24

7 8,

247

8,24

7

R-s

quar

ed

0.

0226

0.

0206

0.

0182

0.

0119

Thi

s ta

ble

repo

rts

regr

essi

on a

naly

sis

resu

lts o

f th

e es

timat

ion

erro

rs b

y m

anag

ers

and

mac

hine

lear

ning

mod

els.

The

reg

ress

ions

are

est

imat

ed b

ased

on

8,24

7 lin

e-ye

ar

obse

rvat

ions

. The

mod

el e

stim

ates

are

the

hold

out s

ampl

e pr

edic

tion

resu

lts b

ased

on

Ran

dom

For

ecas

t mod

els

for

the

year

s 20

06 to

200

8. P-values a

re b

ased

on

one-

side

d te

sts

for

coef

fici

ents

with

pre

dict

ed s

igns

, and

two-

side

d ot

herw

ise.

Fir

m-c

lust

erin

g ad

just

ed s

tand

ard

erro

rs a

re u

sed

to c

alcu

late

p-values.

Col

umn

(1)

pres

ents

the

regr

essi

on r

esul

ts

for

the

full

sam

ple

of m

anag

er e

stim

ates

dur

ing

the

peri

od 1

996-

2008

. Col

umn

(2)

repo

rts

the

resu

lts f

or th

e pe

riod

200

6-20

08 s

o th

at th

ey c

an b

e di

rect

ly c

ompa

red

to m

odel

es

timat

es. C

olum

n (3

) an

d (4

) sh

ow th

e re

gres

sion

res

ults

whe

n th

e de

pend

ent v

aria

ble

is ModelError,

with

out a

nd w

ith m

anag

ers'

estim

ates

as

mod

el in

put a

ttrib

ute.

The

tabl

e al

so p

rese

nts

the

coef

fici

ent d

iffe

renc

es b

etw

een

man

ager

est

imat

ion

erro

rs a

nd m

odel

est

imat

ion

erro

rs, u

sing

two-

side

d t t

ests

. TaxShield

is c

alcu

late

d as

the

sum

of

net

inco

me

and

tota

l res

erve

s, d

ivid

ed b

y to

tal a

sset

s. S

moo

th is

def

ined

as

the

aver

age

retu

rn o

n as

sets

ove

r th

e pr

evio

us th

ree

year

s. SmallProfit

is a

n in

dica

tor

vari

able

whi

ch

equa

ls 1

if th

e ea

rnin

gs f

all i

n th

e bo

ttom

5 p

erce

nt o

f th

e po

sitiv

e ea

rnin

gs d

istr

ibut

ion.

Ins

olve

ncy

is s

et to

1 if

the

risk

cap

ital r

atio

(R

BC

rat

io)

is s

mal

ler

than

2, a

nd 0

ot

herw

ise.

Vio

latio

n is

an

indi

cato

r va

riab

le th

at is

equ

al to

1 if

the

insu

rer

has

mor

e th

an 2

IR

IS r

atio

vio

latio

ns, a

nd 0

oth

erw

ise.

Oth

er v

aria

ble

defi

nitio

ns a

re p

rovi

ded

in

App

endi

x A

.

47

Appendix A: Variable definition Insurance Claims Loss Prediction

ActualLosses The cumulative payment on all events for a given accident year cumulative in the next ten years (including the accident year)

ManagerEstimate Managers' initial estimates of the losses incurred during the accident year

Business Line Operational Variables

Outstclaim Cumulative claims outstanding for the current accident year

Reportedclaim Cumulative reported claims for the current accident year

PaidClaim Number of loss claims closed with payment UnpaidClaim Number of loss claims closed without payment LinePremiums Premiums written in the current accident year on the business line

PremiumsCeded Premiums ceded to reinsurers LinePayment Total Payments for the current accident year PaymentCeded Payments ceded to reinsurers

LineDCC Defense & Cost Containment Payments Direct & Assumed

DCC Ceded Defense & Cost Containment Payments Ceded

SSR Salvage & Subrogation Received

PaidLoss Cumulative paid losses for the current accident year Company Characteristics

State Categorical variable that represents the State to which the insurer files the statutory filing.

Assets Total assets at the end of the accident year. Liability Total liabilities at the end of the accident year. DPW Direct premiums written during the accident year. NPW Total net premiums written during the accident year. NPE Total net premiums earned during the accident year. NI Net Income Accident year dummies Dummy variables for accident year.

Company identifier Identifier variables for insurers.

External Environment Variables

Pricegrowth Personal consumption expenditure growth at state level

48

Gdpgrowth GDP growth at state level Estimation Error Analyses

ManagerError Calculated as the managers' initial estimation for the losses in year t minus the true losses, divided by the total assets.

ModelError Calculated as the model estimation (model without managers' estimates) for the losses in year t minus the true losses, divided by the total assets.

TaxSheild Calculated as the sum of net income and true reserve, divided by total assets

Smooth Calculated as the average return on assets over the previous three years before year t.

SmallProfit Set to 1 if the insurer's earnings fall into the lowest 5 percent of the positive earnings distribution, and zero otherwise.

Insolvency Set to 1 if the Risk Capital Ratio is smaller than two, and zero otherwise. Violation Set to 1 if the insurer has at least one ratio violations, and zero otherwise.

Size Total asset if the firm in thousands, taking algorithm.

SmallLoss Set to 1 if the insurer's earnings fall into the highest 5 percent of the negative earnings distribution, and zero otherwise.

Profit Set to 1 if the insurer's earnings fall into the top 90 percent of the negative earnings distribution, and zero otherwise.

Loss Set to 1 if the insurer's earnings fall into the bottom 90 percent of the negative earnings distribution, and zero otherwise.

LineSize Calculated as the total premiums written in this line divided by the total premium written by the insurer for all lines combined

Reinsurance Calculated as the premiums reinsured divided by total premiums written Public Set to 1 if the company is publicly traded, and zero otherwise.

Mutual Set to 1 if the company has a mutual organization structure, and zero otherwise.

Group Set to 1 if the company is a member of a group, and zero otherwise

49

App

endi

x B:

Cro

ss-v

alid

atio

n re

sults

of m

achi

ne le

arni

ng m

odel

s

Bus

ines

s lin

e

M

achi

ne le

arni

ng w

ithou

t man

ager

est

imat

es

Mac

hine

lear

ning

with

man

ager

est

imat

es

Trai

ning

/Val

idat

ion

Sam

ple

RF

DN

N

GB

M

LR

RF

DN

N

GB

M

LR

M

AE

RM

SE

MA

E R

MSE

M

AE

RM

SE

MA

E R

MSE

M

AE

RM

SE

MA

E R

MSE

M

AE

RM

SE

MA

E R

MSE

Priv

ate

Pass

enge

r A

uto

Liab

ility

1996

-200

5 7,

624

36,8

02

14,1

51

43,8

34

9,28

6 52

,134

7,

404

21,5

60

7,27

0 33

,028

13

,330

35

,262

10

,046

60

,378

7,

300

21,7

34

1996

-200

6 8,

070

38,0

04

14,8

37

40,5

93

9,90

7 48

,702

7,

595

22,0

65

6,98

8 33

,456

12

,347

44

,359

8,

991

47,3

55

7,43

6 22

,202

1996

-200

7 7,

527

37,1

23

23,1

48

55,2

26

10,0

15

51,6

04

7,53

8 22

,345

6,

773

32,4

65

15,7

07

45,7

00

9,54

9 52

,287

7,

312

21,7

17

Com

mer

cial

A

uto

Liab

ility

1996

-200

5 3,

142

12,1

30

4,95

4 15

,846

2,

995

9,66

3 3,

758

11,7

75

2,86

8 11

,383

4,

858

15,4

64

2,86

2 10

,862

3,

734

12,0

04

1996

-200

6 3,

115

12,3

77

4,27

9 15

,918

2,

961

9,87

9 3,

967

11,7

18

2,85

3 11

,121

4,

514

14,7

33

3,00

6 11

,971

3,

938

11,8

42

1996

-200

7 3,

193

12,0

94

4,96

2 16

,172

3,

122

10,7

45

4,12

2 12

,938

3,

076

12,5

81

4,73

2 17

,649

3,

021

10,8

78

4,07

3 12

,676

Wor

kers

’ C

ompe

nsat

ion

1996

-200

5 6,

605

29,0

31

10,2

75

30,2

04

7,85

3 32

,522

9,

192

36,6

82

6,27

2 26

,615

8,

828

27,5

45

7,15

8 30

,019

8,

907

34,5

92

1996

-200

6 6,

887

29,0

58

12,8

26

33,1

17

7,51

3 29

,740

9,

044

33,0

31

6,37

4 26

,876

10

,873

26

,814

6,

622

24,3

63

8,41

8 27

,355

1996

-200

7 7,

126

29,8

64

13,8

35

30,3

52

7,48

8 28

,583

9,

389

31,5

30

6,36

9 26

,332

10

,922

33

,558

6,

692

23,9

58

8,89

2 28

,625

Com

mer

cial

M

ulti-

Peril

1996

-200

5 4,

523

21,3

32

7,30

1 23

,806

4,

671

20,1

60

5,48

6 18

,045

4,

236

19,1

95

8,42

4 24

,730

4,

265

18,4

15

5,52

8 18

,188

1996

-200

6 4,

575

20,8

01

7,47

6 23

,400

4,

686

19,5

52

6,01

8 29

,288

4,

033

18,6

67

7,71

5 25

,592

4,

523

19,7

43

6,02

7 29

,524

1996

-200

7 4,

739

22,9

89

8,07

0 26

,657

4,

787

20,8

29

6,16

8 21

,501

4,

192

19,9

40

8,02

3 27

,552

4,

379

18,0

62

6,14

1 21

,531

Hom

eow

ner/

1996

-200

5 4,

981

36,3

40

7,08

7 26

,352

4,

664

28,7

08

5,76

5 24

,242

4,

574

35,4

56

7,35

7 29

,374

4,

938

34,4

32

4,31

4 18

,921

Farm

owne

r 19

96-2

006

4,87

0 35

,486

6,

867

24,1

20

4,73

5 28

,444

5,

232

20,2

23

4,21

7 30

,341

7,

902

24,5

17

4,99

7 34

,578

4,

135

14,6

69

19

96-2

007

5,30

0 43

,968

8,

866

27,3

93

4,87

3 21

,059

5,

274

20,4

38

4,37

6 34

,501

9,

578

28,2

97

4,95

8 34

,963

4,

198

14,3

85

50

App

endi

x C

: Hol

dout

test

res

ults

of m

achi

ne le

arni

ng m

odel

s

Bus

ines

s lin

e

M

achi

ne le

arni

ng w

ithou

t man

ager

est

imat

es

Mac

hine

lear

ning

with

man

ager

est

imat

es

Hol

dout

Sa

mpl

e R

F D

NN

G

BM

LR

R

F D

NN

G

BM

LR

M

AE

RM

SE

MA

E R

MSE

M

AE

RM

SE

MA

E R

MSE

M

AE

RM

SE

MA

E R

MSE

M

AE

RM

SE

MA

E R

MSE

Priv

ate

Pass

enge

r A

uto

Liab

ility

2006

7,

656

31,4

81

25,2

58

50,9

39

8,90

2 30

,532

7,

825

20,5

94

6,75

1 27

,938

12

,361

58

,797

8,

902

30,5

32

7,51

1 20

,531

2007

7,

806

31,0

55

15,7

25

53,0

09

11,1

33

47,4

45

9,81

3 39

,888

7,

021

30,2

86

12,3

08

38,2

09

11,1

69

57,3

50

9,26

7 32

,258

2008

8,

544

41,8

58

57,5

02

72,6

02

10,1

68

44,8

16

10,2

33

38,6

92

7,63

5 38

,999

23

,993

68

,434

9,

797

44,0

37

9,57

0 37

,335

Com

mer

cial

A

uto

Liab

ility

2006

3,

400

13,1

54

7,12

0 21

,492

3,

545

13,2

83

4,40

4 12

,808

3,

312

13,9

59

5,19

0 15

,287

3,

675

16,0

91

4,32

4 13

,361

2007

3,

107

10,4

10

8,32

2 16

,562

3,

447

11,3

58

4,56

0 13

,000

3,

045

10,9

79

6,74

0 15

,083

2,

960

9,67

9 4,

464

12,6

74

2008

2,

855

9,07

0 4,

865

13,7

83

2,99

9 9,

784

4,27

6 11

,392

2,

836

9,42

0 7,

657

18,1

31

2,82

7 9,

128

4,30

1 11

,270

Wor

kers

’ C

ompe

nsat

ion

2006

7,

642

28,4

37

14,4

10

31,4

75

7,43

0 23

,523

10

,313

31

,721

7,

599

28,1

73

10,0

03

27,2

94

7,16

7 21

,617

10

,745

36

,142

2007

7,

529

31,1

43

10,2

22

39,1

77

8,11

5 29

,873

10

,942

33

,152

6,

422

27,0

63

19,4

19

57,8

88

7,38

0 30

,020

9,

637

32,8

36

2008

9,

218

38,3

37

12,9

72

39,4

45

9,88

3 38

,380

12

,110

46

,065

8,

605

38,3

80

24,0

02

41,2

19

9,28

7 39

,225

11

,816

42

,062

Com

mer

cial

M

ulti-

Peril

2006

3,

943

13,5

49

13,3

73

23,2

58

4,58

6 16

,889

6,

163

18,4

34

3,64

4 13

,917

6,

727

18,4

85

4,30

7 15

,542

6,

203

18,4

10

2007

4,

671

22,9

01

9,55

4 45

,940

4,

460

18,4

89

7,42

4 32

,861

4,

309

20,0

62

9,63

8 48

,366

4,

239

18,4

85

7,36

9 32

,692

2008

7,

832

37,4

35

12,8

43

42,5

72

8,73

0 38

,860

8,

592

36,2

34

6,85

0 34

,425

9,

653

29,2

39

6,22

9 29

,095

8,

502

34,9

60

Hom

eow

ner/

2006

4,

708

31,3

77

7,20

3 30

,867

5,

127

33,7

68

5,45

2 15

,274

3,

918

28,8

98

6,23

2 21

,973

4,

725

33,3

53

3,64

2 9,

574

Farm

owne

r 20

07

4,77

3 30

,437

8,

165

18,9

38

5,08

7 33

,993

4,

912

13,6

89

3,87

5 23

,993

6,

241

22,4

91

4,94

8 34

,664

3,

969

12,1

03

20

08

13,0

44

30,0

34

14,8

87

84,4

13

8,73

0 38

,860

7,

354

24,6

58

10,8

29

22,0

57

9,69

9 66

,250

10

,873

38

,158

5,

335

20,5

74

machine learning improves accounting estimates · 2019-11-27 · smu classification: restricted...

Documents