predicting financial - victoria university of wellington · 2012-01-08 · ©2007 thomas e. mckee,...
TRANSCRIPT
Predicting Financial Statement Fraud
Thomas E. McKee Ph.D., CPA, CMA, CIA
East Tennessee State University
Norwegian School of Economics and Business Administration
©2007 Thomas E. McKee, Ph.D., CPA 2
Motivation
n U.S. SAS 113 requires auditors to consider fraud risk factors
n ISA 240 has similar requirement
n More than 40 fraud risk factors are identified
n Proper fraudulent financial risk assessments are problematic due to complex interaction of factors
©2007 Thomas E. McKee, Ph.D., CPA 3
Fraud “Red Flags”
Fraud Company
NonFraud Company
Red Flag 2
Red Flag 1
Red Flag 3
Red Flag 4
Red Flag 5
No OneToOne Mapping
NonFraud Company
©2007 Thomas E. McKee, Ph.D., CPA 4
K. Hackenbrack 1993, AJPT, Vol. 12, No. 1
n High variability in importance of ratings assigned by auditors to various fraud risk factors
n Small company auditors placed less emphasis on fraud risk factors than did large company auditors
©2007 Thomas E. McKee, Ph.D., CPA 5
Purpose of This Study
n Improved financial fraud theory
n Financial fraud prediction
©2007 Thomas E. McKee, Ph.D., CPA 6
Financial Fraud Rate
n .0028 annual financial fraud occurrence rate for U.S. public companies
Source: T.J.F. Bishop, 2001 Auditors Report
©2007 Thomas E. McKee, Ph.D., CPA 7
Auditor Fraud Experience
n 40% of audit partners never encounter a single case during entire career
n Remaining 60% of audit partners experience a material irregularity [either asset theft or financial fraud ] at .013 rate
Source: Loebbecke, Eining, Willingham, 1989,
©2007 Thomas E. McKee, Ph.D., CPA 8
Early L.W. Fraud Model
Perceived Opportunity
Rationalization
Perceived Pressure
J.K. Loebbecke and J.J. Willingham, “Review of SEC AAERs” 1988 Working Paper
©2007 Thomas E. McKee, Ph.D., CPA 9
Test of L.W. Fraud Model
n 1 of 3 model components present in 88% of 77 material financial fraud cases by single “Big 8” firm
n Loebbecke, J., Eining, M. and Willingham, J. 1989. “Auditors’ Experience With Material Irregularities: Frequency, Nature, and Detectability.” Auditing: A Journal of Practice & Theory. Spring, pp. 128.
©2007 Thomas E. McKee, Ph.D., CPA 10
Prior Financial Fraud Prediction Studies
Author(s) Model Type Number of
Variables
Number of
Cases
Model Accuracy On
Holdout Sample
Loebbecke,
Eining, &
Willingham
General Model 3 77 Fraud 88% of Fraud Cases Had At
Least 1 of 3 Variables
Bell,
Szykowny,
Willingham
Cascaded Logistic
Regression
47 First
Stage,
3 Second
Stage
77 Fraud/305
Nonfraud
85.7%
Fanning,
Cogger, &
Srivastava
Logistic Regression &
Neural Networks
47 77 Fraud/305
Nonfraud
87% Logistic Regression &
90% Neural Networks With
Varying Validation Samples
Hansen,
McDonald,
Messir &
Bell
Generalized
QualitativeResponse
Model [EGB2]
47 77 Fraud/305
Nonfraud
89.3%
Green &
Choi
Neural Network 8 86 Fraud/86
Nonfraud
63%
Benish Probit 12 64
Fraud/1,989
NonFraud
71%
Fanning &
Cogger
Neural Network 20 102
Fraud/102
NonFraud
63%
Bell &
Carcello
Logistic Regression 7 77 Fraud/305
Nonfraud
75%
Feroz,
Kwon,
Pastena, &
Park
Logistic Regression &
Neural Network
7 42 Fraud/90
Nonfraud
52% Logistic Regression &
72% Neural Network
Chen,
Huang, &
Lin
CPAs Unaided
Judgment, Logistic
Regression & Neural
Networks
27 74 Fraud/148
Nonfraud
[Taiwan
data]
60% CPAs,
73% Logistic Regression, &
81% Neural Network
©2007 Thomas E. McKee, Ph.D., CPA 11
Gillett & Uddin 2005 AJPT n Structural Equation Modeling tested whether Red Flags influenced intentions of fraudulent financial reporting
n Reasoned Action Modelà n Intentions Are Assumed to Mediate Overt Behavior
©2007 Thomas E. McKee, Ph.D., CPA 12
Gillett & Uddin 2005 AJPT
n Five fraud scenarios administered via survey instrument
n Random sample of 2,000 U.S. publicly traded firms with 7% response rate
n Did not measure behavior
©2007 Thomas E. McKee, Ph.D., CPA 13
Gillett & Uddin Initial Model
©2007 Thomas E. McKee, Ph.D., CPA 14
Gillett & Uddin Final Model
©2007 Thomas E. McKee, Ph.D., CPA 15
GU Model Hypotheses
n H1Compensation structure is NOT related to fraud risk
n H2Company size has a positive relation to fraud risk
©2007 Thomas E. McKee, Ph.D., CPA 16
Mgmt. Stock Ownership Research
n Stock ownership by management may increase their incentives to commit fraud?
n Mixed evidence from research
n Choi, Jeon, Park, 2004 and Krishnan 2005 find management ownership increases likelihood of earnings management
n Lin, Li, and Yang, 2006 do not find significant relationship between management ownership and earnings management
©2007 Thomas E. McKee, Ph.D., CPA 17
Auditor Tenure Research
n Beck, Frecka, Solomon, 1988 and Lys and Watts, 1994, find auditor independence decreases as length of tenure increases
n Arens, Elder, Beasley, 2005, argue that increased auditor tenure results in better risk assessment and better insights into operations, strategies, internal controls
n Myers, Myers, Omer, 2003, find a significant negative relationship between auditor tenure and earnings management
©2007 Thomas E. McKee, Ph.D., CPA 18
Auditor Size Research
n Argument is that big audit firms are better able to detect financial fraud due to superior knowledge and greater incentives to protect reputation
n Becker, Defond, Jiambalvo, and Subramanyam, 1998, and Francis , Maydew,and Sparks, 1999 find large firm auditors are associated with less earnings management
n Bedard, Chtourou, and Chourteau, 2004 and Lin, Li, and Yang, 2006, find evidence to the contrary
©2007 Thomas E. McKee, Ph.D., CPA 19
Current Study Financial Statement Fraud Model
©2007 Thomas E. McKee, Ph.D., CPA 20
Unique Aspects of This Research
n Hypothesized model
n Data analysis techniques n Classification tree [SEE 5] n Agglomerative Cluster [PNC2] n Logistic regression
n Recent data n 19952002
n Prediction window n Prior to FIRST occurrence of fraud [not just discovery]
©2007 Thomas E. McKee, Ph.D., CPA 21
Research Design
n 50 fraud companies per SEC
n 50 nonfraud companies matched n SIC number n Market value = or > n + Net Income % but < 25%
n Data years19952002 from Oct. 31 2005 Compustat data
©2007 Thomas E. McKee, Ph.D., CPA 22
Sample Description Item Description Fraud
Companies
Nonfraud
Companies
Revenue Highest 40,656,000,000 165,639,000,000
Average 6,553,195,000 6,582,795,000
Minimum 11,727,000 493,000
Standard Deviation 10,784,000 24,246,240,000
Net Income Highest 5,636,000,000 6,295,000,000
Average 1,613,225 355,917,581
Minimum 7,751,000,000 240,000,000
Standard Deviation 2,049,275,040 1,100,295,110
Total Assets Highest 306,577,000,000 255,018,000,000
Average 19,590,000,000 10,119,980,000
Minimum 14,064,000 15,301,000
Standard Deviation 53,904,750,000 38,142,540,000
©2007 Thomas E. McKee, Ph.D., CPA 23
Fraud Prediction Variables
n Hypothesized Predictive Variables 1. Past Net Income Significantly Higher Than
Current 2. Age of CFO 3. Average Age of Top 5 Management
Personnel 4. High Earnings Growth Rate Expected 5. Financial Stress 6. Management Stock Options 7. Management Compensation 8. Company Size 9. Top 5 Management Ownership % 10. Big 4/Non Big 4 Auditor 11. Auditor Tenure 12. Change in Total Accruals 13. Earning Quality 14. Size of Available Discretionary Accruals 15. Change in Auditor
©2007 Thomas E. McKee, Ph.D., CPA 24
Individual Variable Significance
Variable NonFraud Companies
Fraud Companies Total Sample
Mean Std. Dev. Mean Std. Dev. Variable Pearson Correlation With Fraud Status
Two Tailed Significance **
V1Change Net Inc.
97.47 959.71 358.29 1415.38 .11 .31
V2Age CFO 47.2 7.37 45.56 7.10 .11 .26 V3Age Top 5 Officers
50.03 4.90 48.44 6.21 .14 .16
V4Sales Growth
.81 1.21 1.72 3.73 .17 .10
V5ML Bankruptcy Prob.
.29 .30 .36 .29 .12 .24
V6Mgmt. Stock Options
87,162,314 421,421,732 116,622,805 284,280,547 .04 .70
V7Mgmt. Compensation
2,897,782 2,640,526 5,341,795 8,269,177 .20 .05**
V8Company Size
8.79 .99 9.21 1.07 .20 .05**
V9Top 5 Mgmt. Ownership
12.01 13.72 8.31 12.87 .14 .17
V10Big 4 Auditor
.90 .30 .94 .24 .07 .47
V11Auditor Tenure
3.14 1.82 2.56 1.79 .16 .11
V12Change In Tot. Accruals
.04 .80 4.16 27.69 .10 .33
V13Earnings Quality
.88 .38 .74 .59 .14 .18
V14Size Tot. Accruals
.09 .78 .11 .32 .17 .11
V15Change In Auditor
.10 .30 .20 .40 .14 .19
©2007 Thomas E. McKee, Ph.D., CPA 25
GU Model Hypothesis 1
n H1Compensation structure is NOT related to fraud risk
n This study REJECTS H1
n V7Mgmt. Compensation significant and + related to fraud risk
n V6Mgmt. Stock Options not significant [component of total compensation]
©2007 Thomas E. McKee, Ph.D., CPA 26
GU Model Hypothesis 2
n H2Company size has a positive relation to fraud risk
n This study ACCEPTS H2 n V8 significant and + related to fraud risk
©2007 Thomas E. McKee, Ph.D., CPA 27
Mgmt. Ownership Hypothesis 3
n H3Management stock ownership increases likelihood of fraudulent financial reporting
n This study does not support H3.
V9Top 5 Mgmt. Ownership only significant at .17 level
©2007 Thomas E. McKee, Ph.D., CPA 28
Auditor Tenure Hypothesis
n H4Fraud risk decreases with increase auditor tenure
n Marginal support for H4
n V11Auditor tenure significant at .11 level
n V11effective in decision tree models
©2007 Thomas E. McKee, Ph.D., CPA 29
Auditor Size Hypothesis
n H5 Fraud risk decreases with larger audit firms
n Not supported by this research.
n V10Big 4 auditor significant at .47 level
©2007 Thomas E. McKee, Ph.D., CPA 30
Variable Scaling
n Some variables converted to deciles so algorithms could process efficiently
n V1Change in Net Income n V6Mgmt. Stock Options n V7Mgmt. Compensation
©2007 Thomas E. McKee, Ph.D., CPA 31
Models Tested
n SEE5 Decision Tree
n PNC2 Agglomerative Cluster Algorithm
n SPSS Logistic Regression
©2007 Thomas E. McKee, Ph.D., CPA 32
Decision Tree Analysis
n SEE 5 software (2006 version) used in this study
n Extension of Quinlan’s C4.5 (1993) and Iterative Dichotomizer [ID3] (1979)
n Enhanced error pruning
©2007 Thomas E. McKee, Ph.D., CPA 33
Decision Tree Analysis
n A classification tree algorithm iteratively selects attributes that maximize the information gain determined by the change in data entropy [or other measure].
n A tree structure is created where each node of the tree specifies a test of an attribute, each branch corresponds to a test outcome, and each leaf constitutes a classification prediction.
n The highest information gain is closest to the tree root.
©2007 Thomas E. McKee, Ph.D., CPA 34
Decision Tree Analysis
n Entropy for a possible set partition [attribute value cutting point]
n Entropy= p log 2 p p log 2 p
n P = proportion of positive examples
n P = proportion of negative examples
⊕ Θ ⊕ Θ
⊕
Θ
©2007 Thomas E. McKee, Ph.D., CPA 35
SEE 5 Decision Tree Results
69% 81% 3
V5 ML Bank Prob V8 Company Size V9 Top 5 Mgmt Ownership V11 Auditor Tenure
4 B
60% 70% 4
V2 Age CFO V6 Mgmt Stock Options V9 Top 5 Mgmt Ownership
3 A
Mean Accuracy With 10 Fold Cross Validation
Accuracy on 100 Company Sample
Number of Rules In Model
Variables # Variables In Model
SEE 5 Model
©2007 Thomas E. McKee, Ph.D., CPA 36
XL Miner Classification Tree
n 80.2% Accurate on 91 cases
n 3 variables n ML Bankruptcy Probability n Company Size n Auditor Tenure
©2007 Thomas E. McKee, Ph.D., CPA 37
SEE 5 Model A
n If V9Top 5 Mgmt. Ownership > 0.151 classify as Nonfraud or
n If V6Mgmt. Stock Options = 5 classify as Non fraud or
n If V2Age CFO > 48 and V6Mgmt. Stock Options = 7 classify as Nonfraud or
n If V6Mgmt. Stock Options = 1 and V9Top 5 Mgmt. Ownership > 0.0364 classify as Non fraud
n Otherwise Classify as Fraud
©2007 Thomas E. McKee, Ph.D., CPA 38
SEE 5 Model A
70% Overall
56% 28 22 Nonfraud
84% 8 42 Fraud Actual
%Correct
Nonfraud Fraud
Model Predictions
70% accuracy on full 100 company sample
60% accuracy with 10 Fold Cross Validation
©2007 Thomas E. McKee, Ph.D., CPA 39
SEE 5 Model B
Model correctly classified 81 of 100 cases [81% accuracy]
IF V11Auditor Tenure <=3 and V8Company Size <=8.34 classify as Nonfraud or
IF V11Auditor Tenure > 3 and V9 Top 5 Mgmt. Ownership > 0.1 classify as Nonfraud
or
IF V11Auditor Tenure > 3 and V9 Top 5 Mgmt. Ownership <= 0.1 and V5 ML
Bankruptcy Probability <= .193 classify as Nonfraud
Otherwise Classify as Fraud
©2007 Thomas E. McKee, Ph.D., CPA 40
SEE 5 Model B
81% 51 49 Overall
82% 41 9 Nonfraud
80% 10 40 Fraud Actual
%Correct
Nonfraud Fraud
Model Predictions
81% accuracy on full 100 company sample
69% accuracy with 10 Fold Cross Validation
©2007 Thomas E. McKee, Ph.D., CPA 41
XL Miner Decision Tree
91
34 57
9 4 F 5 NF
25 29 28
13 5 F 8 NF
12 10 F 2 NF
14 12 F 2 NF
17 3 F 14 NF
12 2 F 10 NF
14 14 F 0 NF
If Size ≤ 8.52 Else
If Size ≤ 7.62 Else
If Bankruptcy ≤ .29 Else
If Bankruptcy ≤ .19 Else
If Tenure ≤ 3.5 Else
If Tenure ≤3.5 Else
NonFraud Classification
Fraud Classification
©2007 Thomas E. McKee, Ph.D., CPA 42
Pruned XL Miner Decision Tree
91
34 11 F 23 NF
57
29 28 26 F 2 NF
12 10 F 2 NF
17 3 F 14 NF
If Size ≤ 8.52 Else
If Bankruptcy ≤ .19 Else
If Tenure ≤ 3.5 Else
NonFraud Classification
Fraud Classification
©2007 Thomas E. McKee, Ph.D., CPA 43
Cluster Analysis
n Cluster algorithms divide a set of objects into groups [clusters] based on a similarity measure
n Objects in a group should be as similar as possible
n Objects in different groups should be as dissimilar as possible
©2007 Thomas E. McKee, Ph.D., CPA 44
Taxonomy Of Cluster Models
Cluster Algorithms
Partitional Hierarchical
Fuzzy Hard Agglomerative Divisive
©2007 Thomas E. McKee, Ph.D., CPA 45
Partitional Algorithms
n Optimize a partition with respect to an objective function. Number of clusters usually prespecified.
n Common algorithms n K means algorithm hard assignment of membership [0,1]
n FuzzyCmeans algorithm membership in interval [0,1](spherical clusters only)
n Gustafson Kessel algorithm membership in interval [0,1] (can represent ellipsoidal clusters)
©2007 Thomas E. McKee, Ph.D., CPA 46
Partitional Algorithms
n Problem with high complexity cluster representations is that the algorithms tend to get stuck in a local optimum
n Success in those situations depends on a good initialization
©2007 Thomas E. McKee, Ph.D., CPA 47
Hierarchical Cluster Algorithms
n Produces a complete series of nested partitions
n Divisiveà starts with all data tuples in a single cluster and iteratively divides clusters until each data tuple belongs to its own cluster or a termination criteria is met
n Agglomerativeà starts with each data tuple representing a cluster and then iteratively merges clusters that are close to each other based on a similarity measure.
©2007 Thomas E. McKee, Ph.D., CPA 48
PNC2 Agglomerative Cluster Algorithm
n Based on a merge test, data tuples [cases] with the same output value are iteratively merged until an abortion criterion is met or until all clusters are merged into a single cluster for each outcome state.
n Each cluster [a group of merged cases] is represented by an output value and a “cuboid.” The “cuboid” is the space which includes all input vectors of the data tuples merged into the cluster.
n Cuboids are transformed to IfThen rules where the cuboid corresponds to the premise and the output value forms the conclusion.
©2007 Thomas E. McKee, Ph.D., CPA 49
PNC2 Agglomerative Cluster Algorithm
n “A prediction of the output value given an input position is made by using a weighted average of the output values of the agglomerative clusters, whose cuboids are nearest to the given input position.
n Thus the PNC2 Agglomerative Cluster Algorithm behaves like a rulebased system if the input position in inside of a cuboid.
n If no rule is active, i.e. if the input position is outside of any cuboid, the procedure acts like a knearest neighbor approach with the difference, that not distances between an input position and learn data tuples are evaluated, but the distances are determined from the input position to a cuboid.
n Thus the clusters can be viewed as generalized data tuples.” (Haendel, 2003, p. 20)
©2007 Thomas E. McKee, Ph.D., CPA 50
PNC2 Agglomerative Cluster Algorithm Example
n
Case Gender
(1=male,
2=female)
Height Weight
A 1 70 190
B 1 75 180
C 2 65 140
D 2 70 150
©2007 Thomas E. McKee, Ph.D., CPA 51
PNC2 Agglomerative Cluster Algorithm Example
n # Positive
Examples
# Negative
Examples
Output
(gender)
Height Weight
2 0 1 ↔ 180↔ 2 0 2 ↔ ↔150
©2007 Thomas E. McKee, Ph.D., CPA 52
PNC2 Agglomerative Cluster Algorithm Example
n The previous model can simplified into the following two decision rules in an IfThen format:
n Rule A If Variable 2 (weight) ≥ 180 classify as 1 (male)
n Rule B If Variable 2 (weight) ≤150 classify as 2 (female)
©2007 Thomas E. McKee, Ph.D., CPA 53
PNC2 Agglomerative Cluster Algorithm Example
n
©2007 Thomas E. McKee, Ph.D., CPA 54
PNC2 Agglomerative Cluster Algorithm Example
n
©2007 Thomas E. McKee, Ph.D., CPA 55
PNC2 Agglomerative Cluster Algorithm Example
n
n
n Minkowski distance metric which is defined as : n distance = p = 1 for blockwise distances and for continuous inputs the component wise distances dj are calculated as:
n dj = | xaj xbj |
n where Xaj resp. Xbj denotes the jth component of the input vector Xa resp. Xb.
©2007 Thomas E. McKee, Ph.D., CPA 56
PNC2 Agglomerative Cluster Algorithm Example
n
n
Elementary Gender Height Weight Minkowski Cuboid Cluster Metric Component Wise
Distance
A 1 70 190 10.95 B 1 75 180 10.25
0.71
C 2 65 140 8.66 D 2 70 150 8.94
0.28
©2007 Thomas E. McKee, Ph.D., CPA 57
Agglomerative Clustering
©2007 Thomas E. McKee, Ph.D., CPA 58
23 Rule Cluster Model AA
+ O V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
2 0 1 5 > 6 00100 000100 1000 0 > 2 2 > 3 2 > 6 00100 01000000 10 0000100000 000100 000001 000110 01
3 0 1 9 > 10 01010 100110 1010 3 > 3 1 > 7 1 > 8 00110 10000000 10 0000100000 000100 000011 110100 01
2 0 1 0 > 2 00010 000100 1100 1 > 2 1 > 10 7 > 9 00110 11000000 10 1000100000 000101 100001 101000 01
1 0 1 8 > 8 01000 001000 1000 1 > 1 9 > 9 1 > 1 00010 10000000 10 0000100000 001000 000001 000010 01
3 0 1 0 > 0 10100 111000 1001 0 > 3 4 > 6 2 > 4 10110 10100000 11 1000000000 100100 001001 000101 01
4 0 1 8 > 10 01010 000110 1100 0 > 0 2 > 4 1 > 6 00100 11010000 11 0100100001 101000 000001 001011 11
3 0 1 1 > 4 10110 001010 1010 5 > 9 2 > 9 1 > 5 01010 10100000 10 0100000000 001001 010001 100010 10
4 0 1 4 > 5 00110 001110 1000 3 > 4 8 > 9 8 > 10 00011 10000000 10 1000100000 001100 010001 101100 01
5 0 1 0 > 6 11000 111000 1101 5 > 10 4 > 8 1 > 6 01110 01010001 10 0001100000 100111 000001 101010 01
5 0 1 1 > 10 01110 011011 1100 0 > 2 8 > 10 2 > 10 00111 11001000 10 0011000000 011100 000011 011110 01
8 0 1 1 > 9 11000 011100 1010 1 > 10 6 > 10 5 > 10 00111 11100000 11 1101000000 011110 001011 111101 11
2 0 1 9 > 10 10100 001000 1100 0 > 2 3 > 4 3 > 6 00100 01100000 10 0011000000 001100 000011 001100 01
8 0 1 0 > 3 01110 011110 1001 2 > 7 1 > 10 4 > 10 00011 10001000 10 0101100000 001101 100001 111000 01
2 0 2 3 > 4 01100 010100 1000 3 > 7 2 > 5 8 > 9 00010 10010000 10 0000100000 001000 011000 001010 01
4 0 2 0 > 8 01100 001000 1110 1 > 4 4 > 7 1 > 4 00100 01110000 10 1110000000 101001 000001 001111 11
7 0 2 6 > 10 11110 011110 1100 0 > 2 8 > 10 7 > 10 00011 11000000 10 0000100000 001101 000001 111100 01
3 0 2 3 > 5 01101 000100 1000 0 > 2 4 > 5 5 > 7 00100 01000000 10 0000100100 000110 100001 010100 01
3 0 2 3 > 5 10101 001010 1000 3 > 6 1 > 1 1 > 4 01100 01001000 11 0000100000 001010 000001 011000 01
2 0 2 7 > 9 00010 000110 1000 0 > 0 4 > 9 8 > 9 00011 10000000 10 0100000000 000100 010001 010100 11
5 0 2 7 > 10 11100 001100 1000 0 > 1 5 > 7 3 > 6 00110 10110010 11 0010100000 001110 000011 010100 01
8 0 2 0 > 2 11010 11110 1110 6 > 8 1 > 6 1 > 4 00110 11110000 10 1000100100 101111 000001 111001 01
9 0 2 5 > 8 01101 001110 1000 0 > 2 2 > 8 5 > 9 00111 10100000 10 0100100100 001110 000001 001010 01
7 0 2 0 > 8 01110 001110 1000 0 > 9 1 > 3 2 > 5 01101 10110100 11 1101100000 011100 000001 010100 11
©2007 Thomas E. McKee, Ph.D., CPA 59
3 Agglomerative Cluster Models
Models Number of Rules In Model
Number of Pruned Rules InModel
Average Inputs Per Rule for Unpruned Model
Accuracy of Unpruned Model On Development Sample
Accuracy of Pruned Model On Development Sample
Accuracy On Separate Validation Sample
23 9 5.7 100% [100 companies]
79% [100 companies]
n/a
14 4 2.9 100% [50 companies]
62 % [50 companies]
11 4 1.9 100% [50 companies]
60 % [50 companies]
©2007 Thomas E. McKee, Ph.D., CPA 60
Comparison of Two Techniques Via Program Size
n Weak “least Herbrand model” comparison
n SEE 5 = 9.32 bits n PNC2 = 50.7 bits
n Byte count comparison
n SEE 5 = 890 bytes n PNC2 = 4617 bytes
n Conclusion: SEE 5 model 18 19% as complex as PNC2 model
©2007 Thomas E. McKee, Ph.D., CPA 61
Logistic Regression Model
n SPSS version 13 used to create 10 binary logistic regression models
n All models statistically significant
n 3 models based on forward stepwise
n 3 models based on backward stepwise
n 4 models based on entering specified variables
©2007 Thomas E. McKee, Ph.D., CPA 62
Logistic Regression Models
M
od
el
Na
m
e
Logistic
Regression
Method of
Variable Entry
Number of
Variables
In Model
Number
of Cases
Utilized
Accuracy on
Development
Sample
Significance
of Model
Coefficient
Accuracy
on 50%
Random
Validation
Sample
AA Forward stepwise
conditional
2 [V10,V11] 81 70.4% .027 61.1% on
54 cases
BB Forward stepwise
likelihood ratio
2 [V10,V11] 81 70.4% .027 61.1% on
54 cases
CC Forward stepwise
likelihood Wald
2 [V10,V11] 81 70.4% .027 61.1% on
54 cases
DD Backward
stepwise
conditional
5 [V3,
V8,V12,V13,V1
4]
81 70.4% .02 68.1% on
47 cases
EE Backward
stepwise
likelihood ratio
4 [V3,
V8,V12,V14]
81 67.9% .019 61.7% on
47 cases
FF Backward
stepwise
likelihood Wald
5 [V3,
V8,V12,V14,V1
5]
81 66.7% .019 70.2% on
47 cases
GG Enter specified
variables
3 [V4,
V11,V14]
88 55.7% .042 60.4% on
48 cases
HH Enter specified
variables
3[V4, V7, V8] 98 59.2% .045 60.4% on
53 cases
II Enter specified
variables
3 [V2, V6, V9] 100 58% .165 61.1% on
54 cases
JJ Enter specified
variables
4[V5, V8, V9,
V11]
91 68.1% .021 64.6% on
48 cases
©2007 Thomas E. McKee, Ph.D., CPA 63
Logistic Regression “Best” Model
n Logistic Score [L] = sum of n 5.346
n 1.337 V5 [ ML Bankruptcy Probability]
n .623 V8 [Company Size]
n .726 V9 [Top 5 Mgmt. Ownership]
n .195 V11 [Auditor Tenure]
n Fraud Probability = n e L / ( 1 + e L )
n Where e ≈ 2.72 (natural log base)
©2007 Thomas E. McKee, Ph.D., CPA 64
Logistic Regression “Best” Model
n Development sample accuracy 68.1% on 91 cases
n Significance of model coefficient .021
n Validation sample accuracy was 64.6% on 50% random validation sample of 48 cases
68.1% 43 48 Overall
65.2% 30 16 NonFraud
71.1% 13 32 Fraud Actual Classifications
NonFraud Fraud
% Correct Model Predictions
©2007 Thomas E. McKee, Ph.D., CPA 65
Logistic Regression “Best” Model Applied To WorldCom/ MCI at December 31, 2001
n Logistic Score [L] = sum of n 5.346 n 1.337 x .23 n .623 x 10.96 n .726 x 0 n .195 x 3
n Logistic Score L = 1.20
n Fraud Probability = n = e 1.20 / ( 1 + e 1.20 ) n = .77 n Where e ≈ 2.72 (natural log base)
n [In June 2002 WorldCom announced profits inflated by $3.8 billion over previous 5 quarters, later increased to $11 billion ]
©2007 Thomas E. McKee, Ph.D., CPA 66
Research Limitations
n Sample selection not random
n Misclassification costs not formally considered
n Variable and model selection somewhat subjective
n Nonfraud companies may have had fraud
n Latest data was 2002 and relationship between fraud and risk factors may have changed after SOX
n Overfitting bias evident for models
©2007 Thomas E. McKee, Ph.D., CPA 67
Final Models
n Agglomerative cluster model n 61% accurate with cross validation n 79% accurate on entire sample n 9 rules and 15 variables
n Decision tree model n 69% accurate with cross validation n 81% accurate on entire sample n 3 rules and 4 variables
n Logistic regression model n 64.5% accurate with cross validation n 68.1% accurate on overall sample n 4 variables
©2007 Thomas E. McKee, Ph.D., CPA 68
Next Step? MetaLearning Model [stacking]
n Combining output from multiple models
n For example: n OutputàSingle Fraud Prediction
n Inputsà n Agglomerative cluster prediction n Decision tree prediction n Logistic regression prediction
©2007 Thomas E. McKee, Ph.D., CPA 69
Selected References n AICPA. 2007. SAS 113 Consideration of Fraud In A Financial Statement Audit.
n Albrecht, S. and Romney, M. 1986. “RedFlagging Management: A Validation.” Advances In Accounting, pp. 323333.
n Bell, T.B. and J. V. Carcello. 2000. “A Decision Aid For Assessing The Likelihood of Fraudulent Financial Reporting.” Auditing: A Journal of Practice & Theory, Vol. 19, No. 1, pp. 169184.
n Bell, T.B., S. Szykowny, and J.J. Willingham. 1993. “Assessing The Likelihood of Fraudulent Financial Reporting.” Working paper, KPMG Peat Marwick, Montvale, NJ.
n Beneish, M.D., 1997, Detecting GAAP Violation: Implications For Assessing Earnings Management Among Firms With Extreme Financial Performance, Journal of Accounting and Public Policy,Vol. 16 : 271309.
n Chen, H., S. Huang, and Y. Lin. 2006. “Using Artificial Neural Networks To Predict Fraud Litigation: Some Empirical Evidence From Emerging Markets.” Collected Papers of the Fifteenth Annual Research Workshop On Artificial Intelligence and Emerging Technologies In Accounting, Auditing and Tax . Editors C.E. Brown, S. Grabski, and A.A. Baldwin : 97104.
n Conkin, D. and I.H. Witten. 1994. “ComplexityBased Induction.” Machine Learning, 16: 203 225.
n Fanning, K.M., K.O. Cogger and R. Srivastava. 1995. “Detection of Management Fraud: A Neural Network Approach” International Journal o f Intelligent Systems In Accounting, Finance, and Management.Vol. 4, pp. 113126.
n Fanning, K.M. and K.O. Cogger. 1998 “Neural Network Detection of Management Fraud Using Published Financial Data.” International Journal of Intelligent Systems In Accounting, Finance and Management. Vol. 7, pp. 2141.
n Feroz, E.H., T. M. Kwon, V.S. Pastena and K. Park. 2000. “The Efficacy of Red Flags In Predicting The SEC’s Targets: An Artificial Neural Networks Approach. International Journal of Intelligent Systems in Accounting, Finance & Management. Vol. 9 : 145157.
n Gillett, P.R. and N. Uddin. 2005. “CFO Intentions of Fraudulent Financial Reporting.” Auditing: A Journal of Practice & Theory. May, Vol. 24, No. 1, pp. 5575.
©2007 Thomas E. McKee, Ph.D., CPA 70
Selected References n Green, B.P. and J.H. Choi. 1997. “Assessing The Risk of Management Fraud Through Neural
Network Technology.” Auditing: A Journal of Practice & Theory. Vol. 16, No. 1, Spring, pp. 1428.
n Haendel, L. 2003. The PNC2 Agglomerative Cluster Algorithm : An Integrated Learning Algorithm For Rule Induction. 66 pages. http://www.newty.de/pnc2/PNC2.html, accessed 08/24/2006.
n Hansen, J.V., J.B. McDonald, W.F. Messier, and T.B. Bell. 1996. “A Generalized Qualitative Response Model And The Analysis of Management Fraud. Management Science, Vol. 42, pp. 10221032.
n Loebbecke, J. and J. Willingham. 1988. Review of SEC Accounting and Auditing Enforcement Releases. Working Paper.
n Loebbecke, J., Eining, M. and Willingham, J. 1989. “Auditors’ Experience With Material Irregularities: Frequency, Nature, and Detectability.” Auditing: A Journal of Practice & Theory. Spring, pp. 128.
n McKee, T.E. 2006 . “Predicting Fraudulent Financial Reporting.” Unpublished working paper. 38 pages.
n McKee, T.E. 1995. “Predicting Bankruptcy Via Induction," Journal Of Information Technology (Vol. 10), pp. 2636.”
n Quinlan, J.R. 1986. “Induction of Classification trees,” Machine Learning, 1 : 81106.
n Roberts, D. M. and P. D. Wedemeyer, “Assessing The Likelihood of Financial Statement Errors Using A Discriminant Model,” Journal of Accounting Literature, Vol. 7, 1988 : 133 146.
n Romney, M.B. , W.S. Albrecht and D.J. Cherrington. 1980. “Red Flagging The White Collar Criminal.” Management Accounting. May.
n SEE5. Rulequest Research Pty Ltd. www.rulequest.com.
n Standard & Poor’s Research Insight With Compustat Data. 2005. Standard & Poor’s, Centennial, CO.