Predicting the type of financial statement fraud
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Business |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2019
Department
Major/Subject
Mcode
Degree programme
Accounting
Language
en
Pages
100
Series
Abstract
Fraud is a problem for the all kinds companies, both large and small. According to a study be Association of Certified Fraud Examiners could be even 5% of the whole world Gross Domestic Product leading to approximately $4 trillion losses. The financial statement fraud is the costliest form of fraud, when it occurs with a median loss of $800.000 per case. However, in 22% of the cases of financial statement fraud the loss is over $1.000.000. The problem is that the main way of finding fraud has been whistleblowing. There is a clear need of other effective methods to finding fraud. In case of financial statement fraud one can attempt to use artificial intelligence methods to predict whether a financial statement is fraudulent or not. Usually this has been studied using models, which only whether the financial statement is fraudulent or not. Here also the type of fraud is studied, so that one could start to use the information for predicting in which part of the financial statement the fraud is in. We use dataset combined from Audit Analytics and Compustat datasets from Wharton Research Data Services. The data is for years 1995-2016 and consists of prediction variables, which are formed using financial statement data and other public data for the companies. Altogether there are 347 fraudulent financial statements and 58.892 non-fraudulent financial statements in the final dataset. 9 different predictive models are formed using regularized logistic regression and 35 predictive variables. 1 predictive model is for fraud as a whole, 8 are for different fraud types. Finally a predictive model of fraud is built using 3 different fraud types and compared whether it produces better results than modelling fraud directly. Of the 35 predictive variables 7 turn out to appear in at least 8 of the 9 different models: whether new securities were issued, value of issued securities to market value, accounts receivable, accounts receivable to total assets, is the auditor one of Big 4, net sales and whether standard industry classification code is between 3000-3999 or not. The performance of the models to predict fraud or fraud type is measured using expected relative cost of misclassification, accuracy, precision, sensitivity, receiving operating curves and areas under the receiving operating curves. Receiving operating curves for fraud and fraud types are quite similar, so are their areas under the operating curves, which is 0,71 for fraud and 0,68 for the combination of 3 fraud types. The rest of the results depend on the prior fraud probability in the world, which is taken to be between 0,1% - 10%, and the ratio of cost of misclassifying fraud as non-fraud to cost of misclassifying non-fraud as fraud, which varies between 1:1 and 100:1. The accuracy, which measures the percentage of correct classifications among all cases, is between 80% - 99% for the combination of three types and 81% - 99% for fraud. The precision, which measures the percentage of correct fraud classifications among all predicted fraud cases, varies between 1,3% - 3,5% for fraud and 1,4% - 4,2% for the combination of three types, these numbers are low because of the huge imbalance between fraudulent and non-fraudulent cases. The sensitivity, which measures the percentage of correct fraud classifications among all the actual fraud cases, varies between 1,4% - 42% for fraud and between 1,7% - 48% for the combination of three types. The expected relative cost of misclassification for the combination of three types by -3,7% - +0,05% compared to fraud depending on prior fraud probability and relative costs of misclassification. The combination of three types perform better in predicting fraud than direct fraud prediction in most cases prior fraud probability and relative cost of misclassification.Description
Thesis advisor
Myllymäki, Emma-RiikkaKeywords
fraud, financial statement, logistic regression, classification cost, accuracy, precision