Explain the circumstances in which the auditor can use data mining as an audit technique Formulate...

Explain the circumstances in which the auditor can use data mining as an audit technique

Formulate audit procedures to test assertion over investment property valuation at fair value

Solutions

Expert Solution

1. Introduction
This paper analyses the use of big data techniques in auditing, and finds that the practice is not as widespread as it is in other related fields. We first introduce contemporary big data techniques and their origins in the multivariate statistical literature to help unfamiliar auditors understand the techniques. We then review existing research on big data in accounting and finance to ascertain the state of the field. Our analysis shows that – in addition to auditing – existing research on big data in accounting and finance extends across three other genealogies: (1) financial distress modelling, (2) financial fraud modelling, and (3) stock market prediction and quantitative modelling. Compared to the other three research streams, auditing is lagging behind in the use of valuable big data techniques. Anecdotal evidence from audit partners indicates that some leading firms have started to adopt big data techniques in practice; nevertheless, our literature review reveals a general consensus that big data is underutilized in auditing. A possible explanation for this trend is that auditors are reluctant to use techniques and technology that are far ahead of those adopted by their client firms (Alles, 2015). Nonetheless, the lack of progress in implementing big data techniques into auditing practice remains surprising, given that early use of random sampling auditing techniques put auditors well ahead of the practices of their client firms.

This paper contributes to bridging the gap between audit research and practice in the area of big data. We make the important point that big data techniques can be a valuable addition to the audit profession, in particular when rigorous analytical procedures are combined with audit techniques and expert judgement. Other papers have looked at the implications of clients’ growing use of big data (Appelbaum, Kogan, & Vasarhelyi, 2017) and the sources of useful big data for auditing (e.g., Vasarhelyi, Kogan, & Tuttle (2015); Zhang, Hu, et al., 2015); our work focuses more on valuable opportunities to use contemporary big data techniques in auditing. We contribute to three research questions regarding the use of big data in auditing, raised by Appelbaum et al. (2017) and Vasarhelyi et al. (2015): “What models can be used?”, “Which of these methods are the most promising?” and “What will be the algorithms of prioritization?” We provide key information about the main big data techniques to assist researchers and practitioners understand when to apply them. We also call for more research to further align theory and practice in this area; for instance, to better understand the application of big data techniques in auditing and to investigate the actual usage of big data techniques across the auditing profession as a whole.

This paper also integrates research in big data across the fields of accounting and finance. We reveal future opportunities to use big data in auditing by analyzing research conducted in related fields that have been more willing to embrace big data techniques. We offer general suggestions about combining multiple big data models with expert judgement, and we specifically recommend that the audit profession make greater use of contemporary big data models to predict financial distress and detect financial fraud.

The paper proceeds as follows. Section 2 introduces big data techniques, including their origin in the multivariate statistical literature and relates it to the modern mathematical statistics literature. Section 3 offers a systematic literature review of existing research on big data in accounting and finance. This section highlights how auditing substantially differs from the other major research streams. Section 4 identifies novel future research directions for using big data in auditing. Finally, Section 5 concludes the paper with important recommendations for the use of big data in auditing in the 21st century and a call for further research.

2. An inttroduction to big data techniques
This section presents an overview of big data and big data techniques to promote a greater understanding of their potential application. Auditors that use more advanced techniques need to understand them (Appelbaum et al., 2017). An introduction to big data provides the necessary background to present the main big data techniques available and the key information needed to determine which are appropriate in a given circumstance. Appendix A describes the main big data techniques, summarizes their key features and provides suggested references for readers who want more information.

Big data refers to structured or unstructured data sets that are commonly described according to the four Vs: Volume, Variety, Velocity, and Veracity. Volume refers to data sets that are so large that traditional tools are inadequate. Variety reflects different data formats, such as quantitative, text-based, and mixed forms, as well as images, video, and other formats. Velocity measures the frequency at which new data becomes available, which is increasingly often at a very rapid rate. Finally, the quality and relevance of the data can change dramatically over time, which is described as its veracity. The auditing profession has a large and growing volume of data available to it, of increasing variety and veracity. Textual information obtained online is one new type of data, and we discuss this phenomenon later in the paper. Auditors also face an increasing velocity of data, particularly in the context of real-time information, and this is described in Section 4.

Big data comes in a variety of flavors – “small p, large n”, “large p, small n”, and “large p, large n”, where n refers to the number of responses and p the number of variables measured at each response. These categorizations are important because they can influence which technique is the most suitable. The big data techniques described in Appendix A are suited to different categorizations; for instance, Random Forests1 is particularly useful for “large p, small n” problems. High-frequency trading generates massive data sets of both high volume and high velocity, creating major challenges for data analysis. Nevertheless, such “small p, large n” problems are perhaps the easiest of the three scenarios and the analytic tools used are, in the main, adaptations of existing statistical techniques. The “large p, small n” scenario is best exemplified by genomics. A single human genome contains about 100 gigabytes of data. Essentially the data is a very long narrow matrix with each column corresponding to an individual and each row corresponding to a gene. The cost of sequencing a genome has now fallen to a point where it is possible for individuals to purchase their own genome. As a consequence, genomics is rapidly transitioning to the “large p, large n” scenario. Climate change research is another example of science at the forefront of the big data “large p, large n” scenario, with multivariate time-series collected from a world-wide grid of sites over very long time frames.

Big data also refers to the techniques and technology used to draw inferences from the variety of flavors of data. These techniques often seek to infer non-linear relationships and causal effects from data which is often very sparse in information. Given the nature of the data, these techniques often have no or very limited distributional assumptions. Computer scientists approach big data from the point of view of uncovering patterns in the complete record – this is often called the algorithmic approach. The patterns are regarded as approximations of the complexity of the data set. By comparison, statisticians are more inclined to treat the data as observations of an underlying process and to extract information and make inferences about the underlying process.

The statistical techniques used in big data necessitate more flexible models, since highly structured traditional regression models , since highly structured traditional regression models are very unlikely to fit big data well. Furthermore, the volume (as well as variety and velocity) of big data is such that it is not feasible to uncover the appropriate structure for models in many cases. The popularity of more flexible approaches dates back to Efron’s (1979) introduction of the bootstrap at a time when increasing computer power made such new techniques feasible. The bootstrap is a widely applicable statistical tool that is often used to provide accuracy estimates, such as standard errors that can be used to produce confidence intervals. Regularization is another widely used technique which imposes a complexity penalty that shrinks estimated parameters towards zero to prevent over-fitting or to solve ill-posed problems. Ridge regression, which uses a L2 penalty,2 was initially proposed by Hoerl and Kennard (1970) in the 1970s; however, it has only become popular in recent decades with the advent of increased computing power. More recently, regularization techniques have become popular alternatives, such as LARS (least angle regression and shrinkage) proposed by Efron, Hastie, Johnstone, and Tibshirani (2004) and Tibshirani’s (1996) Lasso (least absolute shrinkage and selection operator) which uses an L1 penalty.3 The use of an L1 penalty is important because it is very effective in variable reduction and so results in sparse models that are easier to interpret. These simpler models are often easier to communicate to clients. Penalties that are a mixture of L1 and L2 are also available (Friedman, Hastie, & Tibshirani, 2010); indeed, contemporary statistics scholars continue to investigate new penalties for regularization.

Supervised learning develops explanatory or predictive models from data with known outcomes to apply to data with unknown outcomes. Some popular ways to conduct supervised learning include artificial neural networks, classification and regression trees (decision trees), Random Forests, Naïve Bayes, regularized regression4 (as mentioned above), support vector machines, and multivariate adaptive regression splines (MARS). In contrast, unsupervised learning seeks to uncover patterns in unlabeled data. Popular methods are unsupervised neural networks, latent variable models, association rules, and cluster analysis. Machine learning is an overarching term that encompasses both supervised and unsupervised learning. The techniques mentioned in this paragraph are briefly described in Appendix A.

3. The use of big data in accounting and finance research
This paper offers a systematic literature review of the use of big data techniques in auditing research and practice and follows methodical steps for collecting data to arrive at a comprehensive data set of articles to include in the review. First, we searched the Social Sciences Citation Index for ‘big data’ papers, searching for articles that contained the key words “big data” or “analytics” or “data mining” in the title, abstract, or keywords. To ensure that the search was not too broad, we limited the search to articles that also contained the keywords “accounting” or “financ*” in the title, abstract, or keywords. Our search identified a total of 286 records as of November 2016. Next; we screened the resulting articles to only retain those of interest to the current research. This reduced the original article base to 45 records. Excluded articles discussed other big data and quantitative applications in the context of business decision-making (e.g.; improving customer retention in financial services; see Benoit and Van den Poel (2012)). Next; we conducted further searches via cited references and Google Scholar to manually add another 47 articles into the data set. The articles were then assessed by the author team and categorized according to their main research focus. The analysis revealed four main genealogies; which we review below: (1) financial distress modelling; (2) financial fraud m(3) stock market prediction and quantitative modelling; and (4) auditing. We find that there has been much progress in the first three fields; but that auditors have been slow to implement research findings into practice. We then proceed to address the lack of uptake of big data measures.

3.1. Financial distress modelling

Papers in the financial distress modelling stream use data mining techniques to detect and forecast the financial distress (or financial failure) of companies and these techniques are also of interest to auditors to assist with their going concern evaluations.

Multiple studies have used decision tree based models. Sun and Li (2008) apply data mining techniques based on decision trees in order to predict financial distress. Starting with 35 financial ratios and 135 listed company-pairs, the researchers design and test a prediction model to show theoretical feasibility and practical effectiveness. Koyuncugil and Ozgulbas (2012b) use data mining methods to design a financial distress early warning system for small- to medium-sized enterprises. They test the model on over 7000 small- to medium-sized enterprises and develop a number of risk profiles, risk indicators, early warning systems, and financial road maps that can be used for mitigating financial risk. Similar work has also been undertaken by Koyuncugil and Ozgulbas (2012a) and Kim and Upneja (2014). Li, Sun, and Wu (2010) use classification and regression tree methods to estimate financial distress and failure for a sample of Chinese listed companies, while Gepp, Kumar, and Bhattacharya (2010) use US listed companies.

Chen and Du (2009) propose a different approach and apply data mining techniques in the form of neural networks to build and test financial distress prediction models. Using 37 ratios across 68 listed companies, they demonstrate the feasibility and validity of their modelling. Additional research supports their approach and suggests that neural networks perform better for financial distress modelling than decision trees and alternative approaches such as support vector machines (Geng, Bose, & Chen, 2015).

Zhou et al. (2015)Zhou, Lu, and Fujita (2015) compare the performance of financial distress prediction models based on big data analytics versus prediction models based on predetermined models from domain professionals in accounting and finance. They find that there is no significant difference in the predictions. However, a combination of both approaches performs significantly better than each on its own (Zhou et al., 2015). Lin and McClean (2001) also find that a hybrid approach of professional judgement and data mining produces more accurate predictions. Kim and Han (2003) go one step further and argue that analyses should incorporate qualitative data mining approaches to elicit and represent expert knowledge about bankruptcy predictions from data sets such as loan management databases.

The literature recognises that financial distress might not be limited to a company, but may also extend to corporate stakeholders. Khandani, Kim, and Lo (2010) use machine learning techniques to construct models of consumer credit risk at the level of the individual and the customer, rather than the corporation. They combine customer transactions and credit bureau data and are able to use machine learning to significantly improve classification rates on credit card default and delinquencies. Singh, Bozkaya, and Pentland (2015) were inspired by animal ecology studies to analyze the transactions of thousands of people; they found that individual financial outcomes are associated with spatio-temporal traits (e.g., exploration and exploitation) and that these traits are over 30% better at predicting future financial difficulties than comparable demographic models.

Auditors could harness big data techniques and methods for forecasting financial distress and, combined with their professional judgement, be better able to judge the future financial viability of a firm. This would improve the going concern evaluations required in audits by the Statement on Auditing Standards, No. 59 for public companies (AICPA, 1988). Incorporating big data models should help avoid the costly error of issuing an unmodified opinion prior to bankruptcy. Read and Yezegel (2016) found that this problem is particularly apparent in non-Big 4 firms within the first five years of an audit engagement. The authors do not offer an underlying reason, but it may be that smaller audit firms are reluctant to issue modified going concern opinions early in an engagement for fear of losing clients. If this is the case, then smaller audit firms may be better able to justify modified opinions to their clients by presenting them with objective results from big data models, and thereby increasing the independence of the going concern evaluations. The use of these models also represents an opportunity to increase the efficiency of the going concern evaluation part of the audit, notwithstanding the initial overhead cost of becoming familiar with big data models and techniques.

Although it is likely that the focus will be on one-year predictions that relate to going concern opinions, financial distress models could also be used for longer forecasts. These longer forecasts could be used by internal auditors who tend to have longer time-horizons than external auditors. Financial distress models that are supplemented by the opinion of the internal audit team as to the veracity of the forecasts could provide valuable information for senior management and the Board of Directors. Longer range forecasts and opinions give management more time to make strategic changes to minimize the likelihood that predicted financial distress will occur.

3.2. Financial fraud modelling

A second major research stream centers on modelling financial fraud which can help auditors assess the risk of fraud (Bell & Carcello, 2000) when conducting fraud risk assessments. Section 200 of the Statement on Auditing Standards No. 122/123 requires that external auditors “obtain reasonable assurance about whether the financial statements as a whole are free from material misstatement, whether due to fraud or error” (AICPA, 2011). By adopting contemporary big data models, auditors could provide this assurance, notwithstanding the current debate as to the exact meaning of “reasonable assurance” (Hogan, Rezaee, Riley, & Velury, 2008).

Financial fraud is a substantial concern for organizations and economies around the world.5 The Association of Certified Fraud Examiners (2016) estimates that the typical organization loses 5% of revenue each year to fraud. Applying this to the Gross World Product for 2014, global fraud loss amounts to nearly 4 trillion US dollars. These numbers have prompted researchers to consider the application of big data techniques to fraud detection, prediction, and prevention. For instance, Chang et al. (2008) suggest using visual data analytics to interactively examine millions of bank wire transactions—they argue that this approach is both feasible and effective. In contrast, Abbasi, Albrecht, Vance, and Hansen (2012) model financial fraud using meta-leaning, which is a specialized form of machine learning that combines the outputs of multiple machine learning techniques in a self-adaptive way to improve accuracy. They find the method to be more effective than other single approaches.

Other approaches use supervised neural networks (Green & Choi, 1997; Krambia‐Kapardis, Christodoulou, & Agathocleous, 2010) or unsupervised neural networks based on a growing hierarchical self-organising map (e.g., Huang, Tsaih, & Lin, 2014; Huang, Tsaih, &Yu, 2014) to build a financial fraud detection model. The approach proposed by Huang, Tsaih, Lin, et al. (2014) involves three stages: first, selecting statistically significant variables; second, clustering into small sub-groups based on the significant variables; and third, using principal compon3.3. Stock market prediction and quantitative modelling

In addition to the two research streams outlined above, a third stream is focused on stock market predictions and other quantitative modelling. This stream of research is particularly interested in predictive analysis and providing investment advice to managers and investors. Although this stream is not directly relevant to auditing, relevant lessons will be uncovered from the ways in which big data techniques are applied in this area.

Chun and Kim (2004) use neural networks and case-based reasoning, and a choice of two markets and a choice of passive or active trading strategy, to generate financial predictions substantially in excess of buy-and-hold returns. Lam (2004) employs neural networks and predicts market returns using financial ratios and macroeconomic variables. Chun and Park (2006) later find that a hybrid model further outperforms a pure case-based reasoning approach in predicting a stock market index, although the result is not statistically significant. Equity portfolios that outperform a benchmark index portfolio have also been constructed using popularity in Google searches (Kristoufek, 2013) and changes in Google search queries (Preis, Moat, & Stanley, 2013). Guerard, Rachev, and Shao (2013) also study equity portfolio construction and Pachamanova and Fabozzi (2014) review other studies on the topic. In addition, Zhang, Hu, et al. (2015) use a genetic algorithm-based model to generate stock trading rules (quantitative investment), which outperforms both a decision tree and a Bayesian network.

Curme et al., 2014Curme, Preis, Stanley, and Moat (2014) find that an increase in Google and Wikipedia searches on politics or business are related to subsequent stock market falls. Li, Ma, Wang, and Zhang (2015) use the Google search volume index as a measure of investor attention and find a significant association between the search index and trader positions and future crude oil prices. Adopting a different approach, Sun, Shen, and Cheng (2014) use individual stock transaction data to create a trading network to characterize the trading behaviour of stocks investors. They show that trading networks can be used to predict individual stock returns. Shapira, Berman, and Ben-Jacob (2014) model the stock market as a network of many investors, while Gui, Li, Cao, and Li (2014) model it as a network of communities of stocks.

Many studies have analysed news articles in order to make stock market predictions. Tetlock (2007) uses daily content from a popular Wall Street Journal column and finds that when media pessimism is high stock prices decline but then return to fundamentals. Additionally, unusually high or low media pessimism helps predict high trading volume. Alanyali, Moat, and Preis (2013) find the daily number of mentions of a stock in the Financial Times is positively correlated with daily volume, both before and on the day of the news release. Piskorec et al. (2014) construct a news cohesiveness index based on online financial news and show that this is correlated with and driven by volatility in financial markets. Research has also examined the sentiment of news articles (Smales, 2014a, 2014b, 2015). Jensen, Ahire, and Malhotra (2013) find a significant association between firm-specific news sentiment and intraday volatility persistence, especially for bad news. Nardo, Petracco-Giudici, and Naltsidis (2016) review the literature and conclude that while there is merit in using online news to predict changes in financial markets, the gains from implementing such an approach are usually less than 5%. However, Ranco et al. (2016) find substantial benefit in coupling news sentiment with web browsing data. Some studies (Dhar, 2014; Kao et al., 2015; Zheludev, Smith, & Aste, 2014) have also incorporated non-traditional online sources of information such as social media, blogs, and forums, and proposed many questions for future research.

ekkarill92 answered 1 year ago

Next > < Previous

Question

Explain the circumstances in which the auditor can use data mining as an audit technique Formulate...

Solutions

Expert Solution

Related Solutions

Discuss the circumstances which an auditor should use to ascertain reliability of audit evidence

Under what circumstances is an auditor required to provide an unmodified audit opinion?

Discuss each of the five circumstances when an auditor would issue an unqualified audit report with...

Explain the use of audit sampling methods that work for internal auditor within an organization.

What are the various audit reports an auditor can issue? What are the situations in which...

15. Suggest steps for audit finalization 16. Explain use of audit technique 17. Estimate average sample...

Use the dynamic AD-AS model to explain circumstances by which productivity and technological changes can lead...

explain audit risks associated with external auditor

EXPLAIN THE IMPORTANCE OF AUDIT ENGAGEMENT TO AUDITOR AND THE CLIENT

4. An audit program is designed to generate appropriate evidence on which the auditor can base...