Synopsis

In audits of operational spreadsheets, the reporting individual or organization examines real-world spreadsheets to look for and count errors. Some studies were done by commercial spreadsheet auditing consultancies, but most were done by researchers that used methodologies far less comprehensive than those used in commercial spreadsheet auditing consultancies.

We will look first at the "Core 5" studies that used strong methodologies likely to catch many errors and that only reported important errors based on the spreadsheet's intended use. These studies collectively examined 55 spreadsheets. They found important errors in 91% of them.

We will then expand this to the "Known 9" that published their methodology in reasonably understandable form. These studies collectively examined 163 spreadsheets. They found errors in 84%. The slightly lower rate of incorrect spreadsheets may be due to the fact that the additional four studies relied heavily on static analysis programs (also called auditing programs) that look for patterns in the spreadsheet that might indicate errors. Static analysis programs can only detect certain types of errors. Panko and Aurigemma [2010] applied two static analysis programs to spreadsheets that had been produced in an experiment, the results were extremely disappointing. However, they did not look at the map analysis tools that most static analysis programs have.

In general, these studies probably underestimated the number of errors strongly. Experiments in which participants attempted to identify errors in a spreadsheet found that they only caught about 60% of the errors. While humans are good at avoiding commission errors when they work, they generally have far lower success when they attempt to find errors. This has been studied extensively in software inspection and in proofreading.

The term "auditing" is somewhat misleading because auditors do not look at everything when the audit. They look for indications of lapses from proper practice, including errors. However, they sample individual items to develop an evidence-based opinion on various issues. Only the Butler (1992) data represent a true audit in this sense. Most others used methodology similar to software code inspection, in which they examined all formulas and presumably the data as well. However, even these studies used inspection methodologies much less comprehensive than those used in software code inspection. For instance, all but the Hicks (1995) study used a single inspector; in spreadsheet inspection experiments, single inspectors have only caught 34% to 69% of the errors in the spreadsheets they inspected.

Given these limitations, we will look at the data on audits of operational spreadsheets in a series of slices through the data.

The "Core 5" Audits that Only Reported Serious Errors and Did Not Depend Primarily on Static Analysis Programs

Given concerns about the efficacy of static analysis programs, the following table excludes studies that depended on these programs primarily. In this more limited pool, the studies share two common attributes. First, they only reported errors that were serious in the context of the spreadsheet's use. Second, they used methodologies more likely to detect errors.

The Coopers and Lybrand and KPMG studies were commercial spreadsheet audits. As Croll (2003) has noted, these studies take several days rather that a few hours like most audit studies.
Hicks used three-person code inspection.
Lukasic reproduced the entire spreadsheet model in a financial modeling program and compared differences to find errors.
The Butler study used a well-developed auditing methodology refined by several years of use. Unlike an earlier study he provided information about, this study examined every formula in the spreadsheet. These five studies examined a total of 55 spreadsheets.

Limiting reporting to more promising methodologies and serious errors led to an even higher percentage of incorrect spreadsheets--91% of the 55 spreadsheets. We consider these studies to be the core audit studies based on methodology.

Table 1: The "Core 5" Audits that Only Reported Serious Errors and Did Not Depend Primarily on Static Analysis Programs

Author(s)	SSs	Percent with Errors	Cell Error Rate (CER)	Description
Hicks (1995)	1	100%	1.2%	Used three-person inspection, generally following the Fagin (1976) method used in software code inspection. One omission error would have cost more than a billion dollar.
Coopers & Lybrand (1997)	23	91%	NR	Accounting spreadsheets with at least 100 rows. Only reported errors that made the spreadsheet wrong by at least 5% In accounting, such errors are material.
KPMG (1998)	22	91%	NR	Decision-supporting spreadsheets. Only reported errors large enough to create a danger of making the wrong decision.
Lukasic (1998)	2	100%	2.2%, 2.5%	Recreated in a financial modeling program two large spreadsheets used to propose massive development plans. Both wrong. One off by 16%, which would have endangered the project. The other had a smaller but less serious error.
Butler (2000)	7	86%	NR	Government auditing process for spreadsheets required to be submitted along with excise taxes to HMS Excise and Customs Office. Examined all formulas. Only reported errors large enough to demand a change in the submitted amount.
Serious Errors Only (weighted average)	55	91%

Audits of Operational Spreadsheets with Known Methodologies

At a broader level, we have data from nine studies that used known methodologies. This excludes a few early audits in the 1990s. This corpus of studies includes both comprehensive inspection studies and studies that relied heavily on static analysis programs to find errors. These nine inspections audited 163 programs. They found errors in 84% using a weighted average. This is an alarming percentage. Only one study, in fact, found errors in fewer than 86% of the spreadsheets it examined. The nine studies used different methodologies, so they are difficult to compare in detail. However, this gives a hybrid robustness to the results.

Table 2: Errors in Audits of Operational Spreadsheets with Known Methodologies

Author(S)	SSs	Percent with Errors	Cell Error Rate (CER)	Description
Hicks (1995)	1	100%	1.2%	Used three-person inspection, generally following the Fagin (1976) method used in software code inspection. One omission error would have cost more than a billion dollar.
Coopers & Lybrand (1997)	23	91%	NR	Accounting spreadsheets with at least 100 rows. Only reported errors that made the spreadsheet wrong by at least 5% In accounting, such errors are material.
KPMG (1998)	22	91%	NR	Decision-supporting spreadsheets. Only reported errors large enough to create a danger of making the wrong decision.
Lukasic (1998)	2	100%	2.2%, 2.5%	Recreated in a financial modeling program two large spreadsheets used to propose massive development plans. Both wrong. One off by 16%, which would have endangered the project.
Butler (2000)	7	86%	NR	Government auditing process for spreadsheets required to be submitted along with excise taxes to HMS Excise and Customs Office. Examined all formulas. Only reported errors large enough to demand a change in the submitted amount.
Lawrence & Lee (2001)	30	100%	NR	30 most financially significant SSs audited by Mercer Finance & Risk Consulting in previous year, a commercial spreadsheet auditing firm.
Clermont, Hanin, & Mittermeier (2002)	3	67%	0.2%	Quantitative errors only. Used a statica analysis program (spreadsheet "auditing" program) exclusively. Note the low cell error rate compared to the Hicks (1995) and Lukasic (1998) studies, which used human error detection.
Powell, Baker & Lawson (2008a)	50	86%	0.9%	Spreadsheets sampled from the EUSES corpus. Inspectors did not have a description of the spreadsheet's requirements, design, or flow. Depended primarily on a static analysis program to find errors, although inspectors were instructed to look at all formulas, at least briefly. Only spent 3.25 hours per spreadsheet, on average.
Powell, Baker & Lawson (2008b)	25	44%	NR	In this study, worked with the spreadsheet developer/owner to understand the spreadsheet and evaluate errors. Depended primarily on a static analysis program to find errors, although inspectors were instructed to look at all formulas. All had errors of $100,000 or more, 7 had errors of $1 million or more, 6 had errors of $10 million or more, and 1 had an error of more than $100 million.
All Studies (weighted average)	163	83%

Experiences of Spreadsheet Auditing Consultancies

We have just looked at data from two spreadsheet auditing consultancies, which typically spend days to audit a spreadsheet. These studies only reported serious errors in the context of the spreadsheet's use. In interviews by the author with two spreadsheet auditing consultancy principles, reported that while serious errors were very frequent, neither had ever audited a spreadsheet model without finding an error. This has also been the author's experience in doing 30 commercial spreadsheet audits. While spreadsheets with serious errors are bad, the principals said that about five percent of the spreadsheets they audit go beyond serious errors and have "show stopper" errors.

Early Audits of Operational Spreadsheets

So far, we have not reported on four studies that were done earlier.

Davies and Ikin (1987) and Cragg and King (1995) did not report methodology in detail, but their methodologies appeared to have been cursory. This may account for the low percentage of incorrect spreadsheets found. Cragg and King (1995) specifically reported concerns about the study's methodological limitations.
The Butler (1992) at HM Excise and Customs reported on a true audit that did not examine all spreadsheet formulas but did risk-based sampling. The total number of spreadsheets that were actually correct in the sample was not discernible by this method.
Finally, Dent (1994) looked at a single firm's inspection of its operational spreadsheets to look for a single type of error, namely the overwriting of a formula by a number during operational use. This makes the spreadsheet incorrect for subsequent users. The exact number of spreadsheets examined was not reported, other than that it was "in the dozens." This is an important study because it shows how frequently spreadsheets can be damaged by users after development.

Table 3: Early Audits of Operational Spreadsheets

Author(s)	SSs	Percent with Errors	Comment
Davies and Ikin (1987)	19	21%	Methodology unknown. Only reported serious errors.
Cragg and King (1995)	20	25%	Spreadsheets with 50 to 10,000 cells. Acknowledged that the inspection was cursory, lasting only about two hours, and probably missed many errors.
Butler (1992)	273	11%	Government auditing process for spreadsheets required to be submitted along with excise taxes to HMS Excise and Customs Office. Mostly audited spreadsheets with a static analysis program to find errors. Only looked at high-risk portions of the spreadsheet. A true audit, as opposed to a limited inspection.
Dent (1994)	Unknown	30%	Percentage is the percentage of operational spreadsheets in an Australian mining company. Only counted a single type of error--operational spreadsheets in which formulas had been overridden with numbers during data entry in operational use.

Research Opportunities

In future studies, methodology should be enhanced. The most recent studies depended heavily on static analysis programs to find errors. Given that static analysis programs seem to be designed to detect slip errors rather than logic errors and that most errors in development experiments appear to be logic errors, depending heavily on these programs requires careful assessment.

Future studies should spend much longer per spreadsheet. It seems best to examine whether Fagin (1976, 1986) code inspection rules should apply to spreadsheet inspection. This includes using multiple inspectors (five is a good number to use for data analysis on effectiveness), an initial meeting at which someone goes over a spreadsheet module in depth for the reviewers, the limitation of inspections to modules, and the placing of strict limitations on module size, inspection rates, and inspection time. Code inspection has proven to be highly effective in finding errors in software, where commission rates are similar to those in spreadsheet development, although where detection rates are lower.

Using multiple inspectors has another benefit. By considering how many errors each inspector found and how much overlap there was in their detected errors, it is possible to estimate how many errors were not found in an inspection. Not being able to do this has been a major limitation on past studies.

Annotated References

Butler, Raymond, Tax audit study, personal communications by electronic mail, August and September, 1996 and in August, 1997. Mr. Butler is Team Leader, Central Computer Audit Unit, HM Customs & Excise Computer Audit Unit. Data collected 1992.

273 spreadsheets were audited using spreadsheet auditing software. The staff used the auditing program to find core formulas that were copied. They examined the core formulas and then examined the general coping for that formula rather than examining all copied cells. They only reported errors serious enough to lead to a demand for additional tax payment. They found such errors in 10.7% of the spreadsheets.

Butler, Raymond, "Is this Spreadsheet a Tax Evader? How H. M. Customs & Excise Test Spreadsheet Applications" Proceedings of the Thirty-Third Hawaii International Conference on System Sciences, Maui, Hawaii, January 2000. (.PDF format)

Spreadsheet models are commonly used by UK Taxpayers to calculate their liabilities. The risks of error from spreadsheets have been exhaustively documented, and applications in this domain are no less at risk of error than those in any other. Officers of H. M. Customs and Excise in the United Kingdom have been performing field audits of taxpayers’ spreadsheet applications since 1985. Building on the experience gained, H. M. Customs and Excise computer auditors have developed a testing methodology and supporting audit support software for use by generalist tax inspectors. This paper briefly summarizes the audit experience, describes the methodology and outlines the results to date of a campaign of spreadsheet testing started in July of 1999.

Clermont, M., Hanin, C. & Mittermeier, R. (2000, July). A spreadsheet auditing tool evaluated in an industrial context. In Proceedings of the EuSpRIG 2000 Symposium, Spreadsheet Risks—the Hidden Corporate Gamble. (pp. 35-46). Greenwich, England: Greenwich University Press.

Relied primarily on a static analysis program to find errors.

Coopers & Lybrand in London. Description available at http://www.planningobjects.com/jungle1.htm. Contact information is available at that webpage.

The company audited 23 operational spreadsheets that had more than 150 rows. Errors in 21 (91%) made bottom-line values off by more than 5%. In accounting, these are considered to be material errors.

Cragg, P.G. & King, M. "Spreadsheet Modelling Abuse: An Opportunity for OR?" Journal of the Operational Research Society (44:8) August 1993, pp.743-752.

By Dr. Cragg: Spreadsheet models are increasingly being used in decision making within organizations. With questions about the quality of these models, an investigation was conducted into the spreadsheet practices in ten firms, with an emphasis on the process of building spreadsheet models. The study showed that spreadsheet models were usually built in an informal, iterative manner, by people from all organizational levels. These people had received very little training in the building of models, which could help explain why at least 25% of the models contained errors. Other problems were also found. It was evident that the spreadsheet practices in the firms were inadequate. There is a need for increased training as well as setting and enforcing organizational spreadsheet standards. This could provide an opportunity for OR workers if carefully handled. Although the study was viewed as exploratory, it indicated a need for further study into the effect of formal design practices on the incidence of errors and model creation time. Abstract by P. G. Cragg.

Our summary: Audited 20 operational models from 10 firms. 150 to 10,000 cells in size. Audits lasted only two hours each, which by program code inspection guidelines was quite short. Cragg and King acknowledged that they probably missed some errors. Five (25%) were found to have quantitative errors. Spreadsheets had been in use a median of 6 months and had an average of 7 updates. Only one had been tested by another person. All but one had been tested with data. Half had poor layout. Only a third used cell protection. Eight were for use by others, but in only two cases did others have input on the design. In 6 cases, there had been problems with the model previously. Development had been informal and iterative.

Croll, Grenville J. (2003). "A Typical Model Audit Approach: Spreadsheet Audit Methodologies in the City of London," IFIP, Integrity and Internal Control in Information Systems, Vol 124, pp. 213-219

Davies, N. & Ikin, C. "Auditing Spreadsheets," Australian Accountant, December 1987, pp. 54-56.

Audited 19 operational models from 10 different people in 10 different firms. Four (21%) has serious errors, including a $7 million error in interdivisional transfers, different exchange rates for Australian dollars in the same time period, and a negative balance for stock on hand. Thirteen had no documentation, and only ten used cell protection. Only five were free of quantitative errors and qualitative (potential) errors, and these five were from two of the ten participants. Yet subjects expressed strong beliefs that their spreadsheets were error free. Only half used cell protection. Manual audits were "rare."

Dent, A., personal communication with the first author via electronic mail, April 2, 1995.

In an Australian mining firm, and audit found that 30% of the spreadsheets audited had been corrupted because cell protection had not been used, and users typed numbers into formula cells.

Hicks, Lee, Audit of Capital Budgeting Spreadsheet at NYNEX, personal communication via electronic mail, June 21, 1995.

A 3-person code inspection of module in a spreadsheet about to become operational for doing capital budgeting at NYNEX. The team found errors in 45 of 3,856 cells, for a cell error rate of 1.2%. There were 27 mechanical, 14 logic, and 2 omission errors. Two errors were not classified. One omission error would have produced an error of almost a billion dollars.

KPMG Management Consulting, "Supporting the Decision Maker - A Guide to the Value of Business Modeling," press release, July 30, 1998.

Reports on an audit of 22 spreadsheets by KPMG Management Consulting. Spreadsheets were from over 21 major UK banking and financial organizations. Twenty (91%) had significant errors. 92% of the spreadsheets dealing with tax issues had significant errors; 75% had significant accounting errors, and 59% were judged to have poor design. Contact was Mr. David Parlby, Partner in charge of Business Modeling at KPMG Management Consulting.

Lawrence, Robert J. & Lee, Jasmine (2004). “Financial Modelling of Project Financing Transactions,” Institute of Actuaries of Australia Financial Services Forum. The Institute of Actuaries of Australia, Level 7 Challis House 4 Martin Place, Sidney, NSW Australia 2000.

Examined spreadsheets created to justify project funding. The 30 most financially significant spreadsheets of this kind audited by Mercer Finance & Risk Consulting in previous year. An error of 2% in the bid price can lose the business. The average spreadsheet had 2,182 unique formulas.

Lukasik, Todd, CPS. Personal communication by e-mail, August 10, 1998.

Audited two operational spreadsheets by recreating the analyses with a financial analysis program (Alcar) and then examining the results to identify errors. Spreadsheet 1: 7,027 cells, 156 errors, Cell Error Rate (CER) of 2.2%. Model 2: 2,270 cells, 57 errors, CER 2.5%. In Model 2, the amount of money required to fund the project was understated by 16%, making the project look substantially more attractive than it really was. This was a very serious error.

Powell, Stephen G.; Lawson, Barry; and Baker, Kenneth R. (July 11-13, 2007). “Impact of Errors in Operational Spreadsheets,” EuSpRIG Conference Proceedings, University of London, Greenwich 2007, 57-67.

Used a spreadsheet auditing methodology created by the authors. An earlier version of this methodology had been used to examine 50 spreadsheets, although in the earlier study, the authors simply audited spreadsheets without talking to the spreadsheet authors to understand the spreadsheet. In this 2007 paper, the authors did discuss the spreadsheets initially with the spreadsheets' authors. In addition, Powell, et al. discussed the issues they identified with the spreadsheet authors. Some of the issues they identified were not considered to be errors by the spreadsheet authors; others were identified as errors, but the spreadsheet authors said that these errors had zero impact. If the study had only reported spreadsheets with issues, it would have reported that 84% of spreadsheets were incorrect.