Spreadsheet Development Error Experiments

Ray Panko's Spreadsheet Research Website

Synopsis

Studies described at the Human Error website indicate that when people do simple but nontrivial cognitive actions such as writing a line of computer code, they make errors about 1% to 5% of the time. This is nearly perfect. For everyday life, this is sufficient accuracy nearly all of the time. However, spreadsheets contain chains of formulas. And an accuracy rate of 95% to 99% plummets rapidly as the number of steps in a cascade grows. For a spreadsheet of any moderate size, a 1% to 5% error rate at the cell level almost always brings bottom line errors.

This page looks at error rates during spreadsheet development. Specifically, it looks at the cell error rate (CER), which is the percentage of cells that is incorrect in the final model, when the developer declares the model finished to his or her satisfaction.

For the nine studies that collected data on errors when developers worked alone, the CERs averaged 1.1% to 5.6%. The results are consistent with error rates in other simple but nontrivial cognitive action such as writing, doing a calculation, or writing a line of software code.

Cell Error Rates (CERs)

As just noted, spreadsheet base error rates are called cell error rates (CERs). Counting CERs uses the "original sin" procedure. Cells that are incorrect only because precedent cells are incorrect are not counted as errors. Cells that are copied are called root cells. If a root cell formula is copied correct, the copy is not considered an error. However, in such cases, the denominator must be adjusted to take this into account. The basis for the CER may be all cells, value cells (numbers and formulas), or formula cells. In these cases, the cell error rate is referred to as CERA, CERV, and CERF, respectively.

Cell error rates are important because they help us estimate how likely large and even medium spreadsheets are to have an error and, with typical human error rates such as these, how many errors there are likely to be in a spreadsheets. CER is a long-term average across many people and events and will not be accurate for specific spreadsheets. However, no one is immune to making errors.

Individuals Working Alone

Table 1 shows data from studies that had individuals build models working by themselves. If experiments have treatment and control groups, only control group results are reported for comparability to other studies. The nine studies in the table collectively used 547 participants. These studies reported average error rates of 1.1% to 5.6%. Note that error rates were higher for the simple Wall task than for the more difficulty Microslo/Kooker/Galumpke task

Table 1: Summary of Results from Spreadsheet Development Experiments in which Participants Worked Alone.

Study Task SSs Percent Incorrect CER Basis
Hassinen (1988) Unknown 92 55% 4.3% CERF
Irons (2003) Wall 11 18% 1.7% CERV
Kreie (2002) posttest Purpose-Built 73 52% 2.8% Unknown
Panko & Halverson (1997) Galumpke 42 79% 5.6% CERV
Panko & Halverson (2001) Microslo 35 86% 4.6% CERV
Panko & Sprague (1998) MBA students < 150 hrs Wall 26 35% 2.1% CERV
Panko & Sprague (1998) MBA students >250 hrs Wall 17 24% 1.1% CERV
Takaki (2005) Microslo plus Wall 83   4.8% CERV
Teo & Tan (1997) Wall 168 42% 2.1% CERV
Overall   547   3.2%  

The Broader Picture

Table 2 gives summary statistics from all spreadsheet development experiments. Two studies compared individual and group development. They found that having multiple people work on a spreadsheet simultaneously reduced errors but only for group sizes larger than dyads. They also showed that having people work jointly on building a spreadsheet had a problem; when a mistake was made, the other group members were often looking away at the screen, typically to study the problem statement.

Table 2: Summary of Results from All Spreadsheet Development Experiments

Study Task SSs Percent Incorrect CER Basis
Brown and Gould (1987) Unknown 27 63%    
Hassinen (1988) Unknown 92 55% 4.3% CERF
Panko & Halverson (1997)          
Working alone Galumpke 42 79% 5.6% CERV
Working in dyads Galumpke 23 76% 3.8% CERV
Working in tetrads Galumpke 11 64% 1.9% CERV
Teo & Tan (1997) Wall 168 42% 2.1% CERV
Panko & Sprague (1998) MBA Students Wall 43 30% 1.7% CERV
Panko & Halverson (2001)          
Working alone Microslo 35 86% 4.6% CERV
Working in triads Microslo 15 27% 1.0% CERV
Kreie (2002) posttest Purpose-Built 73 52% 2.8% Unknown
Goo (2002) control          
Task 1 Juice or Loan 61 89% 4.7% CERV
Task 3 Microslo 61 82% 5.2% CERV
Irons (2003)          
Task 1 Wall 11 18% 1.7% CERV
Task 2 (4 did not finish) Ball 7 71%    
Takaki (2005) Both Microslo and Wall 83   4.8% CERV
Kruck & Sheetz (2006) Control Bids for kitchen construction 28   11.0% CERV
Panko (2007) Control Both Microslo and Wall        
Overall   697   3.9%  

Types of Errors

Panko and Aurigemma (2010) used a new taxonomy to categorize errors in spreadsheets developed during an experiment. They individually categorized errors in 39 spreadsheets, then compared their results. They jointly classified 85 of the 88 errors the same. An important conclusion is that domain-related logic errors were far more common than execution errors (slips and lapses), constituting four errors in five.

Table 3: Types of Errors in Spreadsheets Developed in an Experiment

Category Number Percent
Total Errors Jointly Classified 85 100%
Planning Errors (Mistakes) 70 82%
Domain 69 81%
Spreadsheet Expression 1 1%
Execution Errors 15 18%
Slip 6 7%
Lapse 9 11%

Source: Panko and Aurigemma (2010).

Opportunities for Research

Traditionally, spreadsheet development studies have attempted to estimate the cell error rate in spreadsheet development. The purpose was to test (and demonstrate) the presence of an error problem. This has been done, and spreadsheet CERs have been shown to be comparable to other simple but nontrivial cognitive actions such as writing, doing calculations, and writing a line of computer code.

It is time for a new round of studies that examine ways that have been proposed to reduce errors during spreadsheet development. Human error research has shown that people have poor intuition about how many errors they make, and intuitions about what will reduce errors also are suspect. Prescriptions for reducing errors should not be accepted until they have been tested and proven to be "safe and effective."

The existing experiments have often used the Wall and the Microsoft/Galumpke/Kooker tasks. It would be good to have more tasks selected to represent different conditions of interest. Developing new tasks is difficult. If too many participants are unable to complete them, little can be learned about error rates.

Furthermore, because this research is likely to be done by computer scientists who are testing tools they have developed, the past tendency of computer scientists to use poor methodology must be addressed. There must be nonconvenience samples, large sample sizes, rigorous randomization for individual participants, careful experimental design, proper statistical analysis, and the proper presentation of statistical results. In addition, the number of errors must be an important dependent variable, rather than some other measure that is good for the tool but that may or may not actually reduce errors.

Finally, something that reduces errors during development may not be good if it reduces the ability to find errors during subsequent testing. For instance, to reduce long distances for reference arrows, Raffensperger has suggested the creation of longer formulas. However, inspection success when looking for errors in formulas has been shown to fall as formula size increases [Panko 1991]. New tools, then, should be tested in both development and in implications for subsequent testing.

References

Brown, P. S., & Gould, J. D. (1987). “An Experimental Study of People Creating Spreadsheets.” ACM Transactions on Office Information Systems, 5(3), 258-272.

Butler, R. J. (2000). “Is this spreadsheet a tax evader? How HM Customs and Excise test spreadsheet applications.” In Hawaii International Conference on System Sciences. Published by the IEEE Computer Society.

Goo, Justin M. W. (2002). “The Effect of Negative Feedback on Confidence Calibration and Error Reduction in Spreadsheet Development,” Ph.D. dissertation, Communication and Information Sciences Program, University of Hawaii.

Hassinen, K. (1988) An Experimental Study of Spreadsheet Errors Made by Novice Spreadsheet Users, Department of Computer Science, University of Joensuu, P. O. Box 111, SF-80101 Joensuu, Finland.

Hassinen, K., University of Joensuu, private communication with the first author, January 1995.  Provided further data from his 1988 study, so that the cell error rate could be computed.

Irons, Richard J (2003). The Wall and the Ball: A Study of Domain Referent Spreadsheet Errors, EuSpRIG Conference Proceedings 2003, Trinity College, Dublin, July 24-25 2003, 33-43.

Kreie, Jennifer; Cronan, Timothy Paul; Pendley, John; & Renwick, Janet S. (2000) "Application Development by End-Users: Can Quality be Improved?" Decision Support Systems, 29, pp. 143-152.

Kruck, S., and S. Sheetz, “Spreadsheet accuracy theory,” Journal of Information Systems Education, 12 (2001) 93-106.

Lerch, F. J. (1988). Computerized Financial Planning: Discovering Cognitive Difficulties in Knowledge Building. Unpublished Ph.D. Dissertation, University of Michigan, Ann Arbor, MI.

Olson, J. R., & Nilsen, E. (1987-1988). Analysis of the Cognition Involved in Spreadsheet Interaction. Human-Computer Interaction, 3(4), 309-349.

Panko, Raymond R., “Two Experiments in Reducing Overconfidence in Spreadsheet Development,” Journal of Organizational and End User Computing, 19(1), January–March 2007, 1-23.

Panko, Raymond R. and Aurigemma, Salvatore, “Revising the Panko–Halverson Taxonomy of Spreadsheet Errors,” Decision Support Systems, Volume 49, Issue 2, May 2010, Pages 235-244.

Panko, Raymond R. and Halverson, Richard P., Jr., “An Experiment in Collaborative Spreadsheet Development,” 2(4) Journal of the Association for Information Systems, July 2001.

Panko, Raymond R. & Halverson, Richard Jr., “Are Two Heads Better than One? (At Reducing Errors in Spreadsheet Modeling),” Office Systems Research Journal 15(1), Spring 1997, 21-32.

Panko, R.R. and Sprague, R.H. Jr., Hitting the Wall: Errors in Developing and Code Inspecting a “Simple” Spreadsheet Model, Decision Support Systems, 22(4) (1998, April).

Raffensperger, John F. (2003) “New Guidelines for Spreadsheets,” International Journal of Business and Economics, 2(2). 141-154.

Takaki, S. T. (2005). “Self-Efficacy and Overconfidence as Contributing Factors to Spreadsheet Development Errors.” Working Paper. Information Technology Management Department, College of Business Administration, 2404 Maile Way, Honolulu, HI, 96822.

Teo, T. S. H., & Tan, M. (1997, January). “Quantitative and Qualitative Errors in Spreadsheet Development.” Proceedings of the Thirtieth Hawaii International Conference on System Sciences, Maui, Hawai`i.