Excel statistical functions: VAR and VARP improvements and pivot tables

Applies to: Microsoft Office Excel 2007

Summary

The Microsoft Office Excel statistical functions, VAR and VARP, are computed with better precision in Microsoft Office Excel 2003 and later versions of Excel. The same algorithmic improvements were made for VAR and VARP when used to summarize data for pivot tables. Unfortunately, some computational issues remain unresolved. Use extreme caution when you summarize data with VAR and VARP in Excel 2003 and later versions of Excel. This article describes the remaining problem.

You may notice different values when worksheets that were created in earlier versions of Excel are recalculated in Excel 2003 and later versions of Excel. This article describes the values that are more accurate in Excel 2003 and later versions of Excel and the values that are less accurate.

Microsoft Excel 2004 for Macintosh Information

The statistical functions in Microsoft Office Excel 2004 for Macintosh were updated using the same algorithms as Excel 2003 and later versions of Excel. Any information in this article that describes how a function works or how a function was modified for Excel 2003 and later versions of Excel also applies to Excel 2004 for Macintosh.

The articles on VAR and VARP describe how a two-pass procedure has replaced the "calculator formula" for computing these measures. With infinite precision arithmetic, both procedures return the same results and the calculator formula takes half the computation time. However, with finite precision arithmetic, the potential for round off errors becomes a factor. The result of using the two-pass procedure is less susceptibility to round off errors in extreme situations in Excel 2003 and later versions of Excel. However, you may not notice the effect of these improvements because for most data sets, there are not enough significant digits in the data to cause different results from the calculator formula and the two-pass procedure. VAR and VARP are always more accurate in Excel 2003 and later versions of Excel, but VAR and VARP differences between earlier versions of Excel and Excel 2003 are likely to be noticeable only when data values contain many significant digits and yet have small variation.

Microsoft recommends that you read the article about VAR because it explains modifications in the computational procedure and it also provides a worksheet for more extensive experimentation with adding a power of ten to data values. For additional information about VAR, click the following article number to view the article in the Microsoft Knowledge Base:
826112 Excel Statistical Functions: VAR

The intent of the improvements was to also improve separate code for computing VAR for summarizing data in pivot tables. This code has been improved for part of the pivot table, but problems remain unresolved for other parts. The numeric example later in this article helps clarify the situation.

Create a blank Microsoft Office Excel worksheet, copy the table that follows later in this section, and then select cell A1 in your blank Excel worksheet. Then paste the entries so that the table fills cells A1:I28 in your worksheet.
 Group Trial Value Modified Value Power of 10 1 1 1 =C2+10^\$E\$2 8 1 1 2 =C3+10^\$E\$2 All Excel versions: 1 1 3 =C4+10^\$E\$2 Var of Value Trial 1 2 4 =C5+10^\$E\$2 Group 1 2 Grand Total 1 2 5 =C6+10^\$E\$2 1 1 1 3.5 1 2 6 =C7+10^\$E\$2 2 4 4 14 2 1 2 =C8+10^\$E\$2 3 1 1 3.5 2 1 4 =C9+10^\$E\$2 Grand Total 2.5 6.25 8.35294117647059 2 1 6 =C10+10^\$E\$2 2 2 8 =C11+10^\$E\$2 Excel 2002 values with E2 set to 8: 2 2 10 =C12+10^\$E\$2 Var of Modified Value Trial 2 2 12 =C13+10^\$E\$2 Group 1 2 Grand Total 3 1 3 =C14+10^\$E\$2 1 0 0 4.8 3 1 4 =C15+10^\$E\$2 2 4 4 14.4 3 1 5 =C16+10^\$E\$2 3 0 0 1.6 3 2 6 =C17+10^\$E\$2 Grand Total 4 6 7.52941176470588 3 2 7 =C18+10^\$E\$2 3 2 8 =C19+10^\$E\$2 Excel 2003 and later versions of Excel values with E2 set to 8: Var of Modified Value Trial Values with E2 set to 8: Group 1 2 Grand Total Excel 2002 Excel 2003 and later versions 1 1 1 4.8 all =VAR(C2:C19) 7.52941176470588 8.35294117647059 2 4 4 28.7 group 1 =VAR(C2:C7) 4.8 3.5 3 1 1 1.6 group 2 =VAR(C8:C13) 14.4 14 Grand Total 7500000450000010 0 7058824164705910 group 3 =VAR(C14:C19) 1.6 3.5 trial 1 =VAR(C2:C4, C8:C10, C14:C16) 4 2.5 trial 2 =VAR(C5:C7, C11:C13, C17:C19) 6 6.25

Note After you paste this table into your new Excel worksheet, click Paste Options, and then click Match Destination Formatting. With the pasted range still selected, use one of the following methods:
• In Microsoft Office Excel 2007, click the Home tab, click Format in the Cells group, and then click AutoFit Column Width.
• In Microsoft Office Excel 2003 and earlier versions of Excel, point to
Column on the Format menu, and then click
AutoFit Selection.
Data for pivot tables are in cells A1:D19. Columns F through I show values that are obtained in three pivot tables. The first shows VAR of Value (column C). The other two pivot table results are the same. By using this worksheet, you can create Modified Values (column D) by adding the power of 10 in cell E2 to each Value. Adding a constant to each data value will not affect variances. The second pivot table demonstrates that all variances are incorrect in Microsoft Excel 2002 and earlier versions if the data is modified by adding the large constant 10^8 to each value. This produces data with many significant digits but small variance. Such extreme data is unlikely to occur in practice, and you are unlikely to be affected by this degree of numeric imprecision. This experiment is designed to magnify round off errors. All results in this table, while incorrect, are consistent with results from computing various VARs by using the Excel VAR function in cells D23:D28.

The third pivot table shows results from Excel 2003 and later versions of Excel. The good news is that VAR has been correctly upgraded for the interior of the table, cells G22:H24. Values in these six cells agree with corresponding values in the first pivot table as they should. However, something yet unresolved is significantly wrong in the Grand Total row and Grand Total column. Cells E23:E28 show results from computing various VARs by using the Excel 2003 and later versions of Excel VAR function for Modified Values with cell E2 set to 8. These results agree with entries in the Grand Total row and Grand Total column of the first pivot table as they should.

Unfortunately, this author recommends that you ignore values in the Grand Total row and Grand Total column when you use VAR to summarize data in an Excel pivot table. For earlier versions of Excel, all entries in such pivot tables are likely to be correct provided that data values are not so extreme as to cause significant round off problems. To examine the accuracy of any VAR in the table, identify the range of data, Range, of whose VAR you want. For earlier versions of Excel, the correct value is obtained by computing DEVSQ(Range) / (COUNT(Range) – 1). For Excel 2003 and later versions of Excel, the correct value comes from VAR(Range).

The same comments and same performance issues apply to VARP. To verify the accuracy of any VARP in the table in any version, identify the range of data, Range, of whose VARP that you want. For earlier versions of Excel, the correct value is obtained by computing DEVSQ(Range) / COUNT(Range). For Excel 2003 and later versions of Excel, the correct value comes from VARP(Range).

Results in Earlier Versions of Excel

Use of the calculator formula in earlier versions of Excel makes VAR and VARP, STDEV, STDEVP more susceptible to round off errors. However, round off errors are very rarely significant enough that a typical user will notice them. These errors are likely to occur only in extreme situations, particularly when data contains many significant digits but small variations.

Results in Excel 2003 and later versions of Excel

An improved procedure that involves two passes through the data is correctly implemented for the interior of a pivot table. On the first pass, the sample mean is calculated; on the second pass, the sum of squared deviations about this sample mean is calculated. This sum is then divided by the number of observations minus 1 for VAR or by the number of observations for VARP. This procedure is generally recommended over the calculator formula to minimize the risk of round off errors.

Unfortunately, this has not been correctly implemented for the Grand Total row and Grand Total column. The author suggests an alternative mechanism for obtaining correct values of these VAR or VARPs.

Conclusions

You will occasionally see differences when you compare the values of VAR or VARP that are calculated with earlier versions of Excel to values that are calculated with Excel 2003 and later versions of Excel. In these cases, Excel 2003 and later versions of Excel will always be more accurate.

If Excel 2003 and later versions of Excel are used to summarize data with VAR or VARP in a pivot table, you can rest assured that results in the interior of the table are equally accurate or more accurate than in earlier versions of Excel. Users of Excel 2003 and later versions of Excel can ignore results in the Grand Total row and Grand Total column until computational issues with these entries are resolved.

XL2007