The computational formulas for the DVAR, DVARP, DSTDEV, and DSTDEVP functions have been improved for Excel 2003. This article discusses the effects of these improvements.
Microsoft Excel 2004 for Mac information
The statistical functions in Excel 2004 for Mac were updated by using the same algorithms that were used to update the statistical functions in Microsoft Office Excel 2003. Any information in this article that describes how a function works or how a function was modified for Excel 2003 also applies to Excel 2004 for Mac.
These four functions are among Excel's Database and List Management Functions. Where the functions calculate the variance or the standard deviation over a set of observations, DVAR, DVARP, DSTDEV, and DSTDEVP differ from VAR, VARP, STDEV, and STDEVP (respectively) in the way they define the set of observations.
DVAR, DVARP, DSTDEV, and DSTDEVP take as arguments a database, a field, and criteria. The functions take as observations the field values in each database record that satisfies criteria.
VAR, VARP, STDEV, and STDEVP take as arguments a set of up to 30 value arguments. Most frequently, VAR, VARP, STDEV, and STDEVP use a single value argument that corresponds to a range of cells, such as VAR(A1:B100).
Therefore, with VAR, VARP, STDEV, and STDEVP, you specify the cells that contain the values you want to include. With DVAR, DVARP, DSTDEV, and DSTDEVP, you specify a database (table), field (column) and criteria. They include only those values that occur in cells that satisfy the criteria.
DVAR(database, field, criteria)
Database is the range of cells that makes up the list or database. A database is a list of related data where the rows of related information are records, and the columns of data are fields. The first row of the list contains labels for each column.
Field indicates what column the function uses. The field argument can appear as text with the column label enclosed between quotation marks, such as "Age" or "Yield," or as a number that represents the position of the column in the list: 1 (without quotation marks) for the first column, 2 for the second column, and so on.
Criteria is the range of cells that contains the conditions you specify. You can use any range for the criteria argument if it includes at least one column label and at least one cell below the column label that specifies a condition for the column.
Note The syntax of DVARP, DSTDEV, and DSTDEVP is the same as DVAR.
The Excel Help files contain a useful example for these functions. The following example is a slightly modified version of the example in the Help files. The following example focuses on the computational aspects of the functions.
Create a blank Excel worksheet, copy the following table, select cell A1 in your blank Excel worksheet, and then click Paste on the Edit menu so that the entries in the table below fill cells A1:F15 in your worksheet.
=13 + 10^$F$12
=9 + 10^$F$12
=8 + 10^$F$12
=9 + 10^$F$12
=7 + 10^$F$12
=5 + 10^$F$12
The estimated standard deviation in the yield of apple and pear trees if the data in the database is only a sample of the total orchard population. (2.9664794)
=STDEV(D5, D6, D8, D9, D10)
Power of 10 added to Yield
The true standard deviation in the yield of apple and pear trees if the data in the database is the whole population. (2.6532998)
=STDEVP(D5, D6, D8, D9, D10)
The estimated variance in the yield of apple and pear trees if the data in the database is only a sample of the total orchard population. (8.8000000)
=VAR(D5, D6, D8, D9, D10)
The true variance in the yield of apple and pear trees if the data in the database is the whole orchard population. ( 7.0400000)
You may want to format cells A12:A15 and C12:C15 as Number with 7 decimal places and cells D5:D10 as Number with 0 (zero) decimal places.
This worksheet is designed to show the differences in behavior between Excel 2002 (and earlier) and Excel 2003. Similar worksheets in the articles for VAR, VARP, STDEV, and STDEVP give you an opportunity for a more complete investigation of these differences.
For additional information, click the following article numbers to view the articles in the Microsoft Knowledge Base:
Each of the four calls to these "D" functions in cells A12:A15 use the same database, field, and criteria. Data is in A4:E10, the specified field is "Yield" whose entries are in column D, and the criteria specified in A1:A3 is that the value in the "Tree" field must be "Apple" or "Pear". There are five values that satisfy the criteria and are in cells D5, D6, D8, D9, and D10. Entries in C12:C15 are the corresponding calls to STDEV, STDEVP, VAR, and VARP that use these five cells as arguments, for example STDEV(D5, D6, D8, D9, D10). Entries in C12:C15 are the same as those in A12:A15. If the value in cell F12 has not changed from its initial value of 0, these values appear in parentheses in the text descriptions in cells B12:B15.
If you have a version of Excel earlier than Excel 2003, there is the potential for significant round off errors in extreme situations. In cell F12, you can add a power of 10 to each Yield data value. If you add the same constant to all the Yield data values, this should not affect the values of any of the four "D" functions or their counterparts STDEV, STDEVP, VAR, and VARP. If you gradually increase the value in cell F12 to 1, 2, 3, and 4, you can see that the addition of a constant does not affect the values of any of the eight functions shown in A12:A15 and C12:C15. Then, if you continue to increase the value in F12 to 5, 6, and 7, minor round off errors occur only in DVAR and VAR. If you continue to increase the value in F12 to 8 and 9, serious round off errors occur in all eight functions. When you set F12 to 9, these entries agree with the fixed numbers that appear in D12:D15.
These round off errors may alarm you, but they also illustrate the extreme type of situation where they occur. If there are many significant digits in the data and a very small variability between the data points, these round off errors can occur. When you set F12 to 9, every value is over 1 billion, but the sample and population standard deviations are both below 3.
If you have Excel 2003, there are no round off errors when you conduct this experiment. If you set F12 to 9, the entries in D12:D15 are the values that you would have obtained with the versions of Excel earlier than Excel 2003.
Results in earlier versions of Excel
In extreme cases where there are many significant digits in the data but a small variance, the old formula leads to inaccurate results. Earlier versions of Excel use a single pass through the data to calculate the sum of the squares of the data values, the sum of the data values, and the count of the data values (sample size). Excel 2002 and earlier then combine these quantities into the formula in the Help files for VAR, VARP, STDEV, and STDEVP. These formulas are also known as "calculator formulas" because they are suitable for use on a hand calculator for small sets of numerically well-behaved data. If, for example, you had weights in pounds for a sample of 100 people and these weights were all whole numbers between 100 and 400, then the "calculator formulas" work just as effectively as a calculator in practice.
Results in Excel 2003
In Excel 2003, the procedure uses two passes through the data. On the first pass, Excel 2003 calculates the sum and count of the data values, and from these it can calculate the sample mean (average). On the second pass, Excel finds the squared difference between each data point and the sample mean, and then sums these squared differences. In the numeric examples, even a high power of 10 in cell F12 does not affect the squared differences, and the results of the second pass are independent of the entry in cell F12. Therefore, Excel 2003 gives results that are numerically more stable.
A two-pass approach gives better numeric performance of the functions DVAR, DVARP, DSTDEV, and DSTDEVP and the functions VAR, VARP, STDEV, and STDEVP in Excel 2003 than in earlier versions of Excel. The Excel 2003 results are never less accurate than the results in earlier versions of Excel.
In most practical cases, however, you are not likely see a difference between Excel 2003 results and the results in earlier versions of Excel. Typical data is not likely to exhibit the unusual behavior that the earlier experiment illustrates. Numeric instability will most likely occur in earlier versions of Excel when the data contains many significant digits and little variation between data values.
The following procedure that finds the sum of squared deviations about a sample mean
Find the sample mean.
Calculate each squared deviation.
Sum the squared deviations.
is more accurate than the alternative procedure (the "calculator formula")
Find the sum of the squares of all the observations, the sample size, and the sum of all the observations.
Compute the sum of the squares of all the observations minus ((sum of all the observations)^2)/sample size)
Because Excel 2003 replaces the one-pass procedure with the two-pass procedure that finds the sample mean on the first pass and computes the sum of squared deviations about it on the second pass, many other functions are improved in Excel 2003. A short list of such improved functions includes VAR, VARP, STDEV, STDEVP, DVAR, DVARP, DSTDEV, DSTDEVP, FORECAST, SLOPE, INTERCEPT, PEARSON, RSQ, and STEYX. Excel 2003 has similar improvements in each of the three Analysis of Variance tools in the Analysis ToolPak.
For more information about DVAR, DVARP, DSTDEV, or DSTDEVP, click Microsoft Excel Help on the Help menu, type dvar, dvarp, dstdev, or dstdevp in the Search for box in the Assistance pane, and then click Start searching to view the topic.