Microsoft Excel 2004 for Mac informationThe statistical functions in Excel 2004 for Mac were updated by using the same algorithms that were used to update the statistical functions in Microsoft Office Excel 2003 and in later versions of Excel. Any information in this article that describes how a function works or how a function was modified for Excel 2003 and for later versions of Excel also applies to Excel 2004 for Mac.
For several ATP tools, numeric performance has been improved for Excel 2003 and for later versions of Excel because the tool calls an Excel statistical function that has been improved for Excel 2003 and for later versions of Excel. In cases where results are different for earlier versions of Excel and later versions of Excel, the values for Excel 2003 and for later versions of Excel are more accurate.
Most users will not notice a difference in results between different versions of Excel. This is because differences are generally caused by round-off errors that are significant only in extreme cases. However, this article must first point out one case where differences occur because of a wrong formula in Microsoft Excel 2002 and in earlier versions of Excel. Avoid the tool in those versions.
A second example involves an incorrect formula in Excel 2002 and in earlier versions of Excel that persists in Excel 2003 and in later versions of Excel. Avoid the ATP tool in this scenario for all versions of Excel.
First, avoid the Regression tool when you have to click to check the Constant is Zero check box. This has been corrected in Excel 2003 and in later versions of Excel. You do not have to avoid the Regression tool when the Constant is Zero check box is clear (the more typical case in practice).
Second, users of all versions of Excel should avoid the ATP t-Test: Paired Two Sample for Means tool unless you can guarantee that there are no missing data observations. The tool gives inappropriate answers (or no answers at all) if there are one or more missing observations.
For more information about the ATP Matched Pairs Two Sample t-Test tool, click the following article number to view the article in the Microsoft Knowledge Base:
Separate sections are provided later in this article for individual ATP tools. Tools that are not listed have not been affected by improvements in Excel 2003 and in later versions of Excel.
ANOVA: Single Factor, Two-Factor with Replication, and Two-Factor without ReplicationEach of these three ANOVA tools has been rewritten to upgrade the computational procedure to a two-pass algorithm that is more numerically robust. These improvements are similar to improvements in statistical functions that compute sums of squared deviations about a mean (for example: VAR, STDEV, SLOPE, PEARSON).
For more information about ATP ANOVA, click the following article number to view the article in the Microsoft Knowledge Base:
CorrelationThis tool has not been changed. However, there is a small difference between the Correlation tool and the Covariance tool that persists in all versions of Excel. The Correlation tool returns a lower triangular correlation table with 1's on the diagonal and correlations off the diagonal. The tool uses CORREL to compute off-diagonal entries and fills those entries with the value that is returned by CORREL. (Therefore, if any data entry changes, no entry in the table changes. Contrast this behavior with the behavior of Covariance.)
CovarianceThis tool returns a lower triangular covariance table with variances on the diagonal and covariances off the diagonal. Cells on the diagonal contain a formula "=VARP(...)" so that if a data entry changes, the result in the table also changes. VARP has been improved for Office Excel 2003 and for later versions of Excel.
For more information about VARP, click the following article number to view the article in the Microsoft Knowledge Base:
Descriptive StatisticsThis tool calls Excel statistical functions for everything that it computes. Because VAR and STDEV are improved for Excel 2003 and for later versions of Excel, different values are possible because of round-off errors in extreme cases.
For more information about VAR, click the following article number to view the article in the Microsoft Knowledge Base:
F-Test Two-Sample for VariancesLike the Descriptive Statistics tool, this tool calls VAR. Again, different values are possible because of round-off errors in extreme cases.
Random Number GenerationThis tool fills a range with random observations. Values of these observations are put directly in the cells so that these cell values are not recomputed and replaced with fresh observations when the sheet is recalculated. The built-in RAND function in Excel, on the other hand, replaces existing random numbers with new ones every time the sheet is recalculated. You can use RAND to preserve values. To do so, copy the results in a range, and then use the Paste Special command to paste values into the same range.
The Random Number Generation (RNG) tool also produces random observations from various probability distributions whereas RAND corresponds to the single option in the tool: Uniform with range between 0 and 1. This article describes how to combine RAND with the statistical functions in Excel to generate such observations.
Therefore, in terms of functionality, you can emulate the ATP random number tool by using RAND and some ingenuity. Sometimes this is useful, particularly when many random numbers are wanted.
For Excel 2002 and earlier, both the ATP random number generator and RAND were known to perform poorly on standard tests of randomness. Performance was poor because the length of a cycle before the sequence of pseudo-random numbers starts repeating was too short. This is an issue only when many random numbers are required.
RAND has been improved for Excel 2003 and for later versions of Excel so that RAND now passes all such standard tests. RAND's sequence of random numbers will start repeating itself after more than 1 trillion numbers are generated.
For more information about RAND, click the following article number to view the article in the Microsoft Knowledge Base:
The RNG tool provides random observations from various probability distributions and the Uniform[0,1], the distribution used for outputting random numbers through RAND. The ATP tool first draws a Uniform[0,1] random number (or more than one such number) and then converts the answer into an observation from one of the following specific distributions. For the benefit of those who prefer RAND because they are going to generate many observations, this article suggests formulas that use RAND in the table below. Following the table, there are some cautionary comments about the Analysis ToolPak's normal distribution case.
|Distribution||Excel formula using RAND()|
|Bernoulli(p)||=IF(RAND() <= p, 1, 0)|
|Binomial(n,p)||=CRITBINOM(n, p, RAND())|
|Normal(mu, sigma)||=NORMINV(RAND(), mu, sigma)|
|Patterned||Not really random|
|Uniform(low, high)||= low + (high – low) * RAND()|
For observations from a discrete distribution, assume that the values are in column B and that their probabilities are in column C. One would then want to fill each row of column A with the probability of observing a value strictly less than the value in column B in that row. Assuming that there were 10 values, assume that this data is in cells A1:C10. Then, because A1 contains the probability of observing a value strictly less than the first value, it must be set to 0. You can use VLOOKUP(RAND(), A1:C10, 2); the fourth argument to VLOOKUP is optional and must be omitted or set to TRUE. The "2" means that you want to return the value in the second column (column B in this example).
The ATP uses an adaptation of the method of generating Poisson observations in Press, W.H., S.A. Teukolsky, W. T. Vetterling, and B.P. Flannery, Numerical Recipes in C, The Art of Scientific Computing, 2nd ed., Cambridge University Press, 1992, pp. 293-295. There are two methods to take easy advantage of existing Excel functions.
The first uses the observation that a POISSON random variable with mean m has a distribution that is well approximated by a BINOMIAL(n, m/n) for large n. You can then call CRITBINOM(n, m/n, RAND()). Choice of n depends on m; n greater than 1,000 times m should be large enough.
The second relates the POISSON distribution to the Exponential. If events occur according to a POISSON process at rate m per unit time, the time between events has an Exponential distribution with mean 1/m. For a POISSON observation you can take a sequence of observations from this Exponential distribution and count how many of them occur before their sum exceeds 1. To obtain an observation from this Exponential distribution, use GAMMAINV(RAND(), 1, 1/m). This method would be suitable when m is relatively close to 0.
RegressionThe Regression tool calls Excel's LINEST. The article on LINEST describes extensive improvements for Excel 2003 and for later versions of Excel.
For more information about LINEST, click the following article number to view the article in the Microsoft Knowledge Base:
- Regression Sum of Squares, r squared, and f statistic values are always incorrect for the case where the regression is forced through the origin.
For LINEST, this means "third argument set to FALSE instead set to TRUE or omitted." For the ATP tool, it means "Constant is Zero check box is checked."
- LINEST and the ATP tool are insensitive to collinearity issues. The article on LINEST discusses the computational approach to LINEST in Excel 2003 and in later versions of Excel that is designed to find collinearity or near-collinearity when it exists and to act appropriately.
The following table shows the Regression tool's output for earlier versions of Excel and for later versions of Excel with the Constant is Zero check box selected. It illustrates the first shortcoming that is mentioned earlier. In earlier versions of Excel, the Regression sum of squares is negative as is the R Square value.
|Excel 2002 and earlier versions|
|Adjusted R Square||-20.9285714|
|Excel 2003 and later versions of Excel version|
|Adjusted R Square||0.401250823|
t-Test: Paired Two Sample for MeansAs mentioned earlier, avoid this tool if there is any chance of one or more missing data values. The prototypical application of this test is an experiment with measurements on subjects Before and After a treatment (such as weights Before and After a 60-day diet plan). If there are no missing observations, the tool will behave fine. If there are different numbers of missing Before and After observations, you receive an error message and the tool will not compute anything. If there are missing observations and the numbers of missing Before and After observations are equal, the tool will return answers that contain several errors.
Standard procedure is to remove a subject from the data if either the Before or After measurement is missing and to analyze the data that contains only those subjects that have both Before and After measurements. Excel's TTEST function handles missing data according to this standard procedure.
The other two t-Test tools, Two-Sample Assuming Equal Variances and Two-Sample Assuming Unequal Variances, do not share this defect.
z-Test: Two Sample for MeansThis article noted that the normal distribution case of the random number generation tool does not call the NORMSINV function (or more precisely, NORMINV that calls NORMSINV) but has its own inferior procedure for finding normal inverse values.
The z-Test tool does call the NORMSINV function and takes advantage of improvements for Excel 2003 and for later versions of Excel.
Results in earlier versions of ExcelThere are ATP tools whose performance has been improved for Excel 2003 and for later versions of Excel because they call Excel statistical functions that have been improved for Excel 2003 and for later versions of Excel. One of these improvements to LINEST, when its third argument is set to FALSE, implies that the ATP regression tool returns incorrect results in Excel 2002 and in earlier versions of Excel when the Constant is Zero check box is selected. In other cases where Excel functions have been improved, users of earlier versions are not likely to notice differences (most of these differences involve round-off errors in extreme situations).
The three ATP ANOVA tools have been improved by editing ATP code to substitute a more numerically robust algorithm (in the same spirit as the improvement in Excel's VAR). Users of these tools in earlier versions of Excel are likely to notice differences only in extreme situations.
Warning for users of all versions: avoid the t-Test: Paired Two Sample for Means if there is the slightest chance of missing data.
Results in Excel 2003 and in later versions of ExcelSignificant improvements have been made in Excel's statistical functions. This translates into improvements in many ATP tools that call these functions. There is one ATP tool, the random number generator, that does not take advantage of an improved RAND function (because it is implemented in a self-contained manner and does not call RAND). This is unfortunate, but even more unfortunate is the special case of normally distributed random observations. The inverse normal distribution is also implemented in a self-contained manner and does not call the much improved NORMSINV function.
The following table lists ATP tools and those Excel functions that they call that have been improved for Excel 2003 and for later versions of Excel. Readers are referred to separate articles on each Excel function that is called.
|ATP tool||Excel functions that are called|
|ANOVA: Single Factor||VAR, FINV|
|ANOVA: Two-Factor With Replication||VAR, FINV|
|ANOVA: Two-Factor Without Replication||VAR, FINV|
|Descriptive Statistics||STDEV, TINV, VAR|
|F-Test Two-Sample for Variances||VAR, FINV|
|Random Number Generation|
|Rank and Percentile|
|t-Test: Paired Two Sample for Means||VAR, PEARSON, TINV|
|t-Test: Two-Sample Assuming Equal Variances||VAR, TINV|
|t-Test: Two-Sample Assuming Unequal Variances||VAR, TINV|
|z-Test: Two Sample for Means||NORMSDIST, NORMSINV|