You are currently offline, waiting for your internet to reconnect

For several ATP tools, numeric performance has been improved for Excel 2003 and for later versions of Excel because the tool calls an Excel statistical function that has been improved for Excel 2003 and for later versions of Excel. In cases where results are different for earlier versions of Excel and later versions of Excel, the values for Excel 2003 and for later versions of Excel are more accurate.

Most users will not notice a difference in results between different versions of Excel. This is because differences are generally caused by round-off errors that are significant only in extreme cases. However, this article must first point out one case where differences occur because of a wrong formula in Microsoft Excel 2002 and in earlier versions of Excel. Avoid the tool in those versions.

A second example involves an incorrect formula in Excel 2002 and in earlier versions of Excel that persists in Excel 2003 and in later versions of Excel. Avoid the ATP tool in this scenario for all versions of Excel.

First, avoid the Regression tool when you have to click to check the

Second, users of all versions of Excel should avoid the ATP t-Test: Paired Two Sample for Means tool unless you can guarantee that there are no missing data observations. The tool gives inappropriate answers (or no answers at all) if there are one or more missing observations.

For more information about the ATP Matched Pairs Two Sample t-Test tool, click the following article number to view the article in the Microsoft Knowledge Base:

829252 You may obtain incorrect results and misleading labels when you use the Analysis ToolPak t-Test in Excel

If you want to use this tool, and if there is missing data (or if there is even the chance of missing data), the TTEST function in Excel will handle it correctly.Separate sections are provided later in this article for individual ATP tools. Tools that are not listed have not been affected by improvements in Excel 2003 and in later versions of Excel.

For more information about ATP ANOVA, click the following article number to view the article in the Microsoft Knowledge Base:

829215 Description of numeric improvement in Analysis ToolPak ANOVA tools in Excel

For more information about VARP, click the following article number to view the article in the Microsoft Knowledge Base:

826393 Excel statistical functions: VARP

The Covariance tool uses COVAR to compute off-diagonal entries and fills those entries with the value that is returned by COVAR. Therefore if a data entry changes, the off-diagonal entries do not change.For more information about VAR, click the following article number to view the article in the Microsoft Knowledge Base:

826112 Excel statistical functions: VAR

The Random Number Generation (RNG) tool also produces random observations from various probability distributions whereas RAND corresponds to the single option in the tool: Uniform with range between 0 and 1. This article describes how to combine RAND with the statistical functions in Excel to generate such observations.

Therefore, in terms of functionality, you can emulate the ATP random number tool by using RAND and some ingenuity. Sometimes this is useful, particularly when many random numbers are wanted.

For Excel 2002 and earlier, both the ATP random number generator and RAND were known to perform poorly on standard tests of randomness. Performance was poor because the length of a cycle before the sequence of pseudo-random numbers starts repeating was too short. This is an issue only when many random numbers are required.

RAND has been improved for Excel 2003 and for later versions of Excel so that RAND now passes all such standard tests. RAND's sequence of random numbers will start repeating itself after more than 1 trillion numbers are generated.

For more information about RAND, click the following article number to view the article in the Microsoft Knowledge Base:

828795 Description of the RAND function in Excel

However, the ATP's separate random number generator was not upgraded. As in the version of RAND that is in Excel 2002 and in earlier versions of Excel, the ATP's separate random number generator is known to perform poorly on standard tests of randomness and to have a short repetition cycle. This has negative implications only if you require a very lengthy sequence of random numbers (for example, 1 million). The RNG tool provides random observations from various probability distributions and the Uniform[0,1], the distribution used for outputting random numbers through RAND. The ATP tool first draws a Uniform[0,1] random number (or more than one such number) and then converts the answer into an observation from one of the following specific distributions. For the benefit of those who prefer RAND because they are going to generate many observations, this article suggests formulas that use RAND in the table below. Following the table, there are some cautionary comments about the Analysis ToolPak's normal distribution case.

Distribution | Excel formula using RAND() |

Bernoulli(p) | =IF(RAND() <= p, 1, 0) |

Binomial(n,p) | =CRITBINOM(n, p, RAND()) |

Discrete | See below |

Normal(mu, sigma) | =NORMINV(RAND(), mu, sigma) |

Patterned | Not really random |

Poisson(mean) | See below |

Uniform(low, high) | = low + (high – low) * RAND() |

For observations from a discrete distribution, assume that the values are in column B and that their probabilities are in column C. One would then want to fill each row of column A with the probability of observing a value strictly less than the value in column B in that row. Assuming that there were 10 values, assume that this data is in cells A1:C10. Then, because A1 contains the probability of observing a value strictly less than the first value, it must be set to 0. You can use VLOOKUP(RAND(), A1:C10, 2); the fourth argument to VLOOKUP is optional and must be omitted or set to TRUE. The "2" means that you want to return the value in the second column (column B in this example).

The ATP uses an adaptation of the method of generating Poisson observations in Press, W.H., S.A. Teukolsky, W. T. Vetterling, and B.P. Flannery, Numerical Recipes in C, The Art of Scientific Computing, 2nd ed., Cambridge University Press, 1992, pp. 293-295. There are two methods to take easy advantage of existing Excel functions.

The first uses the observation that a POISSON random variable with mean m has a distribution that is well approximated by a BINOMIAL(n, m/n) for large n. You can then call CRITBINOM(n, m/n, RAND()). Choice of n depends on m; n greater than 1,000 times m should be large enough.

The second relates the POISSON distribution to the Exponential. If events occur according to a POISSON process at rate m per unit time, the time between events has an Exponential distribution with mean 1/m. For a POISSON observation you can take a sequence of observations from this Exponential distribution and count how many of them occur before their sum exceeds 1. To obtain an observation from this Exponential distribution, use GAMMAINV(RAND(), 1, 1/m). This method would be suitable when m is relatively close to 0.

For more information about LINEST, click the following article number to view the article in the Microsoft Knowledge Base:

828533 Description of the LINEST function in Excel

If you use Excel 2002 or an earlier version of Excel, note the same two shortcomings of the ATP Regression tool as of LINEST: - Regression Sum of Squares, r squared, and f statistic values are always incorrect for the case where the regression is forced through the origin.

For LINEST, this means "third argument set to FALSE instead set to TRUE or omitted." For the ATP tool, it means "**Constant is Zero**check box is checked." - LINEST and the ATP tool are insensitive to collinearity issues. The article on LINEST discusses the computational approach to LINEST in Excel 2003 and in later versions of Excel that is designed to find collinearity or near-collinearity when it exists and to act appropriately.

The following table shows the Regression tool's output for earlier versions of Excel and for later versions of Excel with the

X's | Y's | ||||

1 | 11 | ||||

2 | 12 | ||||

3 | 13 | ||||

Excel 2002 and earlier versions | |||||

SUMMARY OUTPUT | |||||

Regression Statistics | |||||

Multiple R | 65535 | ||||

R Square | -20.4285714 | ||||

Adjusted R Square | -20.9285714 | ||||

Standard Error | 4.629100499 | ||||

Observations | 3 | ||||

ANOVA | |||||

df | SS | MS | F | Significance F | |

Regression | 1 | -40.85714286 | -40.85714286 | -1.90666667 | #NUM! |

Residual | 2 | 42.85714286 | 21.42857143 | ||

Total | 3 | 2 | |||

Excel 2003 and later versions of Excel version | |||||

SUMMARY OUTPUT | |||||

Regression Statistics | |||||

Multiple R | 0.949342311 | ||||

R Square | 0.901250823 | ||||

Adjusted R Square | 0.401250823 | ||||

Standard Error | 4.629100499 | ||||

Observations | 3 | ||||

ANOVA | |||||

df | SS | MS | F | Significance F | |

Regression | 1 | 391.1428571 | 391.1428571 | 18.25333333 | 0.14637279 |

Residual | 2 | 42.85714286 | 21.42857143 | ||

Total | 3 | 434 |

Standard procedure is to remove a subject from the data if either the Before or After measurement is missing and to analyze the data that contains only those subjects that have both Before and After measurements. Excel's TTEST function handles missing data according to this standard procedure.

The other two t-Test tools, Two-Sample Assuming Equal Variances and Two-Sample Assuming Unequal Variances, do not share this defect.

The z-Test tool does call the NORMSINV function and takes advantage of improvements for Excel 2003 and for later versions of Excel.

The three ATP ANOVA tools have been improved by editing ATP code to substitute a more numerically robust algorithm (in the same spirit as the improvement in Excel's VAR). Users of these tools in earlier versions of Excel are likely to notice differences only in extreme situations.

Warning for users of all versions: avoid the t-Test: Paired Two Sample for Means if there is the slightest chance of missing data.

The following table lists ATP tools and those Excel functions that they call that have been improved for Excel 2003 and for later versions of Excel. Readers are referred to separate articles on each Excel function that is called.

ATP tool | Excel functions that are called |

ANOVA: Single Factor | VAR, FINV |

ANOVA: Two-Factor With Replication | VAR, FINV |

ANOVA: Two-Factor Without Replication | VAR, FINV |

Correlation | |

Covariance | |

Descriptive Statistics | STDEV, TINV, VAR |

Exponential Smoothing | |

F-Test Two-Sample for Variances | VAR, FINV |

Fourier Analysis | |

Histogram | |

Moving Average | |

Random Number Generation | |

Rank and Percentile | |

Regression | LINEST |

Sampling | RAND |

t-Test: Paired Two Sample for Means | VAR, PEARSON, TINV |

t-Test: Two-Sample Assuming Equal Variances | VAR, TINV |

t-Test: Two-Sample Assuming Unequal Variances | VAR, TINV |

z-Test: Two Sample for Means | NORMSDIST, NORMSINV |

Properties

Article ID: 829208 - Last Review: 01/22/2007 23:34:37 - Revision: 4.2

Microsoft Office Excel 2007, Microsoft Office Excel 2003, Microsoft Excel 2004 for Mac

- kbformula kbexpertisebeginner kbfunctions kbfuncstat kbinfo KB829208