Article ID: 828515 - View products that this article applies to.
The purpose of this article is to describe the HYPGEOMDIST function in Microsoft Office Excel 2003 and later, to illustrate how the function is used, and to compare the results of the HYPGEOMDIST function for Excel 2003 and later with the results of the HYPGEOMDIST function in earlier versions of Excel.
Microsoft Excel 2004 for Macintosh InformationThe statistical functions in Excel 2004 for Macintosh were updated using the same algorithms as Microsoft Office Excel 2003 and later versions of Excel. Any information in this article that describes how a function works or how a function was modified for Excel 2003 and later versions of Excel also applies to Excel 2004 for Macintosh.
The HYPGEOMDIST(sample_s, number_sample, population_s, number_population) function returns the probability of exactly sample_s successes in a sample of size number_sample that is drawn from a population of size number_population that is known to contain exactly population_s successes.
Note All four arguments are non-negative integers, sample_s is less than or equal to population_s, population_s is less than number_population, and number_sample is less than number_population.
Example UsageThere are fifty states in the U.S. and fifty state governors. Assume twenty-two governors are Democrats and twenty-eight governors are Republicans. If you create a committee of seven governors and you randomly select the governors who sit on the committee, you can use the HYPGEOMDIST function to answer the following questions:
Collapse this tableExpand this table
Cells B2:B9 show the probabilities of exactly d Democrats on the committee for d = 0 to 7.
The HYPGEOMDIST Help file provides a formula to compute HYPGEOMDIST in Excel 2003 and later versions of Excel (where COMBIN(n, k) returns the number of combinations of size k in a population of size n):
There is no cumulative version of HYPGEOMDIST. The cumulative probabilities are in column C of the worksheet. Therefore, the answers to the two previous questions are in cells B5 and C5, respectively.
Results in Earlier Versions of ExcelKnusel (see the "References" section) documented instances where HYPGEOMDIST does not return a numeric answer and instead yields #NUM! because of a numeric overflow. When HYPGEOMDIST returns numeric answers, the answers are correct. Therefore, HYPGEOMDIST does not suffer from round off problems. However, in some extreme cases, HYPGEOMDIST returns #NUM! when it should be able to return an answer. The following functions also exhibit this behavior in earlier versions of Excel:
(http://support.microsoft.com/kb/827459/ )Excel Statistical Functions: BINOMDIST
(http://support.microsoft.com/kb/828117/ )Excel Statistical Functions: CRITBINOM
(http://support.microsoft.com/kb/828130/ )Excel Statistical Functions: POISSON
(http://support.microsoft.com/kb/828361/ )Excel Statistical Functions: NEGBINOMDIST
#NUM! is returned only when COMBIN overflows. COMBIN is called three times in the previous formula for HYPGEOMDIST. This overflow occurs only if the first argument of COMBIN is greater than or equal to 1,030. There are no computational problems as long as this argument is less than 1,030. Because number_population is the largest first argument to COMBIN, HYPGEOMDIST returns correct results if number_population is less than 1,030.
Results in Excel 2003 and later versions of ExcelBecause an overflow causes HYPGEOMDIST to return #NUM! and HYPGEOMDIST is well-behaved when overflow does not occur, Microsoft has implemented a conditional algorithm in Excel 2003 and later versions of Excel. The conditional algorithm uses HYPGEOMDIST code from earlier versions of Excel (the computational formula that involves COMBIN that is mentioned previously) when number_population is less than 1,030. In cases where number_population is greater than or equal to 1,030, Microsoft has implemented an alternative plan. Pseudocode is provided in the "Appendix" section in this article. The alternative plan for HYPGEOMDIST is in the same spirit as the alternative plan for BINOMDIST, CRITBINOM, and NEGBINOMDIST. The plan is used to avoid the evaluation of COMBIN with a first argument that is greater than or equal to 1,030.
ConclusionsInaccuracies in earlier versions of Excel occur only when the HYPGEOMDIST formula results in a call to COMBIN with a first argument that is greater than or equal to 1,030. In these cases, HYPGEOMDIST returns #NUM! in earlier versions of Excel because COMBIN overflows.
The following functions also exhibit similar behavior in earlier versions of Excel:
AppendixThe following variables are used in the pseudocode:
Step 0: Initialization. Initialize both TotalUnscaledProbability and UnscaledResult to 0. Initialize the constant EssentiallyZero to a very small number, such as 10^(-12).
Step 1: Find M*n/N and round down to the nearest whole number, m.
Step 2: Calculate the unscaled probabilities for I is greater than m:
Step 3: Calculate unscaled probabilities for I is less than m:
Step 4: Combine the unscaled results:
Knusel, L. "On the Accuracy of Statistical Distributions in Microsoft Excel 97." Computational Statistics and Data Analysis, 1998, 26, 375-377.