This article describes the FORECAST function in Microsoft Excel. This article discusses how the function is used. This article also compares the results of FORECAST in Microsoft Office Excel 2003 and in later versions of Excel with the results of FORECAST in earlier versions of Excel.
The FORECAST(x, known_y's,known_x's) function returns the predicted value of the dependent variable (represented in the data by known_y's) for the specific value, x, of the independent variable (represented in the data by known_x's) by using a best fit (least squares) linear regression to predict y values from x values.
If you assume that data pairs are plotted in a scatter plot with x values that are measured on the horizontal axis and with y values that are measured on the vertical axis, FORECAST returns the height of the best fit regression line at the specific value x on the horizontal axis. FORECAST is the value of y that would be predicted based on both the value of x and the regression line (characterized by its slope and intercept that can be found by using Excel's SLOPE and INTERCEPT functions).
The parameter x must have a numeric value, known_y's and known_x's must be arrays or cell ranges that contain equal numbers of numeric data values.
The most common usage of FORECAST includes a specific x value plus 2 ranges of cells that contain the data, such as FORECAST(125, A1:A100, B1:B100).
Example of usage
To illustrate the FORECAST function, follow these steps:
- Create a blank Excel worksheet, and then copy the following table.
|1||= 3 + 10^$D$3||Power of 10 to add to data|
|2||=4 + 10^$D$3||0|
|3||=2 + 10^$D$3||Chosen x|
|4||=5 + 10^$D$3||6|
|5||=4+10^$D$3||Chosen x + Power of 10|
|6||=7+10^$D$3||=D5 + 10^$D$3|
|=SLOPE(A2:A7,B2:B7)||Excel 2002 and earlier|
|=INTERCEPT(A2:A7,B2:B7)||when D3 = 7.5|
|=A10 + A9*D7||when D3 = 8|
- Select cell A1 in your blank Excel worksheet, and then paste the entries so that the table fills cells A1:D13 in your Excel worksheet.
- Click the Paste Options button, and then click Match Destination Formatting. With the pasted range still selected, use one of the following procedures, as appropriate for the version of Excel that you are running:
- In Microsoft Office Excel 2007, click the Home tab, click Format in the Cells group, and then click AutoFix Column Width.
- In Excel 2003 and in earlier versions of Excel, point to Column on the Format menu, and then click AutoFit Selection.
You may want to format cells B2:B7 as Number with 0 decimal places, and cells A9:D13 as Number with 6 decimal places.
Cells A2:A7 and B2:B7 contain both the y-values and the x-values that are used to call FORECAST in cell A10. The x value is in cell D7.
If you have a version of Excel that is earlier than Excel 2003, FORECAST may exhibit round-off errors in the earlier versions. This behavior of FORECAST has been improved for Excel 2003 and for later versions of Excel. FORECAST(x, known_y's,known_x's) is the result that is obtained when the following expression is evaluated:
INTERCEPT(known_y's,known_x's) + SLOPE(known_y's,known_x's) * x.
While the code for FORECAST has not been directly changed for Excel 2003 and for later versions of Excel, the behavior of FORECAST is improved as a result of improved code for SLOPE. The code for INTERCEPT has also not been directly changed, but it is improved also, because it also calls SLOPE.
In versions of Excel that are earlier than Excel 2003, FORECAST may exhibit round-off errors. If you add a positive constant to each of the observations in B2:B7, and add that same constant to x at the same time, it should not affect the value of FORECAST. If you were to plot x,y pairs with x on the horizontal axis and with y on the vertical axis, the data would just shift to the right if you add a positive constant to each x value. If x is shifted to the right by the same amount, the value of FORECAST should be unaffected. The best fit regression line would still have the same slope. However, the shifted data would have a different intercept.
With the default value of 0 in D3, SLOPE in A9 is 0.775280899. Cells A10:A11 show values of INTERCEPT and FORECAST. Cell A12 contains the following expression that is evaluated when you calculate FORECAST:
INTERCEPT(known_y's,known_x's) + SLOPE(known_y's,known_x's) * x
Values in cells A11 and A12 will always agree because the value in A12 is exactly what FORECAST returns. SLOPE should not vary as you add different positive constants to the known_x's.
If you increase the value in D3, a larger constant is added to B2:B7, and that same constant is added to the first argument of FORECAST. If D3 <= 7, there are no round-off errors that appear in the first six decimal places of SLOPE. As a result, INTERCEPT and FORECAST are well-behaved. Now change the value in D3 to 7.25, 7.5, 7.75, and 8. You will notice that SLOPE in A9 changes. This causes variations in the values in cells A11:A12. These values should remain constant throughout the experiment.
Cells D7:D13 show that the values that are returned by FORECAST and the values that should have been returned by FORECAST had SLOPE (and INTERCEPT) not changed. These pairs of values are shown for the cases where D3 = 7.5 and 8, respectively.
Note that round-off errors have become so severe that division by 0 occurs when D3 = 8.
Earlier versions of Excel exhibit incorrect answers in these cases because the effects of round-off errors are more profound with the computational formula that is used by these versions. Still, the cases used in this experiment could be viewed as extreme.
If you have Excel 2003 or a later version of Excel, there are not any changes in the common values in A11 and A12 if you try the experiment. However, cells D7:D13 show round-off errors that you would have obtained with earlier versions of Excel.
Results in earlier versions of Excel
The article for SLOPE describes the less numerically robust formula that is used in earlier versions of Excel. It requires only one pass through the data. It is the shortcomings of SLOPE in these earlier versions of Excel that cause FORECAST to also exhibit round-off errors in extreme cases.
Results in Excel 2003 and in later versions of Excel
SLOPE is calculated by an improved procedure in Excel 2003 and in later versions of Excel, and this improves the performance of FORECAST. This procedure requires two passes through the data. Again, the article on SLOPE describes the improvement.
Replacing a one-pass approach with a two-pass approach guarantees better numeric performance of SLOPE in Excel 2003 and in later versions of Excel, and this translates into a better performance of FORECAST. The results for Excel 2003 and for later versions of Excel will never be less accurate than results in earlier versions of Excel.
However, you are not likely to see a difference between the results in later versions of Excel and the results in earlier versions of Excel in most practical examples. This occurs because typical data is unlikely to exhibit the kind of unusual behavior that this experiment illustrates. Numeric instability is most likely to appear in earlier versions of Excel when the data contains a high number of significant digits combined with relatively little variation between data values.
The procedure of finding the sum of squared deviations about a sample mean has been improved in Excel 2003 and in later versions of Excel. The improved procedure performs the following actions:
- Finds the sample mean
- Computes each squared deviation
- Sums the squared deviations
This procedure is more accurate than the alternative procedure. The alternative procedure was frequently named the "calculator formula" because it was suitable for use of a calculator on a small number of data points. The alternative procedure would perform the following actions:
- Find the sum of squares of all observations, the sample size, and the sum of all observations
- Compute the sum of squares of all observations minus ((sum of all observations)^2)/sample size)
There are many other functions that have been improved for Excel 2003 and for later versions of Excel because this latter one-pass procedure is replaced by the two-pass procedure. The two-pass procedure finds the sample mean on the first pass and computes the sum of squared deviations about it on the second pass.
The following list is a list of the functions that have been improved in Excel 2003 and in later versions of Excel in this way:
Similar improvements were made in each of the three Analysis of Variance tools in the Analysis ToolPak.