LINEST() (also SLOPE(), INTERCEPT(), VAR(), STDEV(), LOGEST(), TREND(), FORECAST(), etc.) uses a numerically unstable algorithm. With
challenging data (such as yours), rounding error has accumulated to the point that none of its calculations (slope, intercept, etc.) can be believed. In your case, you were lucky enough to get an impossible R^2,
so that it was obvious that there was a problem. There may still be a problem even with data that give more reasonable R^2 values. These
problems with Excel's algorithms have been well documented for years
(cf. Sawitzki, 1994, "Report on the reliability of data analysis
systems" Comput. Statist. Data Anal. 18:289-301) yet Microsoft continues
to ignore them.
Harlan Grove's matrix formulation simply recreates the same problem.
DEVSQ(), COVAR(), and CORREL() are the only 2nd moment functions in
Excel that are numerically reliable. For simple linear regression, use
the following formulas instead of LINEST(), SLOPE(), INTERCEPT(), RSQ(), STEYX(), etc.
slope = COVAR(y,x)/DEVSQ(x)*COUNT(y)
intercept = AVERAGE(y) - slope*AVERAGE(x)
rsq = CORREL(y,x)^2
SSreg = rsq*DEVSQ(y)
SSresid = (1-rsq)*DEVSQ(y)
df = COUNT(y)-2
F = SSreg/SSresid*df
steyx = SQRT(SSresid/df)
se1 = steyx/SQRT(DEVSQ(x))
seb = steyx/SQRT(1/COUNT(y)+AVERAGE(x)^2/DEVSQ(x))
This approach has the added advantage over LINEST that it allows missing values in the data range. However that cuts both ways, because they
will give a wrong answer if there are data pairs where only x or y (but
not both) are missing.
Similarly, for univariate statistics use the following formulas instead
of VAR(), VARP(), STDEV(), and STDEVP()
var = DEVSQ(x)/(COUNT(x)-1)
varp = DEVSQ(x)/COUNT(x)
stdev = SQRT(var)
stdevp = SQRT(varp)
Since Microsoft has already programmed routines that would be superior
to their unstable routines, it is puzzling why they continue to maintin redundant inferior code. The unstable formulas that Excel programed are mathematically exact (with infinite precision), so my formulas will
agree with the Excel functions for non-challenging data sets. When they disagree, the dedicated Excel functions are wrong.
There is no DEVSQA function, there is no hel for VARA(), VARPA(),
STDEVA(), or STDEVPA() other than doing those calculations manually.
If you are wedded to using LINEST(), then test to see if
STDEV(x) = SQRT(DEVSQ(x)/COUNT(x))
STDEV(y) = SQRT(DEVSQ(y)/COUNT(y))
PEARSON(y,x) = CORREL(y,x)
If all three of these are approximately true (say to at least 12 figures each), then LINEST() can probably be believed for simple linear
regression. Figuring out when LINEST() can be believed for more complex models is not so simple.
Jerry
Richard Nolan wrote:
Having used LINEST for Linear regression, I think
successfully a few times, I now have a data set that
returns an r2 value of -1.18, which is not possible. I can
look at the data and tell r2 must be +, not negative.
Are there two logic problems with LINEST. (a) r2 can never
greater than +/- 1, and (b) I can see the relationship is
+, not -.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 298 |
Nodes: | 16 (2 / 14) |
Uptime: | 12:40:31 |
Calls: | 6,677 |
Files: | 12,219 |
Messages: | 5,340,800 |