MSE of a linear regression line + prediction intervals
From
Alex van der Spek@21:1/5 to
All on Tue Aug 29 14:54:18 2023
I worked out a way to compute and plot the wide and narrow prediction
intervals of a linear regression line. A self contained working plot
script goes below.
I have two questions.
1. It seems as if the MSE could have been computed by the stats command.
It don't think it is so I put in a work around which is rather convoluted.
Am I overlooking something?
2. The shading between the prediction intervals doesn't clip correctly at
the plot boundaries. I believe this is a known issue for which no work
around exists? It would be a nice to have only, currently I just plot the
lines instead of a fill between curves (which is in the example code
below).
Thanks in advance,
Alex
+++++++++++++++++++++
#!/usr/bin/env gnuplot
#Make a Weibull plot of DC motor data taken from Philips databook C18
(1986)
#Clean slate
unset multiplot
unset out
reset
set term qt size 640,480
set encoding utf8
#Inline data N=20
$MDAT << EOF
#Benard Hours Exact
0.034 1750 0.0341
0.083 2000 0.0825
0.132 2525 0.1315
0.181 2600 0.1805
0.230 2815 0.2297
EOF
#Linear fit
stats $MDAT using (log10(column(2))):(log(-log(1-column(1))))
n = STATS_records
x_mean = STATS_mean_x; y_mean = STATS_mean_y
x_std = STATS_stddev_x; y_std = STATS_stddev_y
a = STATS_slope; b = STATS_intercept
a_err = STATS_slope_err; b_err = STATS_intercept_err
#Mean Square Error
set table $MSED
plot $MDAT using (($1)-(1-exp(-exp(a*log10($2)+b))))**2
unset table
stats $MSED using 2
MSE = STATS_sum / (n-2)
#Wide prediction intervals
set table $WPINT
set xrange [200:20000]
set samples 1000
t = 3.18
plot '+' using ($1):(1-exp(-exp(a*log10($1)+b))) + t * \
sqrt(MSE * (1+1/n+(log10($1)-x_mean)**2/(n*x_std**2))): \
(1-exp(-exp(a*log10($1)+b))) - t * \
sqrt(MSE * (1+1/n+(log10($1)-x_mean)**2/(n*x_std**2))) \
with filledcurves
unset table
#Narrow prediction intervals
set table $NPINT
set xrange [200:20000]
set samples 1000
t = 3.18
plot '+' using ($1):(1-exp(-exp(a*log10($1)+b))) + t * \
sqrt(MSE * (1/n+(log10($1)-x_mean)**2/(n*x_std**2))):\
(1-exp(-exp(a*log10($1)+b))) - t * \
sqrt(MSE * (1/n+(log10($1)-x_mean)**2/(n*x_std**2))) \
with filledcurves
unset table
#Compute B10, MTBF and FR via shape (k) and scale (l) parameters
log10exp1 = log10(exp(1))
k = a * log10(exp(1))
l = exp(-b / k)
#Quantile function for Weibull distribution
Q(p, k, l) = l * (-log(1-p))**(1.0/k)
#B10 hours, 90% lives longer
B10 = Q(0.1, k, l)
#Characteristic time and failure rate
CT = Q(1-exp(-1), k, l)
FR = 1 / CT
#Mean Time Between Failures
MTBF = l * gamma(1 + 1.0/k)
#Plot specifics
set samples 100
set grid x y mx
set tics out
set xlabel 'Operating hours'
set ylabel 'Median rank'
set key top left opaque
set style fill transparent solid 0.25
#Non linear, Weibull distributed y axis, x axis log
set yrange [0.01:0.99]
set xrange [1000:7000]
set nonlinear y via log(-log(1-y)) inverse 1-exp(-exp(y))
set logscale x
set ytics (0.01, 0.02, 0.05, 0.1, 0.2, 0.5, \
"1-e^{-1}" 0.632, 0.9, 0.99)
set xtics (1000, '' 1500, 2000, 3000, 4000, 5000, 6000, 7000)
set title sprintf("B_{10}=%.fh; MTBF=%.fh; FR = %.2f/1000h", B10, MTBF, FR*1000)
plot $WPINT using 1:2:3 with filledcurves lc 2 notitle, \
$NPINT using 1:2:3 with filledcurves lc 4 notitle, \
$MDAT using 2:1 with points pt 7 ps 2 lc 8 title 'Data', \
$MDAT using 2:3 with points pt 6 ps 2 lc 8 title 'Data', \
1-exp(-1) with lines dt 2 title '{/Symbol t}', \
0.1 with lines dt 3 title 'B_{10}', \
(1-exp(-exp(a*log10(x)+b))) with lines lc 7 lw 2 title 'Fit Weibull',
\
(1-exp(-exp(log(x/l)))) with lines lc 7 dt 3 title 'Fit Exponential'
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)