Forum: >>> Magnum BBS <<<

Q confidence intervals for model parameters and future predictions

From Cosine@21:1/5 to All on Sun Apr 16 15:06:48 2023

Hi:

Often we want to build a model to predict the population. To do that, we need to draw a set of samples and then determine the parameters of the model in some sense, e.g., least-squares sense. Having the model, we could use it to predict future outcomes.
However, as we are dealing with random variables, the obtained model parameters have uncertainty, i.e., their values would be different when we draw another set of samples to determine them. Therefore, we need to determine the confidence intervals of
there parameters. Due to the same reason, the future outcome of the model also needs such a confidence interval.

We have explicit expressions for these confidence intervals when we use the linear least-squares model. The question is, how do we determine these confidence intervals when using a model other than the linear least-squares?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Jones@21:1/5 to Cosine on Mon Apr 17 17:42:45 2023

Cosine wrote:

Hi:

Often we want to build a model to predict the population. To do
that, we need to draw a set of samples and then determine the
parameters of the model in some sense, e.g., least-squares sense.
Having the model, we could use it to predict future outcomes.
However, as we are dealing with random variables, the obtained model parameters have uncertainty, i.e., their values would be different
when we draw another set of samples to determine them. Therefore, we
need to determine the confidence intervals of there parameters. Due
to the same reason, the future outcome of the model also needs such a confidence interval.

We have explicit expressions for these confidence intervals when we
use the linear least-squares model. The question is, how do we
determine these confidence intervals when using a model other than
the linear least-squares?

The question is answered by the theory of maximum likelihood. You might
find the details already worked-out for some specific models.In
particular, see https://en.wikipedia.org/wiki/Generalized_linear_model

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Cosine@21:1/5 to All on Mon Apr 17 20:45:18 2023

What if we use the method of cross-validation, e.g., the k-fold method?

Then we will have k sample values for each of the parameters and the predicted value.

We could then calculate the sample mean and standard error for each of them to build the corresponding confidence interval.

However, this requires the assumption that the parameter and predicted value are normal distributions or student distributions.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to All on Tue Apr 18 00:54:56 2023

On Mon, 17 Apr 2023 20:45:18 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

What if we use the method of cross-validation, e.g., the k-fold method?

Then we will have k sample values for each of the parameters and the predicted value.

We could then calculate the sample mean and standard error for each of them to build the corresponding confidence interval.

However, this requires the assumption that the parameter and predicted value are normal distributions or student distributions.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191021/

Here is a long article from a generally good site, discussing their
own proposal and earlier ones. They are using k-fold plus bootstrap,
and intend to remove the biases for parameter-estimates (and their
errors) inherent in the simple applications of k-fold or bootstrap.

In the early fraction of it that I read, it does mention CIs as
product.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Jones@21:1/5 to Rich Ulrich on Tue Apr 18 08:33:45 2023

Rich Ulrich wrote:

On Mon, 17 Apr 2023 20:45:18 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

What if we use the method of cross-validation, e.g., the k-fold
method?

Then we will have k sample values for each of the parameters and
the predicted value.

We could then calculate the sample mean and standard error for each
of them to build the corresponding confidence interval.

However, this requires the assumption that the parameter and
predicted value are normal distributions or student distributions.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191021/

Here is a long article from a generally good site, discussing their
own proposal and earlier ones. They are using k-fold plus bootstrap,
and intend to remove the biases for parameter-estimates (and their
errors) inherent in the simple applications of k-fold or bootstrap.

In the early fraction of it that I read, it does mention CIs as
product.

Some of the ideas here relate to the now-old idea of balanced
bootstrapping: see
https://mathweb.ucsd.edu/~ronspubs/90_09_bootstrap.pdf
for example.

I have seen early work on cross-validation for model-selection in
multiple regression where a typical suggestion was to work with
leaving-out 20% of the samples at a time, but that may relate to the
context of overall sample-size and having data that is not from
designed experiments.

But the joint questions "balance" and of "designed experiments" raises
the question of whether any of the considerations of partially-balanced factorial designs can be employed or extended so as to provide a scheme
to provide slices of the data for treating as units in some
cross-validation or other analysis.

The OP says "However, this requires the assumption that the parameter
and predicted value are normal distributions or student distributions."
This may indicate that the plan would be to do multiple analyses on
small sections of the data, in contrast to doing multiple analyses on nearly-complete versions of the data where only a small part is
left-out each time. The possible benefits of either approach would
depend on what is being attempted. In theory, if all the usual
assumptions apply, the best answers come from a single analysis of the
complete dataset. That one contemplates doing something else suggests
that there are worries about the assumptions: not having a fixed model
in mind, not having Gaussian random errors, or not having independence
between observations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Tue Apr 18 18:13:38 2023

On Tue, 18 Apr 2023 08:33:45 -0000 (UTC), "David Jones" <dajhawkxx@nowherel.com> wrote:

In theory, if all the usual
assumptions apply, the best answers come from a single analysis of the >complete dataset. That one contemplates doing something else suggests
that there are worries about the assumptions: not having a fixed model
in mind, not having Gaussian random errors, or not having independence >between observations.

Nicely put.

"All the usual assumptions" must include having the proper
model, scales of measurement, and suitable sample.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Sat May 11 13:43:06 2024
  from Wales, Uk via Telnet
- Keyop
  Sun May 12 20:08:49 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun May 12 20:08:37 2024
  from Huddersfield, West Yorkshire via SSH
- Guest
  Mon May 13 05:37:58 2024
  from Any State via RLogin

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	182:21:27
Calls:	6,738
Calls today:	1
Files:	12,265
Messages:	5,365,774

Q confidence intervals for model parameters and future predictions

Who's Online

Recent Visitors

System Info