Hi:
Often we want to build a model to predict the population. To do
that, we need to draw a set of samples and then determine the
parameters of the model in some sense, e.g., least-squares sense.
Having the model, we could use it to predict future outcomes.
However, as we are dealing with random variables, the obtained model parameters have uncertainty, i.e., their values would be different
when we draw another set of samples to determine them. Therefore, we
need to determine the confidence intervals of there parameters. Due
to the same reason, the future outcome of the model also needs such a confidence interval.
We have explicit expressions for these confidence intervals when we
use the linear least-squares model. The question is, how do we
determine these confidence intervals when using a model other than
the linear least-squares?
What if we use the method of cross-validation, e.g., the k-fold method?
Then we will have k sample values for each of the parameters and the predicted value.
We could then calculate the sample mean and standard error for each of them to build the corresponding confidence interval.
However, this requires the assumption that the parameter and predicted value are normal distributions or student distributions.
On Mon, 17 Apr 2023 20:45:18 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:
What if we use the method of cross-validation, e.g., the k-fold
method?
Then we will have k sample values for each of the parameters and
the predicted value.
We could then calculate the sample mean and standard error for each
of them to build the corresponding confidence interval.
However, this requires the assumption that the parameter and
predicted value are normal distributions or student distributions.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191021/
Here is a long article from a generally good site, discussing their
own proposal and earlier ones. They are using k-fold plus bootstrap,
and intend to remove the biases for parameter-estimates (and their
errors) inherent in the simple applications of k-fold or bootstrap.
In the early fraction of it that I read, it does mention CIs as
product.
In theory, if all the usual
assumptions apply, the best answers come from a single analysis of the >complete dataset. That one contemplates doing something else suggests
that there are worries about the assumptions: not having a fixed model
in mind, not having Gaussian random errors, or not having independence >between observations.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 182:21:27 |
Calls: | 6,738 |
Calls today: | 1 |
Files: | 12,265 |
Messages: | 5,365,774 |