Hi:
How do we determine the minimal number of samples required for a
statistical experiment?
For example, we found on the internet that "N
< 30" is considered a set with a small number of samples. But how do
we decide if the number of samples is too small? For example, are 2,
3, ..., 10 samples too small? Why is that?
Any theory to support the
decision? Likewise, what is the theory behind that decides "N < 30" is
a set with a small number of samples?
Next, let's consider the number of metrics (e.g., accuracy and
specificifity) analyzed in the experiment. If we use too many metrics,
it would be considered that we are fishing the dataset. But again, how
do we determine the proper number of metrics analyzed in the
experiment?
Hi:For example, are 2, 3, ..., 10 samples too small? Why is that? Any theory to support the decision? Likewise, what is the theory behind that decides "N < 30" is a set with a small number of samples?
How do we determine the minimal number of samples required for a statistical experiment? For example, we found on the internet that "N < 30" is considered a set with a small number of samples. But how do we decide if the number of samples is too small?
Next, let's consider the number of metrics (e.g., accuracy and specificifity) analyzed in the experiment. If we use too many metrics, it would be considered that we are fishing the dataset. But again, how do we determine the proper number of metricsanalyzed in the experiment?
I'm about a month late to this party, but I have a couple of thoughts. See below.
On Tuesday, August 2, 2022 at 6:00:37 PM UTC-4, Cosine wrote:For example, are 2, 3, ..., 10 samples too small? Why is that? Any theory to support the decision? Likewise, what is the theory behind that decides "N < 30" is a set with a small number of samples?
Hi:
How do we determine the minimal number of samples required for a statistical experiment? For example, we found on the internet that "N < 30" is considered a set with a small number of samples. But how do we decide if the number of samples is too small?
Are you talking about the central limit theorem (CLT) and the so-called "rule of 30"? If so, remember that the shape of the sampling distribution of the mean depends on both the shape of the raw score (population) distribution and the sample size. Ifthe population of raw scores is normal, the sampling distribution of the mean will be normal for any sample size (even n=1, in which case, it will be an exact copy of the normal population distribution). How large n must be to ensure that the sampling
If this is what you were asking about, you may find some of the following discussion interesting:
https://stats.stackexchange.com/questions/2541/what-references-should-be-cited-to-support-using-30-as-a-large-enough-sample-siz
analyzed in the experiment?
Next, let's consider the number of metrics (e.g., accuracy and specificifity) analyzed in the experiment. If we use too many metrics, it would be considered that we are fishing the dataset. But again, how do we determine the proper number of metrics
You talk about accuracy and specificity. But I wonder if you are really just talking about having multiple dependent (or outcome) variables--i.e., the so-called multiplicity problem. If you are, I recommend two 2005 Lancet articles by Schulz andGrimes (links below). For me, they are two of the most thoughtful articles I have read on the multiplicity problem. HTH.
https://pubmed.ncbi.nlm.nih.gov/15866314/
https://pubmed.ncbi.nlm.nih.gov/15885299/
On Tue, 6 Sep 2022 12:56:51 -0700 (PDT), Bruce Weaver--- snip ---
<bwe...@lakeheadu.ca> wrote:
I'm about a month late to this party, but I have a couple of thoughts. See below.Bruce - This does not indicate that you saw the long reply from me.
I talked about power analysis; also, multipllicity problem. What you
add about normality is good. And the references.
On Wednesday, September 7, 2022 at 1:26:30 PM UTC-4, Rich Ulrich wrote:
On Tue, 6 Sep 2022 12:56:51 -0700 (PDT), Bruce Weaver--- snip ---
<bwe...@lakeheadu.ca> wrote:
I'm about a month late to this party, but I have a couple of thoughts. See below.Bruce - This does not indicate that you saw the long reply from me.
I talked about power analysis; also, multipllicity problem. What you
add about normality is good. And the references.
Hi Rich. I had seen your post, but clearly skimmed through it too quickly, because I missed that you had talked about multiplicity. Sorry about that.
Your comment in your later reply about writing up tests that were not in the proposal reminded me of this recent article, which I think is very good.
Hollenbeck, J. R., & Wright, P. M. (2017). Harking, sharking, and tharking: Making the case for post hoc analysis of scientific data. Journal of Management, 43(1), 5-18. https://journals.sagepub.com/doi/full/10.1177/0149206316679487
I don't know if that link will work for everyone, but it might.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (0 / 16) |
Uptime: | 167:39:21 |
Calls: | 6,735 |
Files: | 12,264 |
Messages: | 5,364,165 |