• Q power for the analysis of the results

    From Cosine@21:1/5 to All on Wed Feb 1 23:05:48 2023
    Hi:

    Found that many academic journals would require the submission report the statistical significance (in terms of p-value or confidence interval) of the results; however, it seems less often that a journal requires reporting the statistical power of the
    results. Why is that?

    Should a "complete" always include both statistical significance (p-value or alpha) and power ( 1-beta )? What are the "practical" meaning for power analysis? Say, would it be possible that the results are not significant, but of high power? What are
    the practical meaning for this situation?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Cosine on Thu Feb 2 08:18:49 2023
    Cosine wrote:

    Hi:

    Found that many academic journals would require the submission
    report the statistical significance (in terms of p-value or
    confidence interval) of the results; however, it seems less often
    that a journal requires reporting the statistical power of the
    results. Why is that?

    Should a "complete" always include both statistical significance
    (p-value or alpha) and power ( 1-beta )? What are the "practical"
    meaning for power analysis? Say, would it be possible that the
    results are not significant, but of high power? What are the
    practical meaning for this situation?

    You should look into the similarities in the theory behind power
    analyses and confidence intervals. Specifically not approximate
    confidence intervals, but the approach where points in the confidence
    intervals are defined to be those not rejected by a significance
    test.So low power means a wide confidence interval.

    But power analyses may not be thought of as being part of the "results"
    of an experiment, but rather something that goes before the experiment,
    being used to help to decide on the design and sample size. After an
    initial experiment, you might do a power analysis to help you design
    the next one in terms of sample size, given that the power analysis
    will require assumptions or estimates of the sampling variation
    inherent in the experimental procedure.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Duffy@21:1/5 to David Jones on Thu Feb 2 23:38:19 2023
    David Jones <dajhawkxx@nowherel.com> wrote:
    Cosine wrote:

    Hi:

    Found that many academic journals would require the submission
    report the statistical significance (in terms of p-value or
    confidence interval) of the results; however, it seems less often
    that a journal requires reporting the statistical power of the
    results. Why is that?

    Should a "complete" always include both statistical significance
    (p-value or alpha) and power ( 1-beta )? What are the "practical"
    meaning for power analysis? Say, would it be possible that the
    results are not significant, but of high power? What are the
    practical meaning for this situation?

    You should look into the similarities in the theory behind power
    analyses and confidence intervals. Specifically not approximate
    confidence intervals, but the approach where points in the confidence intervals are defined to be those not rejected by a significance
    test.So low power means a wide confidence interval.

    But power analyses may not be thought of as being part of the "results"
    of an experiment, but rather something that goes before the experiment,
    being used to help to decide on the design and sample size. After an
    initial experiment, you might do a power analysis to help you design
    the next one in terms of sample size, given that the power analysis
    will require assumptions or estimates of the sampling variation
    inherent in the experimental procedure.

    There are a few papers now on "posterior power" or "reproducibility probability" where "the power is estimated only after
    a statistical test has been performed, in order to evaluate the
    reproducibility of the test result", including:

    Goodman SN, 1992. A comment on replication, p-values and
    evidence. Statistics in Medicine. 11, 875-879.
    De Martini D, 2008. Reproducibility Probability Estimation for Testing
    Statistical Hypotheses. Statistics & Probability Letters. 78, 1056-1061. Boos DD, Stefanski LA, 2014. P-Value Precision and Reproducibility. The
    American Statistician, 65:4, 213-221

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Fri Feb 3 16:00:16 2023
    On Wed, 1 Feb 2023 23:05:48 -0800 (PST), Cosine <asecant@gmail.com>
    wrote:

    Hi:

    Found that many academic journals would require the submission
    report the statistical significance (in terms of p-value or
    confidence interval) of the results; however, it seems less often
    that a journal requires reporting the statistical power of the
    results. Why is that?

    If you found something, obviously you had enough power.

    In the US, the granting agencies of NIH want to hear what
    you have to say about power, to justify giving you money.

    I remember a few things relevant about power and journals.

    1970s - my stats professor told the class that The New England
    Journal of Medicine specified, 'Use /no/ p-levels' -- in an article
    he co-authored, reporting the results of a health survey of 30,000
    people. Anything big enough to be interesting would be 'significant'.

    A number of non-interesting things also would be significant, at 0.05.
    Years later, I analyzed a data set of similar size. I convinced the
    PI that the F-tests of 245 and 350 were the ones that were
    interesting. There were some ANOVA 2-way interactions that
    that were p < 0.05 which were uninteresting -- some of them
    were the consequence of 'non-linear scaling' across 3 or 4 points
    on a rating scale, rather than any inherent interaction on the
    latent dimension being measured. So, we only reported p< .001,
    and (also) carefully dwelt on Effect sizes.

    In the opposite direction -- In one study, we did report a
    predicted one-tailed result at p < 0.05 for an interaction. The good
    journal we submitted to accepted our 'one-tailed test' (both
    'one-tailed' and 'interaction' suggest ugly data-dredging) only
    because we could point to it as one of the (few) tests specified in
    advance in our research proposal.

    I liked the European standard that I heard of, long ago --
    I don't know how wide-spread it is/was -- They reported the
    "minimum N for which the test would be significant." I think
    that people who are not statisticians can relate to this,
    more easily that saying p < .05 and p< .001 or exact numbers.
    An experimantal result that would be significant (.05) with N=10
    is huge; one that requires N=500 is small.

    (By the way, epidemiology results often /require/ huge N's
    because of small effects as measured by Variance; that's why
    their Effect sizes are reported as Odds ratios. 'Effect size' based
    on N or power or p-level do not work well for rare outcomes.)


    Should a "complete" always include both statistical significance
    (p-value or alpha) and power ( 1-beta )? What are the "practical"
    meaning for power analysis? Say, would it be possible that the
    results are not significant, but of high power? What are the
    practical meaning for this situation?

    As suggested elsewhere, high power gives a narrow Confidence
    limit for the size of the actual effect in the experiment. Usually,
    "very close to zero difference in means."

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to Rich Ulrich on Sat Feb 4 11:02:35 2023
    On Friday, February 3, 2023 at 4:00:24 PM UTC-5, Rich Ulrich wrote:

    If you found something, obviously you had enough power.

    Rich, a former boss of mine made a statement similar to yours in a stats book he co-authored, and I challenged it (using simulation) in this short presentation:

    https://www.researchgate.net/publication/299533433_Does_Statistical_Significance_Really_Prove_that_Power_was_Adequate

    Cheers,
    Bruce

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to bweaver@lakeheadu.ca on Sun Feb 5 14:52:18 2023
    On Sat, 4 Feb 2023 11:02:35 -0800 (PST), Bruce Weaver
    <bweaver@lakeheadu.ca> wrote:

    On Friday, February 3, 2023 at 4:00:24 PM UTC-5, Rich Ulrich wrote:

    If you found something, obviously you had enough power.

    Rich, a former boss of mine made a statement similar to yours in a stats book he co-authored, and I challenged it (using simulation) in this short presentation:

    https://www.researchgate.net/publication/299533433_Does_Statistical_Significance_Really_Prove_that_Power_was_Adequate


    Okay, you are right -- mine was a careless statement.

    Saying 'enough power' misleads the non-statistician reader,
    who might be tempted to replicate. Pointing to 'luck' is not
    a bad idea.

    I usually tried this --
    When a study shows something /barely/ at 0.05, then the
    power for replication is pretty close to 50% and not the
    80% to 95% that most grant applications like to show.

    That's the simple logic of saying, "Any replication will come
    up weaker or stronger. If you are right at 0.05 in the first
    place, then Weaker or Stronger is equally likely -- 50% power,
    by definition.

    IIRC, one rule of thumb I used was that to achieve a 5%
    chisquared (1 d.f.), value about 4.0, the Expected effect-test
    that yields 80% power was the one that gives X2 = 8.0; so,
    twice the N, since X2 (2x2 contingency table, etc) is linear in N.
    (IIRC. All from memory.)

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)