• Q sources of, identification of and remedial for insignificance

    From Cosine@21:1/5 to All on Fri Jan 27 10:01:58 2023
    Hi:

    What are the potential sources of statistical insignificance?

    What are the potential means to improve it?

    Yes, it might simply reflect the true, i.e., the the new drug is no better than the traditional one. However, it might due to inaccurate measurement or even the design of experiment, e.g., not using a paired comparison.

    How do we identify the factors actually contributing to the insignificance obtained?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Fri Jan 27 15:26:40 2023
    On Fri, 27 Jan 2023 10:01:58 -0800 (PST), Cosine <asecant@gmail.com>
    wrote:

    Hi:

    What are the potential sources of statistical insignificance?

    What are the potential means to improve it?

    Yes, it might simply reflect the true, i.e., the the new drug is no better than the traditional one. However, it might due to inaccurate measurement or even the design of experiment, e.g., not using a paired comparison.

    How do we identify the factors actually contributing to the insignificance obtained?

    I can get an interesting question out of this, beyond agreeing
    that, Yes, you should use the right test.

    Why do the data look wrong? -- Who cleaned it?

    Fifty years ago, with info provided on 80-column cards,
    there were more sources of foul-up than there are today.

    1960s, a friend worked on data for the US magazine, Consumer
    Reports, for auto repair expenses for used cars. The first check
    the investigator ran was to see if Corvettes had their notorious
    high cost; they did not. SO, the data were bad. It turned
    out that the two-card format was not properly sorted, and
    numbers half the time did not represent the proper card-and
    column; and car-ID was seldom associated with its data.

    Cards also allowed the error of us describing the wrong columns,
    and of errors that shifted the names by one (or more). Fortran-
    style formats leave the analysis open to programmer errors that
    were often detected by seeing bad values; or too many 'Missings'.

    Back before there was on-line data-entry which checked for errors,
    the data cleaning was easily 90% of the time needed for "data
    analysis." Even with relatively clean entry assured, I always
    started by analyses by looking at univariate distributions of
    EVERYTHING, to check for invalid values or outliers that would
    screw up assumptions ('equal intervals') and tests.

    Given real values? Outliers can be interesting. If they screw up
    your testing, they need special handling.

    OUTLIERS.
    Ordinary 'high cholesterol' is in the hundreds. Do not put in your
    study, as an ordinary case, the subject who is expected to die
    if untreated before age 45 because their CHO is 4000.

    Detecting the hole in the ozone layer, southern hemisphere, was
    slowed by the computer-guided deletion of 'extreme values' from
    the regular reports from the satellites. Then someone looked at the
    raw values and took them seriously.

    I consulted for a PI who collected data from heart beat monitors
    that were strapped on over the chest, and which recorded while
    the phobic subjects went shopping. A few recordings had stretches
    of values that were HIGH, sometimes over 200 and likely to be
    wrong. I eventually produced correlations with 'everything' and
    that revealed an association with weight: It turned out that the
    straps in size-regular did not tighten up enough for subjects who
    were size-small, and COULD produce counts that were doubled.
    (Their manual did not mention that.) The PI had saved money
    by not buying a harness in the small size.

    DATA DREDGING?
    Last year, I read about a study that leaned towards an undesirable
    degree of 'data-dredging' -- not finding what they expected in their
    large sample, the PIs pursued detailed analyses of subsamples based
    on not-quite-significant interactions. If I recall correctly, the
    first subsample, selected by age, also failed to find produce
    'significance' and they chased another minor interaction. They
    eventually achieved 'significance'.

    What justified these steps in the eyes of the reviewer is that the
    PIs had, from the start, a particular biological hypothesis which
    corresponded to those interactions.

    Hope this was interesting.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)