• Q problems with a small-sized sample, parametric ad non-parametric appr

    From Cosine@21:1/5 to All on Sat Jan 23 03:22:54 2021
    Hi:

    When we only have a small-sized sample, what comes out to our mind is to use a non-parametric statistical method. But does using a non-parametric method really solve the problem?

    Also, what are the drawbacks of solving a problem with a non-parametric method when the problem actually has some kind of distribution?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Cosine on Sat Jan 23 15:14:22 2021
    Cosine wrote:

    Hi:

    When we only have a small-sized sample, what comes out to our mind
    is to use a non-parametric statistical method. But does using a non-parametric method really solve the problem?

    It's all about assumptions. With a "non-parametric statistical method"
    you avoid the need to make assumptions about a particular
    distributional form, but you are still making assumptions, typically
    that you have observations that are statistically independent in some
    respect.


    Also, what are the drawbacks of solving a problem with a
    non-parametric method when the problem actually has some kind of distribution?

    Again, assumptions. If your assumption of a particular distribution
    happens to be true, then you have an analysis that is better (in some
    sense) than a non-parametric one. The opposite is true, If your
    assumption of a distribution is wrong then your analysis may be worse
    than a non-parametric one ... if the assumption is only slightly wrong
    then the analysis may still be better than a non-parametric one.

    If the task is estimation, analyses based on different sets of
    assumptions may have performances that differ in general two ways: (i)
    they both produce results that are "correct" but for one the results
    are more variable; (ii) the results for the one with incorrect
    assumptions are incorrect.

    If the task is "testing", the analysis with fewer incorrect assumptions
    will give a more powerful test. If assumptions are wrong, the test may
    not have the size (alpha) you think it has.

    A typical solution in small samples is to do both analyses and see if
    the results are radically different. With large samples there is more opportunity to do some assumption-checking.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cosine@21:1/5 to All on Sat Jan 23 08:28:24 2021
    While one is free to make any assumptions, the assumptions made need to be verified.

    But then how does one verify under such circumstance?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Sat Jan 23 14:44:50 2021
    On Sat, 23 Jan 2021 03:22:54 -0800 (PST), Cosine <asecant@gmail.com>
    wrote:

    Hi:

    When we only have a small-sized sample, what comes out to our mind is to use a non-parametric statistical method.

    I assume that when you are say "non-parametric statistical method,"
    you are referring to those methods based on ranks.

    What you say: That's unfortunate, but too often it is true. I think
    the idea was spread especially by psychologists who were
    looking for an easy "out", to avoid dealing with those stats-
    assumptions that they did not understand.

    Psychologists are notoriously weak in math, for reasons I don't
    know. My easiest A in college was in psy-stats; the final exam
    took me less than 10 minutes. Maybe it was less than 5.

    But does using a non-parametric method really solve the problem?

    In the 1980s, Conover provided a fine perspective. Using those
    rank-order tests is - effectively - performing a rank-transformation
    on the data, followed by the usual ANOVA. That's often the text-
    book prescription for the "large-sample" use of rank tests. If you
    have to take into account the text-book's approximated adjustments
    for "tied values", you can be /better/ off using transform-plus-ANOVA
    for the small samples, too.

    If the rank-transformed data are closer to "equal interval" for
    scores than the raw data is, then you get better test after the rank tranformation.


    Also, what are the drawbacks of solving a problem with a non-parametric method when the problem actually has some kind of distribution?

    The obvious problem to which ranking offers a quick fix is the
    presence of visible outliers. Those can screw up both the means
    and the variances.

    If you don't know anything at all about your data, including the
    likely distribution of scores, you should probably put off trying
    to make any sense of it until you learn something.

    If you think that the arithmetic average, the mean, ought to be
    meaningful, then you probably don't want the rank transform.
    Or any transform, if that holds for all likely samples. I've probably
    chosen to "winsorize" data (set an extreme to a moderated value)
    more often than I've taken rank-transformations as the tool.
    (But for winsorizing, I am speaking of large samples, weird scaling.)

    Where data comes from can imply that certain transforms should
    "bring in" the outliers, or otherwise fix the tails. For instance,
    take the square root for (Poisson) counts; take the logit for
    proportions; take the log for chemical traces in blood samples.

    The "drawbacks" of using the rank-transformation approach
    are (a) you throw away the mean and inter-group comparisons,
    and (b) you can get a weaker test.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Duffy@21:1/5 to Cosine on Mon Jan 25 09:37:14 2021
    Cosine <asecant@gmail.com> wrote:

    When we only have a small-sized sample, what comes out to our
    mind is to use a non-parametric statistical method. But does using a non-parametric method really solve the problem?


    You might like to read the 2017 review by Szekely and Rizzo _The Energy of Data_ (you can find it via Google Scholar) which discusses their
    particular general nonparametric approach, _and_ then the more recent
    papers citing that paper. Some relevant R packages are energy and HHG.
    The latter tests their nonparametric test when the true distributions
    are non-monotone:

    W, Diamond, Parabola, 2Parabolas, Circle, Cubic, Sine, Wedge, Cross,
    Spiral, Circles, Heavisine, Doppler, 5Clouds, 4Clouds.

    Standard ranks-based test do poorly against many of these, but the
    energy and Heller et al methods are OK.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Sun Jan 31 16:01:48 2021
    On Sat, 23 Jan 2021 08:28:24 -0800 (PST), Cosine <asecant@gmail.com>
    wrote:

    While one is free to make any assumptions, the assumptions made need to be verified.

    But then how does one verify under such circumstance?

    - Pay attention to what generates the numbers, on what sort
    of sample, and you often have a good idea about what
    distributuion to expect. Will anyone object to the assumption?

    Rather than what you wrote, I would say that assumptions
    need to be understood and accounted for.

    ANOVA is defined in terms of normal distributions.

    However, it is known to be robust (generally) to many sorts of
    violations of normality, owing to the Central Limit Theorem.
    Binary variables where proportions are between 20% and 80%
    are sufficiently "normal" for ANOVA when the Ns are not tiny.

    However, even a single /extreme/ outlier can screw up an ANOVA.
    Strong skewness can weaken the power, which is why I like
    power transformations (log, sq rt, reciprocol usually suffice).

    The assumption most often ignored by the users of rank-
    transformations is that two distributions being compared should
    be of the same kind, the same shape. - When that is true, it is
    also - often - true that a power transformation will provide
    better "equal intervals" than what you get from ranking, and
    pretty good normality.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)