• Lund's Test & Data Transformations

    From Bruce Weaver@21:1/5 to All on Fri Dec 11 14:35:11 2015
    On Friday, December 11, 2015 at 4:07:08 PM UTC-5, Ilovestats!! wrote:
    Hi,

    My data does not follow a normal distribution. I wanted to run a Lund's Test first to remove outliers from my data then transform my data. Is it a good idea to run both procedures together on my data? Or would I just run one procedure?

    Thanks!

    No real data follows a Normal distribution--see George Box's famous article "Science and Statistics", for example (http://mkweb.bcgsc.ca/pointsofsignificance/img/Boxonmaths.pdf). But that does not necessarily mean transformation is required. Back up a
    couple steps and give us a bit more information and context. What variables do you have, and what analyses are you wanting to carry out?

    HTH.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to lucia.costanzo47@gmail.com on Fri Dec 11 22:33:01 2015
    On Fri, 11 Dec 2015 13:07:06 -0800 (PST), "Ilovestats!!" <lucia.costanzo47@gmail.com> wrote:

    Hi,

    My data does not follow a normal distribution. I wanted to run a Lund's Test first to remove outliers from my data then transform my data. Is it a good idea to run both procedures together on my data? Or would I just run one procedure?


    Let's see here. This rather turns logic upside down:

    The outliers of a reasonable distribution are the scores that
    have the most information about what transformation will
    work. Using a /test/ to remove outliers before doing a
    transformation is like shooting yourself in the foot. Crippling.

    Here are a few guidelines.
    1. What transformation is appropriate? Consider /what/ is
    being measured, for what purpose.

    Counts suggest Poisson, chemical concentrations suggest
    logarithms, distances sometimes suggest reciprocals, and so on.
    But the purpose, which might be a reflection of something like
    a "latent factor", may be determinative instead. "Dollars"
    are usually untransformed by economists, but as a measure
    or latent score for a construct or factor for "wealth" over a
    wide range, some transforming is surely needed.

    Three purposes of transformation are (a) to achieve
    linearity with an outcome; (b) to achieve equal error variace
    across the range of the variable; (c) to achieve a normal-
    looking distribution. It is surprisingly often that all three of
    these occur at the same time with natural data... but
    that has led to some ignorant reliance on (c), "looks",
    when it is the least important of the three. Linearity
    matters most for simple model-building; and homogeneous
    residual variance matters most for the robustness of the
    distribution of the test statistic.



    2. Are some scores simply unreasonable? This is not a
    question for Lund's test. 2a. There are outliers that /need/
    to be removed because they are bad data -- data cleaning,
    not analysis. 2b. There are outliers that /need/ to be
    removed because they are not invalid in the sense of 'bad
    data', but they are invalid in the sense of belonging to a
    homogeneous set that the analyses should deal with.
    Data of this sort may be set aside from the analyses, and
    probably be explained by a note in some eventual report.


    3. Throwing away data is usually a bad idea. I have, on a
    few occasions, drawn in the outside few percent of scores
    in order to avoid the bad effect that the extremes would have
    on the test statistics; this is done when I also expect that the
    extreme scores reflect more scoring error than actually-
    extreme phenomena.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ilovestats!!@21:1/5 to All on Fri Dec 11 13:07:06 2015
    Hi,

    My data does not follow a normal distribution. I wanted to run a Lund's Test first to remove outliers from my data then transform my data. Is it a good idea to run both procedures together on my data? Or would I just run one procedure?

    Thanks!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)