• Help needed....

    From Rich Ulrich@21:1/5 to peter@pmoylan.org.invalid on Fri Sep 10 17:18:25 2021
    XPost: alt.usage.english

    cross-posted to sci.stat.math where there is little traffic of late.

    I hope that what follows "LECTURE MODE" is useful enough that
    the reader will excuse my self-indulgence for going on at length.

    On Fri, 10 Sep 2021 11:48:20 +1100, Peter Moylan
    <peter@pmoylan.org.invalid> wrote:

    On 10/09/21 10:31, Mark Brader wrote:
    Tak To:
    The units are respectively "degree Celsius" and "degree
    Fahrenheit". They are both ordinals (marks on a scale) and
    cardinals (distances between pairs of marks).

    No, both words are being used wrongly.

    "Cardinal" refers to a number that tells how many of something
    exist. It is always either a non-negative integer or an infinite

    "Ordinal" refers to a number that tells the position of one
    particular thing in a sequence of things, by giving the number of
    things -- always a cardinal number -- from the start of the sequence
    to the one in question.

    In the languages I know about, the fact that a number is ordinal is
    expressed by modifying its name. For example, the cardinal number
    "six" or "6" corresponds to the ordinal "sixth" or "6th", or if that
    word "six" was in French, then to "sixième" or "6e".

    Neither one is relevant either to temperature readings or to
    distances; these are not constrained to be integers.

    In the case of temperature the distinction is between "degrees Celsius"
    and "Celsius degrees". The first is a point on a scale, and the second
    is an interval.

    The two are related by simple arithmetic operations (addition and >subtraction). Multiplication and division are meaningful when applied to >intervals, but not when applied to points on a scale, except when it
    happens to be a scale with an absolute zero (which Celsius doesn't have).

    Counter-example: One star may be "twice as hot" as another
    and F, C or K does not matter because, given the precision
    of estimates, the nominal zero is close enough to absolute zero.

    Zero degrees centigrade is the freezing point of water; it is an
    absolute zero in relation to the heat added to ice which is at
    that temperature.

    Ratio vs. interval vs. ordinal (ordered).

    I'm not sure whether all professional statisticians are clear on
    the notion that the intervals in "scaling" depend on the context
    and what you are comparing to. The important underlying idea
    for the usual testing by ANOVA is that there are "equal intervals"
    between scores. When someone intuitively refers to multiples like
    twice as much" of something, it implies (reversing the logic) that
    there IS some unmentioned value that serves as an "absolute

    I suggest, for an example, that some balmy temperature serves
    as zero so that an extra 5 degrees C or 10 degrees F is "hot",
    and twice as much increase makes it "twice as hot" -- probably
    on the heat-discomfort scale that compensates for relative humidity.

    The short-sighted error is to believe that WHAT you are
    measuring is all that you need to know, /because/ units of Time
    or Distance (or Temperature) have "inherently equal" intervals.

    John Tukey recommended that any time your largest "natural"
    score (having some zero) is 10 times the smallest, you should
    consider whether a transformation will improve the analysis.
    You keep in mind what constitutes "equal intervals" for your data.

    "Counts" sound like they should create natural, equal intervals.
    However, as counts arise in the world, they very often come
    out with Poisson distributuions; and taking the square root of
    counts is often the best starting point for analyses when there
    is that 10-fold range.

    Distances for spread of disease in epidemiology are modeled
    on the reciprocal of the values in meters or kilometers.

    Miles per gallon (US) and Liters per 100 miles (Europe) are
    implicitly reciprocal, though both "look" like they could have
    equal intervals. The latter works better in most analyses I've
    seen (that is, has a better "equal-interval" nature).

    Bio-chemical levels (hormones, whatever) are often log-transformed
    at the start of analyses; they often represent geometric or
    exponential growth of /something/.

    Proportions (P) bounded at 0 and one are often analyzed these
    days by the "logit", which is the log of P/(1-P).

    The so-called "non-parametric approach" to statistical analysis
    most often starts out by ranking the ordered scores; then it
    treats the differences between ranks as equal. There are
    so-called "exact" tests which may be better in tiny samples
    with no ties; for large samples ANOVAs performed on the
    ranks, as scores, work as well as (and often better than)
    the author's approximations from pre-computer days.

    For "ranks" where both ends are meaningful, rankings
    can be converted to percentiles, and then to logits.

    For ranks where #1 matters most, the log of the rankings
    improves the distance between scores, though not very
    precisely. For instance, the implied gap between #1 and #2
    is much closer in meaning to the gap between 40 and 80
    than to the gap between 40 and 41 or 80 and 81.

    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)