Forum: >>> Magnum BBS <<<

Q interpretation of statistically negative values

From Cosine@21:1/5 to All on Sat Sep 24 00:29:03 2022

Hi:

When doing statistical analysis, we often compute the values of the mean and standard error (SE) of the sample. Then we check the cumulative probability of the interval centered at the mean and depart from there by some positive and negative SE.
However, this kind of interval sometimes would include negative values. How do we interpret this kind of result if the variable, by definition, should be always positive, e.g., age, weight, height, and salary?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to All on Sat Sep 24 13:47:05 2022

On Sat, 24 Sep 2022 00:29:03 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

Hi:

When doing statistical analysis, we often compute the values of

the mean and standard error (SE) of the sample. Then we check the
cumulative probability of the interval centered at the mean and depart
from there by some positive and negative SE. However, this kind of
interval sometimes would include negative values. How do we interpret
this kind of result if the variable, by definition, should be always
positive, e.g., age, weight, height, and salary?

Q: What does it mean when the computed confidence interval
extends beyond the range of the variable?

A: The assumptions for constructing and using the CI as an accurate
indicator have not been met. And if one tail is obviously too long,
the other tail is often too short, which might be a concern.

In my experience, I saw people confused by CI's on proportions
when they went beyond 0 or 100%. The statistical literature
contains several alternatives for those CIs, which vary the
assumptions about the underlying distribution ("logisitc"?) and
construct intervals that are more precise and legitimate. (Note:
approximations can be easier to compute than exact answers.)

For natural measures which have a large range and are never zero,
starting with the log transformation is often appropriate: Transform;
get the average; back-transform if you prefer the original units.

For well-behaved distributions, transformations to achieve "equal
interval" (in the measurement space of whatever matters) will
usually give good CIs.

For distributions on hand that are not well-behaved, you might be
well-advised to switch from Mean to Median as your central measure,
and use some version of ranges instead of Standard Deviation/Error.
Bootstrap methods are used in some problems, to overcome the
"oddness" of distributions.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Cosine@21:1/5 to All on Sat Sep 24 16:11:09 2022

Hi:

Thank you for replying.

However, different transformations would distort the original numeric line in different manners.

For example, while using the log function transforms the original non-negative numeric line [0, inf] to the full numeric line [-inf, inf], it "expands" the part of [0, 1] to [-inf, 0]. If we use another nonlinear transformation, we will get a different
distortion. After all, we only restrict the transformation to one-to-one.

Since the width of the confidence interval represents the cumulative proportions, would the type of transformation affect the determination of statistical significance?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to All on Sat Sep 24 23:00:05 2022

On Sat, 24 Sep 2022 16:11:09 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

Hi:

Thank you for replying.

However, different transformations would distort the original numeric line in different manners.

That does not deserve a "However,"....

Yes, you will compute different values when formulas use
different assumptions. As I wrote,

* * For well-behaved distributions, transformations to achieve "equal
interval" (in the measurement space of whatever matters) will
usually give good CIs. * *

For example, while using the log function transforms the original

non-negative numeric line [0, inf] to the full numeric line [-inf,
inf], it "expands" the part of [0, 1] to [-inf, 0]. If we use another
nonlinear transformation, we will get a different distortion. After
all, we only restrict the transformation to one-to-one.

I don't take the log of zero. Undefined, not -inf.

Also note: Some people misconstrue "equal intervals." Wealth is
measured in dollars; 'dollars' are seen (erroneously) to make the
factor linear and equal-interval when /measured/ in dollars. But
adding a million dollars is a grossly different contribution to
'wealth' depending on the start -- there are unequal intervals
at the extremes. Think of the variables as 'latent factors' for
what you are interested in, and imagine what makes equal intervals
for that factor. Like 'wealth' or whatever, the available units are
often misleading.

Since the width of the confidence interval represents the

cumulative proportions, would the type of transformation affect the determination of statistical significance?

If you want a statement about cumulative proportions, the
safe way is to use rank-order. The range from the 40th to
the 60th percentile (for instance) will be a 95% CI for the
median, for some easily computed N.

"Statistical significance" (to me) implies testing, rather than
presenting CIs. If you don't have 'equal intervals' in the
sense I describe above, your testing will be deficient to some
extent.

Does it matter? The usual tests are pretty robust against
moderate distortion of scaling, when you use the usual 5% test
size (actual size remains in the range 4-6%). ANOVA tests at
0.001 on moderately skewed distributions are often wrong
by five-fold or more.

Extremly fat tails or far outliers mess up p-values even at the
5% size. This is why cleaning your data takes at least 90% of
the time of a competent data analyst hired for a job: We
want to know for ourselves that the means will be meaningful,
et cetera. That usually means fixing stuff, or writing cautions
at the end.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Sat May 11 13:43:06 2024
  from Wales, Uk via Telnet
- Keyop
  Sun May 12 20:08:49 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun May 12 20:08:37 2024
  from Huddersfield, West Yorkshire via SSH
- Guest
  Mon May 13 05:37:58 2024
  from Any State via RLogin

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	178:57:15
Calls:	6,738
Calls today:	1
Files:	12,265
Messages:	5,365,579

Q interpretation of statistically negative values

Who's Online

Recent Visitors

System Info