Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.
Stats 101, a student homework assignment, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.
No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.
Therefore, officially published statistics are unreliable.
Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.
Stats 101, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.
No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.
Well, the number represents what it represents.
It is only a mis-report of you mis-report it.
It is always proper to warn readers of ways that
they might misinterpret what is being reported.
Therefore, officially published statistics are unreliable.
I think you mean "invalid". And you are wrong, mainly.
Technically, in statistics, we have both "reliability" and
"validity". Good reliability says that the number is
reproducible, whereas good validity says that it measures
what it purports to measure. You should complain about
validity: the statistics imply something untrue.
On October 4, Rich Ulrich wrote:
Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.
Stats 101, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.
No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.
Well, the number represents what it represents.
It is only a mis-report of you mis-report it.
It is always proper to warn readers of ways that
they might misinterpret what is being reported.
Therefore, officially published statistics are unreliable.
I think you mean "invalid". And you are wrong, mainly.
Technically, in statistics, we have both "reliability" and
"validity". Good reliability says that the number is
reproducible, whereas good validity says that it measures
what it purports to measure. You should complain about
validity: the statistics imply something untrue.
Given the goal of the study, is the objection mentioned above, justified? >i.e. is the methodology flawed?
Given all that, review the objection mentioned above: those longer unemployed,
will have a greater chance of getting a call. Therefore, the methodology is flawed;
the sample isn't unbiased.
Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.
Stats 101, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.
No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.
Well, the number represents what it represents.
It is only a mis-report of you mis-report it.
Therefore, officially published statistics are unreliable.
I think you mean "invalid". And you are wrong, mainly.
Technically, in statistics, we have both "reliability" and
"validity"... good validity says that it measures
what it purports to measure. You should complain about
validity: the statistics imply something untrue.
Given the goal of the study, is the objection mentioned above, justified? >>i.e. is the methodology flawed?
My position is that you can collect and report information for
any numbers that might be interesting.
The initial problem is, "Where do these data come from?" - That
might put hard limits on what you can infer.
You are jumping ahead to "bad inference." Showing a
histogram of a cross-section of a stated population (sample)
is not "drawing an inference."
Assuming a simplified, instantaneous cross-sectional sample from
that population, you might use your observations above,
about the implicit weighting, to compute a weighted mean --
Each person would be weighted by their TIME (as the
"probability of being sampled") and you compute that weighted
mean... as an estimate of ... hmm.
On October 5, Rich Ulrich wrote:
Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.
Stats 101, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.
No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.
Well, the number represents what it represents.
It is only a mis-report of you mis-report it.
Therefore, officially published statistics are unreliable.
I think you mean "invalid". And you are wrong, mainly.
Technically, in statistics, we have both "reliability" and
"validity"... good validity says that it measures
what it purports to measure. You should complain about
validity: the statistics imply something untrue.
Given the goal of the study, is the objection mentioned above, justified? >>>i.e. is the methodology flawed?
My position is that you can collect and report information for
any numbers that might be interesting.
The initial problem is, "Where do these data come from?" - That
might put hard limits on what you can infer.
You are jumping ahead to "bad inference." Showing a
histogram of a cross-section of a stated population (sample)
is not "drawing an inference."
Assuming a simplified, instantaneous cross-sectional sample from
that population, you might use your observations above,
about the implicit weighting, to compute a weighted mean --
Each person would be weighted by their TIME (as the
"probability of being sampled") and you compute that weighted
mean... as an estimate of ... hmm.
The goal isn't to estimate the chance a person might receive a call.
The goal is to estimate the distribution of population vs. time unemployed, >given a histogram of samples of the unemployed. Then, perhaps, one might >predict, probabilistically, how much time a newly unemployed will require to >find new work.
Intuitively, the distribution should match the sample histogram.
That's the
desired inference. Very simple.
Given all that, review the objection mentioned above: those longer unemployed, >will have a greater chance of getting a call. Therefore, the methodology is flawed;
the sample isn't unbiased.
I have an ulterior motive for posting this -
Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.
Stats 101, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.
My position is that you can collect and report information forWell, the number represents what it represents.
It is only a mis-report of you mis-report it.
any numbers that might be interesting.
Every experiment is correct, in the sense that it is what it is. Start
with initial conditions, observe the results. Ask a question of nature,
she answers. She doesn't care about your confusion.
First, one must specify a hypothesis to be tested, and desired inference
to be drawn. One assesses experimental design correctness according
to whether the experiment meets these goals.
Let's recap: we want to learn the distribution of unemployed persons vs. >days out of work.
We are given a list of unemployed persons, i.e. names
and phone numbers. Presumably, the list is complete. We call
a sample, ask: how many days since you were you laid off?
Couldn't be simpler.
Later, perhaps, one might predict, probabilistically, how much time a
newly unemployed will require to find new work.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 113 |
Nodes: | 8 (1 / 7) |
Uptime: | 127:45:43 |
Calls: | 2,501 |
Files: | 8,692 |
Messages: | 1,924,816 |