Forum: >>> Magnum BBS <<<

unemployment stats

From RichD@21:1/5 to All on Sun Oct 3 16:35:48 2021

Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.

Stats 101, a student homework assignment, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.

No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.

Therefore, officially published statistics are unreliable.

--
Rich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to r_delaney2001@yahoo.com on Mon Oct 4 14:35:16 2021

On Sun, 3 Oct 2021 16:35:48 -0700 (PDT), RichD
<r_delaney2001@yahoo.com> wrote:

Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.

Stats 101, a student homework assignment, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.

No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.

Well, the number represents what it represents.
It is only a mis-report of you mis-report it.

It is always proper to warn readers of ways that
they might misinterpret what is being reported.

Therefore, officially published statistics are unreliable.

I think you mean "invalid". And you are wrong, mainly.

Technically, in statistics, we have both "reliability" and
"validity". Good reliability says that the number is
reproducible, whereas good validity says that it measures
what it purports to measure. You should complain about
validity: the statistics imply something untrue.

I do know that official "unemployment statistics" have
nuances -- like, you don't get counted in the popular
number if you have given up looking for a job. Yes,
amateurs are apt to be misled by raw numbers. I suppose
that the emphasis on "changes" makes use of the
underlying "reliability" -- and the /changes/ lead to
inferences that are generally meaningful and valid.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From RichD@21:1/5 to Rich Ulrich on Mon Oct 4 19:37:14 2021

On October 4, Rich Ulrich wrote:

Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.
Stats 101, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.
No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.

Well, the number represents what it represents.
It is only a mis-report of you mis-report it.
It is always proper to warn readers of ways that
they might misinterpret what is being reported.

Therefore, officially published statistics are unreliable.

I think you mean "invalid". And you are wrong, mainly.

Technically, in statistics, we have both "reliability" and
"validity". Good reliability says that the number is
reproducible, whereas good validity says that it measures
what it purports to measure. You should complain about
validity: the statistics imply something untrue.

Given the goal of the study, is the objection mentioned above, justified?
i.e. is the methodology flawed?

--
Rich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to r_delaney2001@yahoo.com on Tue Oct 5 20:40:37 2021

On Mon, 4 Oct 2021 19:37:14 -0700 (PDT), RichD
<r_delaney2001@yahoo.com> wrote:

On October 4, Rich Ulrich wrote:

Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.
Stats 101, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.
No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.

Well, the number represents what it represents.
It is only a mis-report of you mis-report it.
It is always proper to warn readers of ways that
they might misinterpret what is being reported.

Therefore, officially published statistics are unreliable.

I think you mean "invalid". And you are wrong, mainly.

Technically, in statistics, we have both "reliability" and
"validity". Good reliability says that the number is
reproducible, whereas good validity says that it measures
what it purports to measure. You should complain about
validity: the statistics imply something untrue.

Given the goal of the study, is the objection mentioned above, justified? >i.e. is the methodology flawed?

My position is that you can collect and report information for
any numbers that might be interesting.

The initial problem is, "Where do these data come from?" - That
might put hard limits on what you can infer. Does a person
have to be unemployed for two weeks before they get in that
list? If you call and the person is now employed, were they
asked the two questions, "How long were you unemployed?"
and "How long ago did you get the new job?"

You are jumping ahead to "bad inference." Showing a
histogram of a cross-section of a stated population (sample)
is not "drawing an inference."

Assuming a simplified, instantaneous cross-sectional sample from
that population, you might use your observations above,
about the implicit weighting, to compute a weighted mean --
Each person would be weighted by their TIME (as the
"probability of being sampled") and you compute that weighted
mean... as an estimate of ... hmm. It estimates something that
might be fairly robust, but it can't be labeled, I think, without
knowing something about what else gets a person off the list,
OTHER than getting employed. It is a weighted "observed time
of unemployment for the newly unemployed" with some cutoff
being enforced. I think I would get the average for only those
under 6 months, and give some additional comment on the others.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From RichD@21:1/5 to RichD on Fri Oct 8 12:10:02 2021

On October 8, RichD wrote:

Given all that, review the objection mentioned above: those longer unemployed,
will have a greater chance of getting a call. Therefore, the methodology is flawed;
the sample isn't unbiased.

To be more precise: not that the long time unemployed is more likely to be sampled on a particular day, but more likely during his lifetime, so to speak.

--
Rich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From RichD@21:1/5 to Rich Ulrich on Fri Oct 8 12:04:18 2021

On October 5, Rich Ulrich wrote:

Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.
Stats 101, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.
No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.

Well, the number represents what it represents.
It is only a mis-report of you mis-report it.

Therefore, officially published statistics are unreliable.

I think you mean "invalid". And you are wrong, mainly.
Technically, in statistics, we have both "reliability" and
"validity"... good validity says that it measures
what it purports to measure. You should complain about
validity: the statistics imply something untrue.

Given the goal of the study, is the objection mentioned above, justified? >>i.e. is the methodology flawed?

My position is that you can collect and report information for
any numbers that might be interesting.
The initial problem is, "Where do these data come from?" - That
might put hard limits on what you can infer.
You are jumping ahead to "bad inference." Showing a
histogram of a cross-section of a stated population (sample)
is not "drawing an inference."
Assuming a simplified, instantaneous cross-sectional sample from
that population, you might use your observations above,
about the implicit weighting, to compute a weighted mean --
Each person would be weighted by their TIME (as the
"probability of being sampled") and you compute that weighted
mean... as an estimate of ... hmm.

The goal isn't to estimate the chance a person might receive a call.
The goal is to estimate the distribution of population vs. time unemployed, given a histogram of samples of the unemployed. Then, perhaps, one might predict, probabilistically, how much time a newly unemployed will require to find new work.

Intuitively, the distribution should match the sample histogram. That's the desired inference. Very simple.

Given all that, review the objection mentioned above: those longer unemployed, will have a greater chance of getting a call. Therefore, the methodology is flawed;
the sample isn't unbiased.

I have an ulterior motive for posting this -

--
Rich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to r_delaney2001@yahoo.com on Sat Oct 9 19:10:33 2021

On Fri, 8 Oct 2021 12:04:18 -0700 (PDT), RichD
<r_delaney2001@yahoo.com> wrote:

On October 5, Rich Ulrich wrote:

Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.
Stats 101, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.
No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.

Well, the number represents what it represents.
It is only a mis-report of you mis-report it.

Therefore, officially published statistics are unreliable.

I think you mean "invalid". And you are wrong, mainly.
Technically, in statistics, we have both "reliability" and
"validity"... good validity says that it measures
what it purports to measure. You should complain about
validity: the statistics imply something untrue.

Given the goal of the study, is the objection mentioned above, justified? >>>i.e. is the methodology flawed?

My position is that you can collect and report information for
any numbers that might be interesting.
The initial problem is, "Where do these data come from?" - That
might put hard limits on what you can infer.
You are jumping ahead to "bad inference." Showing a
histogram of a cross-section of a stated population (sample)
is not "drawing an inference."
Assuming a simplified, instantaneous cross-sectional sample from
that population, you might use your observations above,
about the implicit weighting, to compute a weighted mean --
Each person would be weighted by their TIME (as the
"probability of being sampled") and you compute that weighted
mean... as an estimate of ... hmm.

The goal isn't to estimate the chance a person might receive a call.
The goal is to estimate the distribution of population vs. time unemployed, >given a histogram of samples of the unemployed. Then, perhaps, one might >predict, probabilistically, how much time a newly unemployed will require to >find new work.

Intuitively, the distribution should match the sample histogram.

Yes, that is apt to be the naive intuition of someone who has
never considered "sampling." Any good course on sampling is
going to replace that bad idea, early on.

That's the
desired inference. Very simple.

Given all that, review the objection mentioned above: those longer unemployed, >will have a greater chance of getting a call. Therefore, the methodology is flawed;
the sample isn't unbiased.

I will repeat: What you do with the numbers, what you say
about them, is what matters. I think I would say that you may
label this methodology as "problematic" because of the bias.

A whole lot of sample-schems are biased. If all in a set share
the same bias, you might even compare the results fairly without
ever estimating and correcting the bias. But you always do
want to let your audience know that you recognize the bias
(so the wiser ones don't think you are an ignoramus).

I have an ulterior motive for posting this -

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From RichD@21:1/5 to Rich Ulrich on Sat Oct 23 17:18:50 2021

Forgot about this one -

On October 9, Rich Ulrich wrote:

Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.
Stats 101, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.

Well, the number represents what it represents.
It is only a mis-report of you mis-report it.

My position is that you can collect and report information for
any numbers that might be interesting.

That's essentially the philosophy of science.

Every experiment is correct, in the sense that it is what it is. Start
with initial conditions, observe the results. Ask a question of nature,
she answers. She doesn't care about your confusion.

First, one must specify a hypothesis to be tested, and desired inference
to be drawn. One assesses experimental design correctness according
to whether the experiment meets these goals.

Let's recap: we want to learn the distribution of unemployed persons vs.
days out of work.

We are given a list of unemployed persons, i.e. names
and phone numbers. Presumably, the list is complete. We call
a sample, ask: how many days since you were you laid off?
Couldn't be simpler.

Later, perhaps, one might predict, probabilistically, how much time a
newly unemployed will require to find new work.

A reviewer objects. Those longer unemployed, will have a greater chance of getting a call (or repeat calls). Therefore, the methodology is flawed; the sample isn't unbiased. Hence the desired inference is invalid.

I find this objection spurious. Of course, the longer one is unemployed, the greater chance of being sampled! That's inherent to the experiment, not a defect. If Joe is out of work 100 days, the only question is whether he gets a call, and whether 100 goes into the data. It doesn't matter if he was also sampled 50 days ago.

The goal isn't to estimate the chance a person might receive a call, during
his lifetime, so to speak. That would be another hypothesis, another experiment.

Correct?

--
Rich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to r_delaney2001@yahoo.com on Sun Oct 24 13:39:49 2021

On Sat, 23 Oct 2021 17:18:50 -0700 (PDT), RichD
<r_delaney2001@yahoo.com> wrote:

Every experiment is correct, in the sense that it is what it is. Start
with initial conditions, observe the results. Ask a question of nature,
she answers. She doesn't care about your confusion.

First, one must specify a hypothesis to be tested, and desired inference
to be drawn. One assesses experimental design correctness according
to whether the experiment meets these goals.

Let's recap: we want to learn the distribution of unemployed persons vs. >days out of work.

Ahem. What is your "hypothesis to be tested" or "desired
inference to be drawn"? A "distribution" is mum on that.

We are given a list of unemployed persons, i.e. names
and phone numbers. Presumably, the list is complete. We call
a sample, ask: how many days since you were you laid off?
Couldn't be simpler.

Later, perhaps, one might predict, probabilistically, how much time a
newly unemployed will require to find new work.

Ay, there's the rub.

"When someone is fired or quits a job, how long do they
stay unemployed?" That's neither hypothesis nor inference.
It asks for a description.

But it is an "interesting" question -- An ordinary person
might assume was being answered by that "distribution"
mentioned earlier, but it is not. That is why there was
a post.

IN THE REAL WORLD -- A better starting point is the
list of people with the time they register as "unemployed."
That suggests limitations: Not everyone registers; and
no one registers (US) if they expect a new job quickly.

And in the US, you can drop off the rolls of "unemployed"
after some time or lack of effort to find a job.

Otherwise, you could survey and ask EVERYONE if they
have ever been unemployed, and for how long, for some
previous time period. That suffers from errors of memory,
among other problems, but it direct attack on the
question that most people assume is being answered.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Thu Apr 25 22:17:10 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 10:09:36 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 08:24:20 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 06:40:30 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	75:03:21
Calls:	6,657
Calls today:	3
Files:	12,203
Messages:	5,332,640
Posted today:	1

unemployment stats

Who's Online

Recent Visitors

System Info