• unemployment stats

    From RichD@21:1/5 to All on Sun Oct 3 16:35:48 2021
    Given a population of unemployed persons, i.e. names
    and phone numbers. You wish to construct a histogram
    of # of persons vs. time (# of days) out of work.

    Stats 101, a student homework assignment, right?
    Call some random subset of the list, ask them: when
    were you laid off? Assuming the sample is unbiased,
    it will satisfy the conditions.

    No, this method is flawed. Because the person out of
    work a long time, has a greater chance of receiving
    multiple calls (or at least one call) than one who is
    shortly re-employed. This biases the sample, skews
    the numbers on the long side.

    Therefore, officially published statistics are unreliable.

    --
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to r_delaney2001@yahoo.com on Mon Oct 4 14:35:16 2021
    On Sun, 3 Oct 2021 16:35:48 -0700 (PDT), RichD
    <r_delaney2001@yahoo.com> wrote:

    Given a population of unemployed persons, i.e. names
    and phone numbers. You wish to construct a histogram
    of # of persons vs. time (# of days) out of work.

    Stats 101, a student homework assignment, right?
    Call some random subset of the list, ask them: when
    were you laid off? Assuming the sample is unbiased,
    it will satisfy the conditions.

    No, this method is flawed. Because the person out of
    work a long time, has a greater chance of receiving
    multiple calls (or at least one call) than one who is
    shortly re-employed. This biases the sample, skews
    the numbers on the long side.

    Well, the number represents what it represents.
    It is only a mis-report of you mis-report it.

    It is always proper to warn readers of ways that
    they might misinterpret what is being reported.


    Therefore, officially published statistics are unreliable.

    I think you mean "invalid". And you are wrong, mainly.

    Technically, in statistics, we have both "reliability" and
    "validity". Good reliability says that the number is
    reproducible, whereas good validity says that it measures
    what it purports to measure. You should complain about
    validity: the statistics imply something untrue.

    I do know that official "unemployment statistics" have
    nuances -- like, you don't get counted in the popular
    number if you have given up looking for a job. Yes,
    amateurs are apt to be misled by raw numbers. I suppose
    that the emphasis on "changes" makes use of the
    underlying "reliability" -- and the /changes/ lead to
    inferences that are generally meaningful and valid.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From RichD@21:1/5 to Rich Ulrich on Mon Oct 4 19:37:14 2021
    On October 4, Rich Ulrich wrote:
    Given a population of unemployed persons, i.e. names
    and phone numbers. You wish to construct a histogram
    of # of persons vs. time (# of days) out of work.
    Stats 101, right?
    Call some random subset of the list, ask them: when
    were you laid off? Assuming the sample is unbiased,
    it will satisfy the conditions.
    No, this method is flawed. Because the person out of
    work a long time, has a greater chance of receiving
    multiple calls (or at least one call) than one who is
    shortly re-employed. This biases the sample, skews
    the numbers on the long side.

    Well, the number represents what it represents.
    It is only a mis-report of you mis-report it.
    It is always proper to warn readers of ways that
    they might misinterpret what is being reported.

    Therefore, officially published statistics are unreliable.

    I think you mean "invalid". And you are wrong, mainly.

    Technically, in statistics, we have both "reliability" and
    "validity". Good reliability says that the number is
    reproducible, whereas good validity says that it measures
    what it purports to measure. You should complain about
    validity: the statistics imply something untrue.

    Given the goal of the study, is the objection mentioned above, justified?
    i.e. is the methodology flawed?

    --
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to r_delaney2001@yahoo.com on Tue Oct 5 20:40:37 2021
    On Mon, 4 Oct 2021 19:37:14 -0700 (PDT), RichD
    <r_delaney2001@yahoo.com> wrote:

    On October 4, Rich Ulrich wrote:
    Given a population of unemployed persons, i.e. names
    and phone numbers. You wish to construct a histogram
    of # of persons vs. time (# of days) out of work.
    Stats 101, right?
    Call some random subset of the list, ask them: when
    were you laid off? Assuming the sample is unbiased,
    it will satisfy the conditions.
    No, this method is flawed. Because the person out of
    work a long time, has a greater chance of receiving
    multiple calls (or at least one call) than one who is
    shortly re-employed. This biases the sample, skews
    the numbers on the long side.

    Well, the number represents what it represents.
    It is only a mis-report of you mis-report it.
    It is always proper to warn readers of ways that
    they might misinterpret what is being reported.

    Therefore, officially published statistics are unreliable.

    I think you mean "invalid". And you are wrong, mainly.

    Technically, in statistics, we have both "reliability" and
    "validity". Good reliability says that the number is
    reproducible, whereas good validity says that it measures
    what it purports to measure. You should complain about
    validity: the statistics imply something untrue.

    Given the goal of the study, is the objection mentioned above, justified? >i.e. is the methodology flawed?

    My position is that you can collect and report information for
    any numbers that might be interesting.

    The initial problem is, "Where do these data come from?" - That
    might put hard limits on what you can infer. Does a person
    have to be unemployed for two weeks before they get in that
    list? If you call and the person is now employed, were they
    asked the two questions, "How long were you unemployed?"
    and "How long ago did you get the new job?"

    You are jumping ahead to "bad inference." Showing a
    histogram of a cross-section of a stated population (sample)
    is not "drawing an inference."

    Assuming a simplified, instantaneous cross-sectional sample from
    that population, you might use your observations above,
    about the implicit weighting, to compute a weighted mean --
    Each person would be weighted by their TIME (as the
    "probability of being sampled") and you compute that weighted
    mean... as an estimate of ... hmm. It estimates something that
    might be fairly robust, but it can't be labeled, I think, without
    knowing something about what else gets a person off the list,
    OTHER than getting employed. It is a weighted "observed time
    of unemployment for the newly unemployed" with some cutoff
    being enforced. I think I would get the average for only those
    under 6 months, and give some additional comment on the others.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From RichD@21:1/5 to RichD on Fri Oct 8 12:10:02 2021
    On October 8, RichD wrote:
    Given all that, review the objection mentioned above: those longer unemployed,
    will have a greater chance of getting a call. Therefore, the methodology is flawed;
    the sample isn't unbiased.

    To be more precise: not that the long time unemployed is more likely to be sampled on a particular day, but more likely during his lifetime, so to speak.

    --
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From RichD@21:1/5 to Rich Ulrich on Fri Oct 8 12:04:18 2021
    On October 5, Rich Ulrich wrote:
    Given a population of unemployed persons, i.e. names
    and phone numbers. You wish to construct a histogram
    of # of persons vs. time (# of days) out of work.
    Stats 101, right?
    Call some random subset of the list, ask them: when
    were you laid off? Assuming the sample is unbiased,
    it will satisfy the conditions.
    No, this method is flawed. Because the person out of
    work a long time, has a greater chance of receiving
    multiple calls (or at least one call) than one who is
    shortly re-employed. This biases the sample, skews
    the numbers on the long side.

    Well, the number represents what it represents.
    It is only a mis-report of you mis-report it.

    Therefore, officially published statistics are unreliable.

    I think you mean "invalid". And you are wrong, mainly.
    Technically, in statistics, we have both "reliability" and
    "validity"... good validity says that it measures
    what it purports to measure. You should complain about
    validity: the statistics imply something untrue.

    Given the goal of the study, is the objection mentioned above, justified? >>i.e. is the methodology flawed?

    My position is that you can collect and report information for
    any numbers that might be interesting.
    The initial problem is, "Where do these data come from?" - That
    might put hard limits on what you can infer.
    You are jumping ahead to "bad inference." Showing a
    histogram of a cross-section of a stated population (sample)
    is not "drawing an inference."
    Assuming a simplified, instantaneous cross-sectional sample from
    that population, you might use your observations above,
    about the implicit weighting, to compute a weighted mean --
    Each person would be weighted by their TIME (as the
    "probability of being sampled") and you compute that weighted
    mean... as an estimate of ... hmm.

    The goal isn't to estimate the chance a person might receive a call.
    The goal is to estimate the distribution of population vs. time unemployed, given a histogram of samples of the unemployed. Then, perhaps, one might predict, probabilistically, how much time a newly unemployed will require to find new work.

    Intuitively, the distribution should match the sample histogram. That's the desired inference. Very simple.

    Given all that, review the objection mentioned above: those longer unemployed, will have a greater chance of getting a call. Therefore, the methodology is flawed;
    the sample isn't unbiased.

    I have an ulterior motive for posting this -

    --
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to r_delaney2001@yahoo.com on Sat Oct 9 19:10:33 2021
    On Fri, 8 Oct 2021 12:04:18 -0700 (PDT), RichD
    <r_delaney2001@yahoo.com> wrote:

    On October 5, Rich Ulrich wrote:
    Given a population of unemployed persons, i.e. names
    and phone numbers. You wish to construct a histogram
    of # of persons vs. time (# of days) out of work.
    Stats 101, right?
    Call some random subset of the list, ask them: when
    were you laid off? Assuming the sample is unbiased,
    it will satisfy the conditions.
    No, this method is flawed. Because the person out of
    work a long time, has a greater chance of receiving
    multiple calls (or at least one call) than one who is
    shortly re-employed. This biases the sample, skews
    the numbers on the long side.

    Well, the number represents what it represents.
    It is only a mis-report of you mis-report it.

    Therefore, officially published statistics are unreliable.

    I think you mean "invalid". And you are wrong, mainly.
    Technically, in statistics, we have both "reliability" and
    "validity"... good validity says that it measures
    what it purports to measure. You should complain about
    validity: the statistics imply something untrue.

    Given the goal of the study, is the objection mentioned above, justified? >>>i.e. is the methodology flawed?

    My position is that you can collect and report information for
    any numbers that might be interesting.
    The initial problem is, "Where do these data come from?" - That
    might put hard limits on what you can infer.
    You are jumping ahead to "bad inference." Showing a
    histogram of a cross-section of a stated population (sample)
    is not "drawing an inference."
    Assuming a simplified, instantaneous cross-sectional sample from
    that population, you might use your observations above,
    about the implicit weighting, to compute a weighted mean --
    Each person would be weighted by their TIME (as the
    "probability of being sampled") and you compute that weighted
    mean... as an estimate of ... hmm.

    The goal isn't to estimate the chance a person might receive a call.
    The goal is to estimate the distribution of population vs. time unemployed, >given a histogram of samples of the unemployed. Then, perhaps, one might >predict, probabilistically, how much time a newly unemployed will require to >find new work.

    Intuitively, the distribution should match the sample histogram.

    Yes, that is apt to be the naive intuition of someone who has
    never considered "sampling." Any good course on sampling is
    going to replace that bad idea, early on.

    That's the
    desired inference. Very simple.

    Given all that, review the objection mentioned above: those longer unemployed, >will have a greater chance of getting a call. Therefore, the methodology is flawed;
    the sample isn't unbiased.

    I will repeat: What you do with the numbers, what you say
    about them, is what matters. I think I would say that you may
    label this methodology as "problematic" because of the bias.

    A whole lot of sample-schems are biased. If all in a set share
    the same bias, you might even compare the results fairly without
    ever estimating and correcting the bias. But you always do
    want to let your audience know that you recognize the bias
    (so the wiser ones don't think you are an ignoramus).


    I have an ulterior motive for posting this -

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From RichD@21:1/5 to Rich Ulrich on Sat Oct 23 17:18:50 2021
    Forgot about this one -

    On October 9, Rich Ulrich wrote:
    Given a population of unemployed persons, i.e. names
    and phone numbers. You wish to construct a histogram
    of # of persons vs. time (# of days) out of work.
    Stats 101, right?
    Call some random subset of the list, ask them: when
    were you laid off? Assuming the sample is unbiased,
    it will satisfy the conditions.

    Well, the number represents what it represents.
    It is only a mis-report of you mis-report it.
    My position is that you can collect and report information for
    any numbers that might be interesting.

    That's essentially the philosophy of science.

    Every experiment is correct, in the sense that it is what it is. Start
    with initial conditions, observe the results. Ask a question of nature,
    she answers. She doesn't care about your confusion.

    First, one must specify a hypothesis to be tested, and desired inference
    to be drawn. One assesses experimental design correctness according
    to whether the experiment meets these goals.

    Let's recap: we want to learn the distribution of unemployed persons vs.
    days out of work.

    We are given a list of unemployed persons, i.e. names
    and phone numbers. Presumably, the list is complete. We call
    a sample, ask: how many days since you were you laid off?
    Couldn't be simpler.

    Later, perhaps, one might predict, probabilistically, how much time a
    newly unemployed will require to find new work.

    A reviewer objects. Those longer unemployed, will have a greater chance of getting a call (or repeat calls). Therefore, the methodology is flawed; the sample isn't unbiased. Hence the desired inference is invalid.

    I find this objection spurious. Of course, the longer one is unemployed, the greater chance of being sampled! That's inherent to the experiment, not a defect. If Joe is out of work 100 days, the only question is whether he gets a call, and whether 100 goes into the data. It doesn't matter if he was also sampled 50 days ago.

    The goal isn't to estimate the chance a person might receive a call, during
    his lifetime, so to speak. That would be another hypothesis, another experiment.

    Correct?

    --
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to r_delaney2001@yahoo.com on Sun Oct 24 13:39:49 2021
    On Sat, 23 Oct 2021 17:18:50 -0700 (PDT), RichD
    <r_delaney2001@yahoo.com> wrote:



    Every experiment is correct, in the sense that it is what it is. Start
    with initial conditions, observe the results. Ask a question of nature,
    she answers. She doesn't care about your confusion.

    First, one must specify a hypothesis to be tested, and desired inference
    to be drawn. One assesses experimental design correctness according
    to whether the experiment meets these goals.

    Let's recap: we want to learn the distribution of unemployed persons vs. >days out of work.

    Ahem. What is your "hypothesis to be tested" or "desired
    inference to be drawn"? A "distribution" is mum on that.


    We are given a list of unemployed persons, i.e. names
    and phone numbers. Presumably, the list is complete. We call
    a sample, ask: how many days since you were you laid off?
    Couldn't be simpler.

    Later, perhaps, one might predict, probabilistically, how much time a
    newly unemployed will require to find new work.

    Ay, there's the rub.

    "When someone is fired or quits a job, how long do they
    stay unemployed?" That's neither hypothesis nor inference.
    It asks for a description.

    But it is an "interesting" question -- An ordinary person
    might assume was being answered by that "distribution"
    mentioned earlier, but it is not. That is why there was
    a post.

    IN THE REAL WORLD -- A better starting point is the
    list of people with the time they register as "unemployed."
    That suggests limitations: Not everyone registers; and
    no one registers (US) if they expect a new job quickly.

    And in the US, you can drop off the rolls of "unemployed"
    after some time or lack of effort to find a job.

    Otherwise, you could survey and ask EVERYONE if they
    have ever been unemployed, and for how long, for some
    previous time period. That suffers from errors of memory,
    among other problems, but it direct attack on the
    question that most people assume is being answered.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)