• Analyzing Covid Data

    From root@21:1/5 to All on Sun May 31 23:14:36 2020
    I have been looking into the NYT covid data available from:

    https://github.com/nytimes/covid-19-data/archive/master.zip

    My main focus has been to estimate the ratio

    m=(true number of infected)/(reported number of infected)

    This ratio alters the reported lethality of the SARS=Cov-2 virus.

    Before getting into the details of analyzing the covid data I want to discuss an
    analagous problem of coin flipping. Suppose you flip a coin 10 times you would expect, on average, 5 heads and 5 tails. But you know that the number of heads can be anything from 0 to 10 in any set of ten flips. Out of 10 flips 5 is the most likely number. What about 10,000 flips? In that case we would expect 5,000 heads but now the range of likely number of heads is sharply restricted. If you were to repeat many sets of 10,000 flips the number of heads would approach a bell-shaped normal curve with standard deviation of 50. That means it is most likely that the number of heads in 10,000 flips would lie between 4,950 and 5050. The formula for finding this standard deviation is sqrt( p * (1-p) * N ) where N is then number of flips. p is the probability of a head which is 1/2.

    We can turn the coin flipping problem around. Suppose I told you that after a session of coin flipping I got, say, 50 heads. What would you guess is the number of times I flipped the coin to get 50 heads? Your best guess would be 100
    times, but the actual number on any given trial might be some number near 100.

    Now we are ready to discuss the GitHub covid data. For each day from Jan 21, 2020 until today, and for each state or territory, or for each county in the US the data include the cumulative total of reported cases and reported deaths. An example few lines for the US would be:



    2020-05-17 1493350 89568
    2020-05-18 1515177 90414
    2020-05-19 1536129 91934

    In the discussion that follows the number of currently infected people will assume the role of the number of tosses of a coin. Instead of 50/50 odds of heads, there will be a probability (p) of an infected person infecting someone new in the next day. Otherwise we will be using the method described above to estimate the number of flips ( the number of infected people).



    Using the numbers above we can compute the daily differences (dC) in the number
    if cases (C). Furthermore we can compute the ratio dC/C. What is the meaning of this ratio? It is the probability that any one of the C people will infect a new
    person on that day. Call this ratio (p) this just says the expected mumber of infected people is p*C = dC. We can use the formula given above to compute the expected standard deviation (s) of the number infected by C people:
    s = sqrt( p * (1-p) * C).

    We can also directly compute the standard deviation (S) of the daily new cases. We find, however, that (S) is considerably larger than (s). The reason for this discrepancy that the new cases are derived from a larger number of actual infected people than the reported number. In the case of the US data as a whole the number of infected people is over ten times the reported number.

    The data show a pronounced weekly variation such as might happen if new cases are only reported on Monday. Careful measures must be taken to eliminate this source of variation.

    Here are my findings to-date:

    True Reported Factor Lethality
    Alabama 81732.8 16530 4.94452 0.723088
    Alaska 587.342 430 1.36591 1.36207
    Arizona 86081.5 17763 4.84611 0.995568
    Arkansas 43757.2 6538 6.69275 0.285667
    California 828019 104071 7.95629 0.488153
    Colorado 127288 25116 5.06799 1.11715
    Connecticut 475436 41559 11.44 0.804735
    Delaware 52705.4 9171 5.74696 0.654582 DistrictofColumbia 31440.9 8492 3.70242 1.4408
    Florida 402480 53277 7.55449 0.587109
    Georgia 445281 43363 10.2687 0.43613
    Guam 9154.68 1141 8.02339 0.0655402
    Hawaii 1044.66 637 1.63997 1.62733
    Idaho 11638.4 2770 4.20157 0.704567
    Illinois 1.1155e+06 116128 9.60574 0.468402
    Indiana 164705 33885 4.86071 1.25558
    Iowa 116177 18672 6.22198 0.438986
    Kansas 59938.9 9512 6.30139 0.348689
    Kentucky 59169.7 9510 6.22184 0.716583
    Louisiana 553734 38907 14.2323 0.494822
    Maine 4922.62 2189 2.2488 1.70641
    Maryland 385367 50334 7.6562 0.630048
    Massachusetts 1.10488e+06 94895 11.6432 0.600971
    Michigan 472088 55944 8.43858 1.13792
    Minnesota 102622 22957 4.47019 0.952035
    Mississippi 61517.4 14372 4.28036 1.12651
    Missouri 58292.4 12815 4.54876 1.22658
    Montana 801.297 485 1.65216 2.12156
    Nebraska 79160.9 13261 5.96945 0.214753
    Nevada 30918.9 8247 3.74911 1.32605
    NewHampshire 11029.1 4389 2.5129 2.10352
    NewJersey 1.76773e+06 157815 11.2013 0.644953
    NewMexico 23145.5 7364 3.14306 1.44737
    NewYork 7.05482e+06 371559 18.9871 0.417275
    NorthCarolina 163655 25616 6.38879 0.524884
    NorthDakota 6564.57 2484 2.64274 0.913997 NorthernMarianaIslands 32.5908 22 1.4814 6.13671
    Ohio 203122 33915 5.98914 1.03288
    Oklahoma 22629 6270 3.60909 1.44063
    Oregon 8513.77 4086 2.08364 1.7736
    Pennsylvania 572519 74312 7.70427 0.942675
    PuertoRico 25735.7 3486 7.38259 0.509021
    RhodeIsland 54769.8 14494 3.77879 1.23608
    SouthCarolina 43660.9 10788 4.04718 1.07648
    SouthDakota 20844.9 4793 4.34903 0.259056
    Tennessee 156846 21763 7.20702 0.224423
    Texas 427105 60787 7.02626 0.378829
    Utah 24485.9 8953 2.73494 0.432902
    Vermont 2213.81 974 2.2729 2.48441
    VirginIslands 132.527 69 1.92068 4.52739
    Virginia 295112 41401 7.12815 0.453387
    Washington 170473 21825 7.81092 0.654648
    WestVirginia 5243.41 1935 2.70977 1.41129
    Wisconsin 88229.6 17211 5.12635 0.623374
    Wyoming 2282.81 876 2.60595 0.657085

    US 18.12M 1.73M 10.65 .548


    I will respond to any post asking for more details.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Mon Jun 1 13:02:14 2020
    On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
    wrote:


    I've been following this, too. My own calculations have been
    on scraps of paper, using numbers from Johns Hopkins and
    random news reports.

    I have to say that I don't understand what numbers you
    are computing and what you are assuming.


    Using the numbers above we can compute the daily differences (dC) in the number
    if cases (C). Furthermore we can compute the ratio dC/C. What is the meaning of
    this ratio? It is the probability that any one of the C people will infect a new
    person on that day.

    Complications that I see: The Case does not show up until 5 or 6 days
    after the exposure. In the explosive phase in NY City, 6 days was a quadrupling of the number of cases. Louisiana was even faster.
    The day-to-day increase does indicate the exponential rate, however,
    when you can assume the increase is exponential. (But that is less
    often, now.)


    Rural "exponential increase" is much slower, except when it is a
    big splash of reports either in or near a prison or meatpacking plant.
    The Case in some cities is often counted only when it becomes
    a hospitilization, and (therefore) is no longer in the community
    infecting anyone.


    Call this ratio (p) this just says the expected mumber of
    infected people is p*C = dC. We can use the formula given above to compute the >expected standard deviation (s) of the number infected by C people:
    s = sqrt( p * (1-p) * C).

    We can also directly compute the standard deviation (S) of the daily new cases.
    We find, however, that (S) is considerably larger than (s). The reason for this
    discrepancy that the new cases are derived from a larger number of actual

    Trump refused to let the CDC set up a proper monitoring system,
    because he is an idiot who still doesn't believe in serious epidemics.
    So, the counting sucks. For cases. For tests. For results of tests
    which can mix virus-results with antigen-results with tests by
    manufacturers whose tests may be neither calibrated nor reliable.

    Day-to-day variation depends on sloppy reporting practices, and
    is contaminated by sloppy standards. And further contaminated
    by political concerns, which may-or-may-not intentionally count
    the cases at a prison or meatpacking plant or intentionally avoid
    counting them. And not all nursing homes are /able/ to get the
    test kits that they want.

    infected people than the reported number. In the case of the US data as a whole
    the number of infected people is over ten times the reported number.

    " ... ten times the reported number" is a guess, based on
    incomplete and insufficient surveys of antigen levels. That is a
    very important thing to know. Sweden recently reduced their
    claim (made a month ago) about subjects in Stockholm having
    antigens from 26% to 7%. I am hoping for better surveys.

    I read the CDC report (online) with 5 scenarios for the antigen
    levels, etc. CDC used data from April, and offers poor
    documentation. About 5 out of 6 epidemiologists (I gather) think
    that the CDC's highest mortality-rate estimate is more like a minimum.

    The data show a pronounced weekly variation such as might happen if new cases >are only reported on Monday. Careful measures must be taken to eliminate this >source of variation.

    Day to day variation is one problem among many.


    Here are my findings to-date:

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to Rich Ulrich on Mon Jun 1 18:20:41 2020
    Rich Ulrich <rich.ulrich@comcast.net> wrote:
    On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
    wrote:



    I have to say that I don't understand what numbers you
    are computing and what you are assuming.


    I suppose that there may be more people infected than are reported
    in the data as "cases". I hope to derive that number, or more
    exactly, the ratio of (true number)/reported number. My
    approach does not assume any model for the contagion.


    Complications that I see: The Case does not show up until 5 or 6 days
    after the exposure. In the explosive phase in NY City, 6 days was a quadrupling of the number of cases. Louisiana was even faster.
    The day-to-day increase does indicate the exponential rate, however,
    when you can assume the increase is exponential. (But that is less
    often, now.)

    I have looked into the lage between infection and reporting. I have
    considered lags from one to 50 days. The effect, while not strong,
    is to increase the estimated ratio. That is because the day/day
    variance decreases as you go back in time. So the ratio of
    current variance to earlier variance increases.

    Nothing in my presentation assumed exponential growth although
    you may have led to believe that since I utilized
    (delta cases)/cases. If you reread what I wrote I only took
    that to be a probability (p) which is a tautology from the
    equation p = (delta cases)/cases.



    Trump refused to let the CDC set up a proper monitoring system,
    because he is an idiot who still doesn't believe in serious epidemics.
    So, the counting sucks. For cases. For tests. For results of tests
    which can mix virus-results with antigen-results with tests by
    manufacturers whose tests may be neither calibrated nor reliable.

    Day-to-day variation depends on sloppy reporting practices, and
    is contaminated by sloppy standards. And further contaminated
    by political concerns, which may-or-may-not intentionally count
    the cases at a prison or meatpacking plant or intentionally avoid
    counting them. And not all nursing homes are /able/ to get the
    test kits that they want.

    I don't want to engage in any political issues surrounding the
    contagion.


    infected people than the reported number. In the case of the US data as a whole
    the number of infected people is over ten times the reported number.


    " ... ten times the reported number" is a guess, based on
    incomplete and insufficient surveys of antigen levels. That is a
    very important thing to know. Sweden recently reduced their
    claim (made a month ago) about subjects in Stockholm having
    antigens from 26% to 7%. I am hoping for better surveys.

    I derived my figure of 10 from the data, not from any external
    sources. I welcome criticism of the procedure apart from
    the factor since only experimental results based upon
    widespread (reliable) testing will provide the answer.


    Day to day variation is one problem among many.

    By "many" do you mean problems in my approach?
    If so I would like to hear what you mean. If you
    are referring to the data itself we can only use
    what we have.

    Thanks for your comments.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Rich Ulrich on Mon Jun 1 18:39:49 2020
    Rich Ulrich wrote:

    On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
    wrote:


    I've been following this, too. My own calculations have been
    on scraps of paper, using numbers from Johns Hopkins and
    random news reports.

    I have to say that I don't understand what numbers you
    are computing and what you are assuming.


    Using the numbers above we can compute the daily differences (dC)
    in the number if cases (C). Furthermore we can compute the ratio
    dC/C. What is the meaning of this ratio? It is the probability that
    any one of the C people will infect a new person on that day.

    Complications that I see: The Case does not show up until 5 or 6 days
    after the exposure. In the explosive phase in NY City, 6 days was a quadrupling of the number of cases. Louisiana was even faster.
    The day-to-day increase does indicate the exponential rate, however,
    when you can assume the increase is exponential. (But that is less
    often, now.)


    Rural "exponential increase" is much slower, except when it is a
    big splash of reports either in or near a prison or meatpacking plant.
    The Case in some cities is often counted only when it becomes
    a hospitilization, and (therefore) is no longer in the community
    infecting anyone.


    Call this ratio (p) this just says the expected mumber of infected people is p*C = dC. We can use the formula given above to
    compute the expected standard deviation (s) of the number infected
    by C people: s = sqrt( p * (1-p) * C).

    We can also directly compute the standard deviation (S) of the
    daily new cases. We find, however, that (S) is considerably larger
    than (s). The reason for this discrepancy that the new cases are
    derived from a larger number of actual

    Trump refused to let the CDC set up a proper monitoring system,
    because he is an idiot who still doesn't believe in serious epidemics.
    So, the counting sucks. For cases. For tests. For results of tests
    which can mix virus-results with antigen-results with tests by
    manufacturers whose tests may be neither calibrated nor reliable.

    Day-to-day variation depends on sloppy reporting practices, and
    is contaminated by sloppy standards. And further contaminated
    by political concerns, which may-or-may-not intentionally count
    the cases at a prison or meatpacking plant or intentionally avoid
    counting them. And not all nursing homes are able to get the
    test kits that they want.

    infected people than the reported number. In the case of the US
    data as a whole the number of infected people is over ten times the reported number.

    " ... ten times the reported number" is a guess, based on
    incomplete and insufficient surveys of antigen levels. That is a
    very important thing to know. Sweden recently reduced their
    claim (made a month ago) about subjects in Stockholm having
    antigens from 26% to 7%. I am hoping for better surveys.

    I read the CDC report (online) with 5 scenarios for the antigen
    levels, etc. CDC used data from April, and offers poor
    documentation. About 5 out of 6 epidemiologists (I gather) think
    that the CDC's highest mortality-rate estimate is more like a minimum.

    The data show a pronounced weekly variation such as might happen if
    new cases are only reported on Monday. Careful measures must be
    taken to eliminate this source of variation.

    Day to day variation is one problem among many.


    Here are my findings to-date:

    I have not tried to follow any of the above. But, anyone with a
    statistical interest in this epidemic should probably know about the
    following list of articles associtaed with the Significance magazine:

    https://www.significancemagazine.com/business/647

    The list features the UK rather heavily and relates to the RSS Covid-19
    Task Force, which is outlined here:

    https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

    and:

    https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/

    There is some overlap in these lists.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to David Jones on Mon Jun 1 12:56:02 2020
    On Monday, June 1, 2020 at 2:39:54 PM UTC-4, David Jones wrote:
    --- snip ---

    I have not tried to follow any of the above. But, anyone with a
    statistical interest in this epidemic should probably know about the following list of articles associtaed with the Significance magazine:

    https://www.significancemagazine.com/business/647

    The list features the UK rather heavily and relates to the RSS Covid-19
    Task Force, which is outlined here:

    https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

    and:

    https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/

    There is some overlap in these lists.


    Thanks David. Those links are helpful.


    PS- I see the RSS is still calling that publication "Significance", despite the ASA's (2019) pronouncement that "it is time to stop using the term “statistically significant” entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to Bruce Weaver on Mon Jun 1 20:14:23 2020
    Bruce Weaver <bweaver@lakeheadu.ca> wrote:


    Thanks David. Those links are helpful.

    +1



    PS- I see the RSS is still calling that publication "Significance", despite
    the ASA's (2019) pronouncement that "it is time to stop using the term >???statistically significant??? entirely" >(https://www.tandfonline.com/toc/utas20/73/sup1). ;-)



    The RSS is jumping into the fray by scheduling a meeting in
    Sept 2021!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to root on Mon Jun 1 21:49:58 2020
    root wrote:

    Bruce Weaver <bweaver@lakeheadu.ca> wrote:


    Thanks David. Those links are helpful.

    +1



    PS- I see the RSS is still calling that publication "Significance",
    despite the ASA's (2019) pronouncement that "it is time to stop
    using the term ???statistically significant??? entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)



    The RSS is jumping into the fray by scheduling a meeting in
    Sept 2021!

    No, it is jumping into the fray by setting up its task force since 09
    April 2020 at least. I am not clear if ASA has anything similar to
    co-ordionate research?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Bruce Weaver on Tue Jun 2 00:17:31 2020
    Bruce Weaver wrote:

    On Monday, June 1, 2020 at 2:39:54 PM UTC-4, David Jones wrote:
    --- snip ---

    I have not tried to follow any of the above. But, anyone with a
    statistical interest in this epidemic should probably know about the following list of articles associtaed with the Significance
    magazine:

    https://www.significancemagazine.com/business/647

    The list features the UK rather heavily and relates to the RSS
    Covid-19 Task Force, which is outlined here:


    https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

    and:

    https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/

    There is some overlap in these lists.


    Thanks David. Those links are helpful.


    PS- I see the RSS is still calling that publication "Significance",
    despite the ASA's (2019) pronouncement that "it is time to stop using
    the term “statistically significant” entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)

    Wel, in fact it is a joint RSS and ASA publication.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to David Jones on Tue Jun 2 07:04:46 2020
    On Monday, June 1, 2020 at 8:17:41 PM UTC-4, David Jones wrote:
    Bruce Weaver wrote:

    PS- I see the RSS is still calling that publication "Significance",
    despite the ASA's (2019) pronouncement that "it is time to stop using
    the term “statistically significant” entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)

    Wel, in fact it is a joint RSS and ASA publication.


    So it is. I never noticed that. From the bottom of the front page:

    "Significance Magazine is published for the Royal Statistical Society and American Statistical Association by John Wiley & Sons Ltd."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to David Jones on Tue Jun 2 18:47:05 2020
    David Jones wrote:

    root wrote:

    Bruce Weaver <bweaver@lakeheadu.ca> wrote:


    Thanks David. Those links are helpful.

    +1



    PS- I see the RSS is still calling that publication
    "Significance", despite the ASA's (2019) pronouncement that "it
    is time to stop using the term ???statistically significant???
    entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)



    The RSS is jumping into the fray by scheduling a meeting in
    Sept 2021!

    No, it is jumping into the fray by setting up its task force since 09
    April 2020 at least. I am not clear if ASA has anything similar to co-ordinate research?

    In fact ASA has this:

    https://magazine.amstat.org/blog/2020/04/29/online-communities-created-for-covid-19-discussion/

    But the discussions can't be seen if one (such as I) is not a member of
    ASA.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Wed Jun 3 16:50:30 2020
    On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    Okay, I will back up and discuss your original post.
    I thought you were leaving out vital steps.
    Now I figure that the whole process is a comparison
    based on variances.



    In the discussion that follows the number of currently infected people will >assume the role of the number of tosses of a coin. Instead of 50/50 odds of >heads, there will be a probability (p) of an infected person infecting someone
    new in the next day. Otherwise we will be using the method described above to >estimate the number of flips ( the number of infected people).

    No. Your coin-flipping example had nothing to do with
    variances.



    Using the numbers above we can compute the daily differences (dC) in the number
    if cases (C). Furthermore we can compute the ratio dC/C. What is the meaning of
    this ratio? It is the probability that any one of the C people will infect a new
    person on that day. Call this ratio (p) this just says the expected mumber of >infected people is p*C = dC. We can use the formula given above to compute the >expected standard deviation (s) of the number infected by C people:
    s = sqrt( p * (1-p) * C).

    We can also directly compute the standard deviation (S) of the daily new cases.
    We find, however, that (S) is considerably larger than (s). The reason for this
    discrepancy that the new cases are derived from a larger number of actual >infected people than the reported number.

    No, you have that backwards. The change has to be an
    increase in p. If you increase C, you would implicitly decrease
    p, which decreases the expected variation. Binomial
    distributions have larger variance near 50%, smaller in
    the tails.


    In the case of the US data as a whole
    the number of infected people is over ten times the reported number.

    Contrarywise, you should conclude something like "the
    effective number of people spreading the disease is
    perhaps a tenth of the number" being pointed to.

    That might be somewhat consistent with the observation
    that 10% of the cases are "super-spreaders" who account
    for 80% of all new infections. ("Super" must be a function
    of how much virus they shed and how many people they
    breathe or cough on.)


    Taking another tack - for small proportions, like your
    dC over C, the Poisson distribution is neater than the
    binomial. The variance of a Poisson observation is equal
    to the observation. Note, too, that it says /nothing/
    about the total N that may be generating the sample.

    Your proper conclusion is a Goodness of Fit conclusion:
    the distribution at hand has too much variation to be Poisson.
    "N of infected" does not offer an explanation.

    The reason that would-be Poisson observations fail to
    be Poisson - by having too much variance, as in these
    data - is (in my experience) that they are not independent.

    That is: I've occasionally done this to augment an eyeball
    check of a possible Poisson: divide the observed counts
    by 2 or 5 or 10 to get a smaller "effective N". I decide that
    there is non-independence if the resulting counts now
    match the expected spread of a Poisson. (This notion of
    "effective N" is something I picked up from some British
    authors, years ago. I haven't seen it widely used by
    US biostatisticians.)


    Non-independence could be accounted for by various things.
    I mentioned (1st Reply) reporting errors, and the problem
    of modeling the surge of cases from nursing homes,
    prisons, etc., complicated by "politics" that interfere
    with accurate reporting. I think you can't pretend that
    those problems don't exist and affect the data.

    "Superspreaders" and super-spreading events also yield
    non-independent cases.


    If I have missed what you are doing, please let me know.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to Rich Ulrich on Thu Jun 4 02:27:31 2020
    Rich Ulrich <rich.ulrich@comcast.net> wrote:
    On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    Okay, I will back up and discuss your original post.
    I thought you were leaving out vital steps.
    Now I figure that the whole process is a comparison
    based on variances.

    Thanks for responding Rich, and thanks for investing your
    time.






    No. Your coin-flipping example had nothing to do with
    variances.


    I don't see how you can say that. Certainly there is a difference in that
    coins are usually fair, but the same equations apply. The limiting form
    of the binomial distribution is normal with a variance as I specified.




    No, you have that backwards. The change has to be an
    increase in p. If you increase C, you would implicitly decrease
    p, which decreases the expected variation. Binomial
    distributions have larger variance near 50%, smaller in
    the tails.

    True, and that is exactly my point. As the number of "trials"
    increases the proportion of cases approaches the expected
    number, and the variance around that expected number grows
    as the sqrt of the number of trials.




    Contrarywise, you should conclude something like "the
    effective number of people spreading the disease is
    perhaps a tenth of the number" being pointed to.

    I can't see how you say this.Consider two cases, one with 100,000 people and another with a million people but in each case each person as a 1% chance of infecting a new person in the next day. In the case of 100,000 infected people we would expect 1,000 new cases with a SD of 31 cases. With a million infected we would expect 10,000 new cases with a SD of 100 cases. In the case of
    100,000 people the SD represents about 4% of the cases. For a million
    people the SD represents about 1% of the cases.

    The point I wish to make here is that with these large numbers of
    infected people we should expect smaller variance that we see.





    That might be somewhat consistent with the observation
    that 10% of the cases are "super-spreaders" who account
    for 80% of all new infections. ("Super" must be a function
    of how much virus they shed and how many people they
    breathe or cough on.)

    I would expect that there is a variation among the population
    in terms of their contagion. I would also expect that such
    a distribuion would scale with the population. A million
    infected people should have 100 times as many super-spreaders
    as a population of 100,000.



    Taking another tack - for small proportions, like your
    dC over C, the Poisson distribution is neater than the
    binomial. The variance of a Poisson observation is equal
    to the observation. Note, too, that it says /nothing/
    about the total N that may be generating the sample.

    In fact the limit of a Poisson is also normal. The Poisson
    represents how radioactive atoms behave. The number of
    atoms (N) certainly affects the number of clicks on
    the counter. The early stages of the contagion is represented
    by a Poisson. However during that phase, when the Poisson
    dominates, the infection grows linearly with time



    Your proper conclusion is a Goodness of Fit conclusion:
    the distribution at hand has too much variation to be Poisson.
    "N of infected" does not offer an explanation.

    The reason that would-be Poisson observations fail to
    be Poisson - by having too much variance, as in these
    data - is (in my experience) that they are not independent.

    That is: I've occasionally done this to augment an eyeball
    check of a possible Poisson: divide the observed counts
    by 2 or 5 or 10 to get a smaller "effective N". I decide that
    there is non-independence if the resulting counts now
    match the expected spread of a Poisson. (This notion of
    "effective N" is something I picked up from some British
    authors, years ago. I haven't seen it widely used by
    US biostatisticians.)


    Non-independence could be accounted for by various things.
    I mentioned (1st Reply) reporting errors, and the problem
    of modeling the surge of cases from nursing homes,
    prisons, etc., complicated by "politics" that interfere
    with accurate reporting. I think you can't pretend that
    those problems don't exist and affect the data.

    I don't contend that these factors don't exist. (Sorry
    for the awkward satement). I do maintain that whatever
    the distribution it should scale with the population.

    "Superspreaders" and super-spreading events also yield
    non-independent cases.

    See above,


    If I have missed what you are doing, please let me know.

    Rich I greatly appreciate the effort you have put in to
    thinking and commenting on the subject.

    Your first post asked what assumptions I made which I did not reveal in my first
    post. I have been thinking about that. Let's imagine a large number of idential infections spreading over the US (or any country) with each realization differing only in the variations caused by chance. Strictly speaking I am assuming that the lateral variation across realizations is similar to the longitudinal variation im time over one case. This is sort of an ergodic assumption. It amounts to an assumption that the day to day variation is slow enough to justify my approach.

    I don't know what facilities you have to look at the data. What I would
    like you, and everyone reading this, to do is pretty simple. Stick
    with the us.csv data from GitHub. Read in the data and perform
    a 7 day moving average on the number of cases. As you noted above,
    reporting issues create a prononounced 7 day variation. The moving
    average suppresses the variation as well as variation due to
    any other causes.

    Compute first order differences of the smoothed data and follow
    the steps I have outlined. Comparison the variance of the
    smoothed data is far larger than I have indicated. See for
    yourselves.

    Thanks again Rich.





    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Thu Jun 4 14:17:25 2020
    On Thu, 4 Jun 2020 02:27:31 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    Rich Ulrich <rich.ulrich@comcast.net> wrote:
    On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
    wrote:


    No. Your coin-flipping example had nothing to do with
    variances.


    I don't see how you can say that. Certainly there is a difference in that >coins are usually fair, but the same equations apply. The limiting form
    of the binomial distribution is normal with a variance as I specified.

    Okay. Let me revise that. Your coin-flipping had nothing
    to do with estimating total N from variances.


    ...

    Contrarywise, you should conclude something like "the
    effective number of people spreading the disease is
    perhaps a tenth of the number" being pointed to.

    I'm not particulaly happy with my argument from the binomial.
    I think I was implying that there are unstated premises, and
    my logic, though poor, was as good as yours.

    I will stand by my argument from the Poisson.


    I can't see how you say this.Consider two cases, one with 100,000 people and >another with a million people but in each case each person as a 1% chance of >infecting a new person in the next day. In the case of 100,000 infected people >we would expect 1,000 new cases with a SD of 31 cases. With a million infected >we would expect 10,000 new cases with a SD of 100 cases.

    In the real world, this estimate of "within group" variance is
    widely (though not widely enough) recognized as an underestimate
    of the true variance - for most sampling situations. The explanation
    is that there are unrecognized dependencies between cases.


    In the case of
    100,000 people the SD represents about 4% of the cases. For a million
    people the SD represents about 1% of the cases.

    The point I wish to make here is that with these large numbers of
    infected people we should expect smaller variance that we see.

    But. You certainly are ignoring the "compound distribution"
    that exists. You have to account for the fact that your mean of
    cases-observed is 1000, not 10,000.

    Not only is there a p1 of infection, there is a p2 for the infected
    case being recognized. Compound distributions have long tails,
    thus, larger variances.

    I'm a little bit curious if you could build a model that way.
    It seems like there are too many unknowns parameters.



    Taking another tack - for small proportions, like your
    dC over C, the Poisson distribution is neater than the
    binomial. The variance of a Poisson observation is equal
    to the observation. Note, too, that it says /nothing/
    about the total N that may be generating the sample.

    In fact the limit of a Poisson is also normal. The Poisson
    represents how radioactive atoms behave. The number of
    atoms (N) certainly affects the number of clicks on
    the counter. The early stages of the contagion is represented
    by a Poisson. However during that phase, when the Poisson
    dominates, the infection grows linearly with time

    No, and No.

    You are constructing a model where the p is small, always.
    Therefore, the Poisson is appropriate, not the binomial
    and not the normal.

    The Poisson with small p gives a convenient expression for
    the variance, which is NOT defined by either the p or the N,
    and the observed cases (alone) can't un-confound them.


    ...

    The reason that would-be Poisson observations fail to
    be Poisson - by having too much variance, as in these
    data - is (in my experience) that they are not independent.

    ...

    Non-independence could be accounted for by various things.

    ...


    Compute first order differences of the smoothed data and follow
    the steps I have outlined. Comparison the variance of the
    smoothed data is far larger than I have indicated. See for
    yourselves.

    Oh, I readily agree that the data are not Poisson. Or
    binomial, if you use a larger p.

    I pointed to many sources of dependency.

    I read an article about the public discussion in Germany.
    They made heavy use of "R0", the rate of passing on
    disease from a single case. The public was encouraged
    to help keep R0 low by those things like wearing masks
    and avoiding unnecessary gatherings. It was clear that
    they regarding R0 as a parameter that can be controlled.

    The places or events (super-spreader) with high R0 are
    particularly strong sources of dependency in the counts,
    making the variance much larger for independent events
    - whether modeled as Poisson, binomial or normal.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to Rich Ulrich on Thu Jun 4 19:48:19 2020
    Rich Ulrich <rich.ulrich@comcast.net> wrote:

    I read an article about the public discussion in Germany.
    They made heavy use of "R0", the rate of passing on
    disease from a single case. The public was encouraged
    to help keep R0 low by those things like wearing masks
    and avoiding unnecessary gatherings. It was clear that
    they regarding R0 as a parameter that can be controlled.

    Sure it can be controlled: just ensure that no two people
    are ever closer than 100 feet from each other.

    Regardless of the model, all contagions begin with
    an expoential increase. R0 is the initial slope of the log
    of that increase vs time. However, the very first
    stages of the contagion undoubtedly begin with a
    Poisson process. You can see that in the GitHub datia.



    The places or events (super-spreader) with high R0 are
    particularly strong sources of dependency in the counts,
    making the variance much larger for independent events
    - whether modeled as Poisson, binomial or normal.



    I tried to analyze the possibility for super-spreaders
    but I was unable to come up with anything justifiable.
    It certainly doesn't work if only super-spreaders are
    responsible.

    About Poisson, remember that the distribution was invented
    to analyze the situation where there is an average number
    of events/time. That means that over time the cumulative
    number of cases would be linear regardless of the specific
    sequence of arrivals. Poisson is used in queuing theory
    to determine the optimum number of servers, etc.

    I just read the results of a study of tests in Wisconsin.
    16,000 people were tested and 6.4% of them tested positive
    for SARS-Cov2. Extrapolating that to the 5.82M people
    in Wisconsin that would suggest that 372,480 people in
    Wisconsin carry the antibodies. About 20,000 "cases"
    have been reported in Wisconsin. That is a ratio of
    18.6 (people expected to carry the virus)/(known cases).


    Earlier a Stanford study in Santa Clara county estimated
    that between 2.4% and 4.1% of residents had been infected
    with Cov2. The Stanford study was widely criticized for
    procedural errors.

    Thanks again for your comments.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to root on Fri Jun 5 07:15:15 2020
    root wrote:

    Rich Ulrich <rich.ulrich@comcast.net> wrote:

    I read an article about the public discussion in Germany.
    They made heavy use of "R0", the rate of passing on
    disease from a single case. The public was encouraged
    to help keep R0 low by those things like wearing masks
    and avoiding unnecessary gatherings. It was clear that
    they regarding R0 as a parameter that can be controlled.

    Sure it can be controlled: just ensure that no two people
    are ever closer than 100 feet from each other.

    Regardless of the model, all contagions begin with
    an expoential increase. R0 is the initial slope of the log
    of that increase vs time. However, the very first
    stages of the contagion undoubtedly begin with a
    Poisson process. You can see that in the GitHub datia.



    The places or events (super-spreader) with high R0 are
    particularly strong sources of dependency in the counts,
    making the variance much larger for independent events
    - whether modeled as Poisson, binomial or normal.



    I tried to analyze the possibility for super-spreaders
    but I was unable to come up with anything justifiable.
    It certainly doesn't work if only super-spreaders are
    responsible.

    About Poisson, remember that the distribution was invented
    to analyze the situation where there is an average number
    of events/time. That means that over time the cumulative
    number of cases would be linear regardless of the specific
    sequence of arrivals. Poisson is used in queuing theory
    to determine the optimum number of servers, etc.


    If one goes on from Rich's comments about dependencies possibly
    explaining some of the variation, it may be good to leave off from the Poisson/Binomial questions and do a fairly basic time-series analysis.
    Thus someone could do an autocorrelation anlayis of the counts (or
    square roots of counts if this seems good). This could be extended to
    a cross-correlation analysis between states. And ... if a simple ARMA
    modelling approach were added, one could start from ideas such as, if
    an infected person is infectious but not symptomatic for say 6 days (or whatever the figure is), this might lead to there being a
    moving-average-type component extending over 6 days. If any of this
    showed anything worthwhile, one could then be more ambitious (but
    probably unnecessary) by developing a doubly-stochastic Poisson or
    Binomial model.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Fri Jun 5 03:27:59 2020
    On Thu, 4 Jun 2020 19:48:19 -0000 (UTC), root <NoEMail@home.org>
    wrote:


    I just read the results of a study of tests in Wisconsin.
    16,000 people were tested and 6.4% of them tested positive
    for SARS-Cov2. Extrapolating that to the 5.82M people
    in Wisconsin that would suggest that 372,480 people in
    Wisconsin carry the antibodies. About 20,000 "cases"
    have been reported in Wisconsin. That is a ratio of
    18.6 (people expected to carry the virus)/(known cases).

    I've been trying to get a handle on this antibody testing
    business. I just read an original article about surveillance
    for TB, 10,000 people or so, using two different antibody
    tests. This was 2012, comparing their data to a similar
    survey done 8 or 10 years prior.

    People have been testing for TB for over a century.
    I'm a bit appalled that they don't have a good,
    systematic and validated method for estimating
    population rates. But they don't. If you really want
    prevalence, you want the same number of false positives
    as false negatives. I don't think that their method gets
    that, but they never even discuss the question. So far
    as I noticed.

    What they presented used the 10mm cutoff for the
    skin test and a single cutoff for the other test. From
    this, I think they eventually reported all three combos
    for rates. Only 2.7% were high on both tests. But they
    preferred looking at the other two numbers, which were,
    oh, about 5 and about 6.5. I think I have to go back to
    that and save the study.

    Anyway, my own (tentative) conclusion about this TB
    study is that the 2.7% represents more false positives
    than false negatives. So it, their minimum, is too high.

    That gives me less hope than I started with, in regards
    to whether the surveys being done should be believed.



    Earlier a Stanford study in Santa Clara county estimated
    that between 2.4% and 4.1% of residents had been infected
    with Cov2. The Stanford study was widely criticized for
    procedural errors.

    In the bit I recall about the Stanford study, they said they did
    /attempt/ to take into account and calibrate for false positives.
    I don't remember what procedures they were criticized for.

    The Swedish study from the end of April that estimated
    26% coronavirus antibody was been replaced with a
    claim of 7%. And disappointment in Sweden.

    The CDC released estimates last week that gave five models,
    all of which estimated huge population exposures. That
    study, released online, was using data from April, too. It
    was criticised for lacking citations, and for producing those
    rates, outside the usual range, without decent explanation. https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html
    What I like about the reference is it gives some numbers
    for things like "hospitalizations" and "mean days" ....

    The Chinese have done so much testing that they ought to
    have data that would settle some questions. I don't know if
    no one has seen it, or if no one trusts it.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Rich Ulrich on Fri Jun 5 09:31:55 2020
    Rich Ulrich wrote:

    On Thu, 4 Jun 2020 19:48:19 -0000 (UTC), root <NoEMail@home.org>
    wrote:


    I just read the results of a study of tests in Wisconsin.
    16,000 people were tested and 6.4% of them tested positive
    for SARS-Cov2. Extrapolating that to the 5.82M people
    in Wisconsin that would suggest that 372,480 people in
    Wisconsin carry the antibodies. About 20,000 "cases"
    have been reported in Wisconsin. That is a ratio of
    18.6 (people expected to carry the virus)/(known cases).

    I've been trying to get a handle on this antibody testing
    business. I just read an original article about surveillance
    for TB, 10,000 people or so, using two different antibody
    tests. This was 2012, comparing their data to a similar
    survey done 8 or 10 years prior.

    People have been testing for TB for over a century.
    I'm a bit appalled that they don't have a good,
    systematic and validated method for estimating
    population rates. But they don't. If you really want
    prevalence, you want the same number of false positives
    as false negatives. I don't think that their method gets
    that, but they never even discuss the question. So far
    as I noticed.

    What they presented used the 10mm cutoff for the
    skin test and a single cutoff for the other test. From
    this, I think they eventually reported all three combos
    for rates. Only 2.7% were high on both tests. But they
    preferred looking at the other two numbers, which were,
    oh, about 5 and about 6.5. I think I have to go back to
    that and save the study.

    Anyway, my own (tentative) conclusion about this TB
    study is that the 2.7% represents more false positives
    than false negatives. So it, their minimum, is too high.

    That gives me less hope than I started with, in regards
    to whether the surveys being done should be believed.



    Earlier a Stanford study in Santa Clara county estimated
    that between 2.4% and 4.1% of residents had been infected
    with Cov2. The Stanford study was widely criticized for
    procedural errors.

    In the bit I recall about the Stanford study, they said they did
    attempt to take into account and calibrate for false positives.
    I don't remember what procedures they were criticized for.

    The Swedish study from the end of April that estimated
    26% coronavirus antibody was been replaced with a
    claim of 7%. And disappointment in Sweden.

    The CDC released estimates last week that gave five models,
    all of which estimated huge population exposures. That
    study, released online, was using data from April, too. It
    was criticised for lacking citations, and for producing those
    rates, outside the usual range, without decent explanation. https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html
    What I like about the reference is it gives some numbers
    for things like "hospitalizations" and "mean days" ....

    The Chinese have done so much testing that they ought to
    have data that would settle some questions. I don't know if
    no one has seen it, or if no one trusts it.

    The latest results/methodology from survey analysis in the UK are given
    at:

    https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From duncan smith@21:1/5 to Rich Ulrich on Fri Jun 5 16:54:07 2020
    On 05/06/2020 08:27, Rich Ulrich wrote:
    On Thu, 4 Jun 2020 19:48:19 -0000 (UTC), root <NoEMail@home.org>
    wrote:


    I just read the results of a study of tests in Wisconsin.
    16,000 people were tested and 6.4% of them tested positive
    for SARS-Cov2. Extrapolating that to the 5.82M people
    in Wisconsin that would suggest that 372,480 people in
    Wisconsin carry the antibodies. About 20,000 "cases"
    have been reported in Wisconsin. That is a ratio of
    18.6 (people expected to carry the virus)/(known cases).

    I've been trying to get a handle on this antibody testing
    business. I just read an original article about surveillance
    for TB, 10,000 people or so, using two different antibody
    tests. This was 2012, comparing their data to a similar
    survey done 8 or 10 years prior.

    People have been testing for TB for over a century.
    I'm a bit appalled that they don't have a good,
    systematic and validated method for estimating
    population rates. But they don't. If you really want
    prevalence, you want the same number of false positives
    as false negatives. I don't think that their method gets
    that, but they never even discuss the question. So far
    as I noticed.

    What they presented used the 10mm cutoff for the
    skin test and a single cutoff for the other test. From
    this, I think they eventually reported all three combos
    for rates. Only 2.7% were high on both tests. But they
    preferred looking at the other two numbers, which were,
    oh, about 5 and about 6.5. I think I have to go back to
    that and save the study.

    Anyway, my own (tentative) conclusion about this TB
    study is that the 2.7% represents more false positives
    than false negatives. So it, their minimum, is too high.

    That gives me less hope than I started with, in regards
    to whether the surveys being done should be believed.



    Earlier a Stanford study in Santa Clara county estimated
    that between 2.4% and 4.1% of residents had been infected
    with Cov2. The Stanford study was widely criticized for
    procedural errors.

    In the bit I recall about the Stanford study, they said they did
    /attempt/ to take into account and calibrate for false positives.
    I don't remember what procedures they were criticized for.


    [snip]

    AFAICT they got that part right. It's simple enough to estimate the
    population proportion (who would test positive), then perform the simple algebra required to generate a point estimate / CI for the prevalence
    for a given sensitivity and specificity. It *is* possible to generate inadmissible prevalence estimates if the observed data are utterly
    inconsistent with the given sensitivity / specificity, but I ran some simulations and this didn't seem to be a real issue.

    The study used the Delta method for the CI to account for uncertainty in sensitivity / specificity. I didn't work through that, but their CI was
    a bit wider than the naive CI (assuming known sensitivity and
    specificity). So it looked reasonable.

    Their data were not representative of the population, so they introduced weights to generate a number of positive tests that they would have
    expected to have observed if the data had been representative. That's
    where I'm guessing the criticisms lie, but all I've looked at is the
    study itself. Cheers.

    Duncan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to David Jones on Fri Jun 5 16:33:51 2020
    David Jones <dajhawkxx@nowherel.com> wrote:

    The latest results/methodology from survey analysis in the UK are given
    at:

    https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020


    Can I be reading this correctly: outside of hospitals only 0.1% of the population
    in the UK have been infected? Either they have done a wonderful job of social isolation, or Covid19 is a fizzle in the UK. If all those people were then
    to die of the infection the lethality would be no worse than seasonal flu.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to David Jones on Fri Jun 5 16:21:15 2020
    David Jones <dajhawkxx@nowherel.com> wrote:

    If one goes on from Rich's comments about dependencies possibly
    explaining some of the variation, it may be good to leave off from the Poisson/Binomial questions and do a fairly basic time-series analysis.
    Thus someone could do an autocorrelation anlayis of the counts (or
    square roots of counts if this seems good). This could be extended to
    a cross-correlation analysis between states. And ... if a simple ARMA modelling approach were added, one could start from ideas such as, if
    an infected person is infectious but not symptomatic for say 6 days (or whatever the figure is), this might lead to there being a
    moving-average-type component extending over 6 days. If any of this
    showed anything worthwhile, one could then be more ambitious (but
    probably unnecessary) by developing a doubly-stochastic Poisson or
    Binomial model.

    I have aligned the data for the 11 states with the highest infection
    rates:
    NewYork NewJersey Illinois Massachusetts California Pennsylvania Michigan Connecticut Florida Texas Georgia

    After alignment there are only 80 days, or so, of data. There's
    not much time series stuff I can do with that. New York strongly
    dominates and the aforementioned weekly variation in the data
    is pronounced. Smoothing out the weekly variations reduces the
    number of independent data points to a dozen or so.

    When the data are aligned you can see that the states follow their
    own path through the stages of infection. For instance I have looked
    at Illinois together with Wisconsin. It is evident that Wisconsin
    suffered from a spillover from the infection in Illinois. As
    Illinois grows the infection spills across the border.

    I can post the aligned data if anyone is interested. The file
    is about 7K bytes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to root on Fri Jun 5 17:07:18 2020
    root wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    If one goes on from Rich's comments about dependencies possibly
    explaining some of the variation, it may be good to leave off from
    the Poisson/Binomial questions and do a fairly basic time-series
    analysis. Thus someone could do an autocorrelation anlayis of the
    counts (or square roots of counts if this seems good). This could
    be extended to a cross-correlation analysis between states. And ...
    if a simple ARMA modelling approach were added, one could start
    from ideas such as, if an infected person is infectious but not
    symptomatic for say 6 days (or whatever the figure is), this might
    lead to there being a moving-average-type component extending over
    6 days. If any of this showed anything worthwhile, one could then
    be more ambitious (but probably unnecessary) by developing a doubly-stochastic Poisson or Binomial model.

    I have aligned the data for the 11 states with the highest infection
    rates:
    NewYork NewJersey Illinois Massachusetts California Pennsylvania Michigan Connecticut Florida Texas Georgia

    After alignment there are only 80 days, or so, of data. There's
    not much time series stuff I can do with that. New York strongly
    dominates and the aforementioned weekly variation in the data
    is pronounced. Smoothing out the weekly variations reduces the
    number of independent data points to a dozen or so.

    When the data are aligned you can see that the states follow their
    own path through the stages of infection. For instance I have looked
    at Illinois together with Wisconsin. It is evident that Wisconsin
    suffered from a spillover from the infection in Illinois. As
    Illinois grows the infection spills across the border.

    I can post the aligned data if anyone is interested. The file
    is about 7K bytes.

    80 time points is about the size of data-sets used in econometric data,
    weather data etc. for corelation-type analysis and time-series
    modelling.But keep as mucgh data as possible for autocorrelatons.

    I would not suggest starting by smoothing the data, particularly if you
    want to look at short-term variations. As there are marked day-of-week
    effects, you would expecta raw auto-correlation analysis to be
    overwhelmed by this effect. This suggests a need to "detrend" the data
    to remove this effect (and if there are any special holidays), and if
    there are long-terms trends you might well want to remove such trends
    as well.

    This would all be lot of work, and you would need to consider if the
    possible outcomes fit in with what you are interested in. Any temporal correlations in local variations might relate to how long an infected
    person goes on infecting other people.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to root on Fri Jun 5 18:12:51 2020
    root wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    The latest results/methodology from survey analysis in the UK are
    given at:


    https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020


    Can I be reading this correctly: outside of hospitals only 0.1% of
    the population in the UK have been infected? Either they have done a wonderful job of social isolation, or Covid19 is a fizzle in the UK.
    If all those people were then to die of the infection the lethality
    would be no worse than seasonal flu.

    Not "only 0.1% of the population in the UK have been infected",
    instead "only 0.1% of the population in the UK presently have the
    infection." Thus it excludes those who have either recovered after
    showing symptoms, or recovered without showing symptoms. This refers to
    17 May to 30 May 2020. This perentage has gone own a lot, but I don't
    know the peak value.

    In the UK, much of the infections have occured within care-homes and
    within hospitals as opposed to being new cases entering those places
    having been detected as already having the infection. This probably
    relates to the lack of fully effctive personal protection equipment at
    the early and middle stages of the epidemic (and the close contacts in
    those places).

    Deaths so far with confirmed Covid-19 have just passed 40,000 (counting
    only deats in hospital or care homes). Excess deaths compared with what
    would be expected in normal year are around 60,000. These numbers are
    higher than those reported in any other country except the USA, so
    would be judged high I guess. THe problem is that other countries
    report deaths on different bases ... for example, I have read that in
    Germany if someone dies of a heart attack while suffering Covid-19,
    it would be counted only as a heart attack but in the UK it would be
    counted in the Covid-9 totals. Even more, those who survive Covid-19
    having had the extreme version of symptoms will have had a very extreme experience.

    Total Death rates in UK and USA are presently 600 and 330 per million, respectively according to https://www.statista.com/statistics/1104709/coronavirus-deaths-worldwide-per-million-inhabitants/
    THe UK value is second highest in the world, but with the above caveat
    about comparability.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Fri Jun 5 14:06:54 2020
    On Fri, 5 Jun 2020 16:33:51 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    The latest results/methodology from survey analysis in the UK are given
    at:

    https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020


    Can I be reading this correctly: outside of hospitals only 0.1% of the population
    in the UK have been infected? Either they have done a wonderful job of social >isolation, or Covid19 is a fizzle in the UK. If all those people were then
    to die of the infection the lethality would be no worse than seasonal flu.

    As I read it, 0.1% have the active disease at any given
    time. The new infections amount to 0.07% per week.

    That implies that the duration of infection is 10 days,
    for these people outside the hospitals. That surprises
    me. If it is six days until symptoms, even the folks with
    symptoms must show them for only a few days. I
    thought the disease was more tenacious.

    I wonder again at "false positivies." They do have a
    section on sensitivity and specificity. I have not yet
    understood their claims for robustness of their reported
    estimates. It does say "85 to 95% sensitive" and "above 95%"
    specific for their test of the virus. Bad self-testing, they
    say, could revise the 0.1% to 0.19% for prevalence.


    Their point estimate is 6.78% for the prevalence of
    anitbodies (Ever had it?), 24 May. (Section 4).
    That is for their particular sample, not weighted to
    be representative. That's a pretty high rate.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Rich Ulrich on Fri Jun 5 18:42:41 2020
    Rich Ulrich wrote:

    On Fri, 5 Jun 2020 16:33:51 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    The latest results/methodology from survey analysis in the UK are
    given >> at:



    https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020


    Can I be reading this correctly: outside of hospitals only 0.1% of
    the population in the UK have been infected? Either they have done
    a wonderful job of social isolation, or Covid19 is a fizzle in the
    UK. If all those people were then to die of the infection the
    lethality would be no worse than seasonal flu.

    As I read it, 0.1% have the active disease at any given
    time. The new infections amount to 0.07% per week.

    That implies that the duration of infection is 10 days,
    for these people outside the hospitals. That surprises
    me. If it is six days until symptoms, even the folks with
    symptoms must show them for only a few days. I
    thought the disease was more tenacious.


    Those who develop bad symptoms will be quickly moved to hospital (no
    cost worries with the NHS) and so are not in the outside population for
    long. Those who don't develop bad symptoms have those lesser symptoms
    (and count as infected) for a relatively short time.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to dajhawkxx@gmail.com on Fri Jun 5 21:42:38 2020
    On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones"
    <dajhawkxx@gmail.com> wrote:



    I have not tried to follow any of the above. But, anyone with a
    statistical interest in this epidemic should probably know about the >following list of articles associtaed with the Significance magazine:

    https://www.significancemagazine.com/business/647

    The list features the UK rather heavily and relates to the RSS Covid-19
    Task Force, which is outlined here:

    https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

    and:

    https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/


    Thanks - I finally got around to checking those, and I've
    read some good articles already.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Fri Jun 5 21:53:28 2020
    On Fri, 5 Jun 2020 17:07:18 +0000 (UTC), "David Jones"
    <dajhawkxx@nowherel.com> wrote:

    root wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    If one goes on from Rich's comments about dependencies possibly
    explaining some of the variation, it may be good to leave off from
    the Poisson/Binomial questions and do a fairly basic time-series
    analysis. Thus someone could do an autocorrelation anlayis of the
    counts (or square roots of counts if this seems good). This could

    ...

    80 time points is about the size of data-sets used in econometric data, >weather data etc. for corelation-type analysis and time-series
    modelling.But keep as mucgh data as possible for autocorrelatons.

    I would not suggest starting by smoothing the data, particularly if you
    want to look at short-term variations. As there are marked day-of-week >effects, you would expecta raw auto-correlation analysis to be
    overwhelmed by this effect. This suggests a need to "detrend" the data
    to remove this effect (and if there are any special holidays), and if
    there are long-terms trends you might well want to remove such trends
    as well.

    I wasn't thinking of the sort of dependency that would
    show up in these data as autocorrelation across days for
    these data.

    You do see dependency n the existence of clusters. I don't think
    you can call those "independent random observations of the
    infections from each single case." You have clusters when you
    have multiple cases from a nursing home, a prison, a factory, or
    after a choir practice or church service.



    This would all be lot of work, and you would need to consider if the
    possible outcomes fit in with what you are interested in. Any temporal >correlations in local variations might relate to how long an infected
    person goes on infecting other people.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Rich Ulrich on Sat Jun 6 07:00:21 2020
    Rich Ulrich wrote:

    On Fri, 5 Jun 2020 17:07:18 +0000 (UTC), "David Jones" <dajhawkxx@nowherel.com> wrote:

    root wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    If one goes on from Rich's comments about dependencies possibly
    explaining some of the variation, it may be good to leave off
    from >> > the Poisson/Binomial questions and do a fairly basic
    time-series >> > analysis. Thus someone could do an autocorrelation
    anlayis of the >> > counts (or square roots of counts if this seems
    good). This could

    ...

    80 time points is about the size of data-sets used in econometric
    data, weather data etc. for corelation-type analysis and time-series modelling.But keep as mucgh data as possible for autocorrelatons.

    I would not suggest starting by smoothing the data, particularly if
    you want to look at short-term variations. As there are marked
    day-of-week effects, you would expecta raw auto-correlation
    analysis to be overwhelmed by this effect. This suggests a need to "detrend" the data to remove this effect (and if there are any
    special holidays), and if there are long-terms trends you might
    well want to remove such trends as well.

    I wasn't thinking of the sort of dependency that would
    show up in these data as autocorrelation across days for
    these data.

    You do see dependency n the existence of clusters. I don't think
    you can call those "independent random observations of the
    infections from each single case." You have clusters when you
    have multiple cases from a nursing home, a prison, a factory, or
    after a choir practice or church service.


    There are two versions of this: one where the situation is such that a
    high number of cases are generated for a a period of several days and a
    second where a high number of cases are recorded on a single day, as in
    the surpise discovery of a bad situation in an unnoticed care home. I
    think either of these could be modelled (as random-in-time occurences
    of such situations) such as to lead to serial correleation in the
    counts. THe aurocorrelations may not be the best way to detect such
    effects, but they are notionally easy to compute for a data-series. On
    a theoretical basis they correspond to short periods of time where the
    rate of occurence is high compared to a background rate.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to David Jones on Sat Jun 6 11:21:39 2020
    David Jones wrote:

    Rich Ulrich wrote:

    On Fri, 5 Jun 2020 16:33:51 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    The latest results/methodology from survey analysis in the UK are
    given >> at:




    https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020


    Can I be reading this correctly: outside of hospitals only 0.1% of
    the population in the UK have been infected? Either they have
    done a wonderful job of social isolation, or Covid19 is a fizzle
    in the UK. If all those people were then to die of the infection
    the lethality would be no worse than seasonal flu.

    As I read it, 0.1% have the active disease at any given
    time. The new infections amount to 0.07% per week.

    That implies that the duration of infection is 10 days,
    for these people outside the hospitals. That surprises
    me. If it is six days until symptoms, even the folks with
    symptoms must show them for only a few days. I
    thought the disease was more tenacious.


    Those who develop bad symptoms will be quickly moved to hospital (no
    cost worries with the NHS) and so are not in the outside population
    for long. Those who don't develop bad symptoms have those lesser
    symptoms (and count as infected) for a relatively short time.

    ... actually the lockdown rules were rather strict (if anyone followed
    them) in that anyone having symptoms (if not needing hospitalisation)
    were meant to self-isolate, even from their own family but still within
    the family home.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Sat Jun 6 21:37:47 2020
    On Sat, 6 Jun 2020 07:00:21 +0000 (UTC), "David Jones"
    <dajhawkxx@nowherel.com> wrote:

    me >
    I wasn't thinking of the sort of dependency that would
    show up in these data as autocorrelation across days for
    these data.

    You do see dependency n the existence of clusters. I don't think
    you can call those "independent random observations of the
    infections from each single case." You have clusters when you
    have multiple cases from a nursing home, a prison, a factory, or
    after a choir practice or church service.


    There are two versions of this: one where the situation is such that a
    high number of cases are generated for a a period of several days and a >second where a high number of cases are recorded on a single day, as in
    the surpise discovery of a bad situation in an unnoticed care home. I
    think either of these could be modelled (as random-in-time occurences
    of such situations) such as to lead to serial correleation in the
    counts. THe aurocorrelations may not be the best way to detect such
    effects, but they are notionally easy to compute for a data-series. On
    a theoretical basis they correspond to short periods of time where the
    rate of occurence is high compared to a background rate.

    Okay, yes, autocorrelation is a statistic you could generate, whatever
    you fiigure to do with those lumps of data.

    I'm satisfied with observing the clearly-non-Poisson variation
    and pointing to the known clusters, etc. -- which /ought to/ be
    studied up close and in detail, at least a few times. Why has that
    not been done? Meatpackers? Nursing homes?

    My inference is that our CDC has been shut out of any leadership
    role in both management and science. Nobody else is in the same
    position, where they could essentially /mandate/ participation.
    That is a shame. So we are left guessing.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Sat Jun 6 21:21:09 2020
    On Fri, 5 Jun 2020 18:12:51 +0000 (UTC), "David Jones"
    <dajhawkxx@nowherel.com> wrote:

    root wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    The latest results/methodology from survey analysis in the UK are
    given at:

    https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020


    Can I be reading this correctly: outside of hospitals only 0.1% of
    the population in the UK have been infected? Either they have done a
    wonderful job of social isolation, or Covid19 is a fizzle in the UK.
    If all those people were then to die of the infection the lethality
    would be no worse than seasonal flu.

    Not "only 0.1% of the population in the UK have been infected",
    instead "only 0.1% of the population in the UK presently have the
    infection." Thus it excludes those who have either recovered after
    showing symptoms, or recovered without showing symptoms. This refers to
    17 May to 30 May 2020. This perentage has gone own a lot, but I don't
    know the peak value.

    In the UK, much of the infections have occured within care-homes and
    within hospitals as opposed to being new cases entering those places
    having been detected as already having the infection. This probably
    relates to the lack of fully effctive personal protection equipment at
    the early and middle stages of the epidemic (and the close contacts in
    those places).

    Deaths so far with confirmed Covid-19 have just passed 40,000 (counting
    only deats in hospital or care homes). Excess deaths compared with what
    would be expected in normal year are around 60,000. These numbers are
    higher than those reported in any other country except the USA, so
    would be judged high I guess. THe problem is that other countries
    report deaths on different bases ... for example, I have read that in
    Germany if someone dies of a heart attack while suffering Covid-19,
    it would be counted only as a heart attack but in the UK it would be
    counted in the Covid-9 totals. Even more, those who survive Covid-19
    having had the extreme version of symptoms will have had a very extreme >experience.

    Total Death rates in UK and USA are presently 600 and 330 per million, >respectively according to >https://www.statista.com/statistics/1104709/coronavirus-deaths-worldwide-per-million-inhabitants/
    THe UK value is second highest in the world, but with the above caveat
    about comparability.

    The graphs that show "excess deaths" show excesses (beyond
    annual trends + Covid-19 reports) for most countries. That's
    despite the lower death rates in some other specific categories.

    Covid-19 reportedly does manifest as heart attacks. Also, a
    large fraction of those on ventilators also need dialysis. I think
    that, whether it is 100% legit or not, counting all those related
    deaths as Covid-19 won't result in an over-count; too many
    cases are missed elsewhere.

    For a couple of weeks, there were reports that 85-90% of
    those on ventilators eventually die. That led to advice to
    put patients on their bellies, and to hold off the ventilators
    for as long as possible. I haven't seen those mentioned lately.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Rich Ulrich on Tue Jun 16 07:55:08 2020
    Rich Ulrich wrote:

    On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones"
    <dajhawkxx@gmail.com> wrote:



    I have not tried to follow any of the above. But, anyone with a
    statistical interest in this epidemic should probably know about the following list of articles associtaed with the Significance
    magazine:

    https://www.significancemagazine.com/business/647

    The list features the UK rather heavily and relates to the RSS
    Covid-19 Task Force, which is outlined here:


    https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

    and:

    https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/


    Thanks - I finally got around to checking those, and I've
    read some good articles already.

    The pages are often updated, so may be worth checking again. For
    example, this is a recent (10 June) BBC radio podcast : https://www.bbc.co.uk/sounds/play/m000jw02
    titled "Antibody tests, early lockdown advice and European deaths" and
    it discusses various problems with the statistics and modelling used.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to David Jones on Tue Jun 16 14:21:41 2020
    David Jones <dajhawkxx@nowherel.com> wrote:
    Rich Ulrich wrote:

    On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones"
    <dajhawkxx@gmail.com> wrote:



    I have not tried to follow any of the above. But, anyone with a
    statistical interest in this epidemic should probably know about the
    following list of articles associtaed with the Significance
    magazine:

    https://www.significancemagazine.com/business/647

    The list features the UK rather heavily and relates to the RSS
    Covid-19 Task Force, which is outlined here:


    https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

    and:

    https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/


    Thanks - I finally got around to checking those, and I've
    read some good articles already.

    The pages are often updated, so may be worth checking again. For
    example, this is a recent (10 June) BBC radio podcast : https://www.bbc.co.uk/sounds/play/m000jw02
    titled "Antibody tests, early lockdown advice and European deaths" and
    it discusses various problems with the statistics and modelling used.


    One of the papers in the RSS publication news URL above is:

    How many people are infected with Covid-19?

    At least one of the references cited in this paper arrive at numbers
    comparable to those I have inferred from the reported data.

    I tried to contact both the RSS and the CDC about my method for estimating
    that number directly from the reported data. I have, as yet, received no response from either group.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to root on Tue Jun 16 15:35:28 2020
    root wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:
    Rich Ulrich wrote:

    On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones"
    <dajhawkxx@gmail.com> wrote:



    I have not tried to follow any of the above. But, anyone with a
    statistical interest in this epidemic should probably know about
    the >> > following list of articles associtaed with the Significance
    magazine:

    https://www.significancemagazine.com/business/647

    The list features the UK rather heavily and relates to the RSS
    Covid-19 Task Force, which is outlined here:



    https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

    and:

    https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/


    Thanks - I finally got around to checking those, and I've
    read some good articles already.

    The pages are often updated, so may be worth checking again. For
    example, this is a recent (10 June) BBC radio podcast : https://www.bbc.co.uk/sounds/play/m000jw02
    titled "Antibody tests, early lockdown advice and European deaths"
    and it discusses various problems with the statistics and modelling
    used.


    One of the papers in the RSS publication news URL above is:

    How many people are infected with Covid-19?

    At least one of the references cited in this paper arrive at numbers comparable to those I have inferred from the reported data.

    I tried to contact both the RSS and the CDC about my method for
    estimating that number directly from the reported data. I have, as
    yet, received no response from either group.

    That article has the following info about the author

    "About the author:
    Tarak Shah is a data scientist at the Human Rights Data Analysis Group
    (HRDAG), where he processes data about violence and fits models in
    order to better understand evidence of human rights abuses."

    ... so you could try contacting HRDAG via their covid webpage : https://hrdag.org/covid19/

    .. or the author's info at
    https://hrdag.org/people/tarak-shah/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to David Jones on Wed Jun 17 14:27:35 2020
    David Jones wrote:

    Rich Ulrich wrote:

    On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones" <dajhawkxx@gmail.com> wrote:



    I have not tried to follow any of the above. But, anyone with a statistical interest in this epidemic should probably know about
    the following list of articles associtaed with the Significance
    magazine:

    https://www.significancemagazine.com/business/647

    The list features the UK rather heavily and relates to the RSS
    Covid-19 Task Force, which is outlined here:



    https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

    and:

    https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/


    Thanks - I finally got around to checking those, and I've
    read some good articles already.

    The pages are often updated, so may be worth checking again. For
    example, this is a recent (10 June) BBC radio podcast : https://www.bbc.co.uk/sounds/play/m000jw02
    titled "Antibody tests, early lockdown advice and European deaths" and
    it discusses various problems with the statistics and modelling used.

    A video has appeared today on YouTube, that is partly related and that discusses what has been going on with the statistics of Covid in the
    UK. It derives from May 20.

    https://www.youtube.com/watch?v=OrRoeQaucF0
    titled: Using data to improve health from the time of the Crimea to the
    time to the coronavirus

    SPEAKER: Prof. Deborah Ashby, President of the Royal Statistical
    Society and Director of the School of Public Health at Imperial College
    London

    In this talk, Prof Deborah Ashby takes us on a journey through the life
    of Florence Nightingale and comments on the aptness celebrating her
    centenary in the first year of the COVID-19 pandemic.

    Other details in the heading on YouTube.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to rich.ulrich@comcast.net on Thu Jul 9 14:36:13 2020
    On Fri, 05 Jun 2020 03:27:59 -0400, Rich Ulrich
    <rich.ulrich@comcast.net> wrote:
    ...

    The Swedish study from the end of April that estimated
    26% coronavirus antibody was been replaced with a
    claim of 7%. And disappointment in Sweden.

    The CDC released estimates last week that gave five models,
    all of which estimated huge population exposures. That
    study, released online, was using data from April, too. It
    was criticised for lacking citations, and for producing those
    rates, outside the usual range, without decent explanation. >https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html
    What I like about the reference is it gives some numbers
    for things like "hospitalizations" and "mean days" ....


    The idea that the "real" infection rate is 10 times the reported
    rate is gaining currency in news reports.

    My impression is that it springs entirely from a comment
    10 days or so ago, to that effect, made by Redfield, the
    Director of CDC. (My current impression of Redfield is low.)

    Redfield made no citations, so he seems to be pointing tack
    to the CDC report that I cited above. Which I still do not
    place any faith in. That study used only date from April
    and earlier, and did not give good citations.

    I just now read some online comments about the Stanford
    study (April) which reported high infection prevalence.
    I think that David Jones mentioned criticism about sample
    selection. I see that they advertised on Facebook ... which
    IMHO is an assured way to get volunteers who expect that
    they may be positive. So that is a distinct bias, noted in
    comments.

    I also read an assertion that the test they used has been
    shown to have 97.5% specificity, instead of 99.5%, with
    the consequence that ALL their "cases" could have been
    false-positives. I don't know if that criticism is valid. All
    comments I read were two months old.

    Looking for other citations to prevalence surveys in
    Google-news, most of what Google showed were articles from
    single, local newspapers, not formal reports of wide distribution.

    The exception was an article from Lancet, which reported
    on a survey of Geneva, tapping a pre-existing survey sample.

    That article, as it happens, DOES support the hypothesis of
    very widespread infection. They estimate about 10% infection
    in their population -- which had about 1% reported cases.
    Like the US, their apparent case-fatality rate (cases vs deaths)
    was 5 or 6% at the time. Adjusting 6% by tenfold yields an overall,
    "true" case fatality rate of around 0.6% -- which is not
    wholly unreasonable. It compares to the outside-of-Wuhan
    data for China's original epidemic.

    What I believed a month ago is little changed. The best
    extrapolated "true infection" rates may be what you get by
    starting with the reported fatality rate, adjusting that for biases
    you can guess, discounting for excess fatalities in care homes,
    and multiplying by 100 or 150 to account for a fatality rate
    between 0.67% and 1%.


    The Chinese have done so much testing that they ought to
    have data that would settle some questions. I don't know if
    no one has seen it, or if no one trusts it.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to Rich Ulrich on Thu Jul 9 21:03:36 2020
    Rich Ulrich <rich.ulrich@comcast.net> wrote:

    What I believed a month ago is little changed. The best
    extrapolated "true infection" rates may be what you get by
    starting with the reported fatality rate, adjusting that for biases
    you can guess, discounting for excess fatalities in care homes,
    and multiplying by 100 or 150 to account for a fatality rate
    between 0.67% and 1%.



    In other words, a guess. The proposed numbers agree pretty
    well with what I derived in an earlier post to which you
    objected.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Fri Jul 10 13:14:31 2020
    On Thu, 9 Jul 2020 21:03:36 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    Rich Ulrich <rich.ulrich@comcast.net> wrote:

    What I believed a month ago is little changed. The best
    extrapolated "true infection" rates may be what you get by
    starting with the reported fatality rate, adjusting that for biases
    you can guess, discounting for excess fatalities in care homes,
    and multiplying by 100 or 150 to account for a fatality rate
    between 0.67% and 1%.



    In other words, a guess. The proposed numbers agree pretty
    well with what I derived in an earlier post to which you
    objected.



    A guess, yup. But an educated guess.

    Applying what I said there -- 90,000 deaths outside of care
    facilities (say) yields 9 to 13.5 million infected, rather than 30
    million.

    I still consider the non-Poisson daily counts as reflecting the
    clumping of cases, owing to artifacts of reporting in the cases
    where it doesn't owe to super-spread events or care homes.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to Rich Ulrich on Fri Jul 10 18:54:08 2020
    Rich Ulrich <rich.ulrich@comcast.net> wrote:
    I still consider the non-Poisson daily counts as reflecting the
    clumping of cases, owing to artifacts of reporting in the cases
    where it doesn't owe to super-spread events or care homes.


    An article in a recent WSJ indicated that Covid testing is
    employing pooling, a method developed in WW2 for syphilis
    testing. As described, pooling involves pooling a number
    of blood samples and testing the pool for antibodies.
    If the pool is clear then all the samples in the pool
    are clear. If the pool is not clear then the samples
    are tested again individually. Another article I read
    said the pools now consist of 5 samples.

    A little math will reveal that a pool of 5 samples
    is optimum (in the sense of minimum tests) for a
    population with a 20% infection rate. This suggests
    that 20% is the rate at which samples are proving positive.

    At 20% infection rate there is not much chance to do
    better than this pooling method. But, if the infection
    rate were much less there is a vastly superior testing
    method which involve sequential pooling.

    As I like to consider coin problems, we have a batch
    of suspect coins for which it is known that the bad coins
    are always lighter than good coins. How should they
    be tested if we only have a balance scale?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Sat Jul 11 17:50:16 2020
    On Fri, 10 Jul 2020 18:54:08 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    Rich Ulrich <rich.ulrich@comcast.net> wrote:
    I still consider the non-Poisson daily counts as reflecting the
    clumping of cases, owing to artifacts of reporting in the cases
    where it doesn't owe to super-spread events or care homes.


    An article in a recent WSJ indicated that Covid testing is
    employing pooling, a method developed in WW2 for syphilis
    testing. As described, pooling involves pooling a number
    of blood samples and testing the pool for antibodies.
    If the pool is clear then all the samples in the pool
    are clear. If the pool is not clear then the samples
    are tested again individually. Another article I read
    said the pools now consist of 5 samples.

    A little math will reveal that a pool of 5 samples
    is optimum (in the sense of minimum tests) for a
    population with a 20% infection rate. This suggests
    that 20% is the rate at which samples are proving positive.

    I think that Fauci mentioned pooling, and used "10".
    One limiting factor - for some tests, anyway -- is how
    much the dilution of the sample affects the sensitivity.

    I read that Abbott's quick test (15 minutes) originally
    allowed for either dry-swab or wet-stored-swab, but they
    changed the instructions when the wet-swabs showed
    lower sensitivity -- which was attributed to dilution.

    That was in the discussion after an outside lab found
    very low sensitivity for that test. Latest instructions:
    "Now, the company says only direct swabs from
    patients should be inserted into the machine." https://khn.org/news/abbott-rapid-test-problems-grow-fda-standards-on-covid-tests-under-fire/

    At 20% infection rate there is not much chance to do
    better than this pooling method. But, if the infection
    rate were much less there is a vastly superior testing
    method which involve sequential pooling.

    As I like to consider coin problems, we have a batch
    of suspect coins for which it is known that the bad coins
    are always lighter than good coins. How should they
    be tested if we only have a balance scale?

    By thirds. If two thirds balance, the other third is off.
    That's the trick for a single bad coin. I expect it
    generalizes, but correcting multiple errors does get
    trickier.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Rich Ulrich on Tue Jul 14 06:15:20 2020
    Rich Ulrich wrote:

    On Fri, 10 Jul 2020 18:54:08 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    Rich Ulrich <rich.ulrich@comcast.net> wrote:
    I still consider the non-Poisson daily counts as reflecting the
    clumping of cases, owing to artifacts of reporting in the cases
    where it doesn't owe to super-spread events or care homes.


    An article in a recent WSJ indicated that Covid testing is
    employing pooling, a method developed in WW2 for syphilis
    testing. As described, pooling involves pooling a number
    of blood samples and testing the pool for antibodies.
    If the pool is clear then all the samples in the pool
    are clear. If the pool is not clear then the samples
    are tested again individually. Another article I read
    said the pools now consist of 5 samples.

    A little math will reveal that a pool of 5 samples
    is optimum (in the sense of minimum tests) for a
    population with a 20% infection rate. This suggests
    that 20% is the rate at which samples are proving positive.

    I think that Fauci mentioned pooling, and used "10".
    One limiting factor - for some tests, anyway -- is how
    much the dilution of the sample affects the sensitivity.

    I read that Abbott's quick test (15 minutes) originally
    allowed for either dry-swab or wet-stored-swab, but they
    changed the instructions when the wet-swabs showed
    lower sensitivity -- which was attributed to dilution.

    That was in the discussion after an outside lab found
    very low sensitivity for that test. Latest instructions:
    "Now, the company says only direct swabs from
    patients should be inserted into the machine."

    https://khn.org/news/abbott-rapid-test-problems-grow-fda-standards-on-covid-tests-under-fire/

    At 20% infection rate there is not much chance to do
    better than this pooling method. But, if the infection
    rate were much less there is a vastly superior testing
    method which involve sequential pooling.

    As I like to consider coin problems, we have a batch
    of suspect coins for which it is known that the bad coins
    are always lighter than good coins. How should they
    be tested if we only have a balance scale?

    By thirds. If two thirds balance, the other third is off.
    That's the trick for a single bad coin. I expect it
    generalizes, but correcting multiple errors does get
    trickier.

    On the topic of testing by pooling, there is some relevant discussion
    of multidimensional pooling in the following BBC radio podcast,
    starting at at out 10:45 for about 10 minutes: https://www.bbc.co.uk/sounds/play/w3cszh0k

    This should be accessible worldwide. Blurb says:

    "African scientists have developed a reliable, quick and cheap testing
    method which could be used by worldwide as the basis for mass testing programmes.

    The method, which produces highly accurate results, is built around mathematical algorithms developed at the African Institute for
    Mathematical Sciences in Kigali. We speak to Neil Turok who founded the institute, Leon Mutesa Professor of human genetics on the government coronavirus task force, and Wilfred Ndifon, the mathematical biologist
    who devised the algorithm."

    The idea is to do very few tests to identify rare infected individuals
    among large populations. For multidimensional pooling, each indiudual
    sample is put into several different pools, and those pools which turn
    out to test positive can lead to a quick identification of infected
    candidates from very few actual tests.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to David Jones on Tue Jul 14 08:02:04 2020
    David Jones <dajhawkxx@nowherel.com> wrote:

    The idea is to do very few tests to identify rare infected individuals
    among large populations. For multidimensional pooling, each indiudual
    sample is put into several different pools, and those pools which turn
    out to test positive can lead to a quick identification of infected candidates from very few actual tests.

    The results of sequential pooling when the infection is rare are
    surprising: For a condition which affects .1% of the population,
    the infected people in a population of 1 million can be isolated
    with only a few thousand tests. Certainly under 4.000. Were it
    not for random variations in the binomial distribution it could
    be done in about 1,700 tests.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Tue Jul 14 14:05:14 2020
    On Tue, 14 Jul 2020 06:15:20 +0000 (UTC), "David Jones" <dajhawkxx@nowherel.com> wrote:


    On the topic of testing by pooling, there is some relevant discussion
    of multidimensional pooling in the following BBC radio podcast,
    starting at at out 10:45 for about 10 minutes: >https://www.bbc.co.uk/sounds/play/w3cszh0k

    This should be accessible worldwide. Blurb says:

    "African scientists have developed a reliable, quick and cheap testing
    method which could be used by worldwide as the basis for mass testing >programmes.

    The method, which produces highly accurate results, is built around >mathematical algorithms developed at the African Institute for
    Mathematical Sciences in Kigali. We speak to Neil Turok who founded the >institute, Leon Mutesa Professor of human genetics on the government >coronavirus task force, and Wilfred Ndifon, the mathematical biologist
    who devised the algorithm."

    The idea is to do very few tests to identify rare infected individuals
    among large populations. For multidimensional pooling, each indiudual
    sample is put into several different pools, and those pools which turn
    out to test positive can lead to a quick identification of infected >candidates from very few actual tests.


    I think immediately about the work done (1980s, I think)
    to provide reliable disk drives. Simple checksums can detect
    some read-error in a sector. Algorithms were developed to use
    bit-wise "pooling" (like above) to provide error-correction that detect-and-correct up to some maximum number of errors. Using
    minimum resources for computing. Magnetic media were prone to
    developing errors.

    I have no idea whether that technology is still in use, or how
    much of it is in use. I've seen no talk of faulty disk drives in
    years. The need to re-load some .exe is also pretty rare, and
    seems to be assumed to be the fault of bad program execution
    (or virus) instead of memory-rot.


    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Sun Aug 9 12:47:16 2020
    On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    The idea is to do very few tests to identify rare infected individuals
    among large populations. For multidimensional pooling, each indiudual
    sample is put into several different pools, and those pools which turn
    out to test positive can lead to a quick identification of infected
    candidates from very few actual tests.

    The results of sequential pooling when the infection is rare are
    surprising: For a condition which affects .1% of the population,
    the infected people in a population of 1 million can be isolated
    with only a few thousand tests. Certainly under 4.000. Were it
    not for random variations in the binomial distribution it could
    be done in about 1,700 tests.

    I am a bit surprised that the whole topic of "pooling" for coronavirus
    testing made a tiny splash and disappeared, in the media that I
    read and view.

    On the other hand, the /need/ for pooling seems ever more
    apparent, now that doctors and pundits have started repeating
    (over and over) that a week between test and results is far too
    long. One pundit suggested that "capitalist incentive" would fix the
    delays, if no one had to pay for a test that took more than 48 hours.

    I've heard exactly one interview about pooling which may have
    already started. One university (US) is using its own lab resources,
    for what they are doing now (I think) and what they plan for
    students (too) when they open. Frequent re-tests mean that
    they will need many thousands of test results per day.

    I don't know if "dilution" puts a limit on how many samples
    can be pooled -- the interviewee talked about combining 10
    samples for one lab test. That "10" could have for illustration,
    or based on dilution, or based on expected Positives.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From root@21:1/5 to Rich Ulrich on Sun Aug 9 17:45:45 2020
    Rich Ulrich <rich.ulrich@comcast.net> wrote:

    I don't know if "dilution" puts a limit on how many samples
    can be pooled -- the interviewee talked about combining 10
    samples for one lab test. That "10" could have for illustration,
    or based on dilution, or based on expected Positives.


    The size of the pool depends upon the expected frequency of
    the infection. A pool size of 10 would be too large if
    the expected fraction of infected is 0.5. The optimum size
    of the pool depends upon the proposed schedule of testing:
    what you do if the pool tests positive. Sequential testing
    yields the mimimum number of tests. With the observed frequency
    of infected and the schedule of using individual tests after
    a pool has failed, the optimum pool size is now around 5.

    I have read that dilution does impose a limit on pool size.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Rich Ulrich on Sun Aug 9 23:46:09 2020
    Rich Ulrich wrote:

    On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    The idea is to do very few tests to identify rare infected
    individuals >> among large populations. For multidimensional pooling,
    each indiudual >> sample is put into several different pools, and
    those pools which turn >> out to test positive can lead to a quick identification of infected >> candidates from very few actual tests.

    The results of sequential pooling when the infection is rare are surprising: For a condition which affects .1% of the population,
    the infected people in a population of 1 million can be isolated
    with only a few thousand tests. Certainly under 4.000. Were it
    not for random variations in the binomial distribution it could
    be done in about 1,700 tests.

    I am a bit surprised that the whole topic of "pooling" for coronavirus testing made a tiny splash and disappeared, in the media that I
    read and view.

    On the other hand, the need for pooling seems ever more
    apparent, now that doctors and pundits have started repeating
    (over and over) that a week between test and results is far too
    long. One pundit suggested that "capitalist incentive" would fix the
    delays, if no one had to pay for a test that took more than 48 hours.

    I've heard exactly one interview about pooling which may have
    already started. One university (US) is using its own lab resources,
    for what they are doing now (I think) and what they plan for
    students (too) when they open. Frequent re-tests mean that
    they will need many thousands of test results per day.

    I don't know if "dilution" puts a limit on how many samples
    can be pooled -- the interviewee talked about combining 10
    samples for one lab test. That "10" could have for illustration,
    or based on dilution, or based on expected Positives.

    In the UK, the recently identified need to recompute the Covid
    statistics because of double counting etc. throws some doubt on the
    ability of the bureaucracy to cope with pooled testing (needing records
    of those in each pool).At least some of the poor performance of the
    "test and trace" initiative has been attributed to poor record keeping
    and uncooperative response from testees.

    On the subject of pooling, In the UK there has been a push for research
    on testing of sewage out-falls for early evidence of Covid outbreaks
    ... https://www.bbc.co.uk/news/science-environment-53635692

    On the subject of the time taken for outcomes of Covid tests, the UK
    has news of tests that take 90 minutes for the result ... https://www.imperial.ac.uk/news/201073/90-minute-covid-19-tests-government-orders/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Tue Aug 11 01:11:08 2020
    On Sun, 9 Aug 2020 17:45:45 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    Rich Ulrich <rich.ulrich@comcast.net> wrote:

    I don't know if "dilution" puts a limit on how many samples
    can be pooled -- the interviewee talked about combining 10
    samples for one lab test. That "10" could have for illustration,
    or based on dilution, or based on expected Positives.


    The size of the pool depends upon the expected frequency of
    the infection. A pool size of 10 would be too large if
    the expected fraction of infected is 0.5. The optimum size
    of the pool depends upon the proposed schedule of testing:
    what you do if the pool tests positive. Sequential testing
    yields the mimimum number of tests. With the observed frequency
    of infected and the schedule of using individual tests after
    a pool has failed, the optimum pool size is now around 5.

    I have read that dilution does impose a limit on pool size.

    I can see a potential problem from /relying/ on pooling
    to achieve thousands of tests results, like the university
    I mentioned.

    Since the number to pool depends directly on the Infected
    rate, if the rate of infection doubles, suddenly you have to
    double the number of lab-tests-performed to get the same
    coverage for people-tested.

    That becomes a big number.

    Maybe the standard for the future will be a single lab test,
    performed at home, on the combined sample from all
    all members of the household. Once a week?

    The retail price of tests seems to be fairly high. I think
    the US labs are charging $100 a pop. That British quick-
    test in the article cited by David Jones was (IIRC) less
    than half that -- though, maybe that was just for the kits
    and not for the completed testing.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Tue Aug 11 01:35:57 2020
    On Sun, 9 Aug 2020 23:46:09 +0000 (UTC), "David Jones"
    <dajhawkxx@nowherel.com> wrote:

    Rich Ulrich wrote:

    On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    The idea is to do very few tests to identify rare infected
    individuals >> among large populations. For multidimensional pooling,
    each indiudual >> sample is put into several different pools, and
    those pools which turn >> out to test positive can lead to a quick
    identification of infected >> candidates from very few actual tests.

    The results of sequential pooling when the infection is rare are
    surprising: For a condition which affects .1% of the population,
    the infected people in a population of 1 million can be isolated
    with only a few thousand tests. Certainly under 4.000. Were it
    not for random variations in the binomial distribution it could
    be done in about 1,700 tests.

    I am a bit surprised that the whole topic of "pooling" for coronavirus
    testing made a tiny splash and disappeared, in the media that I
    read and view.

    On the other hand, the need for pooling seems ever more
    apparent, now that doctors and pundits have started repeating
    (over and over) that a week between test and results is far too
    long. One pundit suggested that "capitalist incentive" would fix the
    delays, if no one had to pay for a test that took more than 48 hours.

    I've heard exactly one interview about pooling which may have
    already started. One university (US) is using its own lab resources,
    for what they are doing now (I think) and what they plan for
    students (too) when they open. Frequent re-tests mean that
    they will need many thousands of test results per day.

    I don't know if "dilution" puts a limit on how many samples
    can be pooled -- the interviewee talked about combining 10
    samples for one lab test. That "10" could have for illustration,
    or based on dilution, or based on expected Positives.

    In the UK, the recently identified need to recompute the Covid
    statistics because of double counting etc. throws some doubt on the
    ability of the bureaucracy to cope with pooled testing (needing records
    of those in each pool).At least some of the poor performance of the
    "test and trace" initiative has been attributed to poor record keeping
    and uncooperative response from testees.

    I've read of 35% non-follow-up in some US cities, from lack of
    cooperation. But we also have raging disease, places where there
    are too many to follow, and insufferable delays on getting test
    results back. The media (finally) have started repeating the
    complaints about the delays in testing. The two major companies
    that together process for half the hospitals in the country have
    reported delays of 5 to 7 days for the low-priority tests (not
    in-hospital; not professional sports....).


    On the subject of pooling, In the UK there has been a push for research
    on testing of sewage out-falls for early evidence of Covid outbreaks
    ... https://www.bbc.co.uk/news/science-environment-53635692

    I've seen scattered reports of that. I think a couple of states
    are trying that, but I haven't read of great predictive success.


    On the subject of the time taken for outcomes of Covid tests, the UK
    has news of tests that take 90 minutes for the result ... >https://www.imperial.ac.uk/news/201073/90-minute-covid-19-tests-government-orders/

    Sounds great! I notice that it tests for more than just coronavirus.

    I'd say, "Order a billion" except that before they made that many,
    we can hope for testing that is cheaper and quicker.

    By the way -- The CDC test that went bad this year was a failed
    attempt to piggyback three or four other diagnoses on top of the
    covid-19 test. I read an accusatory article that said that the CDC
    made a similar, lesser error with their Zita test a few years ago.

    For zita, the new test was not a total failure, but it was less
    reliable than advertised (and desired). Like with covid, other
    people started using their own tests when the tests from the
    CDC proved to be unreliable. The director (who was never
    called to account for the bad zita test) also directed the covid
    effort.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Fri Aug 28 15:42:40 2020
    On Sun, 9 Aug 2020 23:46:09 +0000 (UTC), "David Jones"
    <dajhawkxx@nowherel.com> wrote:

    Rich Ulrich wrote:

    On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    The idea is to do very few tests to identify rare infected
    individuals >> among large populations. For multidimensional pooling,
    each indiudual >> sample is put into several different pools, and
    those pools which turn >> out to test positive can lead to a quick
    identification of infected >> candidates from very few actual tests.

    The results of sequential pooling when the infection is rare are
    surprising: For a condition which affects .1% of the population,
    the infected people in a population of 1 million can be isolated
    with only a few thousand tests. Certainly under 4.000. Were it
    not for random variations in the binomial distribution it could
    be done in about 1,700 tests.

    I am a bit surprised that the whole topic of "pooling" for coronavirus
    testing made a tiny splash and disappeared, in the media that I
    read and view.

    On the other hand, the need for pooling seems ever more
    apparent, now that doctors and pundits have started repeating
    (over and over) that a week between test and results is far too
    long. One pundit suggested that "capitalist incentive" would fix the
    delays, if no one had to pay for a test that took more than 48 hours.

    I've heard exactly one interview about pooling which may have
    already started. One university (US) is using its own lab resources,
    for what they are doing now (I think) and what they plan for
    students (too) when they open. Frequent re-tests mean that
    they will need many thousands of test results per day.

    I don't know if "dilution" puts a limit on how many samples
    can be pooled -- the interviewee talked about combining 10
    samples for one lab test. That "10" could have for illustration,
    or based on dilution, or based on expected Positives.

    In the UK, the recently identified need to recompute the Covid
    statistics because of double counting etc. throws some doubt on the
    ability of the bureaucracy to cope with pooled testing (needing records
    of those in each pool).At least some of the poor performance of the
    "test and trace" initiative has been attributed to poor record keeping
    and uncooperative response from testees.

    On the subject of pooling, In the UK there has been a push for research
    on testing of sewage out-falls for early evidence of Covid outbreaks
    ... https://www.bbc.co.uk/news/science-environment-53635692

    Here is a news article on sewage tests. Apparently "dilution" does
    not ruin all the possible tests, because sewage is surely dilute.
    The University of Arizona may have prevented an outbreak -

    https://www.washingtonpost.com/nation/2020/08/28/arizona-coronavirus-wastewater-testing/

    << Researchers around the world have been studying whether wastewater
    testing can effectively catch cases early to prevent covid-19
    clusters. There are programs in Singapore, China, Spain, Canada and
    New Zealand, while in the United States, more than 170 wastewater
    facilities across 37 states are being tested. Earlier this month,
    officials in Britain announced testing at 44 water treatment
    facilities. The Netherlands has been collecting samples at 300 sewage
    treatment plants. >>



    On the subject of the time taken for outcomes of Covid tests, the UK
    has news of tests that take 90 minutes for the result ... >https://www.imperial.ac.uk/news/201073/90-minute-covid-19-tests-government-orders/

    --
    Rich Ulrlch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to rich.ulrich@comcast.net on Tue Sep 1 01:30:38 2020
    On Sun, 09 Aug 2020 12:47:16 -0400, Rich Ulrich
    <rich.ulrich@comcast.net> wrote:

    On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
    wrote:

    David Jones <dajhawkxx@nowherel.com> wrote:

    The idea is to do very few tests to identify rare infected individuals
    among large populations. For multidimensional pooling, each indiudual
    sample is put into several different pools, and those pools which turn
    out to test positive can lead to a quick identification of infected
    candidates from very few actual tests.

    The results of sequential pooling when the infection is rare are >>surprising: For a condition which affects .1% of the population,
    the infected people in a population of 1 million can be isolated
    with only a few thousand tests. Certainly under 4.000. Were it
    not for random variations in the binomial distribution it could
    be done in about 1,700 tests.

    I am a bit surprised that the whole topic of "pooling" for coronavirus >testing made a tiny splash and disappeared, in the media that I
    read and view.

    On the other hand, the /need/ for pooling seems ever more
    apparent, now that doctors and pundits have started repeating
    (over and over) that a week between test and results is far too
    long. One pundit suggested that "capitalist incentive" would fix the
    delays, if no one had to pay for a test that took more than 48 hours.

    I've heard exactly one interview about pooling which may have
    already started. One university (US) is using its own lab resources,
    for what they are doing now (I think) and what they plan for
    students (too) when they open. Frequent re-tests mean that
    they will need many thousands of test results per day.

    I don't know if "dilution" puts a limit on how many samples
    can be pooled -- the interviewee talked about combining 10
    samples for one lab test. That "10" could have for illustration,
    or based on dilution, or based on expected Positives.

    More about pooling -

    https://www.nytimes.com/2020/08/18/health/coronavirus-pool-testing.html

    << Experts disagree, for instance, on the cutoff at which pooling
    stops being useful. The Centers for Disease Control and Prevention’s coronavirus test, which is used by most public health laboratories in
    the United States, stipulates that pooling shouldn’t be used when
    positivity rates exceed 10 percent. But at Mayo Clinic, “we’d have to
    start to question it once prevalence goes above 2 percent, definitely
    above 5 percent,” Dr. Pritt said.

    << And prevalence isn’t the only factor at play. The more individual
    samples grouped, the more efficient the process gets. But at some
    point, pooling’s perks hit an inflection point: A positive specimen
    can only get diluted so much before the coronavirus becomes
    undetectable. That means pooling will miss some people who harbor very
    low amounts of the virus. >>

    Per the article -
    Various folks (US) have received permission to officially use
    pooling, but not all have started. 25, 10, 7 and 5 are all mentioned
    in there as numbers of samples being pooled, in various labs.

    One more "factor in play" that is mentioned is the human-
    intensive part -- measuring out the test materials to be combined,
    and keeping track of what sample is where and what to do
    with the results. One sample, one result: is obviously simpler.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)