• #### Analyzing Covid Data

From root@21:1/5 to All on Sun May 31 23:14:36 2020
I have been looking into the NYT covid data available from:

https://github.com/nytimes/covid-19-data/archive/master.zip

My main focus has been to estimate the ratio

m=(true number of infected)/(reported number of infected)

This ratio alters the reported lethality of the SARS=Cov-2 virus.

Before getting into the details of analyzing the covid data I want to discuss an
analagous problem of coin flipping. Suppose you flip a coin 10 times you would expect, on average, 5 heads and 5 tails. But you know that the number of heads can be anything from 0 to 10 in any set of ten flips. Out of 10 flips 5 is the most likely number. What about 10,000 flips? In that case we would expect 5,000 heads but now the range of likely number of heads is sharply restricted. If you were to repeat many sets of 10,000 flips the number of heads would approach a bell-shaped normal curve with standard deviation of 50. That means it is most likely that the number of heads in 10,000 flips would lie between 4,950 and 5050. The formula for finding this standard deviation is sqrt( p * (1-p) * N ) where N is then number of flips. p is the probability of a head which is 1/2.

We can turn the coin flipping problem around. Suppose I told you that after a session of coin flipping I got, say, 50 heads. What would you guess is the number of times I flipped the coin to get 50 heads? Your best guess would be 100
times, but the actual number on any given trial might be some number near 100.

Now we are ready to discuss the GitHub covid data. For each day from Jan 21, 2020 until today, and for each state or territory, or for each county in the US the data include the cumulative total of reported cases and reported deaths. An example few lines for the US would be:

2020-05-17 1493350 89568
2020-05-18 1515177 90414
2020-05-19 1536129 91934

In the discussion that follows the number of currently infected people will assume the role of the number of tosses of a coin. Instead of 50/50 odds of heads, there will be a probability (p) of an infected person infecting someone new in the next day. Otherwise we will be using the method described above to estimate the number of flips ( the number of infected people).

Using the numbers above we can compute the daily differences (dC) in the number
if cases (C). Furthermore we can compute the ratio dC/C. What is the meaning of this ratio? It is the probability that any one of the C people will infect a new
person on that day. Call this ratio (p) this just says the expected mumber of infected people is p*C = dC. We can use the formula given above to compute the expected standard deviation (s) of the number infected by C people:
s = sqrt( p * (1-p) * C).

We can also directly compute the standard deviation (S) of the daily new cases. We find, however, that (S) is considerably larger than (s). The reason for this discrepancy that the new cases are derived from a larger number of actual infected people than the reported number. In the case of the US data as a whole the number of infected people is over ten times the reported number.

The data show a pronounced weekly variation such as might happen if new cases are only reported on Monday. Careful measures must be taken to eliminate this source of variation.

Here are my findings to-date:

True Reported Factor Lethality
Alabama 81732.8 16530 4.94452 0.723088
Alaska 587.342 430 1.36591 1.36207
Arizona 86081.5 17763 4.84611 0.995568
Arkansas 43757.2 6538 6.69275 0.285667
California 828019 104071 7.95629 0.488153
Colorado 127288 25116 5.06799 1.11715
Connecticut 475436 41559 11.44 0.804735
Delaware 52705.4 9171 5.74696 0.654582 DistrictofColumbia 31440.9 8492 3.70242 1.4408
Florida 402480 53277 7.55449 0.587109
Georgia 445281 43363 10.2687 0.43613
Guam 9154.68 1141 8.02339 0.0655402
Hawaii 1044.66 637 1.63997 1.62733
Idaho 11638.4 2770 4.20157 0.704567
Illinois 1.1155e+06 116128 9.60574 0.468402
Indiana 164705 33885 4.86071 1.25558
Iowa 116177 18672 6.22198 0.438986
Kansas 59938.9 9512 6.30139 0.348689
Kentucky 59169.7 9510 6.22184 0.716583
Louisiana 553734 38907 14.2323 0.494822
Maine 4922.62 2189 2.2488 1.70641
Maryland 385367 50334 7.6562 0.630048
Massachusetts 1.10488e+06 94895 11.6432 0.600971
Michigan 472088 55944 8.43858 1.13792
Minnesota 102622 22957 4.47019 0.952035
Mississippi 61517.4 14372 4.28036 1.12651
Missouri 58292.4 12815 4.54876 1.22658
Montana 801.297 485 1.65216 2.12156
Nebraska 79160.9 13261 5.96945 0.214753
Nevada 30918.9 8247 3.74911 1.32605
NewHampshire 11029.1 4389 2.5129 2.10352
NewJersey 1.76773e+06 157815 11.2013 0.644953
NewMexico 23145.5 7364 3.14306 1.44737
NewYork 7.05482e+06 371559 18.9871 0.417275
NorthCarolina 163655 25616 6.38879 0.524884
NorthDakota 6564.57 2484 2.64274 0.913997 NorthernMarianaIslands 32.5908 22 1.4814 6.13671
Ohio 203122 33915 5.98914 1.03288
Oklahoma 22629 6270 3.60909 1.44063
Oregon 8513.77 4086 2.08364 1.7736
Pennsylvania 572519 74312 7.70427 0.942675
PuertoRico 25735.7 3486 7.38259 0.509021
RhodeIsland 54769.8 14494 3.77879 1.23608
SouthCarolina 43660.9 10788 4.04718 1.07648
SouthDakota 20844.9 4793 4.34903 0.259056
Tennessee 156846 21763 7.20702 0.224423
Texas 427105 60787 7.02626 0.378829
Utah 24485.9 8953 2.73494 0.432902
Vermont 2213.81 974 2.2729 2.48441
VirginIslands 132.527 69 1.92068 4.52739
Virginia 295112 41401 7.12815 0.453387
Washington 170473 21825 7.81092 0.654648
WestVirginia 5243.41 1935 2.70977 1.41129
Wisconsin 88229.6 17211 5.12635 0.623374
Wyoming 2282.81 876 2.60595 0.657085

US 18.12M 1.73M 10.65 .548

I will respond to any post asking for more details.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Mon Jun 1 13:02:14 2020
On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
wrote:

I've been following this, too. My own calculations have been
on scraps of paper, using numbers from Johns Hopkins and
random news reports.

I have to say that I don't understand what numbers you
are computing and what you are assuming.

Using the numbers above we can compute the daily differences (dC) in the number
if cases (C). Furthermore we can compute the ratio dC/C. What is the meaning of
this ratio? It is the probability that any one of the C people will infect a new
person on that day.

Complications that I see: The Case does not show up until 5 or 6 days
after the exposure. In the explosive phase in NY City, 6 days was a quadrupling of the number of cases. Louisiana was even faster.
The day-to-day increase does indicate the exponential rate, however,
when you can assume the increase is exponential. (But that is less
often, now.)

Rural "exponential increase" is much slower, except when it is a
big splash of reports either in or near a prison or meatpacking plant.
The Case in some cities is often counted only when it becomes
a hospitilization, and (therefore) is no longer in the community
infecting anyone.

Call this ratio (p) this just says the expected mumber of
infected people is p*C = dC. We can use the formula given above to compute the >expected standard deviation (s) of the number infected by C people:
s = sqrt( p * (1-p) * C).

We can also directly compute the standard deviation (S) of the daily new cases.
We find, however, that (S) is considerably larger than (s). The reason for this
discrepancy that the new cases are derived from a larger number of actual

Trump refused to let the CDC set up a proper monitoring system,
because he is an idiot who still doesn't believe in serious epidemics.
So, the counting sucks. For cases. For tests. For results of tests
which can mix virus-results with antigen-results with tests by
manufacturers whose tests may be neither calibrated nor reliable.

Day-to-day variation depends on sloppy reporting practices, and
is contaminated by sloppy standards. And further contaminated
by political concerns, which may-or-may-not intentionally count
the cases at a prison or meatpacking plant or intentionally avoid
counting them. And not all nursing homes are /able/ to get the
test kits that they want.

infected people than the reported number. In the case of the US data as a whole
the number of infected people is over ten times the reported number.

" ... ten times the reported number" is a guess, based on
incomplete and insufficient surveys of antigen levels. That is a
very important thing to know. Sweden recently reduced their
claim (made a month ago) about subjects in Stockholm having
antigens from 26% to 7%. I am hoping for better surveys.

I read the CDC report (online) with 5 scenarios for the antigen
levels, etc. CDC used data from April, and offers poor
documentation. About 5 out of 6 epidemiologists (I gather) think
that the CDC's highest mortality-rate estimate is more like a minimum.

The data show a pronounced weekly variation such as might happen if new cases >are only reported on Monday. Careful measures must be taken to eliminate this >source of variation.

Day to day variation is one problem among many.

Here are my findings to-date:

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From root@21:1/5 to Rich Ulrich on Mon Jun 1 18:20:41 2020
Rich Ulrich <rich.ulrich@comcast.net> wrote:
On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
wrote:

I have to say that I don't understand what numbers you
are computing and what you are assuming.

I suppose that there may be more people infected than are reported
in the data as "cases". I hope to derive that number, or more
exactly, the ratio of (true number)/reported number. My
approach does not assume any model for the contagion.

Complications that I see: The Case does not show up until 5 or 6 days
after the exposure. In the explosive phase in NY City, 6 days was a quadrupling of the number of cases. Louisiana was even faster.
The day-to-day increase does indicate the exponential rate, however,
when you can assume the increase is exponential. (But that is less
often, now.)

I have looked into the lage between infection and reporting. I have
considered lags from one to 50 days. The effect, while not strong,
is to increase the estimated ratio. That is because the day/day
variance decreases as you go back in time. So the ratio of
current variance to earlier variance increases.

Nothing in my presentation assumed exponential growth although
you may have led to believe that since I utilized
(delta cases)/cases. If you reread what I wrote I only took
that to be a probability (p) which is a tautology from the
equation p = (delta cases)/cases.

Trump refused to let the CDC set up a proper monitoring system,
because he is an idiot who still doesn't believe in serious epidemics.
So, the counting sucks. For cases. For tests. For results of tests
which can mix virus-results with antigen-results with tests by
manufacturers whose tests may be neither calibrated nor reliable.

Day-to-day variation depends on sloppy reporting practices, and
is contaminated by sloppy standards. And further contaminated
by political concerns, which may-or-may-not intentionally count
the cases at a prison or meatpacking plant or intentionally avoid
counting them. And not all nursing homes are /able/ to get the
test kits that they want.

I don't want to engage in any political issues surrounding the
contagion.

infected people than the reported number. In the case of the US data as a whole
the number of infected people is over ten times the reported number.

" ... ten times the reported number" is a guess, based on
incomplete and insufficient surveys of antigen levels. That is a
very important thing to know. Sweden recently reduced their
claim (made a month ago) about subjects in Stockholm having
antigens from 26% to 7%. I am hoping for better surveys.

I derived my figure of 10 from the data, not from any external
sources. I welcome criticism of the procedure apart from
the factor since only experimental results based upon
widespread (reliable) testing will provide the answer.

Day to day variation is one problem among many.

By "many" do you mean problems in my approach?
If so I would like to hear what you mean. If you
are referring to the data itself we can only use
what we have.

Thanks for your comments.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to Rich Ulrich on Mon Jun 1 18:39:49 2020
Rich Ulrich wrote:

On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
wrote:

I've been following this, too. My own calculations have been
on scraps of paper, using numbers from Johns Hopkins and
random news reports.

I have to say that I don't understand what numbers you
are computing and what you are assuming.

Using the numbers above we can compute the daily differences (dC)
in the number if cases (C). Furthermore we can compute the ratio
dC/C. What is the meaning of this ratio? It is the probability that
any one of the C people will infect a new person on that day.

Complications that I see: The Case does not show up until 5 or 6 days
after the exposure. In the explosive phase in NY City, 6 days was a quadrupling of the number of cases. Louisiana was even faster.
The day-to-day increase does indicate the exponential rate, however,
when you can assume the increase is exponential. (But that is less
often, now.)

Rural "exponential increase" is much slower, except when it is a
big splash of reports either in or near a prison or meatpacking plant.
The Case in some cities is often counted only when it becomes
a hospitilization, and (therefore) is no longer in the community
infecting anyone.

Call this ratio (p) this just says the expected mumber of infected people is p*C = dC. We can use the formula given above to
compute the expected standard deviation (s) of the number infected
by C people: s = sqrt( p * (1-p) * C).

We can also directly compute the standard deviation (S) of the
daily new cases. We find, however, that (S) is considerably larger
than (s). The reason for this discrepancy that the new cases are
derived from a larger number of actual

Trump refused to let the CDC set up a proper monitoring system,
because he is an idiot who still doesn't believe in serious epidemics.
So, the counting sucks. For cases. For tests. For results of tests
which can mix virus-results with antigen-results with tests by
manufacturers whose tests may be neither calibrated nor reliable.

Day-to-day variation depends on sloppy reporting practices, and
is contaminated by sloppy standards. And further contaminated
by political concerns, which may-or-may-not intentionally count
the cases at a prison or meatpacking plant or intentionally avoid
counting them. And not all nursing homes are able to get the
test kits that they want.

infected people than the reported number. In the case of the US
data as a whole the number of infected people is over ten times the reported number.

" ... ten times the reported number" is a guess, based on
incomplete and insufficient surveys of antigen levels. That is a
very important thing to know. Sweden recently reduced their
claim (made a month ago) about subjects in Stockholm having
antigens from 26% to 7%. I am hoping for better surveys.

I read the CDC report (online) with 5 scenarios for the antigen
levels, etc. CDC used data from April, and offers poor
documentation. About 5 out of 6 epidemiologists (I gather) think
that the CDC's highest mortality-rate estimate is more like a minimum.

The data show a pronounced weekly variation such as might happen if
new cases are only reported on Monday. Careful measures must be
taken to eliminate this source of variation.

Day to day variation is one problem among many.

Here are my findings to-date:

I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about the
following list of articles associtaed with the Significance magazine:

https://www.significancemagazine.com/business/647

The list features the UK rather heavily and relates to the RSS Covid-19
Task Force, which is outlined here:

https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

and:

https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/

There is some overlap in these lists.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Bruce Weaver@21:1/5 to David Jones on Mon Jun 1 12:56:02 2020
On Monday, June 1, 2020 at 2:39:54 PM UTC-4, David Jones wrote:
--- snip ---

I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about the following list of articles associtaed with the Significance magazine:

https://www.significancemagazine.com/business/647

The list features the UK rather heavily and relates to the RSS Covid-19
Task Force, which is outlined here:

https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

and:

https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/

There is some overlap in these lists.

Thanks David. Those links are helpful.

PS- I see the RSS is still calling that publication "Significance", despite the ASA's (2019) pronouncement that "it is time to stop using the term “statistically significant” entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From root@21:1/5 to Bruce Weaver on Mon Jun 1 20:14:23 2020
Bruce Weaver <bweaver@lakeheadu.ca> wrote:

Thanks David. Those links are helpful.

+1

PS- I see the RSS is still calling that publication "Significance", despite
the ASA's (2019) pronouncement that "it is time to stop using the term >???statistically significant??? entirely" >(https://www.tandfonline.com/toc/utas20/73/sup1). ;-)

The RSS is jumping into the fray by scheduling a meeting in
Sept 2021!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to root on Mon Jun 1 21:49:58 2020
root wrote:

Bruce Weaver <bweaver@lakeheadu.ca> wrote:

Thanks David. Those links are helpful.

+1

PS- I see the RSS is still calling that publication "Significance",
despite the ASA's (2019) pronouncement that "it is time to stop
using the term ???statistically significant??? entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)

The RSS is jumping into the fray by scheduling a meeting in
Sept 2021!

No, it is jumping into the fray by setting up its task force since 09
April 2020 at least. I am not clear if ASA has anything similar to
co-ordionate research?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to Bruce Weaver on Tue Jun 2 00:17:31 2020
Bruce Weaver wrote:

On Monday, June 1, 2020 at 2:39:54 PM UTC-4, David Jones wrote:
--- snip ---

I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about the following list of articles associtaed with the Significance
magazine:

https://www.significancemagazine.com/business/647

The list features the UK rather heavily and relates to the RSS
Covid-19 Task Force, which is outlined here:

https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

and:

https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/

There is some overlap in these lists.

Thanks David. Those links are helpful.

PS- I see the RSS is still calling that publication "Significance",
despite the ASA's (2019) pronouncement that "it is time to stop using
the term “statistically significant” entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)

Wel, in fact it is a joint RSS and ASA publication.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Bruce Weaver@21:1/5 to David Jones on Tue Jun 2 07:04:46 2020
On Monday, June 1, 2020 at 8:17:41 PM UTC-4, David Jones wrote:
Bruce Weaver wrote:

PS- I see the RSS is still calling that publication "Significance",
despite the ASA's (2019) pronouncement that "it is time to stop using
the term “statistically significant” entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)

Wel, in fact it is a joint RSS and ASA publication.

So it is. I never noticed that. From the bottom of the front page:

"Significance Magazine is published for the Royal Statistical Society and American Statistical Association by John Wiley & Sons Ltd."

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to David Jones on Tue Jun 2 18:47:05 2020
David Jones wrote:

root wrote:

Bruce Weaver <bweaver@lakeheadu.ca> wrote:

Thanks David. Those links are helpful.

+1

PS- I see the RSS is still calling that publication
"Significance", despite the ASA's (2019) pronouncement that "it
is time to stop using the term ???statistically significant???
entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)

The RSS is jumping into the fray by scheduling a meeting in
Sept 2021!

No, it is jumping into the fray by setting up its task force since 09
April 2020 at least. I am not clear if ASA has anything similar to co-ordinate research?

In fact ASA has this:

https://magazine.amstat.org/blog/2020/04/29/online-communities-created-for-covid-19-discussion/

But the discussions can't be seen if one (such as I) is not a member of
ASA.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Wed Jun 3 16:50:30 2020
On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
wrote:

Okay, I will back up and discuss your original post.
I thought you were leaving out vital steps.
Now I figure that the whole process is a comparison
based on variances.

In the discussion that follows the number of currently infected people will >assume the role of the number of tosses of a coin. Instead of 50/50 odds of >heads, there will be a probability (p) of an infected person infecting someone
new in the next day. Otherwise we will be using the method described above to >estimate the number of flips ( the number of infected people).

No. Your coin-flipping example had nothing to do with
variances.

Using the numbers above we can compute the daily differences (dC) in the number
if cases (C). Furthermore we can compute the ratio dC/C. What is the meaning of
this ratio? It is the probability that any one of the C people will infect a new
person on that day. Call this ratio (p) this just says the expected mumber of >infected people is p*C = dC. We can use the formula given above to compute the >expected standard deviation (s) of the number infected by C people:
s = sqrt( p * (1-p) * C).

We can also directly compute the standard deviation (S) of the daily new cases.
We find, however, that (S) is considerably larger than (s). The reason for this
discrepancy that the new cases are derived from a larger number of actual >infected people than the reported number.

No, you have that backwards. The change has to be an
increase in p. If you increase C, you would implicitly decrease
p, which decreases the expected variation. Binomial
distributions have larger variance near 50%, smaller in
the tails.

In the case of the US data as a whole
the number of infected people is over ten times the reported number.

Contrarywise, you should conclude something like "the
effective number of people spreading the disease is
perhaps a tenth of the number" being pointed to.

That might be somewhat consistent with the observation
that 10% of the cases are "super-spreaders" who account
for 80% of all new infections. ("Super" must be a function
of how much virus they shed and how many people they
breathe or cough on.)

Taking another tack - for small proportions, like your
dC over C, the Poisson distribution is neater than the
binomial. The variance of a Poisson observation is equal
to the observation. Note, too, that it says /nothing/
about the total N that may be generating the sample.

Your proper conclusion is a Goodness of Fit conclusion:
the distribution at hand has too much variation to be Poisson.
"N of infected" does not offer an explanation.

The reason that would-be Poisson observations fail to
be Poisson - by having too much variance, as in these
data - is (in my experience) that they are not independent.

That is: I've occasionally done this to augment an eyeball
check of a possible Poisson: divide the observed counts
by 2 or 5 or 10 to get a smaller "effective N". I decide that
there is non-independence if the resulting counts now
match the expected spread of a Poisson. (This notion of
"effective N" is something I picked up from some British
authors, years ago. I haven't seen it widely used by
US biostatisticians.)

Non-independence could be accounted for by various things.
I mentioned (1st Reply) reporting errors, and the problem
of modeling the surge of cases from nursing homes,
prisons, etc., complicated by "politics" that interfere
with accurate reporting. I think you can't pretend that
those problems don't exist and affect the data.

"Superspreaders" and super-spreading events also yield
non-independent cases.

If I have missed what you are doing, please let me know.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From root@21:1/5 to Rich Ulrich on Thu Jun 4 02:27:31 2020
Rich Ulrich <rich.ulrich@comcast.net> wrote:
On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
wrote:

Okay, I will back up and discuss your original post.
I thought you were leaving out vital steps.
Now I figure that the whole process is a comparison
based on variances.

Thanks for responding Rich, and thanks for investing your
time.

No. Your coin-flipping example had nothing to do with
variances.

I don't see how you can say that. Certainly there is a difference in that
coins are usually fair, but the same equations apply. The limiting form
of the binomial distribution is normal with a variance as I specified.

No, you have that backwards. The change has to be an
increase in p. If you increase C, you would implicitly decrease
p, which decreases the expected variation. Binomial
distributions have larger variance near 50%, smaller in
the tails.

True, and that is exactly my point. As the number of "trials"
increases the proportion of cases approaches the expected
number, and the variance around that expected number grows
as the sqrt of the number of trials.

Contrarywise, you should conclude something like "the
effective number of people spreading the disease is
perhaps a tenth of the number" being pointed to.

I can't see how you say this.Consider two cases, one with 100,000 people and another with a million people but in each case each person as a 1% chance of infecting a new person in the next day. In the case of 100,000 infected people we would expect 1,000 new cases with a SD of 31 cases. With a million infected we would expect 10,000 new cases with a SD of 100 cases. In the case of
100,000 people the SD represents about 4% of the cases. For a million
people the SD represents about 1% of the cases.

The point I wish to make here is that with these large numbers of
infected people we should expect smaller variance that we see.

That might be somewhat consistent with the observation
that 10% of the cases are "super-spreaders" who account
for 80% of all new infections. ("Super" must be a function
of how much virus they shed and how many people they
breathe or cough on.)

I would expect that there is a variation among the population
in terms of their contagion. I would also expect that such
a distribuion would scale with the population. A million
infected people should have 100 times as many super-spreaders
as a population of 100,000.

Taking another tack - for small proportions, like your
dC over C, the Poisson distribution is neater than the
binomial. The variance of a Poisson observation is equal
to the observation. Note, too, that it says /nothing/
about the total N that may be generating the sample.

In fact the limit of a Poisson is also normal. The Poisson
represents how radioactive atoms behave. The number of
atoms (N) certainly affects the number of clicks on
the counter. The early stages of the contagion is represented
by a Poisson. However during that phase, when the Poisson
dominates, the infection grows linearly with time

Your proper conclusion is a Goodness of Fit conclusion:
the distribution at hand has too much variation to be Poisson.
"N of infected" does not offer an explanation.

The reason that would-be Poisson observations fail to
be Poisson - by having too much variance, as in these
data - is (in my experience) that they are not independent.

That is: I've occasionally done this to augment an eyeball
check of a possible Poisson: divide the observed counts
by 2 or 5 or 10 to get a smaller "effective N". I decide that
there is non-independence if the resulting counts now
match the expected spread of a Poisson. (This notion of
"effective N" is something I picked up from some British
authors, years ago. I haven't seen it widely used by
US biostatisticians.)

Non-independence could be accounted for by various things.
I mentioned (1st Reply) reporting errors, and the problem
of modeling the surge of cases from nursing homes,
prisons, etc., complicated by "politics" that interfere
with accurate reporting. I think you can't pretend that
those problems don't exist and affect the data.

I don't contend that these factors don't exist. (Sorry
for the awkward satement). I do maintain that whatever
the distribution it should scale with the population.

"Superspreaders" and super-spreading events also yield
non-independent cases.

See above,

If I have missed what you are doing, please let me know.

Rich I greatly appreciate the effort you have put in to
thinking and commenting on the subject.

Your first post asked what assumptions I made which I did not reveal in my first
post. I have been thinking about that. Let's imagine a large number of idential infections spreading over the US (or any country) with each realization differing only in the variations caused by chance. Strictly speaking I am assuming that the lateral variation across realizations is similar to the longitudinal variation im time over one case. This is sort of an ergodic assumption. It amounts to an assumption that the day to day variation is slow enough to justify my approach.

I don't know what facilities you have to look at the data. What I would
like you, and everyone reading this, to do is pretty simple. Stick
with the us.csv data from GitHub. Read in the data and perform
a 7 day moving average on the number of cases. As you noted above,
reporting issues create a prononounced 7 day variation. The moving
average suppresses the variation as well as variation due to
any other causes.

Compute first order differences of the smoothed data and follow
the steps I have outlined. Comparison the variance of the
smoothed data is far larger than I have indicated. See for
yourselves.

Thanks again Rich.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Thu Jun 4 14:17:25 2020
On Thu, 4 Jun 2020 02:27:31 -0000 (UTC), root <NoEMail@home.org>
wrote:

Rich Ulrich <rich.ulrich@comcast.net> wrote:
On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
wrote:

No. Your coin-flipping example had nothing to do with
variances.

I don't see how you can say that. Certainly there is a difference in that >coins are usually fair, but the same equations apply. The limiting form
of the binomial distribution is normal with a variance as I specified.

Okay. Let me revise that. Your coin-flipping had nothing
to do with estimating total N from variances.

...

Contrarywise, you should conclude something like "the
effective number of people spreading the disease is
perhaps a tenth of the number" being pointed to.

I'm not particulaly happy with my argument from the binomial.
I think I was implying that there are unstated premises, and
my logic, though poor, was as good as yours.

I will stand by my argument from the Poisson.

I can't see how you say this.Consider two cases, one with 100,000 people and >another with a million people but in each case each person as a 1% chance of >infecting a new person in the next day. In the case of 100,000 infected people >we would expect 1,000 new cases with a SD of 31 cases. With a million infected >we would expect 10,000 new cases with a SD of 100 cases.

In the real world, this estimate of "within group" variance is
widely (though not widely enough) recognized as an underestimate
of the true variance - for most sampling situations. The explanation
is that there are unrecognized dependencies between cases.

In the case of
100,000 people the SD represents about 4% of the cases. For a million
people the SD represents about 1% of the cases.

The point I wish to make here is that with these large numbers of
infected people we should expect smaller variance that we see.

But. You certainly are ignoring the "compound distribution"
that exists. You have to account for the fact that your mean of
cases-observed is 1000, not 10,000.

Not only is there a p1 of infection, there is a p2 for the infected
case being recognized. Compound distributions have long tails,
thus, larger variances.

I'm a little bit curious if you could build a model that way.
It seems like there are too many unknowns parameters.

Taking another tack - for small proportions, like your
dC over C, the Poisson distribution is neater than the
binomial. The variance of a Poisson observation is equal
to the observation. Note, too, that it says /nothing/
about the total N that may be generating the sample.

In fact the limit of a Poisson is also normal. The Poisson
represents how radioactive atoms behave. The number of
atoms (N) certainly affects the number of clicks on
the counter. The early stages of the contagion is represented
by a Poisson. However during that phase, when the Poisson
dominates, the infection grows linearly with time

No, and No.

You are constructing a model where the p is small, always.
Therefore, the Poisson is appropriate, not the binomial
and not the normal.

The Poisson with small p gives a convenient expression for
the variance, which is NOT defined by either the p or the N,
and the observed cases (alone) can't un-confound them.

...

The reason that would-be Poisson observations fail to
be Poisson - by having too much variance, as in these
data - is (in my experience) that they are not independent.

...

Non-independence could be accounted for by various things.

...

Compute first order differences of the smoothed data and follow
the steps I have outlined. Comparison the variance of the
smoothed data is far larger than I have indicated. See for
yourselves.

Oh, I readily agree that the data are not Poisson. Or
binomial, if you use a larger p.

I pointed to many sources of dependency.

I read an article about the public discussion in Germany.
They made heavy use of "R0", the rate of passing on
disease from a single case. The public was encouraged
to help keep R0 low by those things like wearing masks
and avoiding unnecessary gatherings. It was clear that
they regarding R0 as a parameter that can be controlled.

The places or events (super-spreader) with high R0 are
particularly strong sources of dependency in the counts,
making the variance much larger for independent events
- whether modeled as Poisson, binomial or normal.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From root@21:1/5 to Rich Ulrich on Thu Jun 4 19:48:19 2020
Rich Ulrich <rich.ulrich@comcast.net> wrote:

I read an article about the public discussion in Germany.
They made heavy use of "R0", the rate of passing on
disease from a single case. The public was encouraged
to help keep R0 low by those things like wearing masks
and avoiding unnecessary gatherings. It was clear that
they regarding R0 as a parameter that can be controlled.

Sure it can be controlled: just ensure that no two people
are ever closer than 100 feet from each other.

Regardless of the model, all contagions begin with
an expoential increase. R0 is the initial slope of the log
of that increase vs time. However, the very first
stages of the contagion undoubtedly begin with a
Poisson process. You can see that in the GitHub datia.

The places or events (super-spreader) with high R0 are
particularly strong sources of dependency in the counts,
making the variance much larger for independent events
- whether modeled as Poisson, binomial or normal.

I tried to analyze the possibility for super-spreaders
but I was unable to come up with anything justifiable.
It certainly doesn't work if only super-spreaders are
responsible.

About Poisson, remember that the distribution was invented
to analyze the situation where there is an average number
of events/time. That means that over time the cumulative
number of cases would be linear regardless of the specific
sequence of arrivals. Poisson is used in queuing theory
to determine the optimum number of servers, etc.

I just read the results of a study of tests in Wisconsin.
16,000 people were tested and 6.4% of them tested positive
for SARS-Cov2. Extrapolating that to the 5.82M people
in Wisconsin that would suggest that 372,480 people in
Wisconsin carry the antibodies. About 20,000 "cases"
have been reported in Wisconsin. That is a ratio of
18.6 (people expected to carry the virus)/(known cases).

Earlier a Stanford study in Santa Clara county estimated
that between 2.4% and 4.1% of residents had been infected
with Cov2. The Stanford study was widely criticized for
procedural errors.

Thanks again for your comments.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to root on Fri Jun 5 07:15:15 2020
root wrote:

Rich Ulrich <rich.ulrich@comcast.net> wrote:

I read an article about the public discussion in Germany.
They made heavy use of "R0", the rate of passing on
disease from a single case. The public was encouraged
to help keep R0 low by those things like wearing masks
and avoiding unnecessary gatherings. It was clear that
they regarding R0 as a parameter that can be controlled.

Sure it can be controlled: just ensure that no two people
are ever closer than 100 feet from each other.

Regardless of the model, all contagions begin with
an expoential increase. R0 is the initial slope of the log
of that increase vs time. However, the very first
stages of the contagion undoubtedly begin with a
Poisson process. You can see that in the GitHub datia.

The places or events (super-spreader) with high R0 are
particularly strong sources of dependency in the counts,
making the variance much larger for independent events
- whether modeled as Poisson, binomial or normal.

I tried to analyze the possibility for super-spreaders
but I was unable to come up with anything justifiable.
It certainly doesn't work if only super-spreaders are
responsible.

About Poisson, remember that the distribution was invented
to analyze the situation where there is an average number
of events/time. That means that over time the cumulative
number of cases would be linear regardless of the specific
sequence of arrivals. Poisson is used in queuing theory
to determine the optimum number of servers, etc.

If one goes on from Rich's comments about dependencies possibly
explaining some of the variation, it may be good to leave off from the Poisson/Binomial questions and do a fairly basic time-series analysis.
Thus someone could do an autocorrelation anlayis of the counts (or
square roots of counts if this seems good). This could be extended to
a cross-correlation analysis between states. And ... if a simple ARMA
modelling approach were added, one could start from ideas such as, if
an infected person is infectious but not symptomatic for say 6 days (or whatever the figure is), this might lead to there being a
moving-average-type component extending over 6 days. If any of this
showed anything worthwhile, one could then be more ambitious (but
probably unnecessary) by developing a doubly-stochastic Poisson or
Binomial model.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Fri Jun 5 03:27:59 2020
On Thu, 4 Jun 2020 19:48:19 -0000 (UTC), root <NoEMail@home.org>
wrote:

I just read the results of a study of tests in Wisconsin.
16,000 people were tested and 6.4% of them tested positive
for SARS-Cov2. Extrapolating that to the 5.82M people
in Wisconsin that would suggest that 372,480 people in
Wisconsin carry the antibodies. About 20,000 "cases"
have been reported in Wisconsin. That is a ratio of
18.6 (people expected to carry the virus)/(known cases).

I've been trying to get a handle on this antibody testing
business. I just read an original article about surveillance
for TB, 10,000 people or so, using two different antibody
tests. This was 2012, comparing their data to a similar
survey done 8 or 10 years prior.

People have been testing for TB for over a century.
I'm a bit appalled that they don't have a good,
systematic and validated method for estimating
population rates. But they don't. If you really want
prevalence, you want the same number of false positives
as false negatives. I don't think that their method gets
that, but they never even discuss the question. So far
as I noticed.

What they presented used the 10mm cutoff for the
skin test and a single cutoff for the other test. From
this, I think they eventually reported all three combos
for rates. Only 2.7% were high on both tests. But they
preferred looking at the other two numbers, which were,
oh, about 5 and about 6.5. I think I have to go back to
that and save the study.

Anyway, my own (tentative) conclusion about this TB
study is that the 2.7% represents more false positives
than false negatives. So it, their minimum, is too high.

That gives me less hope than I started with, in regards
to whether the surveys being done should be believed.

Earlier a Stanford study in Santa Clara county estimated
that between 2.4% and 4.1% of residents had been infected
with Cov2. The Stanford study was widely criticized for
procedural errors.

In the bit I recall about the Stanford study, they said they did
/attempt/ to take into account and calibrate for false positives.
I don't remember what procedures they were criticized for.

The Swedish study from the end of April that estimated
26% coronavirus antibody was been replaced with a
claim of 7%. And disappointment in Sweden.

The CDC released estimates last week that gave five models,
all of which estimated huge population exposures. That
study, released online, was using data from April, too. It
was criticised for lacking citations, and for producing those
rates, outside the usual range, without decent explanation. https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html
What I like about the reference is it gives some numbers
for things like "hospitalizations" and "mean days" ....

The Chinese have done so much testing that they ought to
have data that would settle some questions. I don't know if
no one has seen it, or if no one trusts it.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to Rich Ulrich on Fri Jun 5 09:31:55 2020
Rich Ulrich wrote:

On Thu, 4 Jun 2020 19:48:19 -0000 (UTC), root <NoEMail@home.org>
wrote:

I just read the results of a study of tests in Wisconsin.
16,000 people were tested and 6.4% of them tested positive
for SARS-Cov2. Extrapolating that to the 5.82M people
in Wisconsin that would suggest that 372,480 people in
Wisconsin carry the antibodies. About 20,000 "cases"
have been reported in Wisconsin. That is a ratio of
18.6 (people expected to carry the virus)/(known cases).

I've been trying to get a handle on this antibody testing
business. I just read an original article about surveillance
for TB, 10,000 people or so, using two different antibody
tests. This was 2012, comparing their data to a similar
survey done 8 or 10 years prior.

People have been testing for TB for over a century.
I'm a bit appalled that they don't have a good,
systematic and validated method for estimating
population rates. But they don't. If you really want
prevalence, you want the same number of false positives
as false negatives. I don't think that their method gets
that, but they never even discuss the question. So far
as I noticed.

What they presented used the 10mm cutoff for the
skin test and a single cutoff for the other test. From
this, I think they eventually reported all three combos
for rates. Only 2.7% were high on both tests. But they
preferred looking at the other two numbers, which were,
oh, about 5 and about 6.5. I think I have to go back to
that and save the study.

Anyway, my own (tentative) conclusion about this TB
study is that the 2.7% represents more false positives
than false negatives. So it, their minimum, is too high.

That gives me less hope than I started with, in regards
to whether the surveys being done should be believed.

Earlier a Stanford study in Santa Clara county estimated
that between 2.4% and 4.1% of residents had been infected
with Cov2. The Stanford study was widely criticized for
procedural errors.

In the bit I recall about the Stanford study, they said they did
attempt to take into account and calibrate for false positives.
I don't remember what procedures they were criticized for.

The Swedish study from the end of April that estimated
26% coronavirus antibody was been replaced with a
claim of 7%. And disappointment in Sweden.

The CDC released estimates last week that gave five models,
all of which estimated huge population exposures. That
study, released online, was using data from April, too. It
was criticised for lacking citations, and for producing those
rates, outside the usual range, without decent explanation. https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html
What I like about the reference is it gives some numbers
for things like "hospitalizations" and "mean days" ....

The Chinese have done so much testing that they ought to
have data that would settle some questions. I don't know if
no one has seen it, or if no one trusts it.

The latest results/methodology from survey analysis in the UK are given
at:

https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From duncan smith@21:1/5 to Rich Ulrich on Fri Jun 5 16:54:07 2020
On 05/06/2020 08:27, Rich Ulrich wrote:
On Thu, 4 Jun 2020 19:48:19 -0000 (UTC), root <NoEMail@home.org>
wrote:

I just read the results of a study of tests in Wisconsin.
16,000 people were tested and 6.4% of them tested positive
for SARS-Cov2. Extrapolating that to the 5.82M people
in Wisconsin that would suggest that 372,480 people in
Wisconsin carry the antibodies. About 20,000 "cases"
have been reported in Wisconsin. That is a ratio of
18.6 (people expected to carry the virus)/(known cases).

I've been trying to get a handle on this antibody testing
business. I just read an original article about surveillance
for TB, 10,000 people or so, using two different antibody
tests. This was 2012, comparing their data to a similar
survey done 8 or 10 years prior.

People have been testing for TB for over a century.
I'm a bit appalled that they don't have a good,
systematic and validated method for estimating
population rates. But they don't. If you really want
prevalence, you want the same number of false positives
as false negatives. I don't think that their method gets
that, but they never even discuss the question. So far
as I noticed.

What they presented used the 10mm cutoff for the
skin test and a single cutoff for the other test. From
this, I think they eventually reported all three combos
for rates. Only 2.7% were high on both tests. But they
preferred looking at the other two numbers, which were,
oh, about 5 and about 6.5. I think I have to go back to
that and save the study.

Anyway, my own (tentative) conclusion about this TB
study is that the 2.7% represents more false positives
than false negatives. So it, their minimum, is too high.

That gives me less hope than I started with, in regards
to whether the surveys being done should be believed.

Earlier a Stanford study in Santa Clara county estimated
that between 2.4% and 4.1% of residents had been infected
with Cov2. The Stanford study was widely criticized for
procedural errors.

In the bit I recall about the Stanford study, they said they did
/attempt/ to take into account and calibrate for false positives.
I don't remember what procedures they were criticized for.

[snip]

AFAICT they got that part right. It's simple enough to estimate the
population proportion (who would test positive), then perform the simple algebra required to generate a point estimate / CI for the prevalence
for a given sensitivity and specificity. It *is* possible to generate inadmissible prevalence estimates if the observed data are utterly
inconsistent with the given sensitivity / specificity, but I ran some simulations and this didn't seem to be a real issue.

The study used the Delta method for the CI to account for uncertainty in sensitivity / specificity. I didn't work through that, but their CI was
a bit wider than the naive CI (assuming known sensitivity and
specificity). So it looked reasonable.

Their data were not representative of the population, so they introduced weights to generate a number of positive tests that they would have
expected to have observed if the data had been representative. That's
where I'm guessing the criticisms lie, but all I've looked at is the
study itself. Cheers.

Duncan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From root@21:1/5 to David Jones on Fri Jun 5 16:33:51 2020
David Jones <dajhawkxx@nowherel.com> wrote:

The latest results/methodology from survey analysis in the UK are given
at:

https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020

Can I be reading this correctly: outside of hospitals only 0.1% of the population
in the UK have been infected? Either they have done a wonderful job of social isolation, or Covid19 is a fizzle in the UK. If all those people were then
to die of the infection the lethality would be no worse than seasonal flu.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From root@21:1/5 to David Jones on Fri Jun 5 16:21:15 2020
David Jones <dajhawkxx@nowherel.com> wrote:

If one goes on from Rich's comments about dependencies possibly
explaining some of the variation, it may be good to leave off from the Poisson/Binomial questions and do a fairly basic time-series analysis.
Thus someone could do an autocorrelation anlayis of the counts (or
square roots of counts if this seems good). This could be extended to
a cross-correlation analysis between states. And ... if a simple ARMA modelling approach were added, one could start from ideas such as, if
an infected person is infectious but not symptomatic for say 6 days (or whatever the figure is), this might lead to there being a
moving-average-type component extending over 6 days. If any of this
showed anything worthwhile, one could then be more ambitious (but
probably unnecessary) by developing a doubly-stochastic Poisson or
Binomial model.

I have aligned the data for the 11 states with the highest infection
rates:
NewYork NewJersey Illinois Massachusetts California Pennsylvania Michigan Connecticut Florida Texas Georgia

After alignment there are only 80 days, or so, of data. There's
not much time series stuff I can do with that. New York strongly
dominates and the aforementioned weekly variation in the data
is pronounced. Smoothing out the weekly variations reduces the
number of independent data points to a dozen or so.

When the data are aligned you can see that the states follow their
own path through the stages of infection. For instance I have looked
at Illinois together with Wisconsin. It is evident that Wisconsin
suffered from a spillover from the infection in Illinois. As
Illinois grows the infection spills across the border.

I can post the aligned data if anyone is interested. The file
is about 7K bytes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to root on Fri Jun 5 17:07:18 2020
root wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

If one goes on from Rich's comments about dependencies possibly
explaining some of the variation, it may be good to leave off from
the Poisson/Binomial questions and do a fairly basic time-series
analysis. Thus someone could do an autocorrelation anlayis of the
counts (or square roots of counts if this seems good). This could
be extended to a cross-correlation analysis between states. And ...
if a simple ARMA modelling approach were added, one could start
from ideas such as, if an infected person is infectious but not
symptomatic for say 6 days (or whatever the figure is), this might
lead to there being a moving-average-type component extending over
6 days. If any of this showed anything worthwhile, one could then
be more ambitious (but probably unnecessary) by developing a doubly-stochastic Poisson or Binomial model.

I have aligned the data for the 11 states with the highest infection
rates:
NewYork NewJersey Illinois Massachusetts California Pennsylvania Michigan Connecticut Florida Texas Georgia

After alignment there are only 80 days, or so, of data. There's
not much time series stuff I can do with that. New York strongly
dominates and the aforementioned weekly variation in the data
is pronounced. Smoothing out the weekly variations reduces the
number of independent data points to a dozen or so.

When the data are aligned you can see that the states follow their
own path through the stages of infection. For instance I have looked
at Illinois together with Wisconsin. It is evident that Wisconsin
suffered from a spillover from the infection in Illinois. As
Illinois grows the infection spills across the border.

I can post the aligned data if anyone is interested. The file
is about 7K bytes.

80 time points is about the size of data-sets used in econometric data,
weather data etc. for corelation-type analysis and time-series
modelling.But keep as mucgh data as possible for autocorrelatons.

I would not suggest starting by smoothing the data, particularly if you
want to look at short-term variations. As there are marked day-of-week
effects, you would expecta raw auto-correlation analysis to be
overwhelmed by this effect. This suggests a need to "detrend" the data
to remove this effect (and if there are any special holidays), and if
there are long-terms trends you might well want to remove such trends
as well.

This would all be lot of work, and you would need to consider if the
possible outcomes fit in with what you are interested in. Any temporal correlations in local variations might relate to how long an infected
person goes on infecting other people.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to root on Fri Jun 5 18:12:51 2020
root wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

The latest results/methodology from survey analysis in the UK are
given at:

https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020

Can I be reading this correctly: outside of hospitals only 0.1% of
the population in the UK have been infected? Either they have done a wonderful job of social isolation, or Covid19 is a fizzle in the UK.
If all those people were then to die of the infection the lethality
would be no worse than seasonal flu.

Not "only 0.1% of the population in the UK have been infected",
instead "only 0.1% of the population in the UK presently have the
infection." Thus it excludes those who have either recovered after
showing symptoms, or recovered without showing symptoms. This refers to
17 May to 30 May 2020. This perentage has gone own a lot, but I don't
know the peak value.

In the UK, much of the infections have occured within care-homes and
within hospitals as opposed to being new cases entering those places
having been detected as already having the infection. This probably
relates to the lack of fully effctive personal protection equipment at
the early and middle stages of the epidemic (and the close contacts in
those places).

Deaths so far with confirmed Covid-19 have just passed 40,000 (counting
only deats in hospital or care homes). Excess deaths compared with what
would be expected in normal year are around 60,000. These numbers are
higher than those reported in any other country except the USA, so
would be judged high I guess. THe problem is that other countries
report deaths on different bases ... for example, I have read that in
Germany if someone dies of a heart attack while suffering Covid-19,
it would be counted only as a heart attack but in the UK it would be
counted in the Covid-9 totals. Even more, those who survive Covid-19
having had the extreme version of symptoms will have had a very extreme experience.

Total Death rates in UK and USA are presently 600 and 330 per million, respectively according to https://www.statista.com/statistics/1104709/coronavirus-deaths-worldwide-per-million-inhabitants/
THe UK value is second highest in the world, but with the above caveat
about comparability.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Fri Jun 5 14:06:54 2020
On Fri, 5 Jun 2020 16:33:51 -0000 (UTC), root <NoEMail@home.org>
wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

The latest results/methodology from survey analysis in the UK are given
at:

https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020

Can I be reading this correctly: outside of hospitals only 0.1% of the population
in the UK have been infected? Either they have done a wonderful job of social >isolation, or Covid19 is a fizzle in the UK. If all those people were then
to die of the infection the lethality would be no worse than seasonal flu.

As I read it, 0.1% have the active disease at any given
time. The new infections amount to 0.07% per week.

That implies that the duration of infection is 10 days,
for these people outside the hospitals. That surprises
me. If it is six days until symptoms, even the folks with
symptoms must show them for only a few days. I
thought the disease was more tenacious.

I wonder again at "false positivies." They do have a
section on sensitivity and specificity. I have not yet
understood their claims for robustness of their reported
estimates. It does say "85 to 95% sensitive" and "above 95%"
specific for their test of the virus. Bad self-testing, they
say, could revise the 0.1% to 0.19% for prevalence.

Their point estimate is 6.78% for the prevalence of
anitbodies (Ever had it?), 24 May. (Section 4).
That is for their particular sample, not weighted to
be representative. That's a pretty high rate.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to Rich Ulrich on Fri Jun 5 18:42:41 2020
Rich Ulrich wrote:

On Fri, 5 Jun 2020 16:33:51 -0000 (UTC), root <NoEMail@home.org>
wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

The latest results/methodology from survey analysis in the UK are
given >> at:

https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020

Can I be reading this correctly: outside of hospitals only 0.1% of
the population in the UK have been infected? Either they have done
a wonderful job of social isolation, or Covid19 is a fizzle in the
UK. If all those people were then to die of the infection the
lethality would be no worse than seasonal flu.

As I read it, 0.1% have the active disease at any given
time. The new infections amount to 0.07% per week.

That implies that the duration of infection is 10 days,
for these people outside the hospitals. That surprises
me. If it is six days until symptoms, even the folks with
symptoms must show them for only a few days. I
thought the disease was more tenacious.

Those who develop bad symptoms will be quickly moved to hospital (no
cost worries with the NHS) and so are not in the outside population for
long. Those who don't develop bad symptoms have those lesser symptoms
(and count as infected) for a relatively short time.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to dajhawkxx@gmail.com on Fri Jun 5 21:42:38 2020
On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones"
<dajhawkxx@gmail.com> wrote:

I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about the >following list of articles associtaed with the Significance magazine:

https://www.significancemagazine.com/business/647

The list features the UK rather heavily and relates to the RSS Covid-19
Task Force, which is outlined here:

https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

and:

https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/

Thanks - I finally got around to checking those, and I've
read some good articles already.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Fri Jun 5 21:53:28 2020
On Fri, 5 Jun 2020 17:07:18 +0000 (UTC), "David Jones"
<dajhawkxx@nowherel.com> wrote:

root wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

If one goes on from Rich's comments about dependencies possibly
explaining some of the variation, it may be good to leave off from
the Poisson/Binomial questions and do a fairly basic time-series
analysis. Thus someone could do an autocorrelation anlayis of the
counts (or square roots of counts if this seems good). This could

...

80 time points is about the size of data-sets used in econometric data, >weather data etc. for corelation-type analysis and time-series
modelling.But keep as mucgh data as possible for autocorrelatons.

I would not suggest starting by smoothing the data, particularly if you
want to look at short-term variations. As there are marked day-of-week >effects, you would expecta raw auto-correlation analysis to be
overwhelmed by this effect. This suggests a need to "detrend" the data
to remove this effect (and if there are any special holidays), and if
there are long-terms trends you might well want to remove such trends
as well.

I wasn't thinking of the sort of dependency that would
show up in these data as autocorrelation across days for
these data.

You do see dependency n the existence of clusters. I don't think
you can call those "independent random observations of the
infections from each single case." You have clusters when you
have multiple cases from a nursing home, a prison, a factory, or
after a choir practice or church service.

This would all be lot of work, and you would need to consider if the
possible outcomes fit in with what you are interested in. Any temporal >correlations in local variations might relate to how long an infected
person goes on infecting other people.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to Rich Ulrich on Sat Jun 6 07:00:21 2020
Rich Ulrich wrote:

On Fri, 5 Jun 2020 17:07:18 +0000 (UTC), "David Jones" <dajhawkxx@nowherel.com> wrote:

root wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

If one goes on from Rich's comments about dependencies possibly
explaining some of the variation, it may be good to leave off
from >> > the Poisson/Binomial questions and do a fairly basic
time-series >> > analysis. Thus someone could do an autocorrelation
anlayis of the >> > counts (or square roots of counts if this seems
good). This could

...

80 time points is about the size of data-sets used in econometric
data, weather data etc. for corelation-type analysis and time-series modelling.But keep as mucgh data as possible for autocorrelatons.

I would not suggest starting by smoothing the data, particularly if
you want to look at short-term variations. As there are marked
day-of-week effects, you would expecta raw auto-correlation
analysis to be overwhelmed by this effect. This suggests a need to "detrend" the data to remove this effect (and if there are any
special holidays), and if there are long-terms trends you might
well want to remove such trends as well.

I wasn't thinking of the sort of dependency that would
show up in these data as autocorrelation across days for
these data.

You do see dependency n the existence of clusters. I don't think
you can call those "independent random observations of the
infections from each single case." You have clusters when you
have multiple cases from a nursing home, a prison, a factory, or
after a choir practice or church service.

There are two versions of this: one where the situation is such that a
high number of cases are generated for a a period of several days and a
second where a high number of cases are recorded on a single day, as in
the surpise discovery of a bad situation in an unnoticed care home. I
think either of these could be modelled (as random-in-time occurences
of such situations) such as to lead to serial correleation in the
counts. THe aurocorrelations may not be the best way to detect such
effects, but they are notionally easy to compute for a data-series. On
a theoretical basis they correspond to short periods of time where the
rate of occurence is high compared to a background rate.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to David Jones on Sat Jun 6 11:21:39 2020
David Jones wrote:

Rich Ulrich wrote:

On Fri, 5 Jun 2020 16:33:51 -0000 (UTC), root <NoEMail@home.org>
wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

The latest results/methodology from survey analysis in the UK are
given >> at:

https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020

Can I be reading this correctly: outside of hospitals only 0.1% of
the population in the UK have been infected? Either they have
done a wonderful job of social isolation, or Covid19 is a fizzle
in the UK. If all those people were then to die of the infection
the lethality would be no worse than seasonal flu.

As I read it, 0.1% have the active disease at any given
time. The new infections amount to 0.07% per week.

That implies that the duration of infection is 10 days,
for these people outside the hospitals. That surprises
me. If it is six days until symptoms, even the folks with
symptoms must show them for only a few days. I
thought the disease was more tenacious.

Those who develop bad symptoms will be quickly moved to hospital (no
cost worries with the NHS) and so are not in the outside population
for long. Those who don't develop bad symptoms have those lesser
symptoms (and count as infected) for a relatively short time.

... actually the lockdown rules were rather strict (if anyone followed
them) in that anyone having symptoms (if not needing hospitalisation)
were meant to self-isolate, even from their own family but still within
the family home.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Sat Jun 6 21:37:47 2020
On Sat, 6 Jun 2020 07:00:21 +0000 (UTC), "David Jones"
<dajhawkxx@nowherel.com> wrote:

me >
I wasn't thinking of the sort of dependency that would
show up in these data as autocorrelation across days for
these data.

You do see dependency n the existence of clusters. I don't think
you can call those "independent random observations of the
infections from each single case." You have clusters when you
have multiple cases from a nursing home, a prison, a factory, or
after a choir practice or church service.

There are two versions of this: one where the situation is such that a
high number of cases are generated for a a period of several days and a >second where a high number of cases are recorded on a single day, as in
the surpise discovery of a bad situation in an unnoticed care home. I
think either of these could be modelled (as random-in-time occurences
of such situations) such as to lead to serial correleation in the
counts. THe aurocorrelations may not be the best way to detect such
effects, but they are notionally easy to compute for a data-series. On
a theoretical basis they correspond to short periods of time where the
rate of occurence is high compared to a background rate.

Okay, yes, autocorrelation is a statistic you could generate, whatever
you fiigure to do with those lumps of data.

I'm satisfied with observing the clearly-non-Poisson variation
and pointing to the known clusters, etc. -- which /ought to/ be
studied up close and in detail, at least a few times. Why has that
not been done? Meatpackers? Nursing homes?

My inference is that our CDC has been shut out of any leadership
role in both management and science. Nobody else is in the same
position, where they could essentially /mandate/ participation.
That is a shame. So we are left guessing.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Sat Jun 6 21:21:09 2020
On Fri, 5 Jun 2020 18:12:51 +0000 (UTC), "David Jones"
<dajhawkxx@nowherel.com> wrote:

root wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

The latest results/methodology from survey analysis in the UK are
given at:

https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020

Can I be reading this correctly: outside of hospitals only 0.1% of
the population in the UK have been infected? Either they have done a
wonderful job of social isolation, or Covid19 is a fizzle in the UK.
If all those people were then to die of the infection the lethality
would be no worse than seasonal flu.

Not "only 0.1% of the population in the UK have been infected",
instead "only 0.1% of the population in the UK presently have the
infection." Thus it excludes those who have either recovered after
showing symptoms, or recovered without showing symptoms. This refers to
17 May to 30 May 2020. This perentage has gone own a lot, but I don't
know the peak value.

In the UK, much of the infections have occured within care-homes and
within hospitals as opposed to being new cases entering those places
having been detected as already having the infection. This probably
relates to the lack of fully effctive personal protection equipment at
the early and middle stages of the epidemic (and the close contacts in
those places).

Deaths so far with confirmed Covid-19 have just passed 40,000 (counting
only deats in hospital or care homes). Excess deaths compared with what
would be expected in normal year are around 60,000. These numbers are
higher than those reported in any other country except the USA, so
would be judged high I guess. THe problem is that other countries
report deaths on different bases ... for example, I have read that in
Germany if someone dies of a heart attack while suffering Covid-19,
it would be counted only as a heart attack but in the UK it would be
counted in the Covid-9 totals. Even more, those who survive Covid-19
having had the extreme version of symptoms will have had a very extreme >experience.

Total Death rates in UK and USA are presently 600 and 330 per million, >respectively according to >https://www.statista.com/statistics/1104709/coronavirus-deaths-worldwide-per-million-inhabitants/
THe UK value is second highest in the world, but with the above caveat
about comparability.

The graphs that show "excess deaths" show excesses (beyond
annual trends + Covid-19 reports) for most countries. That's
despite the lower death rates in some other specific categories.

Covid-19 reportedly does manifest as heart attacks. Also, a
large fraction of those on ventilators also need dialysis. I think
that, whether it is 100% legit or not, counting all those related
deaths as Covid-19 won't result in an over-count; too many
cases are missed elsewhere.

For a couple of weeks, there were reports that 85-90% of
those on ventilators eventually die. That led to advice to
put patients on their bellies, and to hold off the ventilators
for as long as possible. I haven't seen those mentioned lately.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to Rich Ulrich on Tue Jun 16 07:55:08 2020
Rich Ulrich wrote:

On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones"
<dajhawkxx@gmail.com> wrote:

I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about the following list of articles associtaed with the Significance
magazine:

https://www.significancemagazine.com/business/647

The list features the UK rather heavily and relates to the RSS
Covid-19 Task Force, which is outlined here:

https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

and:

https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/

Thanks - I finally got around to checking those, and I've
read some good articles already.

The pages are often updated, so may be worth checking again. For
example, this is a recent (10 June) BBC radio podcast : https://www.bbc.co.uk/sounds/play/m000jw02
titled "Antibody tests, early lockdown advice and European deaths" and
it discusses various problems with the statistics and modelling used.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From root@21:1/5 to David Jones on Tue Jun 16 14:21:41 2020
David Jones <dajhawkxx@nowherel.com> wrote:
Rich Ulrich wrote:

On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones"
<dajhawkxx@gmail.com> wrote:

I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about the
following list of articles associtaed with the Significance
magazine:

https://www.significancemagazine.com/business/647

The list features the UK rather heavily and relates to the RSS
Covid-19 Task Force, which is outlined here:

https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

and:

https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/

Thanks - I finally got around to checking those, and I've
read some good articles already.

The pages are often updated, so may be worth checking again. For
example, this is a recent (10 June) BBC radio podcast : https://www.bbc.co.uk/sounds/play/m000jw02
titled "Antibody tests, early lockdown advice and European deaths" and
it discusses various problems with the statistics and modelling used.

One of the papers in the RSS publication news URL above is:

How many people are infected with Covid-19?

At least one of the references cited in this paper arrive at numbers
comparable to those I have inferred from the reported data.

I tried to contact both the RSS and the CDC about my method for estimating
that number directly from the reported data. I have, as yet, received no response from either group.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to root on Tue Jun 16 15:35:28 2020
root wrote:

David Jones <dajhawkxx@nowherel.com> wrote:
Rich Ulrich wrote:

On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones"
<dajhawkxx@gmail.com> wrote:

I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about
the >> > following list of articles associtaed with the Significance
magazine:

https://www.significancemagazine.com/business/647

The list features the UK rather heavily and relates to the RSS
Covid-19 Task Force, which is outlined here:

https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

and:

https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/

Thanks - I finally got around to checking those, and I've
read some good articles already.

The pages are often updated, so may be worth checking again. For
example, this is a recent (10 June) BBC radio podcast : https://www.bbc.co.uk/sounds/play/m000jw02
titled "Antibody tests, early lockdown advice and European deaths"
and it discusses various problems with the statistics and modelling
used.

One of the papers in the RSS publication news URL above is:

How many people are infected with Covid-19?

At least one of the references cited in this paper arrive at numbers comparable to those I have inferred from the reported data.

I tried to contact both the RSS and the CDC about my method for
estimating that number directly from the reported data. I have, as
yet, received no response from either group.

That article has the following info about the author

"About the author:
Tarak Shah is a data scientist at the Human Rights Data Analysis Group
(HRDAG), where he processes data about violence and fits models in
order to better understand evidence of human rights abuses."

... so you could try contacting HRDAG via their covid webpage : https://hrdag.org/covid19/

.. or the author's info at
https://hrdag.org/people/tarak-shah/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to David Jones on Wed Jun 17 14:27:35 2020
David Jones wrote:

Rich Ulrich wrote:

On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones" <dajhawkxx@gmail.com> wrote:

I have not tried to follow any of the above. But, anyone with a statistical interest in this epidemic should probably know about
the following list of articles associtaed with the Significance
magazine:

https://www.significancemagazine.com/business/647

The list features the UK rather heavily and relates to the RSS
Covid-19 Task Force, which is outlined here:

https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/

and:

https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/

Thanks - I finally got around to checking those, and I've
read some good articles already.

The pages are often updated, so may be worth checking again. For
example, this is a recent (10 June) BBC radio podcast : https://www.bbc.co.uk/sounds/play/m000jw02
titled "Antibody tests, early lockdown advice and European deaths" and
it discusses various problems with the statistics and modelling used.

A video has appeared today on YouTube, that is partly related and that discusses what has been going on with the statistics of Covid in the
UK. It derives from May 20.

https://www.youtube.com/watch?v=OrRoeQaucF0
titled: Using data to improve health from the time of the Crimea to the
time to the coronavirus

SPEAKER: Prof. Deborah Ashby, President of the Royal Statistical
Society and Director of the School of Public Health at Imperial College
London

In this talk, Prof Deborah Ashby takes us on a journey through the life
of Florence Nightingale and comments on the aptness celebrating her
centenary in the first year of the COVID-19 pandemic.

Other details in the heading on YouTube.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to rich.ulrich@comcast.net on Thu Jul 9 14:36:13 2020
On Fri, 05 Jun 2020 03:27:59 -0400, Rich Ulrich
<rich.ulrich@comcast.net> wrote:
...

The Swedish study from the end of April that estimated
26% coronavirus antibody was been replaced with a
claim of 7%. And disappointment in Sweden.

The CDC released estimates last week that gave five models,
all of which estimated huge population exposures. That
study, released online, was using data from April, too. It
was criticised for lacking citations, and for producing those
rates, outside the usual range, without decent explanation. >https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html
What I like about the reference is it gives some numbers
for things like "hospitalizations" and "mean days" ....

The idea that the "real" infection rate is 10 times the reported
rate is gaining currency in news reports.

My impression is that it springs entirely from a comment
10 days or so ago, to that effect, made by Redfield, the
Director of CDC. (My current impression of Redfield is low.)

Redfield made no citations, so he seems to be pointing tack
to the CDC report that I cited above. Which I still do not
place any faith in. That study used only date from April
and earlier, and did not give good citations.

I just now read some online comments about the Stanford
study (April) which reported high infection prevalence.
I think that David Jones mentioned criticism about sample
selection. I see that they advertised on Facebook ... which
IMHO is an assured way to get volunteers who expect that
they may be positive. So that is a distinct bias, noted in
comments.

I also read an assertion that the test they used has been
shown to have 97.5% specificity, instead of 99.5%, with
the consequence that ALL their "cases" could have been
false-positives. I don't know if that criticism is valid. All
comments I read were two months old.

Looking for other citations to prevalence surveys in
Google-news, most of what Google showed were articles from
single, local newspapers, not formal reports of wide distribution.

The exception was an article from Lancet, which reported
on a survey of Geneva, tapping a pre-existing survey sample.

That article, as it happens, DOES support the hypothesis of
very widespread infection. They estimate about 10% infection
in their population -- which had about 1% reported cases.
Like the US, their apparent case-fatality rate (cases vs deaths)
was 5 or 6% at the time. Adjusting 6% by tenfold yields an overall,
"true" case fatality rate of around 0.6% -- which is not
wholly unreasonable. It compares to the outside-of-Wuhan
data for China's original epidemic.

What I believed a month ago is little changed. The best
extrapolated "true infection" rates may be what you get by
starting with the reported fatality rate, adjusting that for biases
you can guess, discounting for excess fatalities in care homes,
and multiplying by 100 or 150 to account for a fatality rate
between 0.67% and 1%.

The Chinese have done so much testing that they ought to
have data that would settle some questions. I don't know if
no one has seen it, or if no one trusts it.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From root@21:1/5 to Rich Ulrich on Thu Jul 9 21:03:36 2020
Rich Ulrich <rich.ulrich@comcast.net> wrote:

What I believed a month ago is little changed. The best
extrapolated "true infection" rates may be what you get by
starting with the reported fatality rate, adjusting that for biases
you can guess, discounting for excess fatalities in care homes,
and multiplying by 100 or 150 to account for a fatality rate
between 0.67% and 1%.

In other words, a guess. The proposed numbers agree pretty
well with what I derived in an earlier post to which you
objected.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Fri Jul 10 13:14:31 2020
On Thu, 9 Jul 2020 21:03:36 -0000 (UTC), root <NoEMail@home.org>
wrote:

Rich Ulrich <rich.ulrich@comcast.net> wrote:

What I believed a month ago is little changed. The best
extrapolated "true infection" rates may be what you get by
starting with the reported fatality rate, adjusting that for biases
you can guess, discounting for excess fatalities in care homes,
and multiplying by 100 or 150 to account for a fatality rate
between 0.67% and 1%.

In other words, a guess. The proposed numbers agree pretty
well with what I derived in an earlier post to which you
objected.

A guess, yup. But an educated guess.

Applying what I said there -- 90,000 deaths outside of care
facilities (say) yields 9 to 13.5 million infected, rather than 30
million.

I still consider the non-Poisson daily counts as reflecting the
clumping of cases, owing to artifacts of reporting in the cases
where it doesn't owe to super-spread events or care homes.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From root@21:1/5 to Rich Ulrich on Fri Jul 10 18:54:08 2020
Rich Ulrich <rich.ulrich@comcast.net> wrote:
I still consider the non-Poisson daily counts as reflecting the
clumping of cases, owing to artifacts of reporting in the cases
where it doesn't owe to super-spread events or care homes.

An article in a recent WSJ indicated that Covid testing is
employing pooling, a method developed in WW2 for syphilis
testing. As described, pooling involves pooling a number
of blood samples and testing the pool for antibodies.
If the pool is clear then all the samples in the pool
are clear. If the pool is not clear then the samples
are tested again individually. Another article I read
said the pools now consist of 5 samples.

A little math will reveal that a pool of 5 samples
is optimum (in the sense of minimum tests) for a
population with a 20% infection rate. This suggests
that 20% is the rate at which samples are proving positive.

At 20% infection rate there is not much chance to do
better than this pooling method. But, if the infection
rate were much less there is a vastly superior testing
method which involve sequential pooling.

As I like to consider coin problems, we have a batch
of suspect coins for which it is known that the bad coins
are always lighter than good coins. How should they
be tested if we only have a balance scale?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Sat Jul 11 17:50:16 2020
On Fri, 10 Jul 2020 18:54:08 -0000 (UTC), root <NoEMail@home.org>
wrote:

Rich Ulrich <rich.ulrich@comcast.net> wrote:
I still consider the non-Poisson daily counts as reflecting the
clumping of cases, owing to artifacts of reporting in the cases
where it doesn't owe to super-spread events or care homes.

An article in a recent WSJ indicated that Covid testing is
employing pooling, a method developed in WW2 for syphilis
testing. As described, pooling involves pooling a number
of blood samples and testing the pool for antibodies.
If the pool is clear then all the samples in the pool
are clear. If the pool is not clear then the samples
are tested again individually. Another article I read
said the pools now consist of 5 samples.

A little math will reveal that a pool of 5 samples
is optimum (in the sense of minimum tests) for a
population with a 20% infection rate. This suggests
that 20% is the rate at which samples are proving positive.

I think that Fauci mentioned pooling, and used "10".
One limiting factor - for some tests, anyway -- is how
much the dilution of the sample affects the sensitivity.

I read that Abbott's quick test (15 minutes) originally
allowed for either dry-swab or wet-stored-swab, but they
changed the instructions when the wet-swabs showed
lower sensitivity -- which was attributed to dilution.

That was in the discussion after an outside lab found
very low sensitivity for that test. Latest instructions:
"Now, the company says only direct swabs from
patients should be inserted into the machine." https://khn.org/news/abbott-rapid-test-problems-grow-fda-standards-on-covid-tests-under-fire/

At 20% infection rate there is not much chance to do
better than this pooling method. But, if the infection
rate were much less there is a vastly superior testing
method which involve sequential pooling.

As I like to consider coin problems, we have a batch
of suspect coins for which it is known that the bad coins
are always lighter than good coins. How should they
be tested if we only have a balance scale?

By thirds. If two thirds balance, the other third is off.
That's the trick for a single bad coin. I expect it
generalizes, but correcting multiple errors does get
trickier.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to Rich Ulrich on Tue Jul 14 06:15:20 2020
Rich Ulrich wrote:

On Fri, 10 Jul 2020 18:54:08 -0000 (UTC), root <NoEMail@home.org>
wrote:

Rich Ulrich <rich.ulrich@comcast.net> wrote:
I still consider the non-Poisson daily counts as reflecting the
clumping of cases, owing to artifacts of reporting in the cases
where it doesn't owe to super-spread events or care homes.

An article in a recent WSJ indicated that Covid testing is
employing pooling, a method developed in WW2 for syphilis
testing. As described, pooling involves pooling a number
of blood samples and testing the pool for antibodies.
If the pool is clear then all the samples in the pool
are clear. If the pool is not clear then the samples
are tested again individually. Another article I read
said the pools now consist of 5 samples.

A little math will reveal that a pool of 5 samples
is optimum (in the sense of minimum tests) for a
population with a 20% infection rate. This suggests
that 20% is the rate at which samples are proving positive.

I think that Fauci mentioned pooling, and used "10".
One limiting factor - for some tests, anyway -- is how
much the dilution of the sample affects the sensitivity.

I read that Abbott's quick test (15 minutes) originally
allowed for either dry-swab or wet-stored-swab, but they
changed the instructions when the wet-swabs showed
lower sensitivity -- which was attributed to dilution.

That was in the discussion after an outside lab found
very low sensitivity for that test. Latest instructions:
"Now, the company says only direct swabs from
patients should be inserted into the machine."

https://khn.org/news/abbott-rapid-test-problems-grow-fda-standards-on-covid-tests-under-fire/

At 20% infection rate there is not much chance to do
better than this pooling method. But, if the infection
rate were much less there is a vastly superior testing
method which involve sequential pooling.

As I like to consider coin problems, we have a batch
of suspect coins for which it is known that the bad coins
are always lighter than good coins. How should they
be tested if we only have a balance scale?

By thirds. If two thirds balance, the other third is off.
That's the trick for a single bad coin. I expect it
generalizes, but correcting multiple errors does get
trickier.

On the topic of testing by pooling, there is some relevant discussion
of multidimensional pooling in the following BBC radio podcast,
starting at at out 10:45 for about 10 minutes: https://www.bbc.co.uk/sounds/play/w3cszh0k

This should be accessible worldwide. Blurb says:

"African scientists have developed a reliable, quick and cheap testing
method which could be used by worldwide as the basis for mass testing programmes.

The method, which produces highly accurate results, is built around mathematical algorithms developed at the African Institute for
Mathematical Sciences in Kigali. We speak to Neil Turok who founded the institute, Leon Mutesa Professor of human genetics on the government coronavirus task force, and Wilfred Ndifon, the mathematical biologist
who devised the algorithm."

The idea is to do very few tests to identify rare infected individuals
among large populations. For multidimensional pooling, each indiudual
sample is put into several different pools, and those pools which turn
out to test positive can lead to a quick identification of infected
candidates from very few actual tests.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From root@21:1/5 to David Jones on Tue Jul 14 08:02:04 2020
David Jones <dajhawkxx@nowherel.com> wrote:

The idea is to do very few tests to identify rare infected individuals
among large populations. For multidimensional pooling, each indiudual
sample is put into several different pools, and those pools which turn
out to test positive can lead to a quick identification of infected candidates from very few actual tests.

The results of sequential pooling when the infection is rare are
surprising: For a condition which affects .1% of the population,
the infected people in a population of 1 million can be isolated
with only a few thousand tests. Certainly under 4.000. Were it
not for random variations in the binomial distribution it could
be done in about 1,700 tests.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Tue Jul 14 14:05:14 2020
On Tue, 14 Jul 2020 06:15:20 +0000 (UTC), "David Jones" <dajhawkxx@nowherel.com> wrote:

On the topic of testing by pooling, there is some relevant discussion
of multidimensional pooling in the following BBC radio podcast,
starting at at out 10:45 for about 10 minutes: >https://www.bbc.co.uk/sounds/play/w3cszh0k

This should be accessible worldwide. Blurb says:

"African scientists have developed a reliable, quick and cheap testing
method which could be used by worldwide as the basis for mass testing >programmes.

The method, which produces highly accurate results, is built around >mathematical algorithms developed at the African Institute for
Mathematical Sciences in Kigali. We speak to Neil Turok who founded the >institute, Leon Mutesa Professor of human genetics on the government >coronavirus task force, and Wilfred Ndifon, the mathematical biologist
who devised the algorithm."

The idea is to do very few tests to identify rare infected individuals
among large populations. For multidimensional pooling, each indiudual
sample is put into several different pools, and those pools which turn
out to test positive can lead to a quick identification of infected >candidates from very few actual tests.

I think immediately about the work done (1980s, I think)
to provide reliable disk drives. Simple checksums can detect
some read-error in a sector. Algorithms were developed to use
bit-wise "pooling" (like above) to provide error-correction that detect-and-correct up to some maximum number of errors. Using
minimum resources for computing. Magnetic media were prone to
developing errors.

I have no idea whether that technology is still in use, or how
much of it is in use. I've seen no talk of faulty disk drives in
years. The need to re-load some .exe is also pretty rare, and
seems to be assumed to be the fault of bad program execution
(or virus) instead of memory-rot.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Sun Aug 9 12:47:16 2020
On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

The idea is to do very few tests to identify rare infected individuals
among large populations. For multidimensional pooling, each indiudual
sample is put into several different pools, and those pools which turn
out to test positive can lead to a quick identification of infected
candidates from very few actual tests.

The results of sequential pooling when the infection is rare are
surprising: For a condition which affects .1% of the population,
the infected people in a population of 1 million can be isolated
with only a few thousand tests. Certainly under 4.000. Were it
not for random variations in the binomial distribution it could
be done in about 1,700 tests.

I am a bit surprised that the whole topic of "pooling" for coronavirus
testing made a tiny splash and disappeared, in the media that I
read and view.

On the other hand, the /need/ for pooling seems ever more
apparent, now that doctors and pundits have started repeating
(over and over) that a week between test and results is far too
long. One pundit suggested that "capitalist incentive" would fix the
delays, if no one had to pay for a test that took more than 48 hours.

I've heard exactly one interview about pooling which may have
already started. One university (US) is using its own lab resources,
for what they are doing now (I think) and what they plan for
students (too) when they open. Frequent re-tests mean that
they will need many thousands of test results per day.

I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From root@21:1/5 to Rich Ulrich on Sun Aug 9 17:45:45 2020
Rich Ulrich <rich.ulrich@comcast.net> wrote:

I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.

The size of the pool depends upon the expected frequency of
the infection. A pool size of 10 would be too large if
the expected fraction of infected is 0.5. The optimum size
of the pool depends upon the proposed schedule of testing:
what you do if the pool tests positive. Sequential testing
yields the mimimum number of tests. With the observed frequency
of infected and the schedule of using individual tests after
a pool has failed, the optimum pool size is now around 5.

I have read that dilution does impose a limit on pool size.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Jones@21:1/5 to Rich Ulrich on Sun Aug 9 23:46:09 2020
Rich Ulrich wrote:

On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

The idea is to do very few tests to identify rare infected
individuals >> among large populations. For multidimensional pooling,
each indiudual >> sample is put into several different pools, and
those pools which turn >> out to test positive can lead to a quick identification of infected >> candidates from very few actual tests.

The results of sequential pooling when the infection is rare are surprising: For a condition which affects .1% of the population,
the infected people in a population of 1 million can be isolated
with only a few thousand tests. Certainly under 4.000. Were it
not for random variations in the binomial distribution it could
be done in about 1,700 tests.

I am a bit surprised that the whole topic of "pooling" for coronavirus testing made a tiny splash and disappeared, in the media that I
read and view.

On the other hand, the need for pooling seems ever more
apparent, now that doctors and pundits have started repeating
(over and over) that a week between test and results is far too
long. One pundit suggested that "capitalist incentive" would fix the
delays, if no one had to pay for a test that took more than 48 hours.

I've heard exactly one interview about pooling which may have
already started. One university (US) is using its own lab resources,
for what they are doing now (I think) and what they plan for
students (too) when they open. Frequent re-tests mean that
they will need many thousands of test results per day.

I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.

In the UK, the recently identified need to recompute the Covid
statistics because of double counting etc. throws some doubt on the
ability of the bureaucracy to cope with pooled testing (needing records
of those in each pool).At least some of the poor performance of the
"test and trace" initiative has been attributed to poor record keeping
and uncooperative response from testees.

On the subject of pooling, In the UK there has been a push for research
on testing of sewage out-falls for early evidence of Covid outbreaks
... https://www.bbc.co.uk/news/science-environment-53635692

On the subject of the time taken for outcomes of Covid tests, the UK
has news of tests that take 90 minutes for the result ... https://www.imperial.ac.uk/news/201073/90-minute-covid-19-tests-government-orders/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Tue Aug 11 01:11:08 2020
On Sun, 9 Aug 2020 17:45:45 -0000 (UTC), root <NoEMail@home.org>
wrote:

Rich Ulrich <rich.ulrich@comcast.net> wrote:

I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.

The size of the pool depends upon the expected frequency of
the infection. A pool size of 10 would be too large if
the expected fraction of infected is 0.5. The optimum size
of the pool depends upon the proposed schedule of testing:
what you do if the pool tests positive. Sequential testing
yields the mimimum number of tests. With the observed frequency
of infected and the schedule of using individual tests after
a pool has failed, the optimum pool size is now around 5.

I have read that dilution does impose a limit on pool size.

I can see a potential problem from /relying/ on pooling
to achieve thousands of tests results, like the university
I mentioned.

Since the number to pool depends directly on the Infected
rate, if the rate of infection doubles, suddenly you have to
double the number of lab-tests-performed to get the same
coverage for people-tested.

That becomes a big number.

Maybe the standard for the future will be a single lab test,
performed at home, on the combined sample from all
all members of the household. Once a week?

The retail price of tests seems to be fairly high. I think
the US labs are charging \$100 a pop. That British quick-
test in the article cited by David Jones was (IIRC) less
than half that -- though, maybe that was just for the kits
and not for the completed testing.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Tue Aug 11 01:35:57 2020
On Sun, 9 Aug 2020 23:46:09 +0000 (UTC), "David Jones"
<dajhawkxx@nowherel.com> wrote:

Rich Ulrich wrote:

On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

The idea is to do very few tests to identify rare infected
individuals >> among large populations. For multidimensional pooling,
each indiudual >> sample is put into several different pools, and
those pools which turn >> out to test positive can lead to a quick
identification of infected >> candidates from very few actual tests.

The results of sequential pooling when the infection is rare are
surprising: For a condition which affects .1% of the population,
the infected people in a population of 1 million can be isolated
with only a few thousand tests. Certainly under 4.000. Were it
not for random variations in the binomial distribution it could
be done in about 1,700 tests.

I am a bit surprised that the whole topic of "pooling" for coronavirus
testing made a tiny splash and disappeared, in the media that I
read and view.

On the other hand, the need for pooling seems ever more
apparent, now that doctors and pundits have started repeating
(over and over) that a week between test and results is far too
long. One pundit suggested that "capitalist incentive" would fix the
delays, if no one had to pay for a test that took more than 48 hours.

I've heard exactly one interview about pooling which may have
already started. One university (US) is using its own lab resources,
for what they are doing now (I think) and what they plan for
students (too) when they open. Frequent re-tests mean that
they will need many thousands of test results per day.

I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.

In the UK, the recently identified need to recompute the Covid
statistics because of double counting etc. throws some doubt on the
ability of the bureaucracy to cope with pooled testing (needing records
of those in each pool).At least some of the poor performance of the
"test and trace" initiative has been attributed to poor record keeping
and uncooperative response from testees.

I've read of 35% non-follow-up in some US cities, from lack of
cooperation. But we also have raging disease, places where there
are too many to follow, and insufferable delays on getting test
results back. The media (finally) have started repeating the
complaints about the delays in testing. The two major companies
that together process for half the hospitals in the country have
reported delays of 5 to 7 days for the low-priority tests (not
in-hospital; not professional sports....).

On the subject of pooling, In the UK there has been a push for research
on testing of sewage out-falls for early evidence of Covid outbreaks
... https://www.bbc.co.uk/news/science-environment-53635692

I've seen scattered reports of that. I think a couple of states
are trying that, but I haven't read of great predictive success.

On the subject of the time taken for outcomes of Covid tests, the UK
has news of tests that take 90 minutes for the result ... >https://www.imperial.ac.uk/news/201073/90-minute-covid-19-tests-government-orders/

Sounds great! I notice that it tests for more than just coronavirus.

I'd say, "Order a billion" except that before they made that many,
we can hope for testing that is cheaper and quicker.

By the way -- The CDC test that went bad this year was a failed
attempt to piggyback three or four other diagnoses on top of the
covid-19 test. I read an accusatory article that said that the CDC
made a similar, lesser error with their Zita test a few years ago.

For zita, the new test was not a total failure, but it was less
reliable than advertised (and desired). Like with covid, other
people started using their own tests when the tests from the
CDC proved to be unreliable. The director (who was never
called to account for the bad zita test) also directed the covid
effort.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Fri Aug 28 15:42:40 2020
On Sun, 9 Aug 2020 23:46:09 +0000 (UTC), "David Jones"
<dajhawkxx@nowherel.com> wrote:

Rich Ulrich wrote:

On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

The idea is to do very few tests to identify rare infected
individuals >> among large populations. For multidimensional pooling,
each indiudual >> sample is put into several different pools, and
those pools which turn >> out to test positive can lead to a quick
identification of infected >> candidates from very few actual tests.

The results of sequential pooling when the infection is rare are
surprising: For a condition which affects .1% of the population,
the infected people in a population of 1 million can be isolated
with only a few thousand tests. Certainly under 4.000. Were it
not for random variations in the binomial distribution it could
be done in about 1,700 tests.

I am a bit surprised that the whole topic of "pooling" for coronavirus
testing made a tiny splash and disappeared, in the media that I
read and view.

On the other hand, the need for pooling seems ever more
apparent, now that doctors and pundits have started repeating
(over and over) that a week between test and results is far too
long. One pundit suggested that "capitalist incentive" would fix the
delays, if no one had to pay for a test that took more than 48 hours.

I've heard exactly one interview about pooling which may have
already started. One university (US) is using its own lab resources,
for what they are doing now (I think) and what they plan for
students (too) when they open. Frequent re-tests mean that
they will need many thousands of test results per day.

I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.

In the UK, the recently identified need to recompute the Covid
statistics because of double counting etc. throws some doubt on the
ability of the bureaucracy to cope with pooled testing (needing records
of those in each pool).At least some of the poor performance of the
"test and trace" initiative has been attributed to poor record keeping
and uncooperative response from testees.

On the subject of pooling, In the UK there has been a push for research
on testing of sewage out-falls for early evidence of Covid outbreaks
... https://www.bbc.co.uk/news/science-environment-53635692

Here is a news article on sewage tests. Apparently "dilution" does
not ruin all the possible tests, because sewage is surely dilute.
The University of Arizona may have prevented an outbreak -

https://www.washingtonpost.com/nation/2020/08/28/arizona-coronavirus-wastewater-testing/

<< Researchers around the world have been studying whether wastewater
testing can effectively catch cases early to prevent covid-19
clusters. There are programs in Singapore, China, Spain, Canada and
New Zealand, while in the United States, more than 170 wastewater
facilities across 37 states are being tested. Earlier this month,
officials in Britain announced testing at 44 water treatment
facilities. The Netherlands has been collecting samples at 300 sewage
treatment plants. >>

On the subject of the time taken for outcomes of Covid tests, the UK
has news of tests that take 90 minutes for the result ... >https://www.imperial.ac.uk/news/201073/90-minute-covid-19-tests-government-orders/

--
Rich Ulrlch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to rich.ulrich@comcast.net on Tue Sep 1 01:30:38 2020
On Sun, 09 Aug 2020 12:47:16 -0400, Rich Ulrich
<rich.ulrich@comcast.net> wrote:

On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
wrote:

David Jones <dajhawkxx@nowherel.com> wrote:

The idea is to do very few tests to identify rare infected individuals
among large populations. For multidimensional pooling, each indiudual
sample is put into several different pools, and those pools which turn
out to test positive can lead to a quick identification of infected
candidates from very few actual tests.

The results of sequential pooling when the infection is rare are >>surprising: For a condition which affects .1% of the population,
the infected people in a population of 1 million can be isolated
with only a few thousand tests. Certainly under 4.000. Were it
not for random variations in the binomial distribution it could
be done in about 1,700 tests.

I am a bit surprised that the whole topic of "pooling" for coronavirus >testing made a tiny splash and disappeared, in the media that I
read and view.

On the other hand, the /need/ for pooling seems ever more
apparent, now that doctors and pundits have started repeating
(over and over) that a week between test and results is far too
long. One pundit suggested that "capitalist incentive" would fix the
delays, if no one had to pay for a test that took more than 48 hours.

I've heard exactly one interview about pooling which may have
already started. One university (US) is using its own lab resources,
for what they are doing now (I think) and what they plan for
students (too) when they open. Frequent re-tests mean that
they will need many thousands of test results per day.

I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.

More about pooling -

https://www.nytimes.com/2020/08/18/health/coronavirus-pool-testing.html

<< Experts disagree, for instance, on the cutoff at which pooling
stops being useful. The Centers for Disease Control and Prevention’s coronavirus test, which is used by most public health laboratories in
the United States, stipulates that pooling shouldn’t be used when
positivity rates exceed 10 percent. But at Mayo Clinic, “we’d have to
start to question it once prevalence goes above 2 percent, definitely
above 5 percent,” Dr. Pritt said.

<< And prevalence isn’t the only factor at play. The more individual
samples grouped, the more efficient the process gets. But at some
point, pooling’s perks hit an inflection point: A positive specimen
can only get diluted so much before the coronavirus becomes
undetectable. That means pooling will miss some people who harbor very
low amounts of the virus. >>

Per the article -
Various folks (US) have received permission to officially use
pooling, but not all have started. 25, 10, 7 and 5 are all mentioned
in there as numbers of samples being pooled, in various labs.

One more "factor in play" that is mentioned is the human-
intensive part -- measuring out the test materials to be combined,
and keeping track of what sample is where and what to do
with the results. One sample, one result: is obviously simpler.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)