Using the numbers above we can compute the daily differences (dC) in the number
if cases (C). Furthermore we can compute the ratio dC/C. What is the meaning of
this ratio? It is the probability that any one of the C people will infect a new
person on that day.
Call this ratio (p) this just says the expected mumber of
infected people is p*C = dC. We can use the formula given above to compute the >expected standard deviation (s) of the number infected by C people:
s = sqrt( p * (1-p) * C).
We can also directly compute the standard deviation (S) of the daily new cases.
We find, however, that (S) is considerably larger than (s). The reason for this
discrepancy that the new cases are derived from a larger number of actual
infected people than the reported number. In the case of the US data as a whole
the number of infected people is over ten times the reported number.
The data show a pronounced weekly variation such as might happen if new cases >are only reported on Monday. Careful measures must be taken to eliminate this >source of variation.
Here are my findings to-date:
On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
wrote:
I have to say that I don't understand what numbers you
are computing and what you are assuming.
Complications that I see: The Case does not show up until 5 or 6 days
after the exposure. In the explosive phase in NY City, 6 days was a quadrupling of the number of cases. Louisiana was even faster.
The day-to-day increase does indicate the exponential rate, however,
when you can assume the increase is exponential. (But that is less
often, now.)
Trump refused to let the CDC set up a proper monitoring system,
because he is an idiot who still doesn't believe in serious epidemics.
So, the counting sucks. For cases. For tests. For results of tests
which can mix virus-results with antigen-results with tests by
manufacturers whose tests may be neither calibrated nor reliable.
Day-to-day variation depends on sloppy reporting practices, and
is contaminated by sloppy standards. And further contaminated
by political concerns, which may-or-may-not intentionally count
the cases at a prison or meatpacking plant or intentionally avoid
counting them. And not all nursing homes are /able/ to get the
test kits that they want.
infected people than the reported number. In the case of the US data as a whole
the number of infected people is over ten times the reported number.
" ... ten times the reported number" is a guess, based on
incomplete and insufficient surveys of antigen levels. That is a
very important thing to know. Sweden recently reduced their
claim (made a month ago) about subjects in Stockholm having
antigens from 26% to 7%. I am hoping for better surveys.
Day to day variation is one problem among many.
On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
wrote:
I've been following this, too. My own calculations have been
on scraps of paper, using numbers from Johns Hopkins and
random news reports.
I have to say that I don't understand what numbers you
are computing and what you are assuming.
Using the numbers above we can compute the daily differences (dC)
in the number if cases (C). Furthermore we can compute the ratio
dC/C. What is the meaning of this ratio? It is the probability that
any one of the C people will infect a new person on that day.
Complications that I see: The Case does not show up until 5 or 6 days
after the exposure. In the explosive phase in NY City, 6 days was a quadrupling of the number of cases. Louisiana was even faster.
The day-to-day increase does indicate the exponential rate, however,
when you can assume the increase is exponential. (But that is less
often, now.)
Rural "exponential increase" is much slower, except when it is a
big splash of reports either in or near a prison or meatpacking plant.
The Case in some cities is often counted only when it becomes
a hospitilization, and (therefore) is no longer in the community
infecting anyone.
Call this ratio (p) this just says the expected mumber of infected people is p*C = dC. We can use the formula given above to
compute the expected standard deviation (s) of the number infected
by C people: s = sqrt( p * (1-p) * C).
We can also directly compute the standard deviation (S) of the
daily new cases. We find, however, that (S) is considerably larger
than (s). The reason for this discrepancy that the new cases are
derived from a larger number of actual
Trump refused to let the CDC set up a proper monitoring system,
because he is an idiot who still doesn't believe in serious epidemics.
So, the counting sucks. For cases. For tests. For results of tests
which can mix virus-results with antigen-results with tests by
manufacturers whose tests may be neither calibrated nor reliable.
Day-to-day variation depends on sloppy reporting practices, and
is contaminated by sloppy standards. And further contaminated
by political concerns, which may-or-may-not intentionally count
the cases at a prison or meatpacking plant or intentionally avoid
counting them. And not all nursing homes are able to get the
test kits that they want.
infected people than the reported number. In the case of the US
data as a whole the number of infected people is over ten times the reported number.
" ... ten times the reported number" is a guess, based on
incomplete and insufficient surveys of antigen levels. That is a
very important thing to know. Sweden recently reduced their
claim (made a month ago) about subjects in Stockholm having
antigens from 26% to 7%. I am hoping for better surveys.
I read the CDC report (online) with 5 scenarios for the antigen
levels, etc. CDC used data from April, and offers poor
documentation. About 5 out of 6 epidemiologists (I gather) think
that the CDC's highest mortality-rate estimate is more like a minimum.
The data show a pronounced weekly variation such as might happen if
new cases are only reported on Monday. Careful measures must be
taken to eliminate this source of variation.
Day to day variation is one problem among many.
Here are my findings to-date:
I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about the following list of articles associtaed with the Significance magazine:
https://www.significancemagazine.com/business/647
The list features the UK rather heavily and relates to the RSS Covid-19
Task Force, which is outlined here:
https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/
and:
https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/
There is some overlap in these lists.
Thanks David. Those links are helpful.
PS- I see the RSS is still calling that publication "Significance", despite
the ASA's (2019) pronouncement that "it is time to stop using the term >???statistically significant??? entirely" >(https://www.tandfonline.com/toc/utas20/73/sup1). ;-)
Bruce Weaver <bweaver@lakeheadu.ca> wrote:
Thanks David. Those links are helpful.
+1
PS- I see the RSS is still calling that publication "Significance",
despite the ASA's (2019) pronouncement that "it is time to stop
using the term ???statistically significant??? entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)
The RSS is jumping into the fray by scheduling a meeting in
Sept 2021!
On Monday, June 1, 2020 at 2:39:54 PM UTC-4, David Jones wrote:
--- snip ---
I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about the following list of articles associtaed with the Significance
magazine:
https://www.significancemagazine.com/business/647
The list features the UK rather heavily and relates to the RSS
Covid-19 Task Force, which is outlined here:
and:
https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/
There is some overlap in these lists.
Thanks David. Those links are helpful.
PS- I see the RSS is still calling that publication "Significance",
despite the ASA's (2019) pronouncement that "it is time to stop using
the term “statistically significant” entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)
Bruce Weaver wrote:
PS- I see the RSS is still calling that publication "Significance",
despite the ASA's (2019) pronouncement that "it is time to stop using
the term “statistically significant” entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)
Wel, in fact it is a joint RSS and ASA publication.
root wrote:
Bruce Weaver <bweaver@lakeheadu.ca> wrote:
Thanks David. Those links are helpful.
+1
PS- I see the RSS is still calling that publication
"Significance", despite the ASA's (2019) pronouncement that "it
is time to stop using the term ???statistically significant???
entirely" (https://www.tandfonline.com/toc/utas20/73/sup1). ;-)
The RSS is jumping into the fray by scheduling a meeting in
Sept 2021!
No, it is jumping into the fray by setting up its task force since 09
April 2020 at least. I am not clear if ASA has anything similar to co-ordinate research?
In the discussion that follows the number of currently infected people will >assume the role of the number of tosses of a coin. Instead of 50/50 odds of >heads, there will be a probability (p) of an infected person infecting someone
new in the next day. Otherwise we will be using the method described above to >estimate the number of flips ( the number of infected people).
Using the numbers above we can compute the daily differences (dC) in the number
if cases (C). Furthermore we can compute the ratio dC/C. What is the meaning of
this ratio? It is the probability that any one of the C people will infect a new
person on that day. Call this ratio (p) this just says the expected mumber of >infected people is p*C = dC. We can use the formula given above to compute the >expected standard deviation (s) of the number infected by C people:
s = sqrt( p * (1-p) * C).
We can also directly compute the standard deviation (S) of the daily new cases.
We find, however, that (S) is considerably larger than (s). The reason for this
discrepancy that the new cases are derived from a larger number of actual >infected people than the reported number.
In the case of the US data as a whole
the number of infected people is over ten times the reported number.
On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
wrote:
Okay, I will back up and discuss your original post.
I thought you were leaving out vital steps.
Now I figure that the whole process is a comparison
based on variances.
No. Your coin-flipping example had nothing to do with
variances.
No, you have that backwards. The change has to be an
increase in p. If you increase C, you would implicitly decrease
p, which decreases the expected variation. Binomial
distributions have larger variance near 50%, smaller in
the tails.
Contrarywise, you should conclude something like "the
effective number of people spreading the disease is
perhaps a tenth of the number" being pointed to.
That might be somewhat consistent with the observation
that 10% of the cases are "super-spreaders" who account
for 80% of all new infections. ("Super" must be a function
of how much virus they shed and how many people they
breathe or cough on.)
Taking another tack - for small proportions, like your
dC over C, the Poisson distribution is neater than the
binomial. The variance of a Poisson observation is equal
to the observation. Note, too, that it says /nothing/
about the total N that may be generating the sample.
Your proper conclusion is a Goodness of Fit conclusion:
the distribution at hand has too much variation to be Poisson.
"N of infected" does not offer an explanation.
The reason that would-be Poisson observations fail to
be Poisson - by having too much variance, as in these
data - is (in my experience) that they are not independent.
That is: I've occasionally done this to augment an eyeball
check of a possible Poisson: divide the observed counts
by 2 or 5 or 10 to get a smaller "effective N". I decide that
there is non-independence if the resulting counts now
match the expected spread of a Poisson. (This notion of
"effective N" is something I picked up from some British
authors, years ago. I haven't seen it widely used by
US biostatisticians.)
Non-independence could be accounted for by various things.
I mentioned (1st Reply) reporting errors, and the problem
of modeling the surge of cases from nursing homes,
prisons, etc., complicated by "politics" that interfere
with accurate reporting. I think you can't pretend that
those problems don't exist and affect the data.
"Superspreaders" and super-spreading events also yield
non-independent cases.
If I have missed what you are doing, please let me know.
Rich Ulrich <rich.ulrich@comcast.net> wrote:
On Sun, 31 May 2020 23:14:36 -0000 (UTC), root <NoEMail@home.org>
wrote:
No. Your coin-flipping example had nothing to do with
variances.
I don't see how you can say that. Certainly there is a difference in that >coins are usually fair, but the same equations apply. The limiting form
of the binomial distribution is normal with a variance as I specified.
Contrarywise, you should conclude something like "the
effective number of people spreading the disease is
perhaps a tenth of the number" being pointed to.
I can't see how you say this.Consider two cases, one with 100,000 people and >another with a million people but in each case each person as a 1% chance of >infecting a new person in the next day. In the case of 100,000 infected people >we would expect 1,000 new cases with a SD of 31 cases. With a million infected >we would expect 10,000 new cases with a SD of 100 cases.
In the case of
100,000 people the SD represents about 4% of the cases. For a million
people the SD represents about 1% of the cases.
The point I wish to make here is that with these large numbers of
infected people we should expect smaller variance that we see.
Taking another tack - for small proportions, like your
dC over C, the Poisson distribution is neater than the
binomial. The variance of a Poisson observation is equal
to the observation. Note, too, that it says /nothing/
about the total N that may be generating the sample.
In fact the limit of a Poisson is also normal. The Poisson
represents how radioactive atoms behave. The number of
atoms (N) certainly affects the number of clicks on
the counter. The early stages of the contagion is represented
by a Poisson. However during that phase, when the Poisson
dominates, the infection grows linearly with time
The reason that would-be Poisson observations fail to
be Poisson - by having too much variance, as in these
data - is (in my experience) that they are not independent.
Non-independence could be accounted for by various things.
Compute first order differences of the smoothed data and follow
the steps I have outlined. Comparison the variance of the
smoothed data is far larger than I have indicated. See for
yourselves.
I read an article about the public discussion in Germany.
They made heavy use of "R0", the rate of passing on
disease from a single case. The public was encouraged
to help keep R0 low by those things like wearing masks
and avoiding unnecessary gatherings. It was clear that
they regarding R0 as a parameter that can be controlled.
The places or events (super-spreader) with high R0 are
particularly strong sources of dependency in the counts,
making the variance much larger for independent events
- whether modeled as Poisson, binomial or normal.
Rich Ulrich <rich.ulrich@comcast.net> wrote:
I read an article about the public discussion in Germany.
They made heavy use of "R0", the rate of passing on
disease from a single case. The public was encouraged
to help keep R0 low by those things like wearing masks
and avoiding unnecessary gatherings. It was clear that
they regarding R0 as a parameter that can be controlled.
Sure it can be controlled: just ensure that no two people
are ever closer than 100 feet from each other.
Regardless of the model, all contagions begin with
an expoential increase. R0 is the initial slope of the log
of that increase vs time. However, the very first
stages of the contagion undoubtedly begin with a
Poisson process. You can see that in the GitHub datia.
The places or events (super-spreader) with high R0 are
particularly strong sources of dependency in the counts,
making the variance much larger for independent events
- whether modeled as Poisson, binomial or normal.
I tried to analyze the possibility for super-spreaders
but I was unable to come up with anything justifiable.
It certainly doesn't work if only super-spreaders are
responsible.
About Poisson, remember that the distribution was invented
to analyze the situation where there is an average number
of events/time. That means that over time the cumulative
number of cases would be linear regardless of the specific
sequence of arrivals. Poisson is used in queuing theory
to determine the optimum number of servers, etc.
I just read the results of a study of tests in Wisconsin.
16,000 people were tested and 6.4% of them tested positive
for SARS-Cov2. Extrapolating that to the 5.82M people
in Wisconsin that would suggest that 372,480 people in
Wisconsin carry the antibodies. About 20,000 "cases"
have been reported in Wisconsin. That is a ratio of
18.6 (people expected to carry the virus)/(known cases).
Earlier a Stanford study in Santa Clara county estimated
that between 2.4% and 4.1% of residents had been infected
with Cov2. The Stanford study was widely criticized for
procedural errors.
On Thu, 4 Jun 2020 19:48:19 -0000 (UTC), root <NoEMail@home.org>
wrote:
I just read the results of a study of tests in Wisconsin.
16,000 people were tested and 6.4% of them tested positive
for SARS-Cov2. Extrapolating that to the 5.82M people
in Wisconsin that would suggest that 372,480 people in
Wisconsin carry the antibodies. About 20,000 "cases"
have been reported in Wisconsin. That is a ratio of
18.6 (people expected to carry the virus)/(known cases).
I've been trying to get a handle on this antibody testing
business. I just read an original article about surveillance
for TB, 10,000 people or so, using two different antibody
tests. This was 2012, comparing their data to a similar
survey done 8 or 10 years prior.
People have been testing for TB for over a century.
I'm a bit appalled that they don't have a good,
systematic and validated method for estimating
population rates. But they don't. If you really want
prevalence, you want the same number of false positives
as false negatives. I don't think that their method gets
that, but they never even discuss the question. So far
as I noticed.
What they presented used the 10mm cutoff for the
skin test and a single cutoff for the other test. From
this, I think they eventually reported all three combos
for rates. Only 2.7% were high on both tests. But they
preferred looking at the other two numbers, which were,
oh, about 5 and about 6.5. I think I have to go back to
that and save the study.
Anyway, my own (tentative) conclusion about this TB
study is that the 2.7% represents more false positives
than false negatives. So it, their minimum, is too high.
That gives me less hope than I started with, in regards
to whether the surveys being done should be believed.
Earlier a Stanford study in Santa Clara county estimated
that between 2.4% and 4.1% of residents had been infected
with Cov2. The Stanford study was widely criticized for
procedural errors.
In the bit I recall about the Stanford study, they said they did
attempt to take into account and calibrate for false positives.
I don't remember what procedures they were criticized for.
The Swedish study from the end of April that estimated
26% coronavirus antibody was been replaced with a
claim of 7%. And disappointment in Sweden.
The CDC released estimates last week that gave five models,
all of which estimated huge population exposures. That
study, released online, was using data from April, too. It
was criticised for lacking citations, and for producing those
rates, outside the usual range, without decent explanation. https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html
What I like about the reference is it gives some numbers
for things like "hospitalizations" and "mean days" ....
The Chinese have done so much testing that they ought to
have data that would settle some questions. I don't know if
no one has seen it, or if no one trusts it.
On Thu, 4 Jun 2020 19:48:19 -0000 (UTC), root <NoEMail@home.org>
wrote:
I just read the results of a study of tests in Wisconsin.
16,000 people were tested and 6.4% of them tested positive
for SARS-Cov2. Extrapolating that to the 5.82M people
in Wisconsin that would suggest that 372,480 people in
Wisconsin carry the antibodies. About 20,000 "cases"
have been reported in Wisconsin. That is a ratio of
18.6 (people expected to carry the virus)/(known cases).
I've been trying to get a handle on this antibody testing
business. I just read an original article about surveillance
for TB, 10,000 people or so, using two different antibody
tests. This was 2012, comparing their data to a similar
survey done 8 or 10 years prior.
People have been testing for TB for over a century.
I'm a bit appalled that they don't have a good,
systematic and validated method for estimating
population rates. But they don't. If you really want
prevalence, you want the same number of false positives
as false negatives. I don't think that their method gets
that, but they never even discuss the question. So far
as I noticed.
What they presented used the 10mm cutoff for the
skin test and a single cutoff for the other test. From
this, I think they eventually reported all three combos
for rates. Only 2.7% were high on both tests. But they
preferred looking at the other two numbers, which were,
oh, about 5 and about 6.5. I think I have to go back to
that and save the study.
Anyway, my own (tentative) conclusion about this TB
study is that the 2.7% represents more false positives
than false negatives. So it, their minimum, is too high.
That gives me less hope than I started with, in regards
to whether the surveys being done should be believed.
Earlier a Stanford study in Santa Clara county estimated
that between 2.4% and 4.1% of residents had been infected
with Cov2. The Stanford study was widely criticized for
procedural errors.
In the bit I recall about the Stanford study, they said they did
/attempt/ to take into account and calibrate for false positives.
I don't remember what procedures they were criticized for.
The latest results/methodology from survey analysis in the UK are given
at:
https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020
If one goes on from Rich's comments about dependencies possibly
explaining some of the variation, it may be good to leave off from the Poisson/Binomial questions and do a fairly basic time-series analysis.
Thus someone could do an autocorrelation anlayis of the counts (or
square roots of counts if this seems good). This could be extended to
a cross-correlation analysis between states. And ... if a simple ARMA modelling approach were added, one could start from ideas such as, if
an infected person is infectious but not symptomatic for say 6 days (or whatever the figure is), this might lead to there being a
moving-average-type component extending over 6 days. If any of this
showed anything worthwhile, one could then be more ambitious (but
probably unnecessary) by developing a doubly-stochastic Poisson or
Binomial model.
David Jones <dajhawkxx@nowherel.com> wrote:
If one goes on from Rich's comments about dependencies possibly
explaining some of the variation, it may be good to leave off from
the Poisson/Binomial questions and do a fairly basic time-series
analysis. Thus someone could do an autocorrelation anlayis of the
counts (or square roots of counts if this seems good). This could
be extended to a cross-correlation analysis between states. And ...
if a simple ARMA modelling approach were added, one could start
from ideas such as, if an infected person is infectious but not
symptomatic for say 6 days (or whatever the figure is), this might
lead to there being a moving-average-type component extending over
6 days. If any of this showed anything worthwhile, one could then
be more ambitious (but probably unnecessary) by developing a doubly-stochastic Poisson or Binomial model.
I have aligned the data for the 11 states with the highest infection
rates:
NewYork NewJersey Illinois Massachusetts California Pennsylvania Michigan Connecticut Florida Texas Georgia
After alignment there are only 80 days, or so, of data. There's
not much time series stuff I can do with that. New York strongly
dominates and the aforementioned weekly variation in the data
is pronounced. Smoothing out the weekly variations reduces the
number of independent data points to a dozen or so.
When the data are aligned you can see that the states follow their
own path through the stages of infection. For instance I have looked
at Illinois together with Wisconsin. It is evident that Wisconsin
suffered from a spillover from the infection in Illinois. As
Illinois grows the infection spills across the border.
I can post the aligned data if anyone is interested. The file
is about 7K bytes.
David Jones <dajhawkxx@nowherel.com> wrote:
The latest results/methodology from survey analysis in the UK are
given at:
Can I be reading this correctly: outside of hospitals only 0.1% of
the population in the UK have been infected? Either they have done a wonderful job of social isolation, or Covid19 is a fizzle in the UK.
If all those people were then to die of the infection the lethality
would be no worse than seasonal flu.
David Jones <dajhawkxx@nowherel.com> wrote:
The latest results/methodology from survey analysis in the UK are given
at:
https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020
Can I be reading this correctly: outside of hospitals only 0.1% of the population
in the UK have been infected? Either they have done a wonderful job of social >isolation, or Covid19 is a fizzle in the UK. If all those people were then
to die of the infection the lethality would be no worse than seasonal flu.
On Fri, 5 Jun 2020 16:33:51 -0000 (UTC), root <NoEMail@home.org>
wrote:
David Jones <dajhawkxx@nowherel.com> wrote:
given >> at:The latest results/methodology from survey analysis in the UK are
Can I be reading this correctly: outside of hospitals only 0.1% of
the population in the UK have been infected? Either they have done
a wonderful job of social isolation, or Covid19 is a fizzle in the
UK. If all those people were then to die of the infection the
lethality would be no worse than seasonal flu.
As I read it, 0.1% have the active disease at any given
time. The new infections amount to 0.07% per week.
That implies that the duration of infection is 10 days,
for these people outside the hospitals. That surprises
me. If it is six days until symptoms, even the folks with
symptoms must show them for only a few days. I
thought the disease was more tenacious.
I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about the >following list of articles associtaed with the Significance magazine:
https://www.significancemagazine.com/business/647
The list features the UK rather heavily and relates to the RSS Covid-19
Task Force, which is outlined here:
https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/
and:
https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/
root wrote:
David Jones <dajhawkxx@nowherel.com> wrote:
If one goes on from Rich's comments about dependencies possibly
explaining some of the variation, it may be good to leave off from
the Poisson/Binomial questions and do a fairly basic time-series
analysis. Thus someone could do an autocorrelation anlayis of the
counts (or square roots of counts if this seems good). This could
80 time points is about the size of data-sets used in econometric data, >weather data etc. for corelation-type analysis and time-series
modelling.But keep as mucgh data as possible for autocorrelatons.
I would not suggest starting by smoothing the data, particularly if you
want to look at short-term variations. As there are marked day-of-week >effects, you would expecta raw auto-correlation analysis to be
overwhelmed by this effect. This suggests a need to "detrend" the data
to remove this effect (and if there are any special holidays), and if
there are long-terms trends you might well want to remove such trends
as well.
This would all be lot of work, and you would need to consider if the
possible outcomes fit in with what you are interested in. Any temporal >correlations in local variations might relate to how long an infected
person goes on infecting other people.
On Fri, 5 Jun 2020 17:07:18 +0000 (UTC), "David Jones" <dajhawkxx@nowherel.com> wrote:
root wrote:
from >> > the Poisson/Binomial questions and do a fairly basicDavid Jones <dajhawkxx@nowherel.com> wrote:
If one goes on from Rich's comments about dependencies possibly
explaining some of the variation, it may be good to leave off
time-series >> > analysis. Thus someone could do an autocorrelation
anlayis of the >> > counts (or square roots of counts if this seems
good). This could
...
80 time points is about the size of data-sets used in econometric
data, weather data etc. for corelation-type analysis and time-series modelling.But keep as mucgh data as possible for autocorrelatons.
I would not suggest starting by smoothing the data, particularly if
you want to look at short-term variations. As there are marked
day-of-week effects, you would expecta raw auto-correlation
analysis to be overwhelmed by this effect. This suggests a need to "detrend" the data to remove this effect (and if there are any
special holidays), and if there are long-terms trends you might
well want to remove such trends as well.
I wasn't thinking of the sort of dependency that would
show up in these data as autocorrelation across days for
these data.
You do see dependency n the existence of clusters. I don't think
you can call those "independent random observations of the
infections from each single case." You have clusters when you
have multiple cases from a nursing home, a prison, a factory, or
after a choir practice or church service.
Rich Ulrich wrote:
On Fri, 5 Jun 2020 16:33:51 -0000 (UTC), root <NoEMail@home.org>
wrote:
David Jones <dajhawkxx@nowherel.com> wrote:
given >> at:The latest results/methodology from survey analysis in the UK are
Can I be reading this correctly: outside of hospitals only 0.1% of
the population in the UK have been infected? Either they have
done a wonderful job of social isolation, or Covid19 is a fizzle
in the UK. If all those people were then to die of the infection
the lethality would be no worse than seasonal flu.
As I read it, 0.1% have the active disease at any given
time. The new infections amount to 0.07% per week.
That implies that the duration of infection is 10 days,
for these people outside the hospitals. That surprises
me. If it is six days until symptoms, even the folks with
symptoms must show them for only a few days. I
thought the disease was more tenacious.
Those who develop bad symptoms will be quickly moved to hospital (no
cost worries with the NHS) and so are not in the outside population
for long. Those who don't develop bad symptoms have those lesser
symptoms (and count as infected) for a relatively short time.
I wasn't thinking of the sort of dependency that would
show up in these data as autocorrelation across days for
these data.
You do see dependency n the existence of clusters. I don't think
you can call those "independent random observations of the
infections from each single case." You have clusters when you
have multiple cases from a nursing home, a prison, a factory, or
after a choir practice or church service.
There are two versions of this: one where the situation is such that a
high number of cases are generated for a a period of several days and a >second where a high number of cases are recorded on a single day, as in
the surpise discovery of a bad situation in an unnoticed care home. I
think either of these could be modelled (as random-in-time occurences
of such situations) such as to lead to serial correleation in the
counts. THe aurocorrelations may not be the best way to detect such
effects, but they are notionally easy to compute for a data-series. On
a theoretical basis they correspond to short periods of time where the
rate of occurence is high compared to a background rate.
root wrote:
David Jones <dajhawkxx@nowherel.com> wrote:
The latest results/methodology from survey analysis in the UK are
given at:
https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020
Can I be reading this correctly: outside of hospitals only 0.1% of
the population in the UK have been infected? Either they have done a
wonderful job of social isolation, or Covid19 is a fizzle in the UK.
If all those people were then to die of the infection the lethality
would be no worse than seasonal flu.
Not "only 0.1% of the population in the UK have been infected",
instead "only 0.1% of the population in the UK presently have the
infection." Thus it excludes those who have either recovered after
showing symptoms, or recovered without showing symptoms. This refers to
17 May to 30 May 2020. This perentage has gone own a lot, but I don't
know the peak value.
In the UK, much of the infections have occured within care-homes and
within hospitals as opposed to being new cases entering those places
having been detected as already having the infection. This probably
relates to the lack of fully effctive personal protection equipment at
the early and middle stages of the epidemic (and the close contacts in
those places).
Deaths so far with confirmed Covid-19 have just passed 40,000 (counting
only deats in hospital or care homes). Excess deaths compared with what
would be expected in normal year are around 60,000. These numbers are
higher than those reported in any other country except the USA, so
would be judged high I guess. THe problem is that other countries
report deaths on different bases ... for example, I have read that in
Germany if someone dies of a heart attack while suffering Covid-19,
it would be counted only as a heart attack but in the UK it would be
counted in the Covid-9 totals. Even more, those who survive Covid-19
having had the extreme version of symptoms will have had a very extreme >experience.
Total Death rates in UK and USA are presently 600 and 330 per million, >respectively according to >https://www.statista.com/statistics/1104709/coronavirus-deaths-worldwide-per-million-inhabitants/
THe UK value is second highest in the world, but with the above caveat
about comparability.
On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones"
<dajhawkxx@gmail.com> wrote:
I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about the following list of articles associtaed with the Significance
magazine:
https://www.significancemagazine.com/business/647
The list features the UK rather heavily and relates to the RSS
Covid-19 Task Force, which is outlined here:
and:
https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/
Thanks - I finally got around to checking those, and I've
read some good articles already.
Rich Ulrich wrote:
On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones"https://rss.org.uk/news-publication/news-publications/2020/general-news/rss-launches-new-covid-19-task-force/
<dajhawkxx@gmail.com> wrote:
I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about the
following list of articles associtaed with the Significance
magazine:
https://www.significancemagazine.com/business/647
The list features the UK rather heavily and relates to the RSS
Covid-19 Task Force, which is outlined here:
and:
https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/
Thanks - I finally got around to checking those, and I've
read some good articles already.
The pages are often updated, so may be worth checking again. For
example, this is a recent (10 June) BBC radio podcast : https://www.bbc.co.uk/sounds/play/m000jw02
titled "Antibody tests, early lockdown advice and European deaths" and
it discusses various problems with the statistics and modelling used.
David Jones <dajhawkxx@nowherel.com> wrote:
Rich Ulrich wrote:
the >> > following list of articles associtaed with the SignificanceOn Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones"
<dajhawkxx@gmail.com> wrote:
I have not tried to follow any of the above. But, anyone with a
statistical interest in this epidemic should probably know about
magazine:
https://www.significancemagazine.com/business/647
The list features the UK rather heavily and relates to the RSS
Covid-19 Task Force, which is outlined here:
and:
https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/
Thanks - I finally got around to checking those, and I've
read some good articles already.
The pages are often updated, so may be worth checking again. For
example, this is a recent (10 June) BBC radio podcast : https://www.bbc.co.uk/sounds/play/m000jw02
titled "Antibody tests, early lockdown advice and European deaths"
and it discusses various problems with the statistics and modelling
used.
One of the papers in the RSS publication news URL above is:
How many people are infected with Covid-19?
At least one of the references cited in this paper arrive at numbers comparable to those I have inferred from the reported data.
I tried to contact both the RSS and the CDC about my method for
estimating that number directly from the reported data. I have, as
yet, received no response from either group.
Rich Ulrich wrote:
On Mon, 1 Jun 2020 18:39:49 +0000 (UTC), "David Jones" <dajhawkxx@gmail.com> wrote:
I have not tried to follow any of the above. But, anyone with a statistical interest in this epidemic should probably know about
the following list of articles associtaed with the Significance
magazine:
https://www.significancemagazine.com/business/647
The list features the UK rather heavily and relates to the RSS
Covid-19 Task Force, which is outlined here:
and:
https://rss.org.uk/policy-campaigns/policy/covid-19-task-force/
Thanks - I finally got around to checking those, and I've
read some good articles already.
The pages are often updated, so may be worth checking again. For
example, this is a recent (10 June) BBC radio podcast : https://www.bbc.co.uk/sounds/play/m000jw02
titled "Antibody tests, early lockdown advice and European deaths" and
it discusses various problems with the statistics and modelling used.
The Swedish study from the end of April that estimated
26% coronavirus antibody was been replaced with a
claim of 7%. And disappointment in Sweden.
The CDC released estimates last week that gave five models,
all of which estimated huge population exposures. That
study, released online, was using data from April, too. It
was criticised for lacking citations, and for producing those
rates, outside the usual range, without decent explanation. >https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html
What I like about the reference is it gives some numbers
for things like "hospitalizations" and "mean days" ....
The Chinese have done so much testing that they ought to
have data that would settle some questions. I don't know if
no one has seen it, or if no one trusts it.
What I believed a month ago is little changed. The best
extrapolated "true infection" rates may be what you get by
starting with the reported fatality rate, adjusting that for biases
you can guess, discounting for excess fatalities in care homes,
and multiplying by 100 or 150 to account for a fatality rate
between 0.67% and 1%.
Rich Ulrich <rich.ulrich@comcast.net> wrote:
What I believed a month ago is little changed. The best
extrapolated "true infection" rates may be what you get by
starting with the reported fatality rate, adjusting that for biases
you can guess, discounting for excess fatalities in care homes,
and multiplying by 100 or 150 to account for a fatality rate
between 0.67% and 1%.
In other words, a guess. The proposed numbers agree pretty
well with what I derived in an earlier post to which you
objected.
I still consider the non-Poisson daily counts as reflecting the
clumping of cases, owing to artifacts of reporting in the cases
where it doesn't owe to super-spread events or care homes.
Rich Ulrich <rich.ulrich@comcast.net> wrote:
I still consider the non-Poisson daily counts as reflecting the
clumping of cases, owing to artifacts of reporting in the cases
where it doesn't owe to super-spread events or care homes.
An article in a recent WSJ indicated that Covid testing is
employing pooling, a method developed in WW2 for syphilis
testing. As described, pooling involves pooling a number
of blood samples and testing the pool for antibodies.
If the pool is clear then all the samples in the pool
are clear. If the pool is not clear then the samples
are tested again individually. Another article I read
said the pools now consist of 5 samples.
A little math will reveal that a pool of 5 samples
is optimum (in the sense of minimum tests) for a
population with a 20% infection rate. This suggests
that 20% is the rate at which samples are proving positive.
At 20% infection rate there is not much chance to do
better than this pooling method. But, if the infection
rate were much less there is a vastly superior testing
method which involve sequential pooling.
As I like to consider coin problems, we have a batch
of suspect coins for which it is known that the bad coins
are always lighter than good coins. How should they
be tested if we only have a balance scale?
On Fri, 10 Jul 2020 18:54:08 -0000 (UTC), root <NoEMail@home.org>
wrote:
Rich Ulrich <rich.ulrich@comcast.net> wrote:
I still consider the non-Poisson daily counts as reflecting the
clumping of cases, owing to artifacts of reporting in the cases
where it doesn't owe to super-spread events or care homes.
An article in a recent WSJ indicated that Covid testing is
employing pooling, a method developed in WW2 for syphilis
testing. As described, pooling involves pooling a number
of blood samples and testing the pool for antibodies.
If the pool is clear then all the samples in the pool
are clear. If the pool is not clear then the samples
are tested again individually. Another article I read
said the pools now consist of 5 samples.
A little math will reveal that a pool of 5 samples
is optimum (in the sense of minimum tests) for a
population with a 20% infection rate. This suggests
that 20% is the rate at which samples are proving positive.
I think that Fauci mentioned pooling, and used "10".
One limiting factor - for some tests, anyway -- is how
much the dilution of the sample affects the sensitivity.
I read that Abbott's quick test (15 minutes) originally
allowed for either dry-swab or wet-stored-swab, but they
changed the instructions when the wet-swabs showed
lower sensitivity -- which was attributed to dilution.
That was in the discussion after an outside lab found
very low sensitivity for that test. Latest instructions:
"Now, the company says only direct swabs from
patients should be inserted into the machine."
At 20% infection rate there is not much chance to do
better than this pooling method. But, if the infection
rate were much less there is a vastly superior testing
method which involve sequential pooling.
As I like to consider coin problems, we have a batch
of suspect coins for which it is known that the bad coins
are always lighter than good coins. How should they
be tested if we only have a balance scale?
By thirds. If two thirds balance, the other third is off.
That's the trick for a single bad coin. I expect it
generalizes, but correcting multiple errors does get
trickier.
The idea is to do very few tests to identify rare infected individuals
among large populations. For multidimensional pooling, each indiudual
sample is put into several different pools, and those pools which turn
out to test positive can lead to a quick identification of infected candidates from very few actual tests.
On the topic of testing by pooling, there is some relevant discussion
of multidimensional pooling in the following BBC radio podcast,
starting at at out 10:45 for about 10 minutes: >https://www.bbc.co.uk/sounds/play/w3cszh0k
This should be accessible worldwide. Blurb says:
"African scientists have developed a reliable, quick and cheap testing
method which could be used by worldwide as the basis for mass testing >programmes.
The method, which produces highly accurate results, is built around >mathematical algorithms developed at the African Institute for
Mathematical Sciences in Kigali. We speak to Neil Turok who founded the >institute, Leon Mutesa Professor of human genetics on the government >coronavirus task force, and Wilfred Ndifon, the mathematical biologist
who devised the algorithm."
The idea is to do very few tests to identify rare infected individuals
among large populations. For multidimensional pooling, each indiudual
sample is put into several different pools, and those pools which turn
out to test positive can lead to a quick identification of infected >candidates from very few actual tests.
David Jones <dajhawkxx@nowherel.com> wrote:
The idea is to do very few tests to identify rare infected individuals
among large populations. For multidimensional pooling, each indiudual
sample is put into several different pools, and those pools which turn
out to test positive can lead to a quick identification of infected
candidates from very few actual tests.
The results of sequential pooling when the infection is rare are
surprising: For a condition which affects .1% of the population,
the infected people in a population of 1 million can be isolated
with only a few thousand tests. Certainly under 4.000. Were it
not for random variations in the binomial distribution it could
be done in about 1,700 tests.
I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.
On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
wrote:
David Jones <dajhawkxx@nowherel.com> wrote:
individuals >> among large populations. For multidimensional pooling,The idea is to do very few tests to identify rare infected
each indiudual >> sample is put into several different pools, and
those pools which turn >> out to test positive can lead to a quick identification of infected >> candidates from very few actual tests.
The results of sequential pooling when the infection is rare are surprising: For a condition which affects .1% of the population,
the infected people in a population of 1 million can be isolated
with only a few thousand tests. Certainly under 4.000. Were it
not for random variations in the binomial distribution it could
be done in about 1,700 tests.
I am a bit surprised that the whole topic of "pooling" for coronavirus testing made a tiny splash and disappeared, in the media that I
read and view.
On the other hand, the need for pooling seems ever more
apparent, now that doctors and pundits have started repeating
(over and over) that a week between test and results is far too
long. One pundit suggested that "capitalist incentive" would fix the
delays, if no one had to pay for a test that took more than 48 hours.
I've heard exactly one interview about pooling which may have
already started. One university (US) is using its own lab resources,
for what they are doing now (I think) and what they plan for
students (too) when they open. Frequent re-tests mean that
they will need many thousands of test results per day.
I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.
Rich Ulrich <rich.ulrich@comcast.net> wrote:
I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.
The size of the pool depends upon the expected frequency of
the infection. A pool size of 10 would be too large if
the expected fraction of infected is 0.5. The optimum size
of the pool depends upon the proposed schedule of testing:
what you do if the pool tests positive. Sequential testing
yields the mimimum number of tests. With the observed frequency
of infected and the schedule of using individual tests after
a pool has failed, the optimum pool size is now around 5.
I have read that dilution does impose a limit on pool size.
Rich Ulrich wrote:
On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
wrote:
David Jones <dajhawkxx@nowherel.com> wrote:individuals >> among large populations. For multidimensional pooling,
The idea is to do very few tests to identify rare infected
each indiudual >> sample is put into several different pools, and
those pools which turn >> out to test positive can lead to a quick
identification of infected >> candidates from very few actual tests.
The results of sequential pooling when the infection is rare are
surprising: For a condition which affects .1% of the population,
the infected people in a population of 1 million can be isolated
with only a few thousand tests. Certainly under 4.000. Were it
not for random variations in the binomial distribution it could
be done in about 1,700 tests.
I am a bit surprised that the whole topic of "pooling" for coronavirus
testing made a tiny splash and disappeared, in the media that I
read and view.
On the other hand, the need for pooling seems ever more
apparent, now that doctors and pundits have started repeating
(over and over) that a week between test and results is far too
long. One pundit suggested that "capitalist incentive" would fix the
delays, if no one had to pay for a test that took more than 48 hours.
I've heard exactly one interview about pooling which may have
already started. One university (US) is using its own lab resources,
for what they are doing now (I think) and what they plan for
students (too) when they open. Frequent re-tests mean that
they will need many thousands of test results per day.
I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.
In the UK, the recently identified need to recompute the Covid
statistics because of double counting etc. throws some doubt on the
ability of the bureaucracy to cope with pooled testing (needing records
of those in each pool).At least some of the poor performance of the
"test and trace" initiative has been attributed to poor record keeping
and uncooperative response from testees.
On the subject of pooling, In the UK there has been a push for research
on testing of sewage out-falls for early evidence of Covid outbreaks
... https://www.bbc.co.uk/news/science-environment-53635692
On the subject of the time taken for outcomes of Covid tests, the UK
has news of tests that take 90 minutes for the result ... >https://www.imperial.ac.uk/news/201073/90-minute-covid-19-tests-government-orders/
Rich Ulrich wrote:
On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
wrote:
David Jones <dajhawkxx@nowherel.com> wrote:individuals >> among large populations. For multidimensional pooling,
The idea is to do very few tests to identify rare infected
each indiudual >> sample is put into several different pools, and
those pools which turn >> out to test positive can lead to a quick
identification of infected >> candidates from very few actual tests.
The results of sequential pooling when the infection is rare are
surprising: For a condition which affects .1% of the population,
the infected people in a population of 1 million can be isolated
with only a few thousand tests. Certainly under 4.000. Were it
not for random variations in the binomial distribution it could
be done in about 1,700 tests.
I am a bit surprised that the whole topic of "pooling" for coronavirus
testing made a tiny splash and disappeared, in the media that I
read and view.
On the other hand, the need for pooling seems ever more
apparent, now that doctors and pundits have started repeating
(over and over) that a week between test and results is far too
long. One pundit suggested that "capitalist incentive" would fix the
delays, if no one had to pay for a test that took more than 48 hours.
I've heard exactly one interview about pooling which may have
already started. One university (US) is using its own lab resources,
for what they are doing now (I think) and what they plan for
students (too) when they open. Frequent re-tests mean that
they will need many thousands of test results per day.
I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.
In the UK, the recently identified need to recompute the Covid
statistics because of double counting etc. throws some doubt on the
ability of the bureaucracy to cope with pooled testing (needing records
of those in each pool).At least some of the poor performance of the
"test and trace" initiative has been attributed to poor record keeping
and uncooperative response from testees.
On the subject of pooling, In the UK there has been a push for research
on testing of sewage out-falls for early evidence of Covid outbreaks
... https://www.bbc.co.uk/news/science-environment-53635692
On the subject of the time taken for outcomes of Covid tests, the UK
has news of tests that take 90 minutes for the result ... >https://www.imperial.ac.uk/news/201073/90-minute-covid-19-tests-government-orders/
On Tue, 14 Jul 2020 08:02:04 -0000 (UTC), root <NoEMail@home.org>
wrote:
David Jones <dajhawkxx@nowherel.com> wrote:
The idea is to do very few tests to identify rare infected individuals
among large populations. For multidimensional pooling, each indiudual
sample is put into several different pools, and those pools which turn
out to test positive can lead to a quick identification of infected
candidates from very few actual tests.
The results of sequential pooling when the infection is rare are >>surprising: For a condition which affects .1% of the population,
the infected people in a population of 1 million can be isolated
with only a few thousand tests. Certainly under 4.000. Were it
not for random variations in the binomial distribution it could
be done in about 1,700 tests.
I am a bit surprised that the whole topic of "pooling" for coronavirus >testing made a tiny splash and disappeared, in the media that I
read and view.
On the other hand, the /need/ for pooling seems ever more
apparent, now that doctors and pundits have started repeating
(over and over) that a week between test and results is far too
long. One pundit suggested that "capitalist incentive" would fix the
delays, if no one had to pay for a test that took more than 48 hours.
I've heard exactly one interview about pooling which may have
already started. One university (US) is using its own lab resources,
for what they are doing now (I think) and what they plan for
students (too) when they open. Frequent re-tests mean that
they will need many thousands of test results per day.
I don't know if "dilution" puts a limit on how many samples
can be pooled -- the interviewee talked about combining 10
samples for one lab test. That "10" could have for illustration,
or based on dilution, or based on expected Positives.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 292 |
Nodes: | 16 (2 / 14) |
Uptime: | 179:20:51 |
Calls: | 6,616 |
Calls today: | 3 |
Files: | 12,165 |
Messages: | 5,314,092 |