• #### Q comparing the two groups in the same or different publications

From Cosine@21:1/5 to All on Sat Oct 9 17:08:53 2021
Hi:

Suppose we did a study. In this study, we tested the effects of drugs A and B and of the placebo to treat the disease Z. We could use the t-test statistic of the random variables A and B to see if the difference between the two drugs is statistically
significant. The formula requires the sample means, standard errors, and the numbers of samples of the two samples of the drug A and B.

Now, suppose we found another study that tested the effects of drugs C and D and of the placebo to treat the disease Z. Could we determine if there are differences in treating the disease Z between drug A and C and between drug A and D again by using
the t-test statistic, given only those sample information but not the raw data?

Thank you,

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From David Duffy@21:1/5 to Cosine on Sun Oct 10 00:50:21 2021
Cosine <asecant@gmail.com> wrote:
Suppose we did a study. In this study, we tested the effects of drugs A and B
Now, suppose we found another study that tested the effects of drugs C and D
See "network meta-analysis".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Cosine@21:1/5 to All on Sun Oct 10 05:23:11 2021
What if the purpose is to compare the drug A published in paper 1, drug B in paper 2, and so on?

Could we again use the t-test for comparing the data from different papers?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Sun Oct 10 14:14:19 2021
On Sat, 9 Oct 2021 17:08:53 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

Hi:

Suppose we did a study. In this study, we tested the effects of drugs A and B and of the placebo to treat the disease Z. We could use the t-test statistic of the random variables A and B to see if the difference between the two drugs is statistically
significant. The formula requires the sample means, standard errors, and the numbers of samples of the two samples of the drug A and B.

Now, suppose we found another study that tested the effects of drugs C and D and of the placebo to treat the disease Z. Could we determine if there are differences in treating the disease Z between drug A and C and between drug A and D again by using
the t-test statistic, given only those sample information but not the raw data?

Most studies only test ONE drug against placebo. They
care about one drug, and they want all their "power" to
go to that comparison.

For the purpose of your question, comparing A to C
(or to D), you would be looking at the performance
of each drug in comparison to pbo.

Describing the studies as having "two drugs" is a red
herring, or it is a non-informative complication.

Here is a modern form of your question, of current interest --

If one Covid vaccine shows 95% protection in its main study
and another vaccine shows 90% protection in its study, can
we conclude that the first is better than the second? What

Well, as a mechanical proposition, we certainly can take the
estimates and their SEs and generate a test. But we KNOW
that the samples differed (location; age/sex/ethnicity?). If they
were in a different time frame (or, even if not), maybe they
were tested against a different dominate mutation of the virus.
The instructions for case-ascertainment may have differed.
And so on.

95% vs 90% is based on small enough numbers that, if p < 0.05,
it probably is not p< 0.001 (or better). So that "tested" difference
is unpersuasive. We /know/ that uncontrolled factors /exist/
and thus could be responsible. For establishing one is better,
a test is necessary but not sufficient. We would have heard more
if one of the vaccines had come in at only (say) 75%, which
a-priori, before the studies, based on flu vaccines, did not seem
like a terrible effiicacy.

We want to see an "effect size" large enough that it is unlikely
to have happened by chance. If those "confounding factors"
see small, or if they exist such that they would bias /against/
the better performing drug, then a test on their difference
showing a bigger difference can be a bit persuasive. There's
all those (educated) readers whom you have to convince.

For Covid, they seem to use all three obvious criteria --
getting symptoms, getting hospitalized, dying. A vaccine
does look better if it looks better on all three criteria.
Performance in whole populations (states, countries) also
washes out the idiosyncracies of the original studies.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Cosine@21:1/5 to All on Sun Oct 10 12:37:51 2021
Let's try the case for developing a new AI algorithm to help screen/detect/diagnose the disease, e.g., CoVid-19. The algorithm could use the medical images as input or use all other relevant information.

Now we would face the question of comparing the performances of different algorithms. As a standard practice, we would need to compare the newly developed algorithm against the state-of-the-art algorithms. We could implement those published algorithms
and then compare them with the new one using the same dataset we have. A more convenient alternative is to compare the performances of the new one we produced with those of the published paper using other datasets. Could we perform the second approach
using the t-test or what else should we use?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Sun Oct 10 19:03:45 2021
On Sun, 10 Oct 2021 12:37:51 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

Let's try the case for developing a new AI algorithm to help screen/detect/diagnose the disease, e.g., CoVid-19. The algorithm
could use the medical images as input or use all other relevant
information.

The picture of the lung is relatively specific. But Covid reportedly
affects a whole slew of systems. I wonder how many of them are
easy to examine and compare.

Now we would face the question of comparing the performances of
different algorithms. As a standard practice, we would need to compare
the newly developed algorithm against the state-of-the-art algorithms.
We could implement those published algorithms and then compare them
with the new one using the same dataset we have.

Yes - I think that any "algorithm" approach will always apply all
algorithms to the same data. There is ENORMOUSLY more power
in doing the "paired" comparisons than comparing to something
derived on some other sets of data, no matter how well defined
their sampling is. Presumably, you look for sensitivity and
specificity, and have to make some judgment on the cases where
two algorithms disagree (which is not possible, for two samplings).

"Gold standards" of dx may figure in, somewhere.

A more convenient
alternative is to compare the performances of the new one we produced
with those of the published paper using other datasets. Could we
perform the second approach using the t-test or what else should we
use?

What do you imagine comparing, for two different samples and
two different algorithms?
If they come up with different rates of disease, you won't know
why.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Cosine@21:1/5 to All on Mon Oct 11 08:00:33 2021
Let's clarify some points for the AI algorithms based on the dataset of patient images.

A general pattern of this kind of researches is: a new algorithm was proposed and its performance was investigated, e.g., sensitivity or specificity. This was realized by comparing the AI results against the gold standard, e.g., the PCR test or something
else. In addition to that, the paper will also present the results of other published AI algorithms to show that the proposed one is better.

If the paper implemented the published algorithms, then the standard t-test for the difference of the random variables is performed. However, sometimes, the paper chose to compare its own results with the results published in other papers. Apparently,
one cannot directly compare the sensitivity/specificity of the proposed algorithm with those of other published papers. How do we formally do this comparison then?

A sad truth is that, for CoVid-19, the publicly available and large datasets of patient images are still scarce. Maybe this is why some papers chose to compare their own results of the proposed algorithm based on a small to medium dataset with the
results of the published paper based on a large dataset.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Tue Oct 12 12:49:08 2021
On Mon, 11 Oct 2021 08:00:33 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

Let's clarify some points for the AI algorithms based on the dataset of patient images.

A general pattern of this kind of researches is: a new algorithm was
proposed and its performance was investigated, e.g., sensitivity or specificity. This was realized by comparing the AI results against the
gold standard, e.g., the PCR test or something else. In addition to
that, the paper will also present the results of other published AI algorithms to show that the proposed one is better.

Sensitivity/specificity go hand in hand. There is a whole curve to
compare. The test that is best at one extreme may not be best
at the other. One Covid-antigen survey in California, mid-2020,
used two different cut-offs for "yes, this person has been infected"
- depending on the base-rate of illness in that region. The final
estimates of disease prevalence made efforts (applied formulas)
to account for false-positives and false-negatives in the raw data.

If the paper implemented the published algorithms, then the
standard t-test for the difference of the random variables is
performed.

- paired tests - Good power, and no question about "sample"
differences.

However, sometimes, the paper chose to compare its own
results with the results published in other papers. Apparently, one
cannot directly compare the sensitivity/specificity of the proposed
algorithm with those of other published papers. How do we formally
do this comparison then?

You write, "One cannot directly [do A]... How do we formally [do A]?"

As I wrote last time: You can do the test. Then you have to argue
that your "significant" effect is large enough that it would be robust
against the likely or possible /confounding/ differences between
samples.

Your best chance of that is when the potential replacement is
tested in conditions that provide /lower/ expectations of good
outcome.

A sad truth is that, for CoVid-19, the publicly available and
large datasets of patient images are still scarce. Maybe this is why
some papers chose to compare their own results of the proposed
algorithm based on a small to medium dataset with the results of the published paper based on a large dataset.

Exploratory work. "We think we have a good competitor" because
it is cheaper and uses better science.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Cosine@21:1/5 to All on Tue Oct 12 11:51:15 2021
Rich Ulrich 在 2021年10月13日 星期三上午12:49:13 [UTC+8] 的信中寫道：
On Mon, 11 Oct 2021 08:00:33 -0700 (PDT), Cosine
wrote:
....
However, sometimes, the paper chose to compare its own
results with the results published in other papers. Apparently, one
cannot directly compare the sensitivity/specificity of the proposed algorithm with those of other published papers. How do we formally
do this comparison then?
You write, "One cannot directly [do A]... How do we formally [do A]?"

As I wrote last time: You can do the test. Then you have to argue
that your "significant" effect is large enough that it would be robust against the likely or possible /confounding/ differences between
samples.

By "we cannot directly compare ..." I meant that we cannot compare directly mu1 > mu2
and then claim that algorithm-1 performs better. However, if the other paper provided mu2, SE2,
and n2 (sample number,) we should be able to use this information to calculate the statistical
significance of the random variable (mu1-mu2) by using the t-test, since the formula of the t-test
used only those three variables of the two samples: mu, SE, and n to form a new random variable.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to All on Thu Oct 14 13:59:58 2021
On Tue, 12 Oct 2021 11:51:15 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

Rich Ulrich ? 2021?10?13? ?????12:49:13 [UTC+8] ??????
On Mon, 11 Oct 2021 08:00:33 -0700 (PDT), Cosine
wrote:
....
However, sometimes, the paper chose to compare its own
results with the results published in other papers. Apparently, one
cannot directly compare the sensitivity/specificity of the proposed
algorithm with those of other published papers. How do we formally
do this comparison then?
You write, "One cannot directly [do A]... How do we formally [do A]?"

As I wrote last time: You can do the test. Then you have to argue
that your "significant" effect is large enough that it would be robust
against the likely or possible /confounding/ differences between
samples.

By "we cannot directly compare ..." I meant that we cannot compare directly mu1 > mu2
and then claim that algorithm-1 performs better. However, if the other paper provided mu2, SE2,
and n2 (sample number,) we should be able to use this information to calculate the statistical
significance of the random variable (mu1-mu2) by using the t-test, since the formula of the t-test
used only those three variables of the two samples: mu, SE, and n to form a new random variable.

Okay, "directly" meant "with no test".

Do keep in mind my warning,
Then you have to argue
that your "significant" effect is large enough that it would be robust against the likely or possible /confounding/ differences between
samples.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)