Hi:
We could easily find in the literature that a study used more than
one performance metric for the hypothesis test without explicitly and
clearly stating what hypothesis this study aims to test. Often the
paper only states that it intends to test if a newly developed object (algorithm, drug, device, technique, etc) would perform better than
some chosen benchmarks. Then the paper presents some tables
summarizing the results of many comparisons. Among the tables, the
paper picks those comparisons having better values of some
performance metric and showing statistical significance. Finally, the
paper claims that the new object is successful since it has some
favorable results that are statistically significant.
This looks odd. SHouldn't we clearly define the hypothesis before conducting any tests? For example, shouldn't we define the success of
the object to be "having all the chosen metrics have better results"? Otherwise, why would we test so many metrics, instead of only one?
The aforementioned approach looks like this: we do not know what
would happen. So let's pick some commonly used metrics to test if we
could get some of them to show favorable and significant results.
Anyway, what are the correct or rigorous ways to conduct tests
with multiple metrics?
Cosine wrote:
Hi:
We could easily find in the literature that a study used more than
one performance metric for the hypothesis test without explicitly and
clearly stating what hypothesis this study aims to test.
Often the
paper only states that it intends to test if a newly developed object
(algorithm, drug, device, technique, etc) would perform better than
some chosen benchmarks. Then the paper presents some tables
summarizing the results of many comparisons. Among the tables, the
paper picks those comparisons having better values of some
performance metric and showing statistical significance. Finally, the
paper claims that the new object is successful since it has some
favorable results that are statistically significant.
This looks odd. SHouldn't we clearly define the hypothesis before
conducting any tests? For example, shouldn't we define the success of
the object to be "having all the chosen metrics have better results"?
Otherwise, why would we test so many metrics, instead of only one?
The aforementioned approach looks like this: we do not know what
would happen. So let's pick some commonly used metrics to test if we
could get some of them to show favorable and significant results.
Anyway, what are the correct or rigorous ways to conduct tests
with multiple metrics?
You might want to search for the terms "multiple testing" and
"Bonferroni correction".
On Sat, 18 Mar 2023 01:25:44 -0000 (UTC), "David Jones" <dajhawkxx@nowherel.com> wrote:
Cosine wrote:
Anyway, what are the correct or rigorous ways to conduct tests
with multiple metrics?
You might want to search for the terms "multiple testing" and
"Bonferroni correction".
That answers the final question -- assuming that you do have
some stated hypothesis or goal.
On Sat, 18 Mar 2023 01:25:44 -0000 (UTC), "David Jones" <dajhawkxx@nowherel.com> wrote:
Cosine wrote:
and >> clearly stating what hypothesis this study aims to test.Hi:
We could easily find in the literature that a study used more than
one performance metric for the hypothesis test without explicitly
That sounds like a journal with reviewers who are not doing their job.
A new method may have better sensitivity or specificity, making it
useful as a second test. If it is cheaper/easier, that virtue might
justify slight inferiority. If it is more expensive, there should be
a gain in accuracy to justify its application (or, it deserves further development).
Often theobject >> (algorithm, drug, device, technique, etc) would perform
paper only states that it intends to test if a newly developed
better than >> some chosen benchmarks. Then the paper presents some
tables >> summarizing the results of many comparisons. Among the
tables, the >> paper picks those comparisons having better values of
some >> performance metric and showing statistical significance.
Finally, the >> paper claims that the new object is successful since
it has some >> favorable results that are statistically significant.
of >> the object to be "having all the chosen metrics have better
This looks odd. SHouldn't we clearly define the hypothesis before
conducting any tests? For example, shouldn't we define the success
results"? >> Otherwise, why would we test so many metrics, instead
of only one? >>
we >> could get some of them to show favorable and significantThe aforementioned approach looks like this: we do not know what
would happen. So let's pick some commonly used metrics to test if
results.
I am not comfortable with your use of the word 'metrics' -- I like
to think of improving the metrics of a scale by taking a power transformation, like, square root for Poisson, etc.
Or, your metric for measuring 'size' might be area, volume, weight....
Anyway, what are the correct or rigorous ways to conduct tests
with multiple metrics?
You might want to search for the terms "multiple testing" and
"Bonferroni correction".
That answers the final question -- assuming that you do have
some stated hypothesis or goal.
Rich Ulrich wrote:
On Sat, 18 Mar 2023 01:25:44 -0000 (UTC), "David Jones"
<dajhawkxx@nowherel.com> wrote:
Cosine wrote:
Anyway, what are the correct or rigorous ways to conduct tests
with multiple metrics?
You might want to search for the terms "multiple testing" and
"Bonferroni correction".
That answers the final question -- assuming that you do have
some stated hypothesis or goal.
Not quite. The "Bonferroni correction" is an approximation, and one
needs to think about that, and more deeply than jut the approximation
to 1-(1-p)^n. More deeply, the formula is exact and valid if all the >test-statistics are statistically independent, it is conservative if
there is positive dependence (and so "OK"). But, theoretically, it
might be wildly wrong if there is negative dependence
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 186:47:30 |
Calls: | 6,738 |
Calls today: | 1 |
Files: | 12,268 |
Messages: | 5,366,264 |