Hi:
Suppose we have 5 algorithms: A, B, C, D, and E, and we did the following two kinds of performance comparison. The performance comparison is to compare the two algorithms' values of a given performance metric, M.
Kind-1:
M_A > M_B, M_A > M_C, M_A >M_D, and M_A >M_E
Then we claim that A performs better than all the rest 4 algorithms.
Kind-2:
M_A > M_B, M_A > M_C, M_A > M_D, M_A > M_E,
M_B > M_C, M_B > M_D, M_B > M_E,
M_C > M_D, M_C > M_E, and
M_D > M_E
Then, we claim that A performs best among all the 5 algorithms.
On Fri, 11 Aug 2023 18:28:50 -0700 (PDT), Cosine
wrote:
Hi:
Suppose we have 5 algorithms: A, B, C, D, and E, and we did the following two kinds of performance comparison. The performance comparison is to compare the two algorithms' values of a given performance metric, M.
Kind-1:
M_A > M_B, M_A > M_C, M_A >M_D, and M_A >M_E
Then we claim that A performs better than all the rest 4 algorithms.It seems that you are describing the RESULT of a set
of comparisons. The two 'kinds' would be, A versus each other,
and "all comparisons among them."
You should say, "on these test data" and "better on M than ..."
and "performed" (past tense).
Kind-2:
M_A > M_B, M_A > M_C, M_A > M_D, M_A > M_E,
M_B > M_C, M_B > M_D, M_B > M_E,
M_C > M_D, M_C > M_E, and
M_D > M_E
Then, we claim that A performs best among all the 5 algorithms.
I would state that A performed better (on M) than the rest, and also
the rest were strictly ordered in how well they performed.
--
Rich Ulrich
Rich Ulrich ? 2023?8?12? ?????12:10:06 [UTC+8] ??????
On Fri, 11 Aug 2023 18:28:50 -0700 (PDT), Cosine
wrote:
Hi:It seems that you are describing the RESULT of a set
Suppose we have 5 algorithms: A, B, C, D, and E, and we did the following two kinds of performance comparison. The performance comparison is to compare the two algorithms' values of a given performance metric, M.
Kind-1:
M_A > M_B, M_A > M_C, M_A >M_D, and M_A >M_E
Then we claim that A performs better than all the rest 4 algorithms.
of comparisons. The two 'kinds' would be, A versus each other,
and "all comparisons among them."
You should say, "on these test data" and "better on M than ..."
and "performed" (past tense).
I would state that A performed better (on M) than the rest, and also
Kind-2:
M_A > M_B, M_A > M_C, M_A > M_D, M_A > M_E,
M_B > M_C, M_B > M_D, M_B > M_E,
M_C > M_D, M_C > M_E, and
M_D > M_E
Then, we claim that A performs best among all the 5 algorithms.
the rest were strictly ordered in how well they performed.
--
Rich Ulrich
In other words, if the purpose is only to demonstrate that A performed better on M than the rest 4 algorithms,
we only need to do the first kind of comparison. We do the second kind only if we want to demonstrate the ordering.
By the way. it seems that to reach the desired conclusion, both kinds of comparison require doing multiple comparisons.
The first kind requires 4 ( = 5-1 ) and the second requires C(5,2) = 10.
Therefore, if we use Bonferroni correction, the significant level will be corrected to alpha/(n-1) and alpha/C(n,2), respectively.
If we use more than one metric, e.g., M_1, to M_m, then we need to further divide the previous alphas by m, right?
But wouldn't the corrected alpha value be too small, especially when we have certain numbers of n and m?
Hmm, let's start by asking or clarifying the research questions then.
Many machine learning papers I read often used a set fo metrics to show that the developed algorithm runs the best, compared to a set of benchmarks.
Typically, the authors list the metrics like accuracy, sensitivity, specificity, the area under the receiver operating characteristic (AUC) curve, recall, F1-score, and Dice score, etc.
Next, the authors list 4-6 published algorithms as benchmarks. These algorithms have similar designs and are designed for the same purpose as the developed one, e.g., segmentation, classification, and detection/diagnosis.
Then the authors run the developed algorithm and the benchmarks using the same dataset to get the values of each of the metrics listed.
Next, the authors conduct the statistical analysis y comparing the values of the metrics to demonstrate that the developed algorithm is the best, and sometimes, the rank of the algorithms (the developed one and all the benchmarks.)
Finally, the authors pick up those results showing favorable comparisons and claim these as the contribution(s) of the developed algorithm.
This looks to me that the authors are doing the statistical tests by comparing multiple algorithms with multiple metrics to conclude the final (single or multiple) contribution(s) of the developed algorithm.
Well, let's consider a more classical problem.
Regarding the English teaching method for high school students, we
develop a new method (A1) and want to demonstrate if it performs
better than other methods (A2, A3, and A4) by comparing the average
scores of the experimental class using different methods. Each
comparison uses paired t-test. Since each comparison is independent of
the other, the correct significance level using the Bonferroni test is alpha_original/( 4-1 ).
Suppose we want to investigate if the developed method (A1) is
better than other methods (A2. A3. and A4) for English, Spanish, and
German, then the correct alpha = alpha_original/( 4-1 )/3.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 185:34:22 |
Calls: | 6,738 |
Calls today: | 1 |
Files: | 12,267 |
Messages: | 5,366,127 |