• Q interpretations for different types of comparisons

    From Cosine@21:1/5 to All on Sat Feb 11 08:07:26 2023
    Hi:

    We have a new method, A, and some benchmarks: B1, B2, and B3.

    We compare the performances of the above methods. Each comparison uses a two-sided test.

    Are the first two types of comparisons identical?

    Is the interpretation of type-3 correct?

    Type-1:
    all significant: A > B1, A > B2, and A > B3 => claim: A is superior to all benchmarks, i.e., A is the best among all these 4 methods.

    Type-2:

    All significant: B1 > B2 and B1 > B3 => B1 is the best among all B's.
    Significant: A > B1 => A is superior to all benchmarks, i.e., A is the best among all these 4 methods.

    Type-3:

    All significant: B1 > B2 and B1 > B3 => B1 is the best among all B's.
    Non-significant: A > B1 => accepting the H0, i.e., performance of A and B1
    has no difference => A is better than B2 and B3.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Sat Feb 11 14:43:15 2023
    On Sat, 11 Feb 2023 08:07:26 -0800 (PST), Cosine <asecant@gmail.com>
    wrote:

    Hi:

    We have a new method, A, and some benchmarks: B1, B2, and B3.

    We compare the performances of the above methods. Each comparison uses a two-sided test.

    Are the first two types of comparisons identical?

    Is the interpretation of type-3 correct?

    Type-1:
    all significant: A > B1, A > B2, and A > B3 => claim: A is superior to all benchmarks, i.e., A is the best among all these 4 methods.

    Clearly -



    Type-2:

    All significant: B1 > B2 and B1 > B3 => B1 is the best among all B's.
    Significant: A > B1 => A is superior to all benchmarks, i.e., A is the best among all these 4 methods.

    Not entirely CLEARLY. Have you ever drawn lines that underline
    the 'not-different' groups, for post-hoc testing? The basic theory,
    which gives no inconsistencies in real data, assumes that Ns and
    variances are equal. Real data can yield 'weird' results if you look
    at the separate two-group tests; so, the recommended algorithms
    perform two-group tests that use the all-group variance, and fake
    the group Ns to be the same.

    So, this is "True" by inference which assumes 'nothing weird is
    happening.'

    I've done testing against a benchmark which entailed paired-tests;
    for paired data, 'nothing weird' also assumes that the r's are not
    different.


    Type-3:

    All significant: B1 > B2 and B1 > B3 => B1 is the best among all B's.
    Non-significant: A > B1 => accepting the H0, i.e., performance of A and B1
    has no difference => A is better than B2 and B3.

    No. Doesn't even depend on varainces and Ns.

    It is easy to imagine that B1 is slightly better than A, though
    not significiant; and the difference is enough so that A is not
    'better' (significantly) than B2 and B3. This is a common picture
    in post-hoc drawings: (B1,A) underlined together as not-different,
    and (A, B2, B3) underlined together as not-different

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)