• Q Test how good a new method is

    From Cosine@21:1/5 to All on Fri Sep 13 04:45:09 2019
    Hi:

    Given a method as the reference golden method, how do we statistically show that a new method is better than the reference one?

    One way I could think of is to define a statistical variable, I, and then conduct a hypothesis test to see if avg( I_new-I_ref ) >0, which means that in average the difference of the performance of the new method and of the reference one is greater than
    zero.

    But how do we define one such statistical variable? Also, I heard that sometimes people would conduct hypothesis for multiple variables. In that case, how do we know that the new method is better than the reference one or not?

    Thank you,

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Sun Sep 15 02:57:20 2019
    On Fri, 13 Sep 2019 04:45:09 -0700 (PDT), Cosine <asecant@gmail.com>
    wrote:

    Hi:

    Given a method as the reference golden method, how do we statistically show that a new method is better than the reference one?

    My first thought was something about, "contradiction in terms".

    Pragmatically, though, I have come up with an example based
    on strong measurement assumptions, and an example based
    on retreating to the original sort of data that makes a clinical
    predictor "golden".

    Measurement Assumption: If we assume that the underlying
    variable is smoothly varying across time, then you can compare
    the "jitter" - noise - in two time series of (closely spaced)
    measures.

    Retreat to basic data: A clincal indicator (say) is "golden" if it
    predicts the eventual occurrance of some event (death?).
    An alternative that proves to be more accurate in the long run (cross-validation studies) becomes the new golded standard.

    In practice, this may use ROC curves, which balance false-
    negatives against false-positives. For instance, the TB scratch-
    test is "positive" for a reaction larger than come conventional
    and specific size. There is one ROC curve based on size. Do you
    call it a different "method" (I don't) if you use a different cutoff?

    A totally different method would have a different curve, and
    it might reflect a different phenomenon. For instance, detecting
    active TB bacteria is not the same as detecting TB antibodies.
    So: What is the purpose of the test?


    One way I could think of is to define a statistical variable, I, and then conduct a hypothesis test to see if avg( I_new-I_ref ) >0, which means that in average the difference of the performance of the new method and of the reference one is greater
    than zero.

    I don't follow.


    But how do we define one such statistical variable? Also, I heard that sometimes people would conduct hypothesis for multiple variables. In that case, how do we know that the new method is better than the reference one or not?

    The difficulties rasied by "gold standards" are logical, not
    statistical. If you have more than one purpose in mind, it is
    very easy to suggest that there might be more than one
    "gold standard" test to meet the several purposes.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cosine@21:1/5 to All on Tue Sep 24 13:29:43 2019
    Let's consider testing the effects of different types of fertilizer to increase a given type of crop. We want to know which type of fertilizer would make a given area of land product the maximum amounts of crop.

    Suppose we have 4 types of fertilizer. Does it mean that we need to conduct C(5,2) times paired student t-tests to determine the best fertilizer?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Wed Sep 25 00:19:08 2019
    On Tue, 24 Sep 2019 13:29:43 -0700 (PDT), Cosine <asecant@gmail.com>
    wrote:

    Let's consider testing the effects of different types of fertilizer to increase a given type of crop. We want to know which type of fertilizer would make a given area of land product the maximum amounts of crop.

    This looks like a question that should have been given
    its own Subject line. Plus, your first question about a
    "new method" included the notion of a golden standard
    or reference, something to be improved upon.


    Suppose we have 4 types of fertilizer. Does it mean that we need to conduct C(5,2) times paired student t-tests to determine the best fertilizer?

    You probably want to frame your hypthetical design with
    something other than Farming problems. "Latin Squares"
    are designs from 90 years ago for comparing outcomes in
    farming while controlling for /some/ outside factors; and
    there are other design variations that protect against
    other confounding factors you may get with plots of land.

    If you want to compare 5 samples that have equal
    a-priori chances of being best, you might start with
    an ANOVA with 5 groups. A post-hoc tests like Tukey's
    Honestly Significant Difference would let you make
    statements about which ones may be superior to
    which others, and which ones should so far be grouped
    together.

    The HSD compensates for the fact that multiple tests
    are being performed, so it is more conservative than
    a single t-test. Doing the testing in one ANOVA gains
    robustness by estimating the error-variance from all
    five of the groups, instead of only the two used in a
    particular comparison.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)