• Q sources of correlation

    From Cosine@21:1/5 to All on Mon May 3 17:56:41 2021
    Hi:

    What are the sources causing statistical correlation or dependence?

    What are the characteristics/factors of these sources in common?

    More directly, given a particular situation, how do we identify these sources?

    Let's use the human trial as an example.

    A well-known example for eliminating the potential sources of correlation when testing the efficacy of a new drug for skin is to use the two hands of the same person as testing and control groups. Then we recruit enough persons to form the sample groups.

    This example implies that the sources of correlation exist even in the same person.

    Strangely, when we test a drug for another purpose, say, for treating headache, we form the testing and control groups by recruiting persons to each of the two groups. Why could we be sure that there are no sources of correlation in the same person
    for this case?

    Thank you,

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Cosine on Tue May 4 19:33:04 2021
    Cosine wrote:

    Hi:

    What are the sources causing statistical correlation or dependence?

    What are the characteristics/factors of these sources in common?

    More directly, given a particular situation, how do we identify these sources?

    Let's use the human trial as an example.

    A well-known example for eliminating the potential sources of
    correlation when testing the efficacy of a new drug for skin is to
    use the two hands of the same person as testing and control groups.
    Then we recruit enough persons to form the sample groups.

    This example implies that the sources of correlation exist even in
    the same person.

    Strangely, when we test a drug for another purpose, say, for
    treating headache, we form the testing and control groups by
    recruiting persons to each of the two groups. Why could we be sure
    that there are no sources of correlation in the same person for this
    case?

    Thank you,

    In principle, there are 3 ways of dealing with this ...

    (i) construct the experiment to take account of dependence. One such
    approach is to organise potential candidate samples into matched
    pairs(for example by weight), and do a paired sample analysis.

    (b) construct the experiment to properly ignore the dependence, by incorporating rabdon assignment of treatments to candidate samples. A
    fully encompassing randomisation by definition eliminates the problem
    of dependence but at the expense of removing information.

    (c) construct the experiment to incorporate the dependence by
    quantifying potenitially important dependence effects and including
    these measurements in the analysis as dependent variables or factors.

    However, any real experiment is likely to be a hybrid to some extent,
    with at least some randomisation involved in assigning which treatments
    are given to the candidate samples.

    A statistically-based book on "design of experiments" would cover this
    better.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Wed May 5 13:06:36 2021
    On Mon, 3 May 2021 17:56:41 -0700 (PDT), Cosine <asecant@gmail.com>
    wrote:

    Hi:

    I'm more than a little baffled at what Cosine is really looking
    for in an answer. David Jones has provided one sort of answer -
    Does that one satisfy?

    Here is a more philosophical approach.


    What are the sources causing statistical correlation or dependence?

    What are the characteristics/factors of these sources in common?

    This is what the sciences are about, finding correlations and
    dependence and trying to describe "causation".


    More directly, given a particular situation, how do we identify these sources?

    There are a whole lot of sciences, which each have their
    own tools. Astrophysicists work rather differently from
    biologists.


    Let's use the human trial as an example.

    A well-known example for eliminating the potential sources of correlation when testing the efficacy of a new drug for skin is to use the two hands of the same person as testing and control groups. Then we recruit enough persons to form the sample groups.

    This example implies that the sources of correlation exist even in the same person.

    It is KNOWN that age and sex are important in many human
    responses, in addition to whatever else might matter as between-
    person differences. Using each person as their own control
    effectively eliminates those sources of separate causation from
    the inference when looking at the quantitative differences of results.

    Strangely, when we test a drug for another purpose, say, for treating headache, we form the testing and control groups by recruiting persons to each of the two groups. Why could we be sure that there are no sources of correlation in the same person
    for this case?


    "...no sources of correlation in the same person" is a phrase that
    eludes my understanding.

    "Crossover designs" do make use of the same person for control
    when looking at the headache remedies you imagine.

    A trial might go a step beyond "randomizing" to use a "stratified-
    random" assignment to groups, if the PIs expect that (say) age and
    sex might matter for outcome. That "matches" the characteristics
    of groups, to elimiinate the source of variation on an ANOVA.

    Lesser factors that are suspected to have a relation to outcome
    might be "controlled for" by including covariates in the analysis.

    Including covariates is often (far) preferable to the use of "matched
    cases" when the matching is not as precise as "same person".
    - I was alarmed by a study that analysed by paired-cases when
    the matching was "within four years of age". That might seem close
    enough as a logicial proposition in a classroom, except that the
    disease was "childhood leukemia", age range of maybe 12 years.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Jones@21:1/5 to Rich Ulrich on Thu May 6 09:24:11 2021
    Rich Ulrich wrote:

    On Mon, 3 May 2021 17:56:41 -0700 (PDT), Cosine <asecant@gmail.com>
    wrote:

    Hi:

    I'm more than a little baffled at what Cosine is really looking
    for in an answer. David Jones has provided one sort of answer -
    Does that one satisfy?

    Here is a more philosophical approach.


    What are the sources causing statistical correlation or dependence?

    What are the characteristics/factors of these sources in common?

    This is what the sciences are about, finding correlations and
    dependence and trying to describe "causation".


    More directly, given a particular situation, how do we identify
    these sources?

    There are a whole lot of sciences, which each have their
    own tools. Astrophysicists work rather differently from
    biologists.


    Let's use the human trial as an example.

    A well-known example for eliminating the potential sources of
    correlation when testing the efficacy of a new drug for skin is to
    use the two hands of the same person as testing and control groups.
    Then we recruit enough persons to form the sample groups.

    This example implies that the sources of correlation exist even
    in the same person.

    It is KNOWN that age and sex are important in many human
    responses, in addition to whatever else might matter as between-
    person differences. Using each person as their own control
    effectively eliminates those sources of separate causation from
    the inference when looking at the quantitative differences of results.

    Strangely, when we test a drug for another purpose, say, for
    treating headache, we form the testing and control groups by
    recruiting persons to each of the two groups. Why could we be sure
    that there are no sources of correlation in the same person for
    this case?


    "...no sources of correlation in the same person" is a phrase that
    eludes my understanding.

    "Crossover designs" do make use of the same person for control
    when looking at the headache remedies you imagine.

    A trial might go a step beyond "randomizing" to use a "stratified-
    random" assignment to groups, if the PIs expect that (say) age and
    sex might matter for outcome. That "matches" the characteristics
    of groups, to elimiinate the source of variation on an ANOVA.

    Lesser factors that are suspected to have a relation to outcome
    might be "controlled for" by including covariates in the analysis.

    Including covariates is often (far) preferable to the use of "matched
    cases" when the matching is not as precise as "same person".
    - I was alarmed by a study that analysed by paired-cases when
    the matching was "within four years of age". That might seem close
    enough as a logicial proposition in a classroom, except that the
    disease was "childhood leukemia", age range of maybe 12 years.

    Perhaps it would be useful to think about what happens for small
    experiments, involving extremely small numbers of samples
    (unrealistically small). Even if some form of randomisation is used
    somewhere in the design, there is a chance that the actual outcome of
    the randomisation produces some unfortunate matching that leads to
    misleading results. If you are only doing the one experiment you have
    to be working conditional on the outcome of the randomisation. Thinking
    of marginalising across all possible outcomes of the randomisation may
    only be relevant if you are dealing with a whole set of separate
    experiments where you might find disparities in the results. So perhaps
    one needs to think about how many samples you need for the
    randomisation to have the desired effect which, in this context, is to
    ensure that there is a good balance of treatment and controls across
    the range of any possible hidden determinands.

    Similarly, where you have measurable qualities for use as regression
    variables or factors, the question of sample size arises not only for
    the purposes of estimating regression coefficients and looking for interactions, but also to alleviate the possibility that the random
    allocation of treatments or controls might hit upon some unfortunate coincidence with any measured or hidden determinands.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to dajhawkxx@nowherel.com on Thu May 6 19:17:40 2021
    On Thu, 6 May 2021 09:24:11 +0000 (UTC), "David Jones"
    <dajhawkxx@nowherel.com> wrote:

    Rich Ulrich wrote:

    On Mon, 3 May 2021 17:56:41 -0700 (PDT), Cosine <asecant@gmail.com>
    wrote:

    Hi:

    I'm more than a little baffled at what Cosine is really looking
    for in an answer. David Jones has provided one sort of answer -
    Does that one satisfy?

    Here is a more philosophical approach.


    What are the sources causing statistical correlation or dependence?

    What are the characteristics/factors of these sources in common?

    This is what the sciences are about, finding correlations and
    dependence and trying to describe "causation".


    More directly, given a particular situation, how do we identify
    these sources?

    There are a whole lot of sciences, which each have their
    own tools. Astrophysicists work rather differently from
    biologists.


    Let's use the human trial as an example.

    A well-known example for eliminating the potential sources of
    correlation when testing the efficacy of a new drug for skin is to
    use the two hands of the same person as testing and control groups.
    Then we recruit enough persons to form the sample groups.

    This example implies that the sources of correlation exist even
    in the same person.

    It is KNOWN that age and sex are important in many human
    responses, in addition to whatever else might matter as between-
    person differences. Using each person as their own control
    effectively eliminates those sources of separate causation from
    the inference when looking at the quantitative differences of results.

    Strangely, when we test a drug for another purpose, say, for
    treating headache, we form the testing and control groups by
    recruiting persons to each of the two groups. Why could we be sure
    that there are no sources of correlation in the same person for
    this case?


    "...no sources of correlation in the same person" is a phrase that
    eludes my understanding.

    "Crossover designs" do make use of the same person for control
    when looking at the headache remedies you imagine.

    A trial might go a step beyond "randomizing" to use a "stratified-
    random" assignment to groups, if the PIs expect that (say) age and
    sex might matter for outcome. That "matches" the characteristics
    of groups, to elimiinate the source of variation on an ANOVA.

    Lesser factors that are suspected to have a relation to outcome
    might be "controlled for" by including covariates in the analysis.

    Including covariates is often (far) preferable to the use of "matched
    cases" when the matching is not as precise as "same person".
    - I was alarmed by a study that analysed by paired-cases when
    the matching was "within four years of age". That might seem close
    enough as a logicial proposition in a classroom, except that the
    disease was "childhood leukemia", age range of maybe 12 years.

    Perhaps it would be useful to think about what happens for small
    experiments, involving extremely small numbers of samples

    An acquaintance who experimented on single cells told me that
    his usual /largest/ sample N was 3, while he was looking for huge
    effects. The reason for "3" was replication: A success with just one
    might be pretty convincing, but he should be sure the cell should
    not have unique features, and he would confirm that he followed
    exactly the same procedure each time, exactly as documented. .


    (unrealistically small). Even if some form of randomisation is used
    somewhere in the design, there is a chance that the actual outcome of
    the randomisation produces some unfortunate matching that leads to
    misleading results. If you are only doing the one experiment you have
    to be working conditional on the outcome of the randomisation. Thinking
    of marginalising across all possible outcomes of the randomisation may
    only be relevant if you are dealing with a whole set of separate
    experiments where you might find disparities in the results. So perhaps
    one needs to think about how many samples you need for the
    randomisation to have the desired effect which, in this context, is to
    ensure that there is a good balance of treatment and controls across
    the range of any possible hidden determinands.

    Similarly, where you have measurable qualities for use as regression >variables or factors, the question of sample size arises not only for
    the purposes of estimating regression coefficients and looking for >interactions, but also to alleviate the possibility that the random >allocation of treatments or controls might hit upon some unfortunate >coincidence with any measured or hidden determinands.

    The prospect of random effects is elevated with the number of
    /hypotheses/ to be tested becomes large (or, /extremely/ large).

    Astronomers use a really tiny p-value for certain things they look
    for when scanning millions of stars.

    On the other hand, I once read a news report about correlations
    of high/low insurance claims, which proposed some link to "living
    within two blocks of a church." That seemed so specific that I
    imagined that the dataset being mined must contain hundreds of
    hypotheses of equal a-priori value. (And the report did not mention
    anything related to that.) I also suspected that the report was using
    the conventional 5% cutoff of the social sciences; and, wth a huge N
    (millions of policies), the effect was probably too small to matter.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)