• #### standard error=0 in stratified sampling?

From george huang@21:1/5 to All on Thu Aug 3 06:10:52 2017
Suppose I have a population with 100 events and 900 non-events. Thus, the population’s event rate is 0.1.
When I select 10% from these 100 events and also 10% from the 900 non-events, the sample’s event rate is 0.1.
If I repeat the process 20 times to create 20 samples, each sample’s rate is 0.1. Then, the standard error (the square root of the variance of these 20 means) is 0 because all the 20 event rates is 0.1.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
• From Rich Ulrich@21:1/5 to poboxabcde@gmail.com on Thu Aug 3 13:26:59 2017
XPost: sci.stat.math, sci.stat.consult

I'm cross-posting this in the 3 groups where I see the identical
message.

On Thu, 3 Aug 2017 06:13:09 -0700 (PDT), poboxabcde@gmail.com wrote:

Suppose I have a population with 100 events and 900 non-events. Thus, the population’s event rate is 0.1.
When I select 10% from these 100 events and also 10% from the 900 non-events, the sample’s event rate is 0.1.
If I repeat the process 20 times to create 20 samples, each sample’s rate is 0.1. Then, the standard error (the square root of the variance of these 20 means) is 0 because all the 20 event rates is 0.1.

This does not look familiar to me, but you can do a lot of stuff
if you can justify it. What is very clear is that you cannot use
a variance (or SE) for any inference or testing after you have
set it to zero by the design. What are you trying to estimate?

A binomial rate has its own variance based on the mean, so those
variances are ordinarily robust.

So: If you are interested in the variance of the rate of events,
that should not be very problematic unless you are stratifying by
some /other/ variable that has a strong effect on the event rate.

Stratification, jack-knife, bootstrap -- I've never done much with
any of them, but it looks to me like you are confusing the ideas.
Bootstrapping goes after difficult variances, but I don't picture
that problem with a dichotomous outcome. (Jackknife, ditto.)

You seem to have the whole sample in hand, so I don't see why your stratification is desirable. I see that as a sampling scheme which is
used when resources are inadequate. Or, to avoid huge Ns (which
is less often seen as a problem now, than when computers were
1000 times slower).

Hope this helps,

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)