XPost: sci.stat.math, sci.stat.edu
I'm cross-posting this in the 3 groups where I see the identical
message.
On Thu, 3 Aug 2017 06:13:09 -0700 (PDT),
poboxabcde@gmail.com wrote:
Suppose I have a population with 100 events and 900 non-events. Thus, the population’s event rate is 0.1.
When I select 10% from these 100 events and also 10% from the 900 non-events, the sample’s event rate is 0.1.
If I repeat the process 20 times to create 20 samples, each sample’s rate is 0.1. Then, the standard error (the square root of the variance of these 20 means) is 0 because all the 20 event rates is 0.1.
Do I miss something or this is a legitimate stratified sampling. Please help. Thanks.
This does not look familiar to me, but you can do a lot of stuff
if you can justify it. What is very clear is that you cannot use
a variance (or SE) for any inference or testing after you have
set it to zero by the design. What are you trying to estimate?
A binomial rate has its own variance based on the mean, so those
variances are ordinarily robust.
So: If you are interested in the variance of the rate of events,
that should not be very problematic unless you are stratifying by
some /other/ variable that has a strong effect on the event rate.
Stratification, jack-knife, bootstrap -- I've never done much with
any of them, but it looks to me like you are confusing the ideas.
Bootstrapping goes after difficult variances, but I don't picture
that problem with a dichotomous outcome. (Jackknife, ditto.)
You seem to have the whole sample in hand, so I don't see why your stratification is desirable. I see that as a sampling scheme which is
used when resources are inadequate. Or, to avoid huge Ns (which
is less often seen as a problem now, than when computers were
1000 times slower).
Hope this helps,
--
Rich Ulrich
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)