Imagine we have a bag. Traditionally such problems are
couched with an urn, but nobody has urns anymore, and the ones that are around
are traditionally associated with cremated remains. So we have a bag. Inside
this bag, we know, there are twelve dice. Specifically, cubical, six-sided
dice. Six of these are normal D6s. But the other six are different – every face
on each one of the dice has the same number, spread out over the same numbers
you’d normally get. So we have one die that every side has one pip, one with
every side having two pips, etc. We’re going to draw one die out of this bag at
random, and then roll it, the same one die, repeatedly. Got the idea?
Okay, let’s talk about statistics now. First of all, we’re
going to assume that these dice are all fair dice, and that we’re throwing them
in a fair way – i.e. every face on any given die has the same probability of
landing on any of its faces. Now, we know that this is, in reality, a pretty
bad assumption. But let’s go with it now. Let’s also assume that we are
actually choosing one of the dice from the bag entirely at random, with each
one having equal probability. Again, this is something we need to be aware that
we’re assuming going in. One of the important things about statistical
analyses, really ANY statistical analyses, is to know what assumptions you are
making. You are always making assumptions. And if those assumptions are
violated, it’s going to mess with your conclusions. But more on this later.
So let’s assume for a second that for some reason, after
pulling the die out of the bag, we can’t measure it at all except to see which
face comes up on the top after each roll. Of course this is absurd, we can at
least see the other sides, but it’s a model here, and there are going to be a
lot of situations where the real data and real actual variables are obfuscated,
and we can only see a little window. Okay fine. Now suppose we roll this die five
times, and each one of these times it comes up ‘6’. How unlikely is this to
have happened?
So there are two major views of probability (actually there
are probably more than this, but I’m looking at these two). A frequentist statistical
method would be to make a Maximum Likelihood Estimate (MLE), a point estimate,
of what we would expect from a die roll, and then use this point to construct a
distribution, and use that to test what we want to (actually this is being
unfair to the frequentists, who would probably actually do the exact same calculations
in this particular instance that I am about to do in the next paragraph;
nevertheless, this is meant as an example, and in many situations, where the
analog of the contents of the bag are not
known, they will do exactly this kind of thing to make their test
distribution). So in this case, we’d be looking at a 1/6 chance of a 1, a 1/6
chance of a 2, etc. And if we use this, we can see that the chance of 5
consecutive 6s is a mere (1/6)^5, or 1 in 7776, or approximately .0129%.
From the Bayesian perspective, they are going to keep an
entire distribution of prior information on what we have. So they will say,
okay there’s a 50% chance of having a normal die, and a 1/12 chance for each of
the special dice. They would thus then calculate the probability of getting 5 6s
in a row as .5*(1/6^5) +(1/12)*1 = 1297/15552 which is approximately a 8.33% or
just a tiny shade over 1/12 chance of this occurring. They would further go on
to say that the odds of having picked out the all 6 die are 1296 to 1, with the
one being split between all of the ‘fair’ dice evenly.
The latter approach is simply correct. And actually here, as
I mentioned when describing the frequentist position above, there’s simply no
argument about that.
So why are frequentists around? Well, you need a prior
distribution, which reflects your prior knowledge, in this case what was
already in the bag, in order to do the Bayesian kind of analysis. And when we
don’t know what’s in the bag, that means in order to do this, we have to supply
something – ultimately this is subjective, and the frequentists have a problem with
that. Of course, like I said at the top, you have to know your assumptions. And
when you get down to the nitty gritty of things, the frequentist method of
doing things is no less subjective than the Bayesian one here. Now, why do they
claim they are? Well, their choice of distribution to test against has some
basis in the rules of ‘how you do things’, which they are purporting to be
objective. Well, there is something to this, in some cases anyway, like if you
are looking for an effect and your base null hypothesis is that the difference
is zero. On the other hand, this is not terribly accurate. How often do you
really think that the difference between two groups on ANYTHING is actually
zero? Not very often. In fact, extremely rarely. Statistically speaking, this
is almost surely never true for a continuous distribution. But the difference
may be really small. Essentially, the frequentist position is to do a
particular subset of what a Bayesian would do, with this prescribed set of
prior distribution; well, not only is this less accurate than actually putting
in what you think to the best of your knowledge, it’s actually not any more
objective either. You’re simply shifting the subjective thing from this
particular case to having some subjective belief in the system of how you’re
choosing the priors, i.e. the MLEs. Well, there is actually something to say
for this, but we should at the very least acknowledge it. And most of what
there is to say for it is because typically we have some situation where a
false positive is more damaging than a false negative – though using 5% as a
significance level everywhere without thought to what the particular situation
calls for is also fairly… I’ll say it’s not well thought out. Of course, it’s
also a LOT less work, which is probably the real reason, apart from it being taught
so much more widely, that it is so prevalent. Of course, with technological
advances, this is not so much more a concern…
Anyway, we were talking about assumptions earlier. One of the
big reasons you want to note these is because your tests are going to be
disrupted if the assumptions are wrong. When we computed our example, there’s
really the 8.33% chance of that happening assuming
that we’re randomly selecting one die from the bag, and rolling it fairly, and
all that jazz as specified above. And we can talk about how robust our test is
to the various assumptions. It is very sensitive, for example, to the assumption
that we are taking a die from the bag randomly; the less random that is, it’s
going to make a huge impact on the bottom line. It’s extremely robust to the
fair rolling thing in terms of if we have the all 6s die, but far less robust if
we have one of the so-called fair dice. Now, all of this stuff can be modeled
in the Bayesian method, and we can account for it. The issue is, we aren’t
actually totally eliminating this problem; the best we can ever do is shift it
to new assumptions, and go from there. Which assumptions we end up with are
going to be a product of a few different things; firstly, they should be based
on what it is that we already know; second, they are a function of the
particular question we want to ask, so that the answer to this question is as
robust as we can get it to the violations of our assumptions; finally, it’s a
function of how much time we want to put in; we can theoretically always push
the assumptions out further, but every time, we are doing something more and
more complicated; eventually the costs of designing a better test are going to
outweigh the greater resolving power we gain. Especially if we’re pretty sure
the assumptions are good ones (as often they are).
No comments:
Post a Comment