18 January 2013

Frequentism vs Bayesianism


Imagine we have a bag. Traditionally such problems are couched with an urn, but nobody has urns anymore, and the ones that are around are traditionally associated with cremated remains. So we have a bag. Inside this bag, we know, there are twelve dice. Specifically, cubical, six-sided dice. Six of these are normal D6s. But the other six are different – every face on each one of the dice has the same number, spread out over the same numbers you’d normally get. So we have one die that every side has one pip, one with every side having two pips, etc. We’re going to draw one die out of this bag at random, and then roll it, the same one die, repeatedly. Got the idea?

 

Okay, let’s talk about statistics now. First of all, we’re going to assume that these dice are all fair dice, and that we’re throwing them in a fair way – i.e. every face on any given die has the same probability of landing on any of its faces. Now, we know that this is, in reality, a pretty bad assumption. But let’s go with it now. Let’s also assume that we are actually choosing one of the dice from the bag entirely at random, with each one having equal probability. Again, this is something we need to be aware that we’re assuming going in. One of the important things about statistical analyses, really ANY statistical analyses, is to know what assumptions you are making. You are always making assumptions. And if those assumptions are violated, it’s going to mess with your conclusions. But more on this later.

So let’s assume for a second that for some reason, after pulling the die out of the bag, we can’t measure it at all except to see which face comes up on the top after each roll. Of course this is absurd, we can at least see the other sides, but it’s a model here, and there are going to be a lot of situations where the real data and real actual variables are obfuscated, and we can only see a little window. Okay fine. Now suppose we roll this die five times, and each one of these times it comes up ‘6’. How unlikely is this to have happened?

So there are two major views of probability (actually there are probably more than this, but I’m looking at these two). A frequentist statistical method would be to make a Maximum Likelihood Estimate (MLE), a point estimate, of what we would expect from a die roll, and then use this point to construct a distribution, and use that to test what we want to (actually this is being unfair to the frequentists, who would probably actually do the exact same calculations in this particular instance that I am about to do in the next paragraph; nevertheless, this is meant as an example, and in many situations, where the analog of the contents of the bag are not known, they will do exactly this kind of thing to make their test distribution). So in this case, we’d be looking at a 1/6 chance of a 1, a 1/6 chance of a 2, etc. And if we use this, we can see that the chance of 5 consecutive 6s is a mere (1/6)^5, or 1 in 7776, or approximately .0129%.

From the Bayesian perspective, they are going to keep an entire distribution of prior information on what we have. So they will say, okay there’s a 50% chance of having a normal die, and a 1/12 chance for each of the special dice. They would thus then calculate the probability of getting 5 6s in a row as .5*(1/6^5) +(1/12)*1 = 1297/15552 which is approximately a 8.33% or just a tiny shade over 1/12 chance of this occurring. They would further go on to say that the odds of having picked out the all 6 die are 1296 to 1, with the one being split between all of the ‘fair’ dice evenly.

The latter approach is simply correct. And actually here, as I mentioned when describing the frequentist position above, there’s simply no argument about that.

So why are frequentists around? Well, you need a prior distribution, which reflects your prior knowledge, in this case what was already in the bag, in order to do the Bayesian kind of analysis. And when we don’t know what’s in the bag, that means in order to do this, we have to supply something – ultimately this is subjective, and the frequentists have a problem with that. Of course, like I said at the top, you have to know your assumptions. And when you get down to the nitty gritty of things, the frequentist method of doing things is no less subjective than the Bayesian one here. Now, why do they claim they are? Well, their choice of distribution to test against has some basis in the rules of ‘how you do things’, which they are purporting to be objective. Well, there is something to this, in some cases anyway, like if you are looking for an effect and your base null hypothesis is that the difference is zero. On the other hand, this is not terribly accurate. How often do you really think that the difference between two groups on ANYTHING is actually zero? Not very often. In fact, extremely rarely. Statistically speaking, this is almost surely never true for a continuous distribution. But the difference may be really small. Essentially, the frequentist position is to do a particular subset of what a Bayesian would do, with this prescribed set of prior distribution; well, not only is this less accurate than actually putting in what you think to the best of your knowledge, it’s actually not any more objective either. You’re simply shifting the subjective thing from this particular case to having some subjective belief in the system of how you’re choosing the priors, i.e. the MLEs. Well, there is actually something to say for this, but we should at the very least acknowledge it. And most of what there is to say for it is because typically we have some situation where a false positive is more damaging than a false negative – though using 5% as a significance level everywhere without thought to what the particular situation calls for is also fairly… I’ll say it’s not well thought out. Of course, it’s also a LOT less work, which is probably the real reason, apart from it being taught so much more widely, that it is so prevalent. Of course, with technological advances, this is not so much more a concern…

Anyway, we were talking about assumptions earlier. One of the big reasons you want to note these is because your tests are going to be disrupted if the assumptions are wrong. When we computed our example, there’s really the 8.33% chance of that happening assuming that we’re randomly selecting one die from the bag, and rolling it fairly, and all that jazz as specified above. And we can talk about how robust our test is to the various assumptions. It is very sensitive, for example, to the assumption that we are taking a die from the bag randomly; the less random that is, it’s going to make a huge impact on the bottom line. It’s extremely robust to the fair rolling thing in terms of if we have the all 6s die, but far less robust if we have one of the so-called fair dice. Now, all of this stuff can be modeled in the Bayesian method, and we can account for it. The issue is, we aren’t actually totally eliminating this problem; the best we can ever do is shift it to new assumptions, and go from there. Which assumptions we end up with are going to be a product of a few different things; firstly, they should be based on what it is that we already know; second, they are a function of the particular question we want to ask, so that the answer to this question is as robust as we can get it to the violations of our assumptions; finally, it’s a function of how much time we want to put in; we can theoretically always push the assumptions out further, but every time, we are doing something more and more complicated; eventually the costs of designing a better test are going to outweigh the greater resolving power we gain. Especially if we’re pretty sure the assumptions are good ones (as often they are).

No comments:

Post a Comment