09 December 2011

Developing the Homogeneous Model Part I: The Home Run Distribution

This is the first part of a series I've written (and am porting here) on the Run-Scoring model I've developed, which I'm calling the Homogeneous model (for reasons which will eventually become evident). I originally called it the Progressive model, which doesn't really make any sense, so if you see 'progressive' somewhere, I'm talking about this. Anyway, a little warning that some math is involved here, but enjoy:

How to determine the absolute value of a Home Run
In a homogeneous environment

First of all, I need to explain what I mean by a homogeneous environment. What I mean is that a) every pitcher is exactly average, b) every batter in the lineup is exactly the same, and, in this case, c)that every batter only either hits a home run or strikes out on every trip to the plate. Obviously these are all very bad assumptions to make for an actual game of baseball, but this exercise is meant as a simple way of providing a building block to a much more complete theoretical model of the game.
Obviously, with these restrictions, every home run will be a ‘solo shot’ and score the batter’s team a single run. So in any given inning, we only need to figure out how many home runs we would expect to hit to figure out how many runs we expect to score. Now, if we just think about it for a second, we can realize what the correct solution is without really any work. Think about it this way: if it’s a 50-50 chance the batter hits a homer or strikes out, an out is as likely as a homer, so we would expect to hit the same number of home runs as outs we record. So, for a single inning, that’s three. If we’re twice as likely to hit a homer as make an out, there should be twice as many homers as outs, so six in an inning. If we’re a third as likely to hit a homer as record an out, we should score runs equal to one-third of the outs we make, so one run per inning. Thus the correct formula is rather obviously

Expected Runs Per Inning (ERI) = 3p/p'

where p denotes the probability of hitting a home run, and p’ denotes “p complement”, 1-p, or the probability of making an out. And this is indeed the correct formula for this case. But I would like to mathematically prove it, and in so doing, help to work out other situations later down the road.
To do this, let’s make a probability tree of how many different ways there are to get any individual number of home runs. To get zero, we must have made three outs – that’s the only way. To get exactly one, we must have made three outs AND hit a home run. How many ways are there of arranging that? Well, the homer could have been before the first out, between the first and second outs, or between the second and third outs, but NOT after the third out (else the inning would have ended before the homer was hit). So there are three. With two home runs, we get a similar predicament, where there are a number of ways of placing the outcomes, but once again one of the outs must come last. So to count how many we have, we can use the choose function, n!/((n-k)!k!) where we know we have the number of home runs plus two of the outs as n, and either the home runs or the two outs as k, it doesn’t matter which. Let’s take the two outs. In this case we have some variation of n choose 2, which reduces down to n*(n-1)/2. Since we really want to count the number of successes we have, we increment n up by two before beginning to get (n+1)(n+2)/2. Taking a look at a chart for the lowest several numbers of successes, we find:

Successes=Runs Probability
0 (1)(p’)­3
1 (3)(p)(p’)3
2 (6)(p)2(p’)3
3 (10)(p)3(p’)3

So we can see the formula for the run expectancy from any given number of successes: (eqn. 1) or . Now, to find our overall run expectancy, all we have to do is sum all of these for every value of n. So we want


(2) ∑((n^3+3n^2+2n)(p^n)〖(p’)〗^3)/2

=(〖p'〗^3/2)(∑〖n^3 p^n 〗+3∑〖n^2 p^n 〗+2∑〖np〗^n )

So the question then becomes how to solve these infinite sums. If you recall from calculus, simple geometric series of the form An = A0*rn converge (for r<1, which we clearly have here with all probabilities except the trivial case where a team that never makes an out scores an infinite number of runs) to the sum of . So we know that
∑〖p^n= 1/(1-p)=〖(1-p)〗^(-1) 〗
(3)

To get what we need, we differentiate both sides to get
∑〖〖np〗^(n-1)= 〖(1-p)〗^(-2) 〗

Multiplying both sides by p, we have:
∑〖〖np〗^n= 〖p(1-p)〗^(-2) 〗
(4)
Further differentiating, we get
∑〖(〖n^2 p〗^(n-1) )= 2p〖(1-p)〗^(-3)+〖(1-p)〗^(-2) 〗

And doing some algebra
∑〖(〖n^2 p〗^n )= 2p^2 〖(1-p)〗^(-3)+(p-p^2)〖(1-p)〗^(-3) 〗
∑〖(〖n^2 p〗^n )= 〖(p〗^2+p)〖(1-p)〗^(-3) 〗

(5)


Finally, differentiating once more, we get:

∑〖(〖n^3 p〗^(n-1) )=(2p+1) (1-p)^(-3)+3(p^2+p)〖(1-p)〗^(-4) 〗
And after more algebra
∑〖(〖n^3 p〗^(n-1) )=(p+1-2p^2 ) (1-p)^(-4)+3(p^2+p)〖(1-p)〗^(-4) 〗
∑〖(〖n^3 p〗^(n-1) )=(p^2+4p+1) (1-p)^(-4) 〗
∑〖(〖n^3 p〗^n )=(p^3+4p^2+p) (1-p)^(-4) 〗
(6)

Now, using the solutions from equations 4, 5, and 6, we can solve equation 2:
(〖p^'〗^3/2)(∑〖n^3 p^n 〗+3∑〖n^2 p^n 〗+2∑〖np〗^n )
=(〖p'〗^3/2)((p^3+4p^2+p) (1-p)^(-4)+3〖(p〗^2+p)(1-p)^(-3)+2〖p(1-p)〗^(-2) )
=(〖p'〗^3/2)((p^3+4p^2+p) (1-p)^(-4)+3〖(-p〗^3+p)(1-p)^(-4)+2(〖p-2p^2+p^3)(1-p)〗^(-2) )
=(1/2) (1-p)^(-1) ((p^3+4p^2+p)+(-3p^3+3p)+2p-4p^2+2p^3 )
=(1/2) (1-p)^(-1) (6p)
=(3)(p) (1-p)^(-1)=3p/(1-p) =3p/p'
Thus we have successfully proven our formula for ERI from the page 1 thought experiment.

No comments:

Post a Comment