Combining the run distributions
Once we have the run distributions for the particular single-type events (see “How to Determine the Absolute Value of a Home Run), we seek to address the most major flaw of these models, namely that they each encapsulate only a single type of event, with no evident way to combine them.
As a personal aside, I was stuck on this part of the model for some time indeed, trying various ways of combining the distributions with somewhat limited success, but there was not particular logic in these combinations, at least not nearly so clearly as had been the case for building the models in the first place. And then it dawned on me that the correct way was quite simple and in front of my face.
The key insight is that, in an inning with more than four successes, the fourth and every subsequent success is a run, as we can see encapsulated in the walks distribution, and that the only additional runs that there can be must come from some combination of the last three events. So, starting with the walk distribution, we add, given the probability that there are at least three successes, three times the probability that the last hit was a homer, plus two runs for the second hit being a homer (multiplied by the correct probability), and on and on and on. Continuing to make our same assumptions (everyone in the lineup is exactly the same, the sequencing of events is totally random, no extra bases are taken on outs, and the extra base is ALWAYS taken on a single or double), we eventually get this:
3S adjustment = (1- (1-O)^3*(1+3*O+6*O^2)) * (3*O^2*H+2*O*H*(W+S+D)+2*O*(W+S+D)*D+2*O*D*S+O*D*W+O*S^2+H*W^2+2*H*W*S+W*S^2+W^2*S+W*S*D) /(O^3)
“3S” stands for three successes
O is On-Base-Percentage
W is Walk rate (BB/PA)
S is Singles rate (1B/PA)
D is Doubles rate (2B/PA)
H is Home run rate (HR/PA)
The first term of the adjustment is the probability that there are at least three successes, the large, second, middle term is the probabilities of scoring one, two, or three runs in the last three successes, weighted for the number of runs scoring and the probability that the particular events leading to that outcome occur. The final term is a correction for the overall probabilities, correcting for the fact that we already in fact know that we have that number of successes.
We must also add in the run-scoring we get from innings with only one or two successes:
1S adjustment =3*H*(1-O)^3
2S adjustment =6*(1-O)^3*(D^2+2*S*D+2*O*H+H*(W+S+D)+W*D)
So overall, we get:
(6*O^6-18*O^5+15*O^4)/(1-O) + 3*H*(1-O)^3 + 6*(1-O)^3*(D^2+2*S*D+2*O*H+H*(W+S+D)+W*D) + (1- (1-O)^3*(1+3*O+6*O^2)) * (3*O^2*H+2*O*H*(W+S+D)+2*O*(W+S+D)*D+2*O*D*S+O*D*W+O*S^2+H*W^2+2*H*W*S+W*S^2+W^2*S+W*S*D) /(O^3)
This result is in agreement with the Markov-built model constrained to the same assumptions used to build our model.
Note: for a good Markov chain model, I recommend http://tangotiger.net/markov.html
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment