13 December 2011

Developing the Homogeneous Model Part II: Beyond the Home Run

Beyond the Home Run
The whole point of going through such an exercise to achieve such an obvious result is to cultivate the methods, so that they can then be used with more complex cases which have less obvious results. With that in mind, let’s turn to the next-simplest case, that of the double/triple. I call it the double/triple because with doubles, triples, or a mix, (but no singles, walks, sacrifices, errors, homers, etc.), the first double/triple scores you nothing, but the second and every subsequent double/triple basically guarantee you a run. Making this assumption, everything is the same as in the home run case until we get to equation 1, where we have to replace the multiplication factor of (n) –which was, if you recall, based on the assumption of 1 success = 1 run – with a factor of (n-1), which reflects our new assumption. Furthermore we have to deal with the case where n = 0, as we don’t actually score -1 runs there but actually 0. So that gives us ((n-1)(n+2)(n+1)(p^n)〖(p’)〗^3)/2 or:
(〖p'〗^3/2)(∑〖n^3 p^n 〗+2∑〖n^2 p^n 〗-∑〖〖np〗^n-2∑p^n 〗)

The easiest way to deal with all of these sums starting at 1 rather than 0 is to simply increment n upward by one, ((n)(n+3)(n+2)(p^(n+1))〖(p’)〗^3)/2= p ((n^3+5n^2+6n)(p^n)〖(p’)〗^3)/2 :
p(〖p'〗^3/2)(∑〖n^3 p^n 〗+5∑〖n^2 p^n 〗+6∑〖np〗^n )
And once again plugging in the information from equations 4, 5, and 6, we get

ERI(Double/Triples only) =(p^4-4p^3+6p^2)/(1-p) .

Skipping singles for a moment, we do the same process for walks (with much more arithmetic in this case, not shown here) and obtain as our final answer

ERI(Walks only) =((6p^6-18p^5+15p^4)/(1-p)).

Now we go back to singles. Singles are the hardest of all ‘normal’ plays to model. It’s very difficult to tell whether a runner on first will ‘take the extra base’ and advance to third on a single, and it is similarly difficult to tell whether a runner on second will take the extra base and score on a single (we have the same problem with runners on first scoring on doubles, but when we examined doubles earlier, we used assumptions which made it so that we never had runners on first, a luxury which we cannot give ourselves here). Taking the extra base happens fairly often, and so does not taking it. However, for consistency’s and completion’s sake, let us assume that for every trip around the bags, either the runner on first will advance to third or the runner on second will score; in this way, on the third and every subsequent success (in this case successes = singles), a run scores. With this set of assumptions (which inherently overvalues singles somewhat), we can use the same methodology to obtain

ERI(Singles only) = ((3p^5-10p^4+10p^3)/(1-p)).

It is interesting to note that what we have just calculated, strictly, is the number of runs you would expect to score in a game that was similar to baseball, except that you can only walk and strikeout, and never be out or advance in any other way, and there is only home plate (for the home run ERI), one other base (double/triple), two other bases (single), or the three other bases that we actually have (walks, naturally).
With these four sets of ERIs, we can already do a significant amount of qualitative and semi-quantitative analyses. For instance, we can take Ted Williams’ record career on-base percentage of .482, ogle its sheer enormity for a second, plug it into our ERI formula for walks, and find out that, if you took all of The Splendid Splinter’s hits and downgraded them from whatever they were to walks, even then, a lineup full of Ted Williamses would still score over .8 runs per inning. As a reference, modern major league teams are scoring a touch over half a run per inning on average, and this drained version of Teddy Ballgame would barely score fewer runs than the ’31 Yankees, the highest-scoring team of the modern era.
You can also compare the ERIs for various values of p. At the low ends, it’s clear that slugging percentage is more important to scoring than on-base, which makes sense given that success is being obtained so few times – you need to make sure to score when you get the chance. But at high ends, OBP totally dwarfs any SLUG considerations (though for any two values of p, the larger will have a greater absolute (i.e. subtracted) difference between the expected run-scoring of two different hit types, the leveraged (i.e. divided) difference is much lesser). And since OBP is p and therefore sets which part of the scale your offense is on, this gives it some extra weight. In fact, in fiddling around with some numbers, you can see that modern Major League Baseball sits in a bit of a sweet spot, a not-so-large range of values where there are real, significant plausible trade-offs. With a little more potency, on-base would be evidently more important, and slugging would cease to be nearly as important. And with a bit less, round-trippers would take on extreme importance.

No comments:

Post a Comment