Baseball - William & Mary Mathematics

Download Report

Transcript Baseball - William & Mary Mathematics

Baseball Statistics
Joseph Mark
October 6, 2009
History of Baseball
• Germans – Schlagball
• English – Rounders
– 1745 referenced as base ball
– Formalized rules in 1884
• Pitched like a softball
• 9 players field, unlimited bat
Baseball in America
• Abner Doubleday (1839)
– Mills report in 1908
• Alexander Cartwright (1845)
– Formalized rules
Stats to know
•
•
•
•
•
•
•
•
•
At Bats (AB) – Batting appearances, not including bases on balls, hit by
pitch, sacrifice hits (bunts), sacrifice flies, & catchers' interference, or
obstruction
Plate Appearance (PA) - number of completed batting appearances no
matter the result (at-bats + walks + hit-batsmen + sacrifice hits (bunts) +
sacrifice flies + catcher's interference/obstruction
Hits (H) – times reached base because of a batted fair ball without an
error by the defense
Runs (R) – times reached home plate legally and safely
Total Bases (TB) – 1 * singles + 2 * doubles + 3 * triples + 4 *home runs
Sacrifice Fly (SF) – number of fly ball out which allows a runner to score
Sacrifice Hit (SH) – a deliberate hit allowing a runner to advance
Strike out (K) – number of times put out by recording three strikes
Walk (BB) – number of times reached base by receiving four balls
Stats to know
• Innings Pitched (IP) – number of outs recorded pitching / 3
• Earned Run Average (ERA) – earned runs * 9 / innings pitched
• Complete Game (CG) – # of times a pitcher was the only pitcher
for his team
• Shutout (SHO) - # of complete games allowing zero runs
• Save (Sv)
• Win (W) - number of games where pitcher was pitching while his
team took the lead and went on to win
• Walks + Hits per Inning Pitched (WHIP)
Basic Baseball Stats
• Batting Average (AVG) – hits / at bats
• Home Runs (HRs)
• Runs Batted In (RBIs)
• Earned Run Average (ERA) – ER * 9 / IP
• Strikeout (K)
• Wins (W)
Importance of Statistics in Baseball
• Game of individual matchups
– Pitcher vs. Hitter
Nolan Ryan vs.
Gates Brown
AB
H
24
0
Lonnie Smith
AB H
24 12
“Stats don’t lie, but they don’t tell
the whole truth”
W
IP
ERA WHIP K
CG
SHO WP
17
228.2 3.15 1.067
214 4
3
6
20
220.1 3.51 1.257
213 0
0
14
League
high
“Stats don’t lie, but they don’t tell
the whole truth”
Player A -
.251 batting average
199 strikeouts
Player B -
.316 batting average
57 strikeouts
But what we didn’t tell you
• Player A –
48 home runs
81 walks
• Player B –
9 home runs
23 walks
So we have more stats
• On-base percentage (OBP)
– (H + BB + HBP) / (AB + BB + HBP + SF)
• Slugging Percentage (SLG%)
– Total bases / at bats
• On Base Plus Slugging (OPS)
– OBP + SLG
Picking a stat is hard to do
How do we pick just one?!
Sabermetrics
• Bill James – Society for American Baseball Research
– Bill James defined sabermetrics as "the search for objective
knowledge about baseball." Thus, sabermetrics attempts to
answer objective questions about baseball, such as "which
player on the Red Sox contributed the most to the team's
offense?" or "How many home runs will Ken Griffey Jr. hit
next year?" It cannot deal with the subjective judgments
which are also important to the game, such as "Who is your
favorite player?" or "That was a great game."
Usefulness of Sabermetrics
• Shortcomings of batting average/home
runs/rbis
• Better predictor of future performance
• Runs Created
• Markov Runs per Game
• Win Shares
Runs Created
• “With regard to an offensive player, the first key
question is how many runs have resulted from
what he has done with the bat and on the
basepaths. Willie McCovey hit .270 in his career,
with 353 doubles, 46 triples, 521 home runs and
1,345 walks -- but his job was not to hit doubles,
nor to hit singles, nor to hit triples, nor to draw
walks or even hit home runs, but rather to put
runs on the scoreboard. How many runs
resulted from all of these things?”
SABR stats
• Runs Created
A = On-base factor
B = Advancement factor
C = Opportunity factor
– RC/27: (RC x 3 x LgIP) / (2 x LgG) / (AB – H + SH + SF + CS +
GDP)
• Compare each player’s contribution over 1 game
• Win Share
– Measure of total performance, cumulative
• Markov RPG
– takes into account runs scored, on base (hits + walks + hit by
pitch), total bases, runs batted in, and stolen bases using (stolen
bases * stolen bases) / stolen base attempts
Comparison of Runs Created vs. Actual
Runs Scored by All 30 Major League
teams in 2008
Wins Shares, please share
•
•
•
•
•
•
•
Win Shares Explained
First, you divide responsibility for a team's wins between the offense (batting and baserunning) and defense
(pitching and fielding). You do this by calculating the team run differential through a method James calls Marginal
Runs. You first calculate the average number of runs scored per team in the league. You next adjust your team's
runs scored and runs allowed for the ballpark in which they played half their games (i.e. home games). Then you
add together two figures: all runs scored over 52% of the league average (credited to the offense), and all runs
allowed less than 152% of the league average (credited to the defense). This is total marginal runs.
Next, you take the percent of marginal runs contributed by the offense, multiply it by the number of wins times
three. This is the total number of offensive Win Shares. You do the same thing for defensive Win Shares.
Next, you attribute offensive Win Shares to individual players. This is done through two key metrics: Runs Created
and Outs Made. Runs Created is a formula built by James and refined over the years. It starts with the basic
equation of OBP times total bases and then adds player credit for other factors, including stolen bases, caught
stealing, grounding into double plays, batting average and home runs with runners in scoring position and the
kitchen sink. Runs Created is calculated for every single batter, including pitchers (if they're in the National
League).
Next, you subtract the league "background" Runs Created (52% of the league average) from each player's Runs
Created based on the number of Outs Made by that batter, adjust it for ballpark, and credit each player with the
result; essentially individual marginal runs created. Add these up for all players and use each player's percentage
of the whole to allocate offensive Win Shares to each. Note that any player whose Runs Created are less than
52% of the league average runs created per out is credited with no Win Shares. This doesn't happen very often
(except for pitchers).
That was the easy part. Now you've got to deal with the defense. The first step is to divide defensive Win Shares
between pitching and fielding. This done through a complicated formula that accounts for FIP elements that can be
attributed only to pitchers (home runs, walks and strikeouts) as well as a team's DER (Defensive Efficiency Ratio,
adjusted for the ballpark) and other fielding statistics such as passed balls, errors and double plays. Typically,
about 70% of defensive Win Shares are credited to pitching, and 30% to fielding. The Win Shares system is bound
so that pitching never is credited with less than 60%, or more than 75%, of defensive Win Shares.
Next, you allocate pitching Win Shares to individual pitchers. This is accomplished through an even more
complicated formula that starts with each pitcher's marginal runs not allowed (same approach as team marginal
runs not allowed), wins, losses and saves. Special consideration is given to relievers by estimating the number of
high-leverage innings they pitched (ninth innings with one-run leads are more important than first innings with no
score) and something called "Component ERA" which is essentially ERA re-calculated according to the actual
underlying run elements.
Continued…
•
•
•
•
Finally, pitchers are deducted Win Shares if they are absolutely lousy hitters. Call this the "Dean Chance" factor.
All these elements are then mixed together in a complicated formula to allocate pitching Win Shares to individual
pitchers. As in offensive Win Shares, any pitcher who gives up more than 152% of league-average Runs Scored
(adjusted for ballpark) does not receive any credit for pitching Win Shares.
One note: responsibility for unearned runs is split 50/50 between pitching and fielding.
Which leads us to the next, most complicated step: allocating fielding Win Shares to fielding positions, and then to
individual fielders. The calculations differ for each position. Essentially, James has selected four defensive
statistics to evaluate positions. Here they are by position, listed in order of importance:
– Catchers: Caught Stealing, Errors, Passed Balls and Sacrifice Hits Allowed
– First Basemen: Plays Made, Errors, Arm Rating and Errors by third basemen and shortstops
– Second Basemen: Double Plays, Assists, Errors and Putouts
– Shortstops: Assists, Double Plays, Errors and Putouts
– Third Basemen: Assists, Errors, Sacrifice Hits Allowed and Double Plays
– Outfielders: Putouts, Team DER, Arm Elements and Assists and Errors
Lots of things to note about the fielding calculations.
– First, the statistics are adjusted based on the number of innings a lefthander pitches for the team, which has
an impact on which side of the field batters hit the ball to.
– Second, these stats are calculated as a proportion of the team's total, divided by the league-average
proportions of the total. In other words, if a shortstop has 50 assists and his team has 100 assists in total, he
receives just as much credit as the shortstop who has 100 assists and plays on a team with 200 assists in
total. This is important, because it adjusts the fielding stats for the fact that fielders may be playing behind
pitchers with certain tendencies such as giving up more ground balls vs. fly balls.
– Third, double plays are only factored in as a proportion of potential double plays. If teams don't have a lot of
runners on first, they have less of a chance to turn double plays, and Win Shares takes this into account.
– Fourth, team DER is used to credit outfielders with fielding Win Shares because it is James' observation that
outfielders have a much larger impact on DER than infielders. James acknowledges that there is some
"circular logic" here.
– Fifth, there is a final element included in the formula to allocate fielding Win Shares to individual fielders.
This element is called "Range Bonus Play." It particularly impacts outfielders in the following manner: if one
outfielder handles more opportunities per inning played than the other outfielders on the team, he will be
credited with more fielding Win Shares. This especially impacts centerfielders, who typically handle more
chances per inning played than the corner outfielders.
Markov RPG
NPERA vs. OPS Against
1.2
1
OPS Against
0.8
0.6
0.4
0.2
0
0
1
2
3
4
NPERA
5
6
7
NPERA vs ERA
10
9
8
7
ERA
6
5
4
3
2
1
0
0
1
2
3
4
NPERA
5
6
7
Correlation between various
individual hitter statistics
OPS
RC
RC /27
Win Shares
OPS
1
RC
.912
1
RC / 27
.967
.937
1
Win Shares
.752
.811
.778
1
Markov RPG
.972
.906
.982
.749
Markov RPG
1
Mythbusters: The Contract Year
• It is a commonly held belief that players
perform better during the final year of their
contract in the hopes that a good year will
enable them to sign a lucrative new deal
Difference in Means Testing
Contrasts
Batting
Average
P-Value
HR/PA
P-Value
OPS P-Value
Runs
Created/27
P-Value
Markov
RPG
P-Value
All Players
0.502779
0.628555
0.250987
0.144358
0.330044
A - Players
0.842572
0.938286
0.772146
0.034324
0.907109
B - Players
0.333571
0.589378
0.070181
0.000004
0.145831
• RC/27 returns significant results for A players
when tested alone and B players when tested
alone. These results show that the mean for A
players increases, on average, from 2.991 to
3.344 RC/27, whereas B players tend to
decrease, on average, from 1.824 to 1.375. The
fact that these two groups of players have a
tendency to move in opposite directions in this
respect explains why the results are not
statistically significant when compared en
masse.
• OPS and Markov RPG actually increase AFTER
signing a new contract!
Mythbusters 2: Waiting for your Pitch
• Another commonly held perception is that
batters that “wait for their pitch” are more
likely to get a hit and when they do hit the
ball, it will go farther (perhaps resulting in
more home runs)
Regression using Pitches per Plate Appearance to
predict OPS
The regression equation is
OPS = 0.435 + 0.0833 P/PA
Predictor
Constant
P/PA
Coef
0.43478
0.08328
S = 0.101122
SE Coef
0.07606
0.01988
R-Sq = 4.5%
T
5.72
4.19
P
0.000
0.000
R-Sq(adj) = 4.2%
Regression using Pitches per Plate Appearance to
predict RPG
The regression equation is
Markov RPG = 0.513 + 1.17 P/PA
Predictor
Constant
P/PA
Coef
0.5125
1.1659
S = 1.30664
SE Coef
0.9828
0.2568
R-Sq = 5.2%
T
0.52
4.54
P
0.602
0.000
R-Sq(adj) = 5.0%
Correlations of A&B Groups of Players w/
OPS and RPG for 2008 season
Correlation
P-value
A Players PPA vs. A Players
OPS
.246
.003
A Players PPA vs. A Players
RPG
.260
.001
B Players PPA vs. B Players
OPS
.204
.002
B Players PPA vs. B Players
RPG
.223
.001
Can We Predict Walks?
Test (1) vs (2)
Mean PPA
PPA1/PPA2
Mean
Walks
(1) / (2)
Est. Diff in
Means
Total Walks
T
P-Value
Top 1/3 vs
Mid 1/3
4.109 /
3.814
49.6 / 41.0
8.55
2.87
.004
Mid 1/3 vs
Bot 1/3
3.814 /
3.530
41.0 / 27.9
13.10
5.62
.000
Top 1/3 vs
Bot 1/3
4.109 /
3.530
49.6 / 27.9
21.65
8.35
.000
How About Home Runs?
• Divided players into thirds according to number
of pitches seen per at bat.
– Those who saw the most pitches in the first third,
those who saw the least number of pitches per at bat
in the bottom third, and a middle third.
Players in this top group hit on average .03055
home runs per plate appearance slightly higher
than the .02938 of the middle group, and both are
significantly higher than the .02282 of the bottom
group
So should you wait for your pitch?
Summary of Changes 2007-2008
∆PPA ∆Walks
∆HRs
∆OPS
∆RPG
Total
.0326* -.24
-.675
-.0176*
-.2241*
Increase
PPA
.1725* 3.01*
-1.064*
-.019*
-.198
Decrease
PPA
-.15*
-4.48*
-.01615
-.258*
-.168
* Indicates significance at 5%
Conclusions
• Changing the number of pitches seen per
plate appearance does not necessarily
increase a player’s raw performance
measures. Rather, a player who sees an
increase in the number of pitches per plate
appearance from year to year will have a
better change in performance relative to a
player who sees a decrease in number of
pitches per plate appearance from one
year to the next.
Conclusions
• Players performance is not significantly
better during a contract year, in fact, it may
actually be worse.
• Increasing the number of pitches you see
does not increase performance
– However, you will walk more
– If you see fewer pitches, you are more likely
to do worse
Baseball keeps stats for
EVERYTHING
• Hitting Stats
– Singles, doubles, triples, G/F, GIDP, HBP,
LOB, R, SF, SB, TB
• Pitching Stats
– ERA, WHIP, GF, GS, K/9, BB/9, HLD, IBB, IP,
CG, SHO, SV, SVO, WP
• Fielding Stats
– PO, A, TC, E
References and Works Cited
•
•
•
•
•
•
•
•
•
•
Cover, Thomas, and Carroll Keilors, “An Offensive Earned-Run Average for
Baseball,” Operations Research, Vol. 25 No. 5, September-October 1977,
pp 729-740
ESPN MLB Team Stats, ESPN Internet Ventures 2009,
http://sports.espn.go.com/mlb/stats/aggregate?statType=batting&seasonT
ype=2&group=9&type=reg&sort=&split=0&season=2008
Free Agent Tracker, ESPN Internet Ventures 2009,
http://sports.espn.go.com/mlb/features/freeagents?type=ranked&season=
2008
James, Bill, The Bill James Handbook, ACTA Sports, Skokie, Illinois, 2009
Krautman, Anthony C., and Margaret Oppenheimer, “Contract Length and the
Return to Performance in Major League Baseball,” Journal of Sports
Economics, Vol. 3, No. 1, 2002, pp 6-17.
Lewis, Michael, Moneyball, Norton, W. W. & Company, Inc., New York, New
York, 2004.
Sagarin, Jeff, Jeff Sagarin MLB Ratings, October 7, 2008,
www.usatoday.com/sports/sagarin/majors08.htm
Studeman, Dave, Major League Baseball Graphs, May 16, 2004,
http://www.baseballgraphs.com/main/index.php/site/details/#sharecalc
The Hardball Times, THT Win Shares, October 1, 2008,
http://www.hardballtimes.com/thtstats/main/?view=winshares