presentation 2
Download
Report
Transcript presentation 2
Optimization of Batting Order
Frank R. Zheng
A Quick Introduction to Baseball
Two teams alternate batting and fielding.
Batting team tries to score runs.
Runners must advance through first, second and third base in order to
reach home
Runners are advanced by players getting hits, drawing walks, stealing
bases, or errors by the opposing team’s defense
The team with the most runs at the end of the game wins
Batting Order
Before each game, the team’s coach must submit the batting order of
the team
The batting order dictates the order in which players step up to the
plate
Substitutions such as pitch hitters or pitch runners are allowed, but are
relatively rare
The optimal batting order maximizes the expected run production
Batting Order Optimization as a Scheduling Problem
Finding the optimal batting order for a team can be thought of as a
single-machine scheduling problem
Each batter is modeled as a job, and the batting order is a set of 9 such
jobs
The objective function is to maximize the run production of the lineup
This is a complicated function that requires simulation to analyze
Approach to Optimize Batting Order
Each baseball team has a roster of ~15 batters, of which only 9
compose the batting order
Brute forcing all the possible lineups is somewhat impractical – need to
calculate 15!/6! combinations (over 1.8 billion unique lineups)
Solution is to combine a qualitative “conventional wisdom” approach
with a data-driven quantitative methodology
Batting Order Conventional Wisdom
Over the many decades baseball has been played, coaches have
dedicated much thought to finding the best lineup
Traditional lineups follow this general order
1-2 – batters who get on base on a lot
3-5 – batters who get a lot of extra base hits
6-8 – weak batters
9 – pitcher/weak batter/batter who gets on base a lot
Key is to have players with a high realization value (lots of runs batted
in) follow those with a high potential value (getting on base a lot)
i.e., get runners on base so your power hitters can drive them
home
Underlying Causes of Run Production
There is a limited set of events that have the potential to score runs
We refer to these as “Run-Producing Events” or RPEs
RPEs include
Singles (1B)
Doubles (2B)
Triples (3B)
Home Runs (HR)
Bases on Balls/Batter Hitter by Pitch (BB+HBP)
Errors (ERR)
Batting Performance
Does the model fully capture differences among player batting
characteristics?
Regression Value
OUT
1B
BB+HBP
2B
3B
HR
ERR
-0.1040
0.4659
0.3255
0.7613
1.0456
1.4031
0.4340
How to distinguish between ‘table setters’ vs. ‘sluggers/cleanup
hitters’?
Realization Value vs. Potential Value
Realization Value is the expected number of runs each RPE actually
scores
Potential Value is the effect each RPE has on the team’s chances to
score additional runs in the same inning
Differentiating between these two metrics allows us to quantitatively
determine which players create the potential for scoring runs and
which ones are good at bringing those players to home plate
OUT
1B
BB+HBP
2B
3B
HR
ERR
Realization Value
0.0000
0.2314
0.0328
0.5120
0.7411
1.7387
0.1000
Potential Value
-0.1040
0.2345
0.2927
0.2493
0.3045
-0.3356
0.3340
Total Value
-0.1040
0.4659
0.3255
0.7613
1.0456
1.4031
0.4340
Differentiating Players
By comparing each individual’s realization value and potential value to
the team’s overall averages, we can group players into one of four
categories
(R+, P+) Strong Hitters – players who bat in a lot of runs but also create
the potential for more runs
(R+, P-) Run Producers – players who bat in a lot of runs
(R-, P+) Table Setters – players who create a lot of potential for more runs
(R-, P-) Weak Hitters – the team’s worst players
This gives us the quantitative data we need to apply the conventional wisdom
discussed earlier
Overview of Heuristic
Now we have the tools we need to combine the holistic conventional
wisdom with quantitative data
We adapted this heuristic from the work of Sokol
After determining which players fall into which set, we attempt to
follow the conventional wisdom of placing batters with high realization
values after a group of batters with high potential values
We want to build up potential value and then release it with
realization value
The optimal order of the four sets is
(R-, P+) (R+, P+) (R+, P-) (R-, P-)
Heuristic Steps
Select the two batters with the highest P in the (R-, P+) set and assign
them to the top two slots in the batting order, by order of increasing P
Place all batters in the (R+, P+) group in the next slots, ordered by
decreasing P
Fill as many remaining slots as possible with batters from the (R+, P-)
group, ordered by decreasing P
If there are any remaining slots, fill them with batters in the (R-, P-)
group, ordered by increasing P
For each player left in the (R-, P+) group, replace a (R-, P-) player if
possible, ordering the new (R-, P+) players by increasing P
Application to 2011 New York Yankees
In order to see the effects of our heuristic, we applied it to the 2011
New York Yankees
First, we placed each player into the appropriate category
NYY 2011 - Realization Value vs. Potential Value (Difference from Team Average)
Potential Value (PV)
Brett Gardner
(R-, P+)
Table Setters
Derek Jeter
(R+, P+)
Strong Hitters
Nick Swisher
Eric Chavez
Jesus Montero
Alex Rodriguez
Eduardo Nunez
Realization Value (RV)
Francisco Cervelli
Andruw Jones
Chris Dickerson
(R-, P-)
Weak Hitters
Russel Martin
Jorge Posada
Curtis Granderson
Mark Teixeria
Robinson Cano
(R+, P-)
Run Producers
Simulation
In order to determine the value of our objective function (the expected
number of runs scored per game) we need to simulate a game of
baseball using the designated lineup
Our simulation follows the structure of a normal game of baseball
At each point in time, the next batter steps up to the plate and either
generates a RPE or gets out, depending on that player’s distribution
RPEs advance runners according to the rules of baseball or by
probabilistic outcomes determined using data from the 2011 season
The number of outs and runs is recorded for each of 16,200 games
Results of Analysis
Standard Lineup
Batting Order
1
2
3
4
5
6
7
8
9
Player
Derek Jeter
Curtis Granderson
Robinson Cano
Alex Rodriguez
Mark Teixeira
Nick Swisher
Jorge Posada
Russel Martin
Brett Gardner
Heuristic Lineup
Set
R-, P+
R+, PR+, PR+, P+
R+, PR-, P+
R-, PR-, PR-, P+
This lineup generated an
average of 5.68 runs, and is
expected to have a 61.3%
chance of winning a 5-game
series against the Detroit Tigers
Batting Order
1
2
3
4
5
6
7
8
9
Player
Brett Gardner
Derek Jeter
Alex Rodriguez
Robinson Cano
Curtis Granderson
Andruw Jones
Mark Teixeira
Russel Martin
Nick Swisher
Set
R-, P+
R-, P+
R+, P+
R+, PR+, PR+, PR+, PR-, PR-, P+
This lineup generated an average
of 5.84 runs, with a 64.7%
chance of winning a 5-game
series against the Detroit Tigers
Conclusions and Other Applications
The heuristic was only able to generate a lineup with a 3% increase in
the amount of expected runs
Since statistical analysis in baseball is a known quantity, it may be the
case that the NYY have already studied this problem in great detail
Even if the gains in expected run production were minimal, there are
other applications for our methodology
Potential trades or acquisitions of new players can be evaluated by what
effect they would have on the team’s expected run production
Can apply a game-theoretic approach to maximize your expected win rate
by adjusting the distribution of your team’s run production to maximize
the potential of winning a game against a specific team