Approximate Dynamic Programming: Solving the curses of dimensionality University of Manchester February, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2008 Warren B.
Download
Report
Transcript Approximate Dynamic Programming: Solving the curses of dimensionality University of Manchester February, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2008 Warren B.
Approximate Dynamic Programming:
Solving the curses of dimensionality
University of Manchester
February, 2009
Warren Powell
CASTLE Laboratory
Princeton University
http://www.castlelab.princeton.edu
© 2008 Warren B. Powell, Princeton University © 2008 Warren B. Powell
Slide 1
© 2008 Warren B. Powell
Slide 2
© 2008 Warren B. Powell
Slide 3
The fractional jet ownership industry
© 2008 Warren B. Powell
Slide 4
NetJets Inc.
© 2008 Warren B. Powell
Slide 5
Schneider National
© 2008 Warren B. Powell
Slide 6
Schneider National
© 2008 Warren B. Powell
Slide 7
Planning for a risky world
Weather
•Robust design of emergency response
networks.
•Design of financial instruments to hedge
against weather emergencies to protect
individuals, companies and municipalities.
•Design of sensor networks and communication
systems to manage responses to major weather
events.
Disease
•Models of disease propagation for response
planning.
•Management of medical personnel, equipment
and vaccines to respond to a disease outbreak.
•Robust design of supply chains to mitigate the
disruption of transportation systems.
© 2008 Warren B. Powell
Slide 8
Energy management
Energy resource allocation
• What is the right mix of energy technologies?
• How should the use of different energy resources be
coordinated over space and time?
• What should my energy R&D portfolio look like?
• Should I invest in nuclear energy?
• What is the impact of a carbon tax?
Energy markets
• How should I hedge energy commodities?
• How do I price energy assets?
• What is the right price for energy futures?
© 2008 Warren B. Powell
Slide 9
High value spare parts
Electric Power Grid
•PJM oversees an aging investment in highvoltage transformers.
•Replacement strategy needs to anticipate a
bulge in retirements and failures
•1-2 year lag times on orders. Typical cost of a
replacement ~ $5 million.
•Failures vary widely in terms of economic impact
on the network.
Spare parts for business jets
•ADP is used to determine purchasing and
allocation strategies for over 400 high value
spare parts.
•Inventory strategy has to determine what to
buy, when and where to store it. Many parts are
very low volume (e.g. 7 spares spread across 15
service centers).
•Inventories have to meet global targets on
level of service and inventory costs.
© 2008 Warren B. Powell
Slide 10
Challenges
Real-time control
» Scheduling aircraft, pilots, generators, tankers
» Pricing stocks, options
» Electricity resource allocation
Near-term tactical planning
» Can I accept a customer request?
» Should I lease equipment?
» How much energy can I commit with my windmills?
Strategic planning
» What is the right equipment mix?
» What is the value of this contract?
» What is the value of more reliable aircraft?
© 2008 Warren B. Powell
Slide 11
Outline
A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Outline
A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
A resource allocation model
Attribute vectors:
a
Asset class
Time invested
Type
Location
Age
Location
ETA
Home
Experience
Driving hours
© 2008 Warren B. Powell
Location
ETA
A/C type
Fuel
level
Home shop
Crew
Eqpt1
Eqpt100
Slide 14
A resource allocation model
Modeling resources:
» The attributes of a single resource:
a The attributes of a single resource
a A The attribute space
» The resource state vector:
Rta The number of resources with attribute a
Rt Rta
aA
The resource state vector
» The information process:
Rˆta The change in the number of resources with
attribute a.
© 2008 Warren B. Powell
Slide 15
A resource allocation model
Modeling demands:
» The attributes of a single demand:
b The attributes of a demand to be served.
b B The attribute space
» The demand state vector:
Dtb The number of demands with attribute b
Dt Dtb
bB
The demand state vector
» The information process:
Dˆ tb The change in the number of demands with
attribute b.
© 2008 Warren B. Powell
Slide 16
Energy resource modeling
The system state:
St Rt , Dt , t System state, where:
Rt Resource state (how much capacity, reserves)
Dt Market demands
t "system parameters"
State of the technology (costs, performance)
Climate, weather (temperature, rainfall, wind)
Government policies (tax rebates on solar panels)
Market prices (oil, coal)
© 2008 Warren B. Powell
Slide 17
Energy resource modeling
The decision variable:
New capacity
Retired
capacity
for each:
xt
Type
Location
Technology
© 2008 Warren B. Powell
Slide 18
Energy resource modeling
Exogenous information:
Wt New information = Rˆt , Dˆ t , ˆt
Rˆt Exogenous changes in capacity, reserves
Dˆ t New demands for energy from each source
ˆt Exogenous changes in parameters.
© 2008 Warren B. Powell
Slide 19
Energy resource modeling
The transition function
St 1 S (St , xt ,Wt 1 )
M
© 2008 Warren B. Powell
Slide 20
A resource allocation model
Resources
Demands
© 2008 Warren B. Powell
Slide 21
A resource allocation model
t
t+1
© 2008 Warren B. Powell
t+2
Slide 22
A resource allocation model
Optimizing over time
t
t+1
t+2
Optimizing at a point in time
© 2008 Warren B. Powell
Slide 23
Outline
A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Introduction to ADP
We just solved Bellman’s equation:
Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St
xX
» We found the value of being in each state by stepping
backward through the tree.
© 2008 Warren B. Powell
Slide 25
Introduction to ADP
The challenge of dynamic programming:
Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St
xX
Problem: Curse of dimensionality
© 2008 Warren B. Powell
Slide 26
Introduction to ADP
The challenge of dynamic programming:
Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St
xX
Three curses
Problem: Curse of dimensionality
State space
Outcome space
Action space (feasible region)
© 2008 Warren B. Powell
Slide 27
Introduction to ADP
The computational challenge:
Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St
xX
How do we find Vt 1 (St 1 )?
How do we compute the expectation?
How do we find the optimal solution?
© 2008 Warren B. Powell
Slide 28
Introduction to ADP
Classical ADP
» Most applications of ADP focus on the challenge of
handling multidimensional state variables
» Start with
Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St
xX
» Now replace the value function with some sort of
approximation
Vt (St ) Vt (St )
Introduction to ADP
Approximating the value function:
» We have to exploit the structure of the value function
(e.g. concavity).
» We might approximate the value function using a
simple polynomial
Vt (St | ) 0 1St 2 St2
» .. or a complicated one:
Vt (St | ) 0 1St 2 St2 3 ln St 4 sin(St )
» Sometimes, they get really messy:
Vt ( R | )
t 2
(0)
s
t 't
t 3
(1)
t , st '
Rt , st ' t(1)
, wt ' Rt , wt '
1,2,3 4,5,6,7
w t 't
2
(2) 2
t(2)
R
, wt ' t , wt '
t , st ' Rt , st '
w
t'
(3)
ts
s
s
8-14
t'
1
R
R
t
,
st
t
,
s
'
t
S s'
2
15
t 1
t 1
1
(4)
ts Rt , st '
R
t , s 't '
2
S
s
t
'
t
s
'
t
'
t
t 2
t 2
1
(5)
ts Rt , st '
R
t ,s 't '
s
3S s ' t ' t
t 't
,1)
t(,ws
ws Rt , wt Rt , st
w
2
16
2
17
18
s
( ws ,2)
t , ws
t 2
Rt , wt Rt , st '
t 't
19
( ws ,3)
t , ws
t 3
t 2
R
R
t , wt ' t , st '
t 't
t 't
20
( ws ,4)
t , ws
t 2
t 2
R
R
t
,
wt
'
t
,
st
'
t 't
t 't
21
w
w
s
s
w
s
,5)
t(,ws
sw Rt , s ,t 2 Rt , wt Rt , w ,t 2
w
s
22
Introduction to ADP
We can write a model of the observed value of being in a
state as:
vˆ 0 1St 2 St2 3 ln St 4 sin(St )
This is often written as a generic regression model:
Y 0 1 X1 2 X 2 3 X 3
4 X 4
The ADP community refers to the independent variables
as basis functions:
Y 00 (S ) 11 (S ) 22 (S ) 33 (S ) 44 (S )
= f f (S )
f F
f ( R) are also known as features.
Introduction to ADP
Methods for estimating
» Generate observations vˆ1 , vˆ2 ,..., vˆ N , and use traditional regression
methods to fit .
» Stochastic gradient for updating n :
n n1 n1 V n1 ( S n | n 1 ) vˆn V n 1 ( S n | n 1 )
1 ( S )
(
S
)
n 1 n 1 V n 1 ( S n | n 1 ) vˆn 2
(
S
)
F
Error
Basis functions
Introduction to ADP
Methods for estimating
» Recursive statistics
• Iterative equations avoid matrix inverse:
n n 1 H n x n n
H
n
1
n
B B
n
B
n 1
1 x
n
B is F F matrix= XX
n 1
T
n
B n 1 x n
1
n T
n Error
B x x
n 1 n
n T
B n 1
1
Introduction to ADP
Other statistical methods
» Regression trees
• Combines regression with techniques for discrete variables.
» Data mining
• Good for categorical data
» Neural networks
• Engineers like this for low-dimensional continuous problems
» Kernel/locally polynomial regression
• Approximations portions of the value function locally using
simple functions
» Dirichlet mixture models
• Aggregate portions of the function and fit approximations
around these aggregations. Need
Introduction to ADP
What you will struggle with:
» Stepsizes
• Can’t live with ‘em, can’t live without ‘em.
• Too small, you think you have converged but you have really
just stalled (“apparent convergence”)
• Too large, and the system is unstable.
» Stability
• There are two sources of randomness:
– The traditional exogenous randomness
– An evolving policy
» Exploration vs. exploitation
• You sometimes have to choose to visit a state just to collect
information about the value of being in a state.
Introduction to ADP
But we are not out of the woods…
» Assume we have an approximate value function.
» We still have to solve a problem that looks like
Vt ( St ) max Ct ( St , xt ) E f f ( St 1 )
xX
f F
» This means we still have to deal with a maximization
problem (might be a linear, nonlinear or integer
program) with an expectation.
Outline
A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Information
State
Action
Information
Action
State
Rain .2 -$2000
Clouds .3 $1000
Sun .5 $5000
Rain .2 -$200
Clouds .3 -$200
Sun .5 -$200
Rain .8 -$2000
Clouds .2 $1000
Sun .0 $5000
Rain .8 -$200
Clouds .2 -$200
Sun .0 -$200
Rain .1 -$2000
Clouds .5 $1000
Sun .4 $5000
Rain .1 -$200
Clouds .5 -$200
Sun .4 -$200
Rain .1 -$2000
Clouds .2 $1000
Sun .7 $5000
Rain .1 -$200
Clouds .2 -$200
Sun .7 -$200
- Decision nodes
- Outcome nodes
-$1400
-$200
$2300
-$200
$3500
-$200
$2400
-$200
-$200
$2300
$3500
$2400
-$200
$2770
$2400
St 1
e
gam
e
l
u
d
Sche
Canc
el gam
e
St
rain
t
s
ca
o re
F
For
rep
.3
ame
g
e
l
du
Sche
Cancel
game
su
nn
he r
dy
ast
6
y.
eat
lo u
rec
Use
w
st c
Fo
or t
ec a
.1
se rt
t u po
n o e r re
Do eath
w
e
e g am
l
u
d
Sche
Cancel
game
ame
g
e
l
du
Sche
Cancel
gamte
xX
Rain .2 -$2000
Clouds .3 $1000
Sun .5 $5000
Rain .2 -$200
t
t 1
t 1
Clouds .3 -$200
Sun .5 -$200
Rain .8 -$2000
Clouds .2 $1000
Sun .0 $5000
Rain .8 -$200
Clouds .2 -$200
Sun .0 -$200
Rain .1 -$2000
Clouds .5 $1000
Sun .4 $5000
Rain .1 -$200
Clouds .5 -$200
Sun .4 -$200
Rain .1 -$2000
Clouds .2 $1000
Sun .7 $5000
Rain .1 -$200
Clouds .2 -$200
Sun .7 -$200
Vt ( St ) max C ( St , x ) E V ( S ) | St -Decision nodes
- Outcome nodes
rain
t
s
ca
o re
F
For
ep o
er r
.3
ame
g
e
l
du
Sche
Cancel
game
su
nn
a th
dy
as t
6
y.
we
lo u
.1
rec
Use
st c
Fo
rt
ec a
e
gam
e
l
u
d
Sche
Canc
el gam
e
se rt
t u po
no er re
Do eath
w
e
e g am
l
u
d
Sche
Cancel
game
ame
g
e
l
du
Sche
Cancel
game
Rain .2 -$2000
Clouds .3 $1000
Sun .5 $5000
Rain .2 -$200
Clouds .3 -$200
Sun .5 -$200
Rain .8 -$2000
Clouds .2 $1000
Sun .0 $5000
Rain .8 -$200
Clouds .2 -$200
Sun .0 -$200
Rain .1 -$2000
Clouds .5 $1000
Sun .4 $5000
Rain .1 -$200
Clouds .5 -$200
Sun .4 -$200
Rain .1 -$2000
Clouds .2 $1000
Sun .7 $5000
Rain .1 -$200
Clouds .2 -$200
Sun .7 -$200
- Decision nodes
- Outcome nodes
The post-decision state
New concept:
» The “pre-decision” state variable:
• St The information required to make a decision xt
• Same as a “decision node” in a decision tree.
» The “post-decision” state variable:
x
• St The state of what we know immediately after we
make a decision.
• Same as an “outcome node” in a decision tree.
© 2008 Warren B. Powell
Slide 45
The post-decision state
An inventory problem:
» Our basic inventory equation:
^ t + 1g
Rt + 1 = maxf 0; Rt + x t ¡ D
where
Rt = Invent ory at t ime t.
xt
=
Amount we ordered.
^t+ 1
D
=
Demand in next t ime period.
» Using pre- and post-decision states:
Rtx
=
Rt + 1
=
Rt + x t
Post -decision st at e
^ t + 1g
maxf 0; Rtx ¡ D
Pre-decision st at e
© 2008 Warren B. Powell
Slide 46
The post-decision state
Pre-decision, state-action, and post-decision
State
Pre-decision state
Action
9
3 states
Post-decision state
3 9 state-action pairs
9
© 2008 Warren B. Powell
39 states
Slide 47
The post-decision state
Pre-decision: resources and demands
St (Rt , Dt )
© 2008 Warren B. Powell
Slide 48
The post-decision state
Stx S M , x (St , xt )
© 2008 Warren B. Powell
Slide 49
The post-decision state
Stx
St 1 S M ,W (Stx ,Wt 1 )
ˆ )
Wt 1 (Rˆt 1, D
t 1
© 2008 Warren B. Powell
Slide 50
The post-decision state
St 1
© 2008 Warren B. Powell
Slide 51
The post-decision state
Classical form of Bellman’s equation:
Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St
xX
Bellman’s equations around pre- and post-decision
states:
» Optimization problem (making the decision):
Vt ( St ) max x Ct ( St , xt ) Vt x StM , x ( St , xt )
• Note: this problem is deterministic!
» Simulation problem (the effect of exogenous
information):
Vt x ( Stx ) E Vt 1 ( S M ,W ( Stx ,Wt 1 )) | Stx
© 2008 Warren B. Powell
Slide 52
The post-decision state
Challenges
» For most practical problems, we are not going to be
able to compute Vt x (Stx ) .
Vt ( St ) max x Ct ( St , xt ) Vt x ( Stx )
» Concept: replace it with an approximation Vt (Stx ) and
solve
Vt ( St ) max x Ct ( St , xt ) Vt ( Stx )
» So now we face:
• What should the approximation look like?
• How do we estimate it?
© 2008 Warren B. Powell
Slide 53
The post-decision state
Value function approximations:
» Linear (in the resource state):
Vt ( Rtx ) vta Rtax
aA
» Piecewise linear, separable:
Vt ( Rtx ) Vta ( Rtax )
aA
» Indexed PWL separable:
Vt ( Rtx ) Vta Rtax | ( featurest )
aA
© 2008 Warren B. Powell
Slide 54
The post-decision state
Value function approximations:
» Ridge regression (Klabjan and Adelman)
Vt ( Rtx )
V R
f F
tf
Rtf
tf
aA f
fa
Rta
» Benders cuts
Vt ( Rt )
x1
x0
© 2008 Warren B. Powell
Slide 55
The post-decision state
Comparison to other methods:
» Classical MDP (value iteration)
V n ( S ) max x C ( S , x) EV n 1 ( St 1 )
» Classical ADP (pre-decision state):
vˆtn max x Ct ( Stn , xt ) p ( s ' | Stn , xt )Vt n11 s ' Expectation
s'
vˆt updates Vt (St )
Vt n ( Stn ) (1 n 1 )Vt n 1 ( Stn ) n 1vˆtn
» Our method (update Vt x ,n1around post-decision state):
vˆtn max x Ct ( Stn , xt ) Vt x ,n 1 ( S M , x ( Stn , xt ))
Vt n1 ( Stx,1n ) (1 n 1 )Vt n11 ( Stx,1n ) n 1vˆtn
© 2008 Warren B. Powell
vˆt updates Vt 1 (Stx1 )
Slide 56
The post-decision state
Step 1: Start with a pre-decision state Stn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
n
n 1
M ,x
n
vˆt max x Ct ( St , xt ) Vt
(S
( St , xt ))
to obtain xtn.
Step 3: Update the value function approximation
Vt n1 (Stx,1n ) (1 n1 )Vt n11 (Stx,1n ) n1vˆtn
Recursive
statistics
Step 4: Obtain Monte Carlo sample of Wt (n ) and
Simulation
compute the next pre-decision state:
Stn1 S M (Stn , xtn ,Wt 1 ( n ))
Step 5: Return to step 1.
© 2008 Warren B. Powell
Slide 57
Outline
A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Blood management
Managing blood inventories
© 2008 Warren B. Powell
Slide 59
Blood management
Managing blood inventories over time
Week 0
Week 1
x0
x1
S0
t=0
ˆ
Rˆ1, D
1
S1
ˆ
Rˆ2 , D
2
Week 2
Week 3
x2
S2 S2x
x3
S3 S3x
t=1
t=2
© 2008 Warren B. Powell
ˆ
Rˆ3 , D
3
t=3
Slide 60
St
R
t
,
Dˆ t
Dt , AB
AB+,0
AB-
Dˆ t , AB
AB+,1
AB+,2
A+
Dˆ t , A
Dˆ t , AB
AB+,3
AB+
Rt ,( AB,0)
AB+,0
Rt ,( AB,1)
AB+,1
Rt ,( AB,2)
AB+,2
A-
Rt ,(O ,0)
O-,0
Rt ,(O ,1)
O-,1
Rt ,(O ,2)
ˆ
Rtx
B+
B-
O-,2
Dˆ t , AB
Dˆ t , AB
Dˆ t , AB
O+
O-
O-,0
O-,1
O-,2
O-,3
Dˆ t , AB
Satisfy a demand
Hold
Rt
Rtx
Rt 1
Rˆt 1, AB
Rt ,( AB,0)
AB+,0
AB+,0
Rt ,( AB,1)
AB+,1
AB+,1
AB+,1
Rt ,( AB,2)
AB+,2
AB+,2
AB+,2
AB+,3
AB+,3
AB+,0
Rt ,(O ,0)
O-,0
Rt ,(O ,1)
O-,1
O-,0
Rt ,(O ,2)
O-,2
O-,1
O-,1
O-,2
O-,2
O-,3
O-,3
Dˆ t
Rˆt 1,O
O-,0
Rtx
Rt
Rt ,( AB,0)
AB+,0
AB+,0
Rt ,( AB,1)
AB+,1
AB+,1
Rt ,( AB,2)
AB+,2
AB+,2
AB+,3
Rt ,(O ,0)
O-,0
Rt ,(O ,1)
O-,1
O-,0
Rt ,(O ,2)
O-,2
O-,1
O-,2
O-,3
Dˆ t
Rtx
Rt
Rt ,( AB,0)
AB+,0
AB+,0
Rt ,( AB,1)
AB+,1
AB+,1
Rt ,( AB,2)
AB+,2
AB+,2
AB+,3
Rt ,(O ,0)
O-,0
Rt ,(O ,1)
O-,1
O-,0
Rt ,(O ,2)
O-,2
O-,1
O-,2
O-,3
Solve this as a
linear program.
Dˆ t
F ( Rt )
Duals
ˆt ,( AB,0)
ˆt ,( AB,1)
ˆt ,( AB,2)
Rtx
Rt
AB+,0
AB+,0
AB+,1
AB+,1
AB+,2
AB+,2
AB+,3
ˆt ,( AB,0)
O-,0
ˆt ,( AB,1)
O-,1
O-,0
ˆt ,( AB,2)
O-,2
O-,1
O-,2
O-,3
Dual variables give
value additional
unit of blood..
Dˆ t
F ( Rt )
Updating the value function approximation
Estimate the gradient at Rtn
ˆtn,( AB,2)
F ( Rt )
Rtn,( AB,2)
© 2008 Warren B. Powell
Slide 66
Updating the value function approximation
Update the value function at Rtx,1n
Vt n11 ( Rtx1 )
ˆtn,( AB,2)
F ( Rt )
Rtx,1n
Rtn,( AB,2)
© 2008 Warren B. Powell
Slide 67
Updating the value function approximation
Update the value function at Rtx,1n
Vt n11 ( Rtx1 )
ˆtn,( AB,2)
Rtx,1n
© 2008 Warren B. Powell
Slide 68
Updating the value function approximation
Update the value function at Rtx,1n
Vt n11 ( Rtx1 )
Vt n1 ( Rtx1 )
Rtx,1n
© 2008 Warren B. Powell
Slide 69
Iterative learning
t
© 2008 Warren B. Powell
Slide 70
Iterative learning
© 2008 Warren B. Powell
Slide 71
Iterative learning
© 2008 Warren B. Powell
Slide 72
Iterative learning
© 2008 Warren B. Powell
Slide 73
Approximate dynamic programming
With luck, the objective function will improve steadily
Objec tive func tion
100
99.5
99
98.5
98
97.5
97
1
11
21
31
41
Ite ra tion
51
61
71
81
Approximate dynamic programming
… but performance can be jagged.
1900000
1800000
Objective function
1700000
1600000
1500000
1400000
1300000
1200000
0
100
200
300
400
500
Iterations
600
700
800
900
1000
Outline
A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Hierarchical aggregation
Attribute vectors:
a
Asset class Blood type
Time invested
Age
Frozen?
Location
ETA
Bus. segment
Single/team
Domicile
Drive
hours
Days from home
Location
ETA
A/C type
Fuel
level
Home shop
Crew
Eqpt1
Eqpt100
Aggregation
Estimating value functions
$2050 =
(1 0.10) $2000
(0.10) $2500
Location
Fleet
Location
Location
n
n 1
)
v ( Fleet ) (1 )v ( Fleet ) vˆ(
Domicile
Domicile
Domicile
DOThrs
DaysFromHome
Value function
Approximation may
have fewer attributes
than driver.
Drivers may have very
detailed attributes
Hierarchical aggregation
Estimating value functions
» Most disaggregate level
Location
Fleet
Location
Location
n
n 1
)
v ( Fleet ) (1 )v ( Fleet ) vˆ(
Domicile
Domicile
Domicile
DOThrs
DaysFromHome
Hierarchical aggregation
Estimating value functions
» Middle level of aggregation
Location
Fleet
n Location
n 1 Location
)
v (
) (1 )v (
) vˆ(
Domicile
Fleet
Fleet
DOThrs
DaysFromHome
Hierarchical aggregation
Estimating value functions
» Most aggregate level
Location
Fleet
n
n 1
)
v ( Location ) (1 )v ( Location ) vˆ(
Domicile
DOThrs
DaysFromHome
Hierarchical aggregation
State-dependent weighted aggregation:
» Now we have a linear regression for each attribute
va wa( g )va( g )
g
» We cannot solve hundreds of thousands of linear
regressions. Instead use
wa( g ) Var va( g )
Estimate of variance
1
2
(g)
a
(g)
w
a 1
Estimate of bias
g
Hierarchical aggregation
1900000
1850000
1800000
Objective function
1750000
1700000
1650000
Aggregate
1600000
1550000
Disaggregate
1500000
1450000
1400000
0
100
200
300
400
500
Iterations
600
700
800
900
1000
Hierarchical aggregation
1900000
Weighted Combination
1850000
1800000
Objective function
1750000
1700000
1650000
Aggregate
1600000
1550000
Disaggregate
1500000
1450000
1400000
0
100
200
300
400
500
Iterations
600
700
800
900
1000
Outline
A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Stepsizes
Stepsizes:
» Fundamental to ADP is an updating equation that looks
like:
V (S ) (1 n1 )V
n
t 1
x
t 1
Updated estimate
The stepsize
“Learning rate”
“Smoothing factor”
n1
t 1
n
ˆ
(S ) n1vt
x
t 1
Old estimate
New observation
Stepsizes
High noise data:
Best stepsize
3
2.5
n 1/ n
2
1.5
1
0.5
0
1
21
41
61
81
Low noise, changing signal
1.2
1
n 1
0.8
0.6
0.4
0.2
0
1
21
41
61
81
Stepsizes
Theorem 1 (general knowledge):
» If data is stationary, then 1/n is the best possible
stepsize.
Theorem 2 (e.g. Tsitsiklis 1994)
» For general problems, 1/n is provably convergent
Theorem 3 (Frazier and Powell):
» 1/n works so badly it should (almost) never be used.
Stepsizes
Lower bound on the value function after n iterations:
10 2
10 4
106
108
1010
1012
Stepsizes
Bias-adjusted Kalman filter
Estimate of the variance
n 1
2
Noise
1
n 1
2
n 2
Estimate of the bias
where:
n 1 n n 1 n
2
2
As 2 increases, stepsize decreases toward 1/ n
As n increases, stepsize increases toward 1.
Bias
Stepsizes
The bias-adjusted Kalman filter
1.2
Observed values
1
0.8
0.6
BAKF stepsize rule
0.4
0.2
1/n stepsize rule
0
1
21
41
61
81
Stepsizes
The bias-adjusted Kalman filter
1.6
Observed values
1.4
1.2
1
0.8
0.6
0.4
BAKF stepsize rule
0.2
0
1
21
41
61
81
Stepsizes
I recommend:
» Start with a constant stepsize
• Vary it, get a sense of what works best
» Next try a deterministic stepsize such as a/(a+n)
• Choose a so that it declines to roughly the best deterministic
stepsize at an iteration comparable to where your constant
stepsize seems to stabilize
– Might be 50 iterations
– Might be 5000
» Try an (optimal) adaptive stepsize rule
• Can work very well if there is not too much noise
• Adaptive rules work well when there is a need to keep the
stepsize from declining too quickly (but you do not know how
quickly)
Outline
A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Schneider National
© 2008 Warren B. Powell
Slide 95
© 2008 Warren B. Powell
Slide 96
Schneider National
1400
1200
Historical maximum
800
Simulation
600
Historical minimum
Utilization
400
1200
200
1000
0
US_SOLO
US_IC
US_TEAM
800
Capacity category
Revenue per WU
Utilization
Revenue per WU
Calibrated
Historical min
model
and max
1000
Historical maximum
600
Simulation
Historical minimum
400
200
0
US_SOLO
US_IC
US_TEAM
Capacity category
© 2008 Warren B. Powell
Slide 97
Princeton team:
Warren Powell
Belgacem Bouzaiene-Ayari
NS team:
Clark Cheng
Ricardo Fiorillo
Junxia Chang
Sourav Das
© 2008 Warren B. Powell
Slide 99
© 2008 Warren B. Powell
Slide 100
© 2008 Warren B. Powell
Slide 101
© 2008 Warren B. Powell
Slide 102
Convergence
Train coverage
Model calibration
Delay hours/day
Train delay curve – October 2007
Fleet size
Model calibration
Delay hours/day
Train delay curve – March 2008
Fleet size
Outline
A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Energy resource modeling
© 2008 Warren B. Powell
Slide 107
Energy resource modeling
Hourly model
» Decisions at time t impact t+1 through the amount of water held in
the reservoir.
Hour t
Hour t+1
© 2008 Warren B. Powell
Slide 108
Energy resource modeling
Hourly model
» Decisions at time t impact t+1 through the amount of water held in
the reservoir.
Hour t
Value of holding water in the reservoir
for future time periods.
© 2008 Warren B. Powell
Slide 109
Energy resource modeling
© 2008 Warren B. Powell
Slide 110
Energy resource modeling
Hour
2008
1
2
3
4
© 2008 Warren B. Powell
8760
2009
1
2
Slide 111
Energy resource modeling
Hour
2008
1
2
3
4
© 2008 Warren B. Powell
8760
2009
1
2
Slide 112
Annual energy model
Optimal from linear program
9000
8000
Reservoir level
7000
6000
5000
4000
3000
2000
Demand
1000
0
0
100
200
300
400
500
Time period
600
700
800
Annual energy model
Approximate dynamic programming
9000
8000
Reservoir level
7000
6000
5000
4000
3000
2000
Demand
1000
0
0
100
200
300
400
500
Time period
600
700
800
Annual energy model
Optimal from linear program
9000
8000
Reservoir level
7000
6000
5000
4000
3000
2000
Demand
1000
0
0
100
200
300
400
500
Time period
600
700
800
Annual energy model
Approximate dynamic programming
9000
8000
Reservoir level
7000
6000
5000
4000
3000
2000
Demand
1000
0
0
100
200
300
400
500
Time period
600
700
800
Annual energy model
ADP vs optimal reservoir levels for stochastic rainfall
9000
Reservoir level
8000
ADP at last iteration
Optimal for individual
scenarios
7000
6000
5000
4000
3000
2000
1000
0
0
100
200
300
400
500
600
700
800
Time period
© 2008 Warren B. Powell
Slide 117
© 2008 Warren B. Powell
Slide 118
The post-decision state
What happens if use Bellman’s
equation?
» State variable is:
• The supply of each type of blood
– 8 blood types
– 6 ages
• The demand for each type of blood
– 8 blood types
• 56 dimensional state vector
» Decision variable is how much of 8 blood
types to supply to 8 demand types.
• 162- dimensional decision vector
» Random information
• Blood donations by week (8 types)
• New demands for blood (8 types)
• 16-dimensional information vector
© 2008 Warren B. Powell
Slide 119
Blood management
Managing blood inventories over time
Week 0
Week 1
Week 2
Week 3
x0
S0 S0x
x1
S1 S1x
x2
S2 S2x
x3
S3 S3x
t=0
ˆ
Rˆ1, D
1
ˆ
Rˆ2 , D
2
t=1
t=2
© 2008 Warren B. Powell
ˆ
Rˆ3 , D
3
t=3
Slide 120
Approximate dynamic programming
Optimization
Optimization
Strengths
Produces optimal decisions.
Mature technology
Weaknesses
Cannot handle uncertainty.
Cannot handle high levels
of complexity
Simulation
Simulation
Strengths
Extremely flexible
High level of detail
Easily handles uncertainty
Weaknesses
Models decisions using
user-specified rules.
Low solution quality.
Cplex
Approximate dynamic programming combines simulation and
optimization in a rigorous yet flexible framework.
Approximate dynamic programming
Step 1: Start with a pre-decision state Stn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
n
n 1
M ,x
n
vˆt max x Ct ( St , xt ) Vt
(S
( St , xt ))
to obtain xtn.
Step 3: Update the value function approximation
Vt n1 (Stx,1n ) (1 n1 )Vt n11 (Stx,1n ) n1vˆtn
Recursive
statistics
Step 4: Obtain Monte Carlo sample of Wt (n ) and
Simulation
compute the next pre-decision state:
Stn1 S M (Stn , xtn ,Wt 1 ( n ))
Step 5: Return to step 1.
© 2008 Warren B. Powell
Slide 122
Approximate dynamic programming
Step 1: Start with a pre-decision state Stn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
n
n 1
M ,x
n
vˆt max x Ct ( St , xt ) Vt
(S
( St , xt ))
to obtain xtn.
Step 3: Update the value function approximation
Vt n1 (Stx,1n ) (1 n1 )Vt n11 (Stx,1n ) n1vˆtn
Recursive
statistics
Step 4: Obtain Monte Carlo sample of Wt (n ) and
Simulation
compute the next pre-decision state:
Stn1 S M (Stn , xtn ,Wt 1 ( n ))
Step 5: Return to step 1.
© 2008 Warren B. Powell
Slide 123