Approximate Dynamic Programming: Solving the curses of dimensionality University of Manchester February, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2008 Warren B.

Transcript Approximate Dynamic Programming: Solving the curses of dimensionality University of Manchester February, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2008 Warren B.

Approximate Dynamic Programming:
Solving the curses of dimensionality
University of Manchester
February, 2009
Warren Powell
CASTLE Laboratory
Princeton University
http://www.castlelab.princeton.edu
© 2008 Warren B. Powell, Princeton University © 2008 Warren B. Powell
Slide 1
© 2008 Warren B. Powell
Slide 2
© 2008 Warren B. Powell
Slide 3
The fractional jet ownership industry
© 2008 Warren B. Powell
Slide 4
NetJets Inc.
© 2008 Warren B. Powell
Slide 5
Schneider National
© 2008 Warren B. Powell
Slide 6
Schneider National
© 2008 Warren B. Powell
Slide 7
Planning for a risky world
Weather
•Robust design of emergency response
networks.
•Design of financial instruments to hedge
against weather emergencies to protect
individuals, companies and municipalities.
•Design of sensor networks and communication
systems to manage responses to major weather
events.
Disease
•Models of disease propagation for response
planning.
•Management of medical personnel, equipment
and vaccines to respond to a disease outbreak.
•Robust design of supply chains to mitigate the
disruption of transportation systems.
© 2008 Warren B. Powell
Slide 8
Energy management
Energy resource allocation
• What is the right mix of energy technologies?
• How should the use of different energy resources be
coordinated over space and time?
• What should my energy R&D portfolio look like?
• Should I invest in nuclear energy?
• What is the impact of a carbon tax?
Energy markets
• How should I hedge energy commodities?
• How do I price energy assets?
• What is the right price for energy futures?
© 2008 Warren B. Powell
Slide 9
High value spare parts
Electric Power Grid
•PJM oversees an aging investment in highvoltage transformers.
•Replacement strategy needs to anticipate a
bulge in retirements and failures
•1-2 year lag times on orders. Typical cost of a
replacement ~ $5 million.
•Failures vary widely in terms of economic impact
on the network.
Spare parts for business jets
•ADP is used to determine purchasing and
allocation strategies for over 400 high value
spare parts.
•Inventory strategy has to determine what to
buy, when and where to store it. Many parts are
very low volume (e.g. 7 spares spread across 15
service centers).
•Inventories have to meet global targets on
level of service and inventory costs.
© 2008 Warren B. Powell
Slide 10
Challenges
 Real-time control
» Scheduling aircraft, pilots, generators, tankers
» Pricing stocks, options
» Electricity resource allocation
 Near-term tactical planning
» Can I accept a customer request?
» Should I lease equipment?
» How much energy can I commit with my windmills?
 Strategic planning
» What is the right equipment mix?
» What is the value of this contract?
» What is the value of more reliable aircraft?
© 2008 Warren B. Powell
Slide 11
Outline







A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Outline







A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
A resource allocation model
 Attribute vectors:
a
 Asset class 
Time invested 


 Type 
 Location 


 Age 
 Location 


ETA




Home


Experience


 Driving hours 
© 2008 Warren B. Powell
 Location 
 ETA 


 A/C type 


Fuel
level


 Home shop 


 Crew 
 Eqpt1 




 Eqpt100 


Slide 14
A resource allocation model
 Modeling resources:
» The attributes of a single resource:
a  The attributes of a single resource
a A The attribute space
» The resource state vector:
Rta  The number of resources with attribute a
 
Rt  Rta
aA
The resource state vector
» The information process:
Rˆta  The change in the number of resources with
attribute a.
© 2008 Warren B. Powell
Slide 15
A resource allocation model
 Modeling demands:
» The attributes of a single demand:
b  The attributes of a demand to be served.
b B The attribute space
» The demand state vector:
Dtb  The number of demands with attribute b
 
Dt  Dtb
bB
The demand state vector
» The information process:
Dˆ tb  The change in the number of demands with
attribute b.
© 2008 Warren B. Powell
Slide 16
Energy resource modeling
 The system state:









St   Rt , Dt , t   System state, where:
Rt  Resource state (how much capacity, reserves)
Dt  Market demands
t  "system parameters"
State of the technology (costs, performance)
Climate, weather (temperature, rainfall, wind)
Government policies (tax rebates on solar panels)
Market prices (oil, coal)
© 2008 Warren B. Powell
Slide 17
Energy resource modeling
 The decision variable:

 New capacity 



Retired
capacity





for each: 

 xt  
Type





Location




 Technology 

© 2008 Warren B. Powell
Slide 18
Energy resource modeling
 Exogenous information:










Wt  New information = Rˆt , Dˆ t , ˆt

Rˆt  Exogenous changes in capacity, reserves
Dˆ t  New demands for energy from each source
ˆt  Exogenous changes in parameters.
© 2008 Warren B. Powell
Slide 19
Energy resource modeling
 The transition function









St 1  S (St , xt ,Wt 1 )
M
© 2008 Warren B. Powell
Slide 20
A resource allocation model
Resources
Demands
© 2008 Warren B. Powell
Slide 21
A resource allocation model
t
t+1
© 2008 Warren B. Powell
t+2
Slide 22
A resource allocation model
Optimizing over time
t
t+1
t+2
Optimizing at a point in time
© 2008 Warren B. Powell
Slide 23
Outline







A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Introduction to ADP
 We just solved Bellman’s equation:
Vt ( St )  max Ct ( St , xt )  E Vt 1 ( St 1 ) | St 
xX
» We found the value of being in each state by stepping
backward through the tree.
© 2008 Warren B. Powell
Slide 25
Introduction to ADP
 The challenge of dynamic programming:
Vt ( St )  max  Ct ( St , xt )  E Vt 1 ( St 1 ) | St 
xX
 Problem: Curse of dimensionality
© 2008 Warren B. Powell
Slide 26
Introduction to ADP
 The challenge of dynamic programming:
Vt ( St )  max  Ct ( St , xt )  E Vt 1 ( St 1 ) | St 
xX
Three curses
 Problem: Curse of dimensionality
State space
Outcome space
Action space (feasible region)
© 2008 Warren B. Powell
Slide 27
Introduction to ADP
 The computational challenge:
Vt ( St )  max  Ct ( St , xt )  E Vt 1 ( St 1 ) | St 
xX
How do we find Vt 1 (St 1 )?
How do we compute the expectation?
How do we find the optimal solution?
© 2008 Warren B. Powell
Slide 28
Introduction to ADP
 Classical ADP
» Most applications of ADP focus on the challenge of
handling multidimensional state variables
» Start with
Vt ( St )  max  Ct ( St , xt )  E Vt 1 ( St 1 ) | St 
xX
» Now replace the value function with some sort of
approximation
Vt (St )  Vt (St )
Introduction to ADP
 Approximating the value function:
» We have to exploit the structure of the value function
(e.g. concavity).
» We might approximate the value function using a
simple polynomial
Vt (St |  )  0  1St  2 St2
» .. or a complicated one:
Vt (St |  )  0  1St  2 St2  3 ln  St   4 sin(St )
» Sometimes, they get really messy:
Vt ( R |  )  
t 2
(0)
  
s
t 't
t 3
(1)
t , st '
Rt , st '    t(1)
, wt ' Rt , wt '
1,2,3 4,5,6,7
w t 't
2
(2) 2
  t(2)
R



, wt ' t , wt '
t , st ' Rt , st '
w
t'
 
(3)
ts
s
s
8-14
t'
1


R

R

t
,
st
t
,
s
'
t


S s'


2
15
t 1
t 1

 1
(4)  
  ts    Rt , st '  
R

t , s 't ' 
2
S
s
t
'

t
s
'
t
'

t




t 2
t 2

 1
(5)  
  ts    Rt , st '  
R
 t ,s 't ' 
s
 3S s ' t '  t
  t 't

,1)
  t(,ws
ws Rt , wt Rt , st
w
2
16
2
17
18
s
  
( ws ,2)
t , ws
 t 2

Rt , wt   Rt , st ' 
 t 't

19
  
( ws ,3)
t , ws
 t 3
  t 2

R
R
  t , wt '    t , st ' 
 t 't
  t 't

20
( ws ,4)
t , ws
 t 2
  t 2

R
R


t
,
wt
'
t
,
st
'



 t 't
  t 't

21
w
w
s
s
  
w
s
,5)
  t(,ws
sw Rt , s ,t  2 Rt , wt Rt , w ,t  2
w
s
22
Introduction to ADP
 We can write a model of the observed value of being in a
state as:
vˆ  0  1St  2 St2  3 ln  St   4 sin(St )  
 This is often written as a generic regression model:
Y  0  1 X1  2 X 2  3 X 3
 4 X 4
 The ADP community refers to the independent variables
as basis functions:
Y  00 (S )  11 (S )  22 (S )  33 (S )   44 (S )
=   f  f (S )
f F
 f ( R) are also known as features.
Introduction to ADP
 Methods for estimating 
» Generate observations vˆ1 , vˆ2 ,..., vˆ N , and use traditional regression
methods to fit  .
» Stochastic gradient for updating  n :
 n   n1   n1 V n1 ( S n |  n 1 )  vˆn  V n 1 ( S n |  n 1 )
 1 ( S ) 



(
S
)

  n 1   n 1 V n 1 ( S n |  n 1 )  vˆn   2





(
S
)
 F

Error
Basis functions
Introduction to ADP
 Methods for estimating 
» Recursive statistics
• Iterative equations avoid matrix inverse:
 n   n 1  H n x n n
H 
n
1

n
B B
n
B
n 1

  1  x
n
B is F  F matrix=  XX 
n 1
T
n


B n 1 x n
1
n T
 n  Error
B x x
n 1 n

n T
B n 1

1
Introduction to ADP
 Other statistical methods
» Regression trees
• Combines regression with techniques for discrete variables.
» Data mining
• Good for categorical data
» Neural networks
• Engineers like this for low-dimensional continuous problems
» Kernel/locally polynomial regression
• Approximations portions of the value function locally using
simple functions
» Dirichlet mixture models
• Aggregate portions of the function and fit approximations
around these aggregations. Need
Introduction to ADP
 What you will struggle with:
» Stepsizes
• Can’t live with ‘em, can’t live without ‘em.
• Too small, you think you have converged but you have really
just stalled (“apparent convergence”)
• Too large, and the system is unstable.
» Stability
• There are two sources of randomness:
– The traditional exogenous randomness
– An evolving policy
» Exploration vs. exploitation
• You sometimes have to choose to visit a state just to collect
information about the value of being in a state.
Introduction to ADP
 But we are not out of the woods…
» Assume we have an approximate value function.
» We still have to solve a problem that looks like


Vt ( St )  max  Ct ( St , xt )  E   f  f ( St 1 ) 
xX
f F


» This means we still have to deal with a maximization
problem (might be a linear, nonlinear or integer
program) with an expectation.
Outline







A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Information
State
Action
Information
Action
State
Rain .2 -$2000
Clouds .3 $1000
Sun .5 $5000
Rain .2 -$200
Clouds .3 -$200
Sun .5 -$200
Rain .8 -$2000
Clouds .2 $1000
Sun .0 $5000
Rain .8 -$200
Clouds .2 -$200
Sun .0 -$200
Rain .1 -$2000
Clouds .5 $1000
Sun .4 $5000
Rain .1 -$200
Clouds .5 -$200
Sun .4 -$200
Rain .1 -$2000
Clouds .2 $1000
Sun .7 $5000
Rain .1 -$200
Clouds .2 -$200
Sun .7 -$200
- Decision nodes
- Outcome nodes
-$1400
-$200
$2300
-$200
$3500
-$200
$2400
-$200
-$200
$2300
$3500
$2400
-$200
$2770
$2400
St 1
e
gam
e
l
u
d
Sche
Canc
el gam
e
St
rain
t
s
ca
o re
F
For
rep
.3
ame
g
e
l
du
Sche
Cancel
game
su
nn
he r
dy
ast
6
y.
eat
lo u
rec
Use
w
st c
Fo
or t
ec a
.1
se rt
t u po
n o e r re
Do eath
w
e
e g am
l
u
d
Sche
Cancel
game
ame
g
e
l
du
Sche
Cancel
gamte
xX
Rain .2 -$2000
Clouds .3 $1000
Sun .5 $5000
Rain .2 -$200
t
t 1
t 1
Clouds .3 -$200
Sun .5 -$200
Rain .8 -$2000
Clouds .2 $1000
Sun .0 $5000
Rain .8 -$200
Clouds .2 -$200
Sun .0 -$200
Rain .1 -$2000
Clouds .5 $1000
Sun .4 $5000
Rain .1 -$200
Clouds .5 -$200
Sun .4 -$200
Rain .1 -$2000
Clouds .2 $1000
Sun .7 $5000
Rain .1 -$200
Clouds .2 -$200
Sun .7 -$200
Vt ( St )  max  C ( St , x )  E V ( S ) | St -Decision nodes
- Outcome nodes
rain
t
s
ca
o re
F
For
ep o
er r
.3
ame
g
e
l
du
Sche
Cancel
game
su
nn
a th
dy
as t
6
y.
we
lo u
.1
rec
Use
st c
Fo
rt
ec a
e
gam
e
l
u
d
Sche
Canc
el gam
e
se rt
t u po
no er re
Do eath
w
e
e g am
l
u
d
Sche
Cancel
game
ame
g
e
l
du
Sche
Cancel
game
Rain .2 -$2000
Clouds .3 $1000
Sun .5 $5000
Rain .2 -$200
Clouds .3 -$200
Sun .5 -$200
Rain .8 -$2000
Clouds .2 $1000
Sun .0 $5000
Rain .8 -$200
Clouds .2 -$200
Sun .0 -$200
Rain .1 -$2000
Clouds .5 $1000
Sun .4 $5000
Rain .1 -$200
Clouds .5 -$200
Sun .4 -$200
Rain .1 -$2000
Clouds .2 $1000
Sun .7 $5000
Rain .1 -$200
Clouds .2 -$200
Sun .7 -$200
- Decision nodes
- Outcome nodes
The post-decision state
 New concept:
» The “pre-decision” state variable:
• St  The information required to make a decision xt
• Same as a “decision node” in a decision tree.
» The “post-decision” state variable:
x
• St  The state of what we know immediately after we
make a decision.
• Same as an “outcome node” in a decision tree.
© 2008 Warren B. Powell
Slide 45
The post-decision state
 An inventory problem:
» Our basic inventory equation:
^ t + 1g
Rt + 1 = maxf 0; Rt + x t ¡ D
where
Rt = Invent ory at t ime t.
xt
=
Amount we ordered.
^t+ 1
D
=
Demand in next t ime period.
» Using pre- and post-decision states:
Rtx
=
Rt + 1
=
Rt + x t
Post -decision st at e
^ t + 1g
maxf 0; Rtx ¡ D
Pre-decision st at e
© 2008 Warren B. Powell
Slide 46
The post-decision state
 Pre-decision, state-action, and post-decision
State
Pre-decision state
Action





9
3 states
Post-decision state





3  9 state-action pairs
9
© 2008 Warren B. Powell
39 states
Slide 47
The post-decision state
 Pre-decision: resources and demands
St  (Rt , Dt )
© 2008 Warren B. Powell
Slide 48
The post-decision state
Stx  S M , x (St , xt )
© 2008 Warren B. Powell
Slide 49
The post-decision state
Stx
St 1  S M ,W (Stx ,Wt 1 )
ˆ )
Wt 1  (Rˆt 1, D
t 1
© 2008 Warren B. Powell
Slide 50
The post-decision state
St 1
© 2008 Warren B. Powell
Slide 51
The post-decision state
 Classical form of Bellman’s equation:
Vt ( St )  max  Ct ( St , xt )  E Vt 1 ( St 1 ) | St 
xX
 Bellman’s equations around pre- and post-decision
states:
» Optimization problem (making the decision):

Vt ( St )  max x Ct ( St , xt )  Vt x  StM , x ( St , xt ) 

• Note: this problem is deterministic!
» Simulation problem (the effect of exogenous
information):
Vt x ( Stx )  E Vt 1 ( S M ,W ( Stx ,Wt 1 )) | Stx 
© 2008 Warren B. Powell
Slide 52
The post-decision state
 Challenges
» For most practical problems, we are not going to be
able to compute Vt x (Stx ) .
Vt ( St )  max x  Ct ( St , xt )  Vt x ( Stx ) 
» Concept: replace it with an approximation Vt (Stx ) and
solve
Vt ( St )  max x  Ct ( St , xt )  Vt ( Stx ) 
» So now we face:
• What should the approximation look like?
• How do we estimate it?
© 2008 Warren B. Powell
Slide 53
The post-decision state
 Value function approximations:
» Linear (in the resource state):
Vt ( Rtx )   vta  Rtax
aA
» Piecewise linear, separable:
Vt ( Rtx )   Vta ( Rtax )
aA
» Indexed PWL separable:
Vt ( Rtx )   Vta  Rtax | ( featurest ) 
aA
© 2008 Warren B. Powell
Slide 54
The post-decision state
 Value function approximations:
» Ridge regression (Klabjan and Adelman)
Vt ( Rtx ) 
V R 
f F
tf
Rtf 
tf

aA f
fa
Rta
» Benders cuts
Vt ( Rt )
x1
x0
© 2008 Warren B. Powell
Slide 55
The post-decision state
 Comparison to other methods:
» Classical MDP (value iteration)
V n ( S )  max x  C ( S , x)   EV n 1 ( St 1 ) 
» Classical ADP (pre-decision state):


vˆtn  max x  Ct ( Stn , xt )   p ( s ' | Stn , xt )Vt n11  s '   Expectation
s'


vˆt updates Vt (St )
Vt n ( Stn )  (1   n 1 )Vt n 1 ( Stn )   n 1vˆtn
» Our method (update Vt x ,n1around post-decision state):
vˆtn  max x  Ct ( Stn , xt )  Vt x ,n 1 ( S M , x ( Stn , xt )) 
Vt n1 ( Stx,1n )  (1   n 1 )Vt n11 ( Stx,1n )   n 1vˆtn
© 2008 Warren B. Powell
vˆt updates Vt 1 (Stx1 )
Slide 56
The post-decision state
Step 1: Start with a pre-decision state Stn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
n
n 1
M ,x
n
vˆt  max x  Ct ( St , xt )  Vt
(S
( St , xt )) 
to obtain xtn.
Step 3: Update the value function approximation
Vt n1 (Stx,1n )  (1  n1 )Vt n11 (Stx,1n )  n1vˆtn
Recursive
statistics
Step 4: Obtain Monte Carlo sample of Wt (n ) and
Simulation
compute the next pre-decision state:
Stn1  S M (Stn , xtn ,Wt 1 ( n ))
Step 5: Return to step 1.
© 2008 Warren B. Powell
Slide 57
Outline







A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Blood management
 Managing blood inventories
© 2008 Warren B. Powell
Slide 59
Blood management
 Managing blood inventories over time
Week 0
Week 1
x0
x1
S0
t=0
ˆ
Rˆ1, D
1
S1
ˆ
Rˆ2 , D
2
Week 2
Week 3
x2
S2 S2x
x3
S3 S3x
t=1
t=2
© 2008 Warren B. Powell
ˆ
Rˆ3 , D
3
t=3
Slide 60
St 
R
t
,
Dˆ t
Dt , AB
AB+,0
AB-
Dˆ t , AB
AB+,1
AB+,2
A+
Dˆ t , A
Dˆ t , AB
AB+,3
AB+
Rt ,( AB,0)
AB+,0
Rt ,( AB,1)
AB+,1
Rt ,( AB,2)
AB+,2
A-
Rt ,(O ,0)
O-,0
Rt ,(O ,1)
O-,1
Rt ,(O ,2)
ˆ
Rtx
B+
B-
O-,2
Dˆ t , AB
Dˆ t , AB
Dˆ t , AB
O+
O-
O-,0
O-,1
O-,2
O-,3
Dˆ t , AB
Satisfy a demand
Hold
Rt
Rtx
Rt 1
Rˆt 1, AB 
Rt ,( AB,0)
AB+,0
AB+,0
Rt ,( AB,1)
AB+,1
AB+,1
AB+,1
Rt ,( AB,2)
AB+,2
AB+,2
AB+,2
AB+,3
AB+,3
AB+,0
Rt ,(O ,0)
O-,0
Rt ,(O ,1)
O-,1
O-,0
Rt ,(O ,2)
O-,2
O-,1
O-,1
O-,2
O-,2
O-,3
O-,3
Dˆ t
Rˆt 1,O
O-,0
Rtx
Rt
Rt ,( AB,0)
AB+,0
AB+,0
Rt ,( AB,1)
AB+,1
AB+,1
Rt ,( AB,2)
AB+,2
AB+,2
AB+,3
Rt ,(O ,0)
O-,0
Rt ,(O ,1)
O-,1
O-,0
Rt ,(O ,2)
O-,2
O-,1
O-,2
O-,3
Dˆ t
Rtx
Rt
Rt ,( AB,0)
AB+,0
AB+,0
Rt ,( AB,1)
AB+,1
AB+,1
Rt ,( AB,2)
AB+,2
AB+,2
AB+,3
Rt ,(O ,0)
O-,0
Rt ,(O ,1)
O-,1
O-,0
Rt ,(O ,2)
O-,2
O-,1
O-,2
O-,3
Solve this as a
linear program.
Dˆ t
F ( Rt )
Duals
ˆt ,( AB,0)
ˆt ,( AB,1)
ˆt ,( AB,2)
Rtx
Rt
AB+,0
AB+,0
AB+,1
AB+,1
AB+,2
AB+,2
AB+,3
ˆt ,( AB,0)
O-,0
ˆt ,( AB,1)
O-,1
O-,0
ˆt ,( AB,2)
O-,2
O-,1
O-,2
O-,3
Dual variables give
value additional
unit of blood..
Dˆ t
F ( Rt )
Updating the value function approximation
 Estimate the gradient at Rtn
ˆtn,( AB,2)
F ( Rt )
Rtn,( AB,2)
© 2008 Warren B. Powell
Slide 66
Updating the value function approximation
 Update the value function at Rtx,1n
Vt n11 ( Rtx1 )
ˆtn,( AB,2)
F ( Rt )
Rtx,1n
Rtn,( AB,2)
© 2008 Warren B. Powell
Slide 67
Updating the value function approximation
 Update the value function at Rtx,1n
Vt n11 ( Rtx1 )
ˆtn,( AB,2)
Rtx,1n
© 2008 Warren B. Powell
Slide 68
Updating the value function approximation
 Update the value function at Rtx,1n
Vt n11 ( Rtx1 )
Vt n1 ( Rtx1 )
Rtx,1n
© 2008 Warren B. Powell
Slide 69
Iterative learning
t
© 2008 Warren B. Powell
Slide 70
Iterative learning
© 2008 Warren B. Powell
Slide 71
Iterative learning
© 2008 Warren B. Powell
Slide 72
Iterative learning
© 2008 Warren B. Powell
Slide 73
Approximate dynamic programming
 With luck, the objective function will improve steadily
Objec tive func tion
100
99.5
99
98.5
98
97.5
97
1
11
21
31
41
Ite ra tion
51
61
71
81
Approximate dynamic programming
 … but performance can be jagged.
1900000
1800000
Objective function
1700000
1600000
1500000
1400000
1300000
1200000
0
100
200
300
400
500
Iterations
600
700
800
900
1000
Outline







A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Hierarchical aggregation
 Attribute vectors:
a
 Asset class   Blood type 
Time invested  


  Age 
 Frozen? 
Location




ETA


 Bus. segment 


Single/team




Domicile


Drive
hours


 Days from home 


 Location 
 ETA 


 A/C type 


Fuel
level


 Home shop 


 Crew 
 Eqpt1 




 Eqpt100 


Aggregation
 Estimating value functions
$2050 =
(1  0.10)  $2000

(0.10) $2500
Location




Fleet
 Location 
 Location 




n 
n 1 
)
v (  Fleet  )  (1   )v (  Fleet  )   vˆ( 
Domicile


 Domicile 
 Domicile 
DOThrs


 DaysFromHome 
Value function
Approximation may
have fewer attributes
than driver.
Drivers may have very
detailed attributes
Hierarchical aggregation
 Estimating value functions
» Most disaggregate level
Location




Fleet
 Location 
 Location 




n 
n 1 
)
v (  Fleet  )  (1   )v (  Fleet  )   vˆ( 
Domicile


 Domicile 
 Domicile 
DOThrs


 DaysFromHome 
Hierarchical aggregation
 Estimating value functions
» Middle level of aggregation
Location




Fleet


n  Location 
n 1  Location 
)
v (
)  (1   )v ( 
)   vˆ( 
Domicile




 Fleet 
 Fleet 
DOThrs


 DaysFromHome 
Hierarchical aggregation
 Estimating value functions
» Most aggregate level
Location




Fleet


n
n 1
)
v ( Location )  (1   )v ( Location )   vˆ( 
Domicile


DOThrs


 DaysFromHome 
Hierarchical aggregation
 State-dependent weighted aggregation:
» Now we have a linear regression for each attribute
va   wa( g )va( g )
g
» We cannot solve hundreds of thousands of linear
regressions. Instead use

wa( g )  Var  va( g )    
Estimate of variance

1
2
(g)
a
(g)
w
 a 1
Estimate of bias
g
Hierarchical aggregation
1900000
1850000
1800000
Objective function
1750000
1700000
1650000
Aggregate
1600000
1550000
Disaggregate
1500000
1450000
1400000
0
100
200
300
400
500
Iterations
600
700
800
900
1000
Hierarchical aggregation
1900000
Weighted Combination
1850000
1800000
Objective function
1750000
1700000
1650000
Aggregate
1600000
1550000
Disaggregate
1500000
1450000
1400000
0
100
200
300
400
500
Iterations
600
700
800
900
1000
Outline







A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Stepsizes
 Stepsizes:
» Fundamental to ADP is an updating equation that looks
like:
V (S )  (1  n1 )V
n
t 1
x
t 1
Updated estimate
The stepsize
“Learning rate”
“Smoothing factor”
n1
t 1
n
ˆ
(S )  n1vt
x
t 1
Old estimate
New observation
Stepsizes
 High noise data:
Best stepsize
3
2.5
 n  1/ n
2
1.5
1
0.5
0
1
21
41
61
81
 Low noise, changing signal
1.2
1
n  1
0.8
0.6
0.4
0.2
0
1
21
41
61
81
Stepsizes
 Theorem 1 (general knowledge):
» If data is stationary, then 1/n is the best possible
stepsize.
 Theorem 2 (e.g. Tsitsiklis 1994)
» For general problems, 1/n is provably convergent
 Theorem 3 (Frazier and Powell):
» 1/n works so badly it should (almost) never be used.
Stepsizes
 Lower bound on the value function after n iterations:
10 2
10 4
106
108
1010
1012
Stepsizes
 Bias-adjusted Kalman filter
Estimate of the variance
n  1
2
Noise
1       
n 1
2
n 2
Estimate of the bias
where:
 n  1   n   n 1   n 
2
2
As  2 increases, stepsize decreases toward 1/ n
As  n increases, stepsize increases toward 1.
Bias
Stepsizes
 The bias-adjusted Kalman filter
1.2
Observed values
1
0.8
0.6
BAKF stepsize rule
0.4
0.2
1/n stepsize rule
0
1
21
41
61
81
Stepsizes
 The bias-adjusted Kalman filter
1.6
Observed values
1.4
1.2
1
0.8
0.6
0.4
BAKF stepsize rule
0.2
0
1
21
41
61
81
Stepsizes
 I recommend:
» Start with a constant stepsize
• Vary it, get a sense of what works best
» Next try a deterministic stepsize such as a/(a+n)
• Choose a so that it declines to roughly the best deterministic
stepsize at an iteration comparable to where your constant
stepsize seems to stabilize
– Might be 50 iterations
– Might be 5000
» Try an (optimal) adaptive stepsize rule
• Can work very well if there is not too much noise
• Adaptive rules work well when there is a need to keep the
stepsize from declining too quickly (but you do not know how
quickly)
Outline







A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Schneider National
© 2008 Warren B. Powell
Slide 95
© 2008 Warren B. Powell
Slide 96
Schneider National
1400
1200
Historical maximum
800
Simulation
600
Historical minimum
Utilization
400
1200
200
1000
0
US_SOLO
US_IC
US_TEAM
800
Capacity category
Revenue per WU
Utilization
Revenue per WU
Calibrated
Historical min
model
and max
1000
Historical maximum
600
Simulation
Historical minimum
400
200
0
US_SOLO
US_IC
US_TEAM
Capacity category
© 2008 Warren B. Powell
Slide 97
Princeton team:
Warren Powell
Belgacem Bouzaiene-Ayari
NS team:
Clark Cheng
Ricardo Fiorillo
Junxia Chang
Sourav Das
© 2008 Warren B. Powell
Slide 99
© 2008 Warren B. Powell
Slide 100
© 2008 Warren B. Powell
Slide 101
© 2008 Warren B. Powell
Slide 102
Convergence
 Train coverage
Model calibration
Delay hours/day
 Train delay curve – October 2007
Fleet size
Model calibration
Delay hours/day
 Train delay curve – March 2008
Fleet size
Outline







A resource allocation model
An introduction to ADP
ADP and the post-decision state variable
A blood management example
Hierarchical aggregation
Stepsizes
Some applications
»
»
Transportation
Energy
Energy resource modeling
© 2008 Warren B. Powell
Slide 107
Energy resource modeling
 Hourly model
» Decisions at time t impact t+1 through the amount of water held in
the reservoir.
Hour t
Hour t+1
© 2008 Warren B. Powell
Slide 108
Energy resource modeling
 Hourly model
» Decisions at time t impact t+1 through the amount of water held in
the reservoir.
Hour t
Value of holding water in the reservoir
for future time periods.
© 2008 Warren B. Powell
Slide 109
Energy resource modeling
© 2008 Warren B. Powell
Slide 110
Energy resource modeling
Hour
2008
1
2
3
4
© 2008 Warren B. Powell
8760
2009
1
2
Slide 111
Energy resource modeling
Hour
2008
1
2
3
4
© 2008 Warren B. Powell
8760
2009
1
2
Slide 112
Annual energy model
 Optimal from linear program
9000
8000
Reservoir level
7000
6000
5000
4000
3000
2000
Demand
1000
0
0
100
200
300
400
500
Time period
600
700
800
Annual energy model
 Approximate dynamic programming
9000
8000
Reservoir level
7000
6000
5000
4000
3000
2000
Demand
1000
0
0
100
200
300
400
500
Time period
600
700
800
Annual energy model
 Optimal from linear program
9000
8000
Reservoir level
7000
6000
5000
4000
3000
2000
Demand
1000
0
0
100
200
300
400
500
Time period
600
700
800
Annual energy model
 Approximate dynamic programming
9000
8000
Reservoir level
7000
6000
5000
4000
3000
2000
Demand
1000
0
0
100
200
300
400
500
Time period
600
700
800
Annual energy model
 ADP vs optimal reservoir levels for stochastic rainfall
9000
Reservoir level
8000
ADP at last iteration
Optimal for individual
scenarios
7000
6000
5000
4000
3000
2000
1000
0
0
100
200
300
400
500
600
700
800
Time period
© 2008 Warren B. Powell
Slide 117
© 2008 Warren B. Powell
Slide 118
The post-decision state
 What happens if use Bellman’s
equation?
» State variable is:
• The supply of each type of blood
– 8 blood types
– 6 ages
• The demand for each type of blood
– 8 blood types
• 56 dimensional state vector
» Decision variable is how much of 8 blood
types to supply to 8 demand types.
• 162- dimensional decision vector
» Random information
• Blood donations by week (8 types)
• New demands for blood (8 types)
• 16-dimensional information vector
© 2008 Warren B. Powell
Slide 119
Blood management
 Managing blood inventories over time
Week 0
Week 1
Week 2
Week 3
x0
S0 S0x
x1
S1 S1x
x2
S2 S2x
x3
S3 S3x
t=0
ˆ
Rˆ1, D
1
ˆ
Rˆ2 , D
2
t=1
t=2
© 2008 Warren B. Powell
ˆ
Rˆ3 , D
3
t=3
Slide 120
Approximate dynamic programming
Optimization
Optimization
Strengths
Produces optimal decisions.
Mature technology
Weaknesses
Cannot handle uncertainty.
Cannot handle high levels
of complexity
Simulation
Simulation
Strengths
Extremely flexible
High level of detail
Easily handles uncertainty
Weaknesses
Models decisions using
user-specified rules.
Low solution quality.
Cplex
Approximate dynamic programming combines simulation and
optimization in a rigorous yet flexible framework.
Approximate dynamic programming
Step 1: Start with a pre-decision state Stn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
n
n 1
M ,x
n
vˆt  max x  Ct ( St , xt )  Vt
(S
( St , xt )) 
to obtain xtn.
Step 3: Update the value function approximation
Vt n1 (Stx,1n )  (1  n1 )Vt n11 (Stx,1n )  n1vˆtn
Recursive
statistics
Step 4: Obtain Monte Carlo sample of Wt (n ) and
Simulation
compute the next pre-decision state:
Stn1  S M (Stn , xtn ,Wt 1 ( n ))
Step 5: Return to step 1.
© 2008 Warren B. Powell
Slide 122
Approximate dynamic programming
Step 1: Start with a pre-decision state Stn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
n
n 1
M ,x
n
vˆt  max x  Ct ( St , xt )  Vt
(S
( St , xt )) 
to obtain xtn.
Step 3: Update the value function approximation
Vt n1 (Stx,1n )  (1  n1 )Vt n11 (Stx,1n )  n1vˆtn
Recursive
statistics
Step 4: Obtain Monte Carlo sample of Wt (n ) and
Simulation
compute the next pre-decision state:
Stn1  S M (Stn , xtn ,Wt 1 ( n ))
Step 5: Return to step 1.
© 2008 Warren B. Powell
Slide 123

Approximate Dynamic Programming: Solving the curses of dimensionality University of Manchester February, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2008 Warren B.

Transcript Approximate Dynamic Programming: Solving the curses of dimensionality University of Manchester February, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2008 Warren B.

Directory