Approximate Dynamic Programming: Solving the curses of dimensionality University of Manchester February, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2008 Warren B.
Download ReportTranscript Approximate Dynamic Programming: Solving the curses of dimensionality University of Manchester February, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2008 Warren B.
Approximate Dynamic Programming: Solving the curses of dimensionality University of Manchester February, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2008 Warren B. Powell, Princeton University © 2008 Warren B. Powell Slide 1 © 2008 Warren B. Powell Slide 2 © 2008 Warren B. Powell Slide 3 The fractional jet ownership industry © 2008 Warren B. Powell Slide 4 NetJets Inc. © 2008 Warren B. Powell Slide 5 Schneider National © 2008 Warren B. Powell Slide 6 Schneider National © 2008 Warren B. Powell Slide 7 Planning for a risky world Weather •Robust design of emergency response networks. •Design of financial instruments to hedge against weather emergencies to protect individuals, companies and municipalities. •Design of sensor networks and communication systems to manage responses to major weather events. Disease •Models of disease propagation for response planning. •Management of medical personnel, equipment and vaccines to respond to a disease outbreak. •Robust design of supply chains to mitigate the disruption of transportation systems. © 2008 Warren B. Powell Slide 8 Energy management Energy resource allocation • What is the right mix of energy technologies? • How should the use of different energy resources be coordinated over space and time? • What should my energy R&D portfolio look like? • Should I invest in nuclear energy? • What is the impact of a carbon tax? Energy markets • How should I hedge energy commodities? • How do I price energy assets? • What is the right price for energy futures? © 2008 Warren B. Powell Slide 9 High value spare parts Electric Power Grid •PJM oversees an aging investment in highvoltage transformers. •Replacement strategy needs to anticipate a bulge in retirements and failures •1-2 year lag times on orders. Typical cost of a replacement ~ $5 million. •Failures vary widely in terms of economic impact on the network. Spare parts for business jets •ADP is used to determine purchasing and allocation strategies for over 400 high value spare parts. •Inventory strategy has to determine what to buy, when and where to store it. Many parts are very low volume (e.g. 7 spares spread across 15 service centers). •Inventories have to meet global targets on level of service and inventory costs. © 2008 Warren B. Powell Slide 10 Challenges Real-time control » Scheduling aircraft, pilots, generators, tankers » Pricing stocks, options » Electricity resource allocation Near-term tactical planning » Can I accept a customer request? » Should I lease equipment? » How much energy can I commit with my windmills? Strategic planning » What is the right equipment mix? » What is the value of this contract? » What is the value of more reliable aircraft? © 2008 Warren B. Powell Slide 11 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes Some applications » » Transportation Energy Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes Some applications » » Transportation Energy A resource allocation model Attribute vectors: a Asset class Time invested Type Location Age Location ETA Home Experience Driving hours © 2008 Warren B. Powell Location ETA A/C type Fuel level Home shop Crew Eqpt1 Eqpt100 Slide 14 A resource allocation model Modeling resources: » The attributes of a single resource: a The attributes of a single resource a A The attribute space » The resource state vector: Rta The number of resources with attribute a Rt Rta aA The resource state vector » The information process: Rˆta The change in the number of resources with attribute a. © 2008 Warren B. Powell Slide 15 A resource allocation model Modeling demands: » The attributes of a single demand: b The attributes of a demand to be served. b B The attribute space » The demand state vector: Dtb The number of demands with attribute b Dt Dtb bB The demand state vector » The information process: Dˆ tb The change in the number of demands with attribute b. © 2008 Warren B. Powell Slide 16 Energy resource modeling The system state: St Rt , Dt , t System state, where: Rt Resource state (how much capacity, reserves) Dt Market demands t "system parameters" State of the technology (costs, performance) Climate, weather (temperature, rainfall, wind) Government policies (tax rebates on solar panels) Market prices (oil, coal) © 2008 Warren B. Powell Slide 17 Energy resource modeling The decision variable: New capacity Retired capacity for each: xt Type Location Technology © 2008 Warren B. Powell Slide 18 Energy resource modeling Exogenous information: Wt New information = Rˆt , Dˆ t , ˆt Rˆt Exogenous changes in capacity, reserves Dˆ t New demands for energy from each source ˆt Exogenous changes in parameters. © 2008 Warren B. Powell Slide 19 Energy resource modeling The transition function St 1 S (St , xt ,Wt 1 ) M © 2008 Warren B. Powell Slide 20 A resource allocation model Resources Demands © 2008 Warren B. Powell Slide 21 A resource allocation model t t+1 © 2008 Warren B. Powell t+2 Slide 22 A resource allocation model Optimizing over time t t+1 t+2 Optimizing at a point in time © 2008 Warren B. Powell Slide 23 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes Some applications » » Transportation Energy Introduction to ADP We just solved Bellman’s equation: Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St xX » We found the value of being in each state by stepping backward through the tree. © 2008 Warren B. Powell Slide 25 Introduction to ADP The challenge of dynamic programming: Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St xX Problem: Curse of dimensionality © 2008 Warren B. Powell Slide 26 Introduction to ADP The challenge of dynamic programming: Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St xX Three curses Problem: Curse of dimensionality State space Outcome space Action space (feasible region) © 2008 Warren B. Powell Slide 27 Introduction to ADP The computational challenge: Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St xX How do we find Vt 1 (St 1 )? How do we compute the expectation? How do we find the optimal solution? © 2008 Warren B. Powell Slide 28 Introduction to ADP Classical ADP » Most applications of ADP focus on the challenge of handling multidimensional state variables » Start with Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St xX » Now replace the value function with some sort of approximation Vt (St ) Vt (St ) Introduction to ADP Approximating the value function: » We have to exploit the structure of the value function (e.g. concavity). » We might approximate the value function using a simple polynomial Vt (St | ) 0 1St 2 St2 » .. or a complicated one: Vt (St | ) 0 1St 2 St2 3 ln St 4 sin(St ) » Sometimes, they get really messy: Vt ( R | ) t 2 (0) s t 't t 3 (1) t , st ' Rt , st ' t(1) , wt ' Rt , wt ' 1,2,3 4,5,6,7 w t 't 2 (2) 2 t(2) R , wt ' t , wt ' t , st ' Rt , st ' w t' (3) ts s s 8-14 t' 1 R R t , st t , s ' t S s' 2 15 t 1 t 1 1 (4) ts Rt , st ' R t , s 't ' 2 S s t ' t s ' t ' t t 2 t 2 1 (5) ts Rt , st ' R t ,s 't ' s 3S s ' t ' t t 't ,1) t(,ws ws Rt , wt Rt , st w 2 16 2 17 18 s ( ws ,2) t , ws t 2 Rt , wt Rt , st ' t 't 19 ( ws ,3) t , ws t 3 t 2 R R t , wt ' t , st ' t 't t 't 20 ( ws ,4) t , ws t 2 t 2 R R t , wt ' t , st ' t 't t 't 21 w w s s w s ,5) t(,ws sw Rt , s ,t 2 Rt , wt Rt , w ,t 2 w s 22 Introduction to ADP We can write a model of the observed value of being in a state as: vˆ 0 1St 2 St2 3 ln St 4 sin(St ) This is often written as a generic regression model: Y 0 1 X1 2 X 2 3 X 3 4 X 4 The ADP community refers to the independent variables as basis functions: Y 00 (S ) 11 (S ) 22 (S ) 33 (S ) 44 (S ) = f f (S ) f F f ( R) are also known as features. Introduction to ADP Methods for estimating » Generate observations vˆ1 , vˆ2 ,..., vˆ N , and use traditional regression methods to fit . » Stochastic gradient for updating n : n n1 n1 V n1 ( S n | n 1 ) vˆn V n 1 ( S n | n 1 ) 1 ( S ) ( S ) n 1 n 1 V n 1 ( S n | n 1 ) vˆn 2 ( S ) F Error Basis functions Introduction to ADP Methods for estimating » Recursive statistics • Iterative equations avoid matrix inverse: n n 1 H n x n n H n 1 n B B n B n 1 1 x n B is F F matrix= XX n 1 T n B n 1 x n 1 n T n Error B x x n 1 n n T B n 1 1 Introduction to ADP Other statistical methods » Regression trees • Combines regression with techniques for discrete variables. » Data mining • Good for categorical data » Neural networks • Engineers like this for low-dimensional continuous problems » Kernel/locally polynomial regression • Approximations portions of the value function locally using simple functions » Dirichlet mixture models • Aggregate portions of the function and fit approximations around these aggregations. Need Introduction to ADP What you will struggle with: » Stepsizes • Can’t live with ‘em, can’t live without ‘em. • Too small, you think you have converged but you have really just stalled (“apparent convergence”) • Too large, and the system is unstable. » Stability • There are two sources of randomness: – The traditional exogenous randomness – An evolving policy » Exploration vs. exploitation • You sometimes have to choose to visit a state just to collect information about the value of being in a state. Introduction to ADP But we are not out of the woods… » Assume we have an approximate value function. » We still have to solve a problem that looks like Vt ( St ) max Ct ( St , xt ) E f f ( St 1 ) xX f F » This means we still have to deal with a maximization problem (might be a linear, nonlinear or integer program) with an expectation. Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes Some applications » » Transportation Energy Information State Action Information Action State Rain .2 -$2000 Clouds .3 $1000 Sun .5 $5000 Rain .2 -$200 Clouds .3 -$200 Sun .5 -$200 Rain .8 -$2000 Clouds .2 $1000 Sun .0 $5000 Rain .8 -$200 Clouds .2 -$200 Sun .0 -$200 Rain .1 -$2000 Clouds .5 $1000 Sun .4 $5000 Rain .1 -$200 Clouds .5 -$200 Sun .4 -$200 Rain .1 -$2000 Clouds .2 $1000 Sun .7 $5000 Rain .1 -$200 Clouds .2 -$200 Sun .7 -$200 - Decision nodes - Outcome nodes -$1400 -$200 $2300 -$200 $3500 -$200 $2400 -$200 -$200 $2300 $3500 $2400 -$200 $2770 $2400 St 1 e gam e l u d Sche Canc el gam e St rain t s ca o re F For rep .3 ame g e l du Sche Cancel game su nn he r dy ast 6 y. eat lo u rec Use w st c Fo or t ec a .1 se rt t u po n o e r re Do eath w e e g am l u d Sche Cancel game ame g e l du Sche Cancel gamte xX Rain .2 -$2000 Clouds .3 $1000 Sun .5 $5000 Rain .2 -$200 t t 1 t 1 Clouds .3 -$200 Sun .5 -$200 Rain .8 -$2000 Clouds .2 $1000 Sun .0 $5000 Rain .8 -$200 Clouds .2 -$200 Sun .0 -$200 Rain .1 -$2000 Clouds .5 $1000 Sun .4 $5000 Rain .1 -$200 Clouds .5 -$200 Sun .4 -$200 Rain .1 -$2000 Clouds .2 $1000 Sun .7 $5000 Rain .1 -$200 Clouds .2 -$200 Sun .7 -$200 Vt ( St ) max C ( St , x ) E V ( S ) | St -Decision nodes - Outcome nodes rain t s ca o re F For ep o er r .3 ame g e l du Sche Cancel game su nn a th dy as t 6 y. we lo u .1 rec Use st c Fo rt ec a e gam e l u d Sche Canc el gam e se rt t u po no er re Do eath w e e g am l u d Sche Cancel game ame g e l du Sche Cancel game Rain .2 -$2000 Clouds .3 $1000 Sun .5 $5000 Rain .2 -$200 Clouds .3 -$200 Sun .5 -$200 Rain .8 -$2000 Clouds .2 $1000 Sun .0 $5000 Rain .8 -$200 Clouds .2 -$200 Sun .0 -$200 Rain .1 -$2000 Clouds .5 $1000 Sun .4 $5000 Rain .1 -$200 Clouds .5 -$200 Sun .4 -$200 Rain .1 -$2000 Clouds .2 $1000 Sun .7 $5000 Rain .1 -$200 Clouds .2 -$200 Sun .7 -$200 - Decision nodes - Outcome nodes The post-decision state New concept: » The “pre-decision” state variable: • St The information required to make a decision xt • Same as a “decision node” in a decision tree. » The “post-decision” state variable: x • St The state of what we know immediately after we make a decision. • Same as an “outcome node” in a decision tree. © 2008 Warren B. Powell Slide 45 The post-decision state An inventory problem: » Our basic inventory equation: ^ t + 1g Rt + 1 = maxf 0; Rt + x t ¡ D where Rt = Invent ory at t ime t. xt = Amount we ordered. ^t+ 1 D = Demand in next t ime period. » Using pre- and post-decision states: Rtx = Rt + 1 = Rt + x t Post -decision st at e ^ t + 1g maxf 0; Rtx ¡ D Pre-decision st at e © 2008 Warren B. Powell Slide 46 The post-decision state Pre-decision, state-action, and post-decision State Pre-decision state Action 9 3 states Post-decision state 3 9 state-action pairs 9 © 2008 Warren B. Powell 39 states Slide 47 The post-decision state Pre-decision: resources and demands St (Rt , Dt ) © 2008 Warren B. Powell Slide 48 The post-decision state Stx S M , x (St , xt ) © 2008 Warren B. Powell Slide 49 The post-decision state Stx St 1 S M ,W (Stx ,Wt 1 ) ˆ ) Wt 1 (Rˆt 1, D t 1 © 2008 Warren B. Powell Slide 50 The post-decision state St 1 © 2008 Warren B. Powell Slide 51 The post-decision state Classical form of Bellman’s equation: Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St xX Bellman’s equations around pre- and post-decision states: » Optimization problem (making the decision): Vt ( St ) max x Ct ( St , xt ) Vt x StM , x ( St , xt ) • Note: this problem is deterministic! » Simulation problem (the effect of exogenous information): Vt x ( Stx ) E Vt 1 ( S M ,W ( Stx ,Wt 1 )) | Stx © 2008 Warren B. Powell Slide 52 The post-decision state Challenges » For most practical problems, we are not going to be able to compute Vt x (Stx ) . Vt ( St ) max x Ct ( St , xt ) Vt x ( Stx ) » Concept: replace it with an approximation Vt (Stx ) and solve Vt ( St ) max x Ct ( St , xt ) Vt ( Stx ) » So now we face: • What should the approximation look like? • How do we estimate it? © 2008 Warren B. Powell Slide 53 The post-decision state Value function approximations: » Linear (in the resource state): Vt ( Rtx ) vta Rtax aA » Piecewise linear, separable: Vt ( Rtx ) Vta ( Rtax ) aA » Indexed PWL separable: Vt ( Rtx ) Vta Rtax | ( featurest ) aA © 2008 Warren B. Powell Slide 54 The post-decision state Value function approximations: » Ridge regression (Klabjan and Adelman) Vt ( Rtx ) V R f F tf Rtf tf aA f fa Rta » Benders cuts Vt ( Rt ) x1 x0 © 2008 Warren B. Powell Slide 55 The post-decision state Comparison to other methods: » Classical MDP (value iteration) V n ( S ) max x C ( S , x) EV n 1 ( St 1 ) » Classical ADP (pre-decision state): vˆtn max x Ct ( Stn , xt ) p ( s ' | Stn , xt )Vt n11 s ' Expectation s' vˆt updates Vt (St ) Vt n ( Stn ) (1 n 1 )Vt n 1 ( Stn ) n 1vˆtn » Our method (update Vt x ,n1around post-decision state): vˆtn max x Ct ( Stn , xt ) Vt x ,n 1 ( S M , x ( Stn , xt )) Vt n1 ( Stx,1n ) (1 n 1 )Vt n11 ( Stx,1n ) n 1vˆtn © 2008 Warren B. Powell vˆt updates Vt 1 (Stx1 ) Slide 56 The post-decision state Step 1: Start with a pre-decision state Stn Step 2: Solve the deterministic optimization using Deterministic an approximate value function: optimization n n n 1 M ,x n vˆt max x Ct ( St , xt ) Vt (S ( St , xt )) to obtain xtn. Step 3: Update the value function approximation Vt n1 (Stx,1n ) (1 n1 )Vt n11 (Stx,1n ) n1vˆtn Recursive statistics Step 4: Obtain Monte Carlo sample of Wt (n ) and Simulation compute the next pre-decision state: Stn1 S M (Stn , xtn ,Wt 1 ( n )) Step 5: Return to step 1. © 2008 Warren B. Powell Slide 57 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes Some applications » » Transportation Energy Blood management Managing blood inventories © 2008 Warren B. Powell Slide 59 Blood management Managing blood inventories over time Week 0 Week 1 x0 x1 S0 t=0 ˆ Rˆ1, D 1 S1 ˆ Rˆ2 , D 2 Week 2 Week 3 x2 S2 S2x x3 S3 S3x t=1 t=2 © 2008 Warren B. Powell ˆ Rˆ3 , D 3 t=3 Slide 60 St R t , Dˆ t Dt , AB AB+,0 AB- Dˆ t , AB AB+,1 AB+,2 A+ Dˆ t , A Dˆ t , AB AB+,3 AB+ Rt ,( AB,0) AB+,0 Rt ,( AB,1) AB+,1 Rt ,( AB,2) AB+,2 A- Rt ,(O ,0) O-,0 Rt ,(O ,1) O-,1 Rt ,(O ,2) ˆ Rtx B+ B- O-,2 Dˆ t , AB Dˆ t , AB Dˆ t , AB O+ O- O-,0 O-,1 O-,2 O-,3 Dˆ t , AB Satisfy a demand Hold Rt Rtx Rt 1 Rˆt 1, AB Rt ,( AB,0) AB+,0 AB+,0 Rt ,( AB,1) AB+,1 AB+,1 AB+,1 Rt ,( AB,2) AB+,2 AB+,2 AB+,2 AB+,3 AB+,3 AB+,0 Rt ,(O ,0) O-,0 Rt ,(O ,1) O-,1 O-,0 Rt ,(O ,2) O-,2 O-,1 O-,1 O-,2 O-,2 O-,3 O-,3 Dˆ t Rˆt 1,O O-,0 Rtx Rt Rt ,( AB,0) AB+,0 AB+,0 Rt ,( AB,1) AB+,1 AB+,1 Rt ,( AB,2) AB+,2 AB+,2 AB+,3 Rt ,(O ,0) O-,0 Rt ,(O ,1) O-,1 O-,0 Rt ,(O ,2) O-,2 O-,1 O-,2 O-,3 Dˆ t Rtx Rt Rt ,( AB,0) AB+,0 AB+,0 Rt ,( AB,1) AB+,1 AB+,1 Rt ,( AB,2) AB+,2 AB+,2 AB+,3 Rt ,(O ,0) O-,0 Rt ,(O ,1) O-,1 O-,0 Rt ,(O ,2) O-,2 O-,1 O-,2 O-,3 Solve this as a linear program. Dˆ t F ( Rt ) Duals ˆt ,( AB,0) ˆt ,( AB,1) ˆt ,( AB,2) Rtx Rt AB+,0 AB+,0 AB+,1 AB+,1 AB+,2 AB+,2 AB+,3 ˆt ,( AB,0) O-,0 ˆt ,( AB,1) O-,1 O-,0 ˆt ,( AB,2) O-,2 O-,1 O-,2 O-,3 Dual variables give value additional unit of blood.. Dˆ t F ( Rt ) Updating the value function approximation Estimate the gradient at Rtn ˆtn,( AB,2) F ( Rt ) Rtn,( AB,2) © 2008 Warren B. Powell Slide 66 Updating the value function approximation Update the value function at Rtx,1n Vt n11 ( Rtx1 ) ˆtn,( AB,2) F ( Rt ) Rtx,1n Rtn,( AB,2) © 2008 Warren B. Powell Slide 67 Updating the value function approximation Update the value function at Rtx,1n Vt n11 ( Rtx1 ) ˆtn,( AB,2) Rtx,1n © 2008 Warren B. Powell Slide 68 Updating the value function approximation Update the value function at Rtx,1n Vt n11 ( Rtx1 ) Vt n1 ( Rtx1 ) Rtx,1n © 2008 Warren B. Powell Slide 69 Iterative learning t © 2008 Warren B. Powell Slide 70 Iterative learning © 2008 Warren B. Powell Slide 71 Iterative learning © 2008 Warren B. Powell Slide 72 Iterative learning © 2008 Warren B. Powell Slide 73 Approximate dynamic programming With luck, the objective function will improve steadily Objec tive func tion 100 99.5 99 98.5 98 97.5 97 1 11 21 31 41 Ite ra tion 51 61 71 81 Approximate dynamic programming … but performance can be jagged. 1900000 1800000 Objective function 1700000 1600000 1500000 1400000 1300000 1200000 0 100 200 300 400 500 Iterations 600 700 800 900 1000 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes Some applications » » Transportation Energy Hierarchical aggregation Attribute vectors: a Asset class Blood type Time invested Age Frozen? Location ETA Bus. segment Single/team Domicile Drive hours Days from home Location ETA A/C type Fuel level Home shop Crew Eqpt1 Eqpt100 Aggregation Estimating value functions $2050 = (1 0.10) $2000 (0.10) $2500 Location Fleet Location Location n n 1 ) v ( Fleet ) (1 )v ( Fleet ) vˆ( Domicile Domicile Domicile DOThrs DaysFromHome Value function Approximation may have fewer attributes than driver. Drivers may have very detailed attributes Hierarchical aggregation Estimating value functions » Most disaggregate level Location Fleet Location Location n n 1 ) v ( Fleet ) (1 )v ( Fleet ) vˆ( Domicile Domicile Domicile DOThrs DaysFromHome Hierarchical aggregation Estimating value functions » Middle level of aggregation Location Fleet n Location n 1 Location ) v ( ) (1 )v ( ) vˆ( Domicile Fleet Fleet DOThrs DaysFromHome Hierarchical aggregation Estimating value functions » Most aggregate level Location Fleet n n 1 ) v ( Location ) (1 )v ( Location ) vˆ( Domicile DOThrs DaysFromHome Hierarchical aggregation State-dependent weighted aggregation: » Now we have a linear regression for each attribute va wa( g )va( g ) g » We cannot solve hundreds of thousands of linear regressions. Instead use wa( g ) Var va( g ) Estimate of variance 1 2 (g) a (g) w a 1 Estimate of bias g Hierarchical aggregation 1900000 1850000 1800000 Objective function 1750000 1700000 1650000 Aggregate 1600000 1550000 Disaggregate 1500000 1450000 1400000 0 100 200 300 400 500 Iterations 600 700 800 900 1000 Hierarchical aggregation 1900000 Weighted Combination 1850000 1800000 Objective function 1750000 1700000 1650000 Aggregate 1600000 1550000 Disaggregate 1500000 1450000 1400000 0 100 200 300 400 500 Iterations 600 700 800 900 1000 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes Some applications » » Transportation Energy Stepsizes Stepsizes: » Fundamental to ADP is an updating equation that looks like: V (S ) (1 n1 )V n t 1 x t 1 Updated estimate The stepsize “Learning rate” “Smoothing factor” n1 t 1 n ˆ (S ) n1vt x t 1 Old estimate New observation Stepsizes High noise data: Best stepsize 3 2.5 n 1/ n 2 1.5 1 0.5 0 1 21 41 61 81 Low noise, changing signal 1.2 1 n 1 0.8 0.6 0.4 0.2 0 1 21 41 61 81 Stepsizes Theorem 1 (general knowledge): » If data is stationary, then 1/n is the best possible stepsize. Theorem 2 (e.g. Tsitsiklis 1994) » For general problems, 1/n is provably convergent Theorem 3 (Frazier and Powell): » 1/n works so badly it should (almost) never be used. Stepsizes Lower bound on the value function after n iterations: 10 2 10 4 106 108 1010 1012 Stepsizes Bias-adjusted Kalman filter Estimate of the variance n 1 2 Noise 1 n 1 2 n 2 Estimate of the bias where: n 1 n n 1 n 2 2 As 2 increases, stepsize decreases toward 1/ n As n increases, stepsize increases toward 1. Bias Stepsizes The bias-adjusted Kalman filter 1.2 Observed values 1 0.8 0.6 BAKF stepsize rule 0.4 0.2 1/n stepsize rule 0 1 21 41 61 81 Stepsizes The bias-adjusted Kalman filter 1.6 Observed values 1.4 1.2 1 0.8 0.6 0.4 BAKF stepsize rule 0.2 0 1 21 41 61 81 Stepsizes I recommend: » Start with a constant stepsize • Vary it, get a sense of what works best » Next try a deterministic stepsize such as a/(a+n) • Choose a so that it declines to roughly the best deterministic stepsize at an iteration comparable to where your constant stepsize seems to stabilize – Might be 50 iterations – Might be 5000 » Try an (optimal) adaptive stepsize rule • Can work very well if there is not too much noise • Adaptive rules work well when there is a need to keep the stepsize from declining too quickly (but you do not know how quickly) Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes Some applications » » Transportation Energy Schneider National © 2008 Warren B. Powell Slide 95 © 2008 Warren B. Powell Slide 96 Schneider National 1400 1200 Historical maximum 800 Simulation 600 Historical minimum Utilization 400 1200 200 1000 0 US_SOLO US_IC US_TEAM 800 Capacity category Revenue per WU Utilization Revenue per WU Calibrated Historical min model and max 1000 Historical maximum 600 Simulation Historical minimum 400 200 0 US_SOLO US_IC US_TEAM Capacity category © 2008 Warren B. Powell Slide 97 Princeton team: Warren Powell Belgacem Bouzaiene-Ayari NS team: Clark Cheng Ricardo Fiorillo Junxia Chang Sourav Das © 2008 Warren B. Powell Slide 99 © 2008 Warren B. Powell Slide 100 © 2008 Warren B. Powell Slide 101 © 2008 Warren B. Powell Slide 102 Convergence Train coverage Model calibration Delay hours/day Train delay curve – October 2007 Fleet size Model calibration Delay hours/day Train delay curve – March 2008 Fleet size Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes Some applications » » Transportation Energy Energy resource modeling © 2008 Warren B. Powell Slide 107 Energy resource modeling Hourly model » Decisions at time t impact t+1 through the amount of water held in the reservoir. Hour t Hour t+1 © 2008 Warren B. Powell Slide 108 Energy resource modeling Hourly model » Decisions at time t impact t+1 through the amount of water held in the reservoir. Hour t Value of holding water in the reservoir for future time periods. © 2008 Warren B. Powell Slide 109 Energy resource modeling © 2008 Warren B. Powell Slide 110 Energy resource modeling Hour 2008 1 2 3 4 © 2008 Warren B. Powell 8760 2009 1 2 Slide 111 Energy resource modeling Hour 2008 1 2 3 4 © 2008 Warren B. Powell 8760 2009 1 2 Slide 112 Annual energy model Optimal from linear program 9000 8000 Reservoir level 7000 6000 5000 4000 3000 2000 Demand 1000 0 0 100 200 300 400 500 Time period 600 700 800 Annual energy model Approximate dynamic programming 9000 8000 Reservoir level 7000 6000 5000 4000 3000 2000 Demand 1000 0 0 100 200 300 400 500 Time period 600 700 800 Annual energy model Optimal from linear program 9000 8000 Reservoir level 7000 6000 5000 4000 3000 2000 Demand 1000 0 0 100 200 300 400 500 Time period 600 700 800 Annual energy model Approximate dynamic programming 9000 8000 Reservoir level 7000 6000 5000 4000 3000 2000 Demand 1000 0 0 100 200 300 400 500 Time period 600 700 800 Annual energy model ADP vs optimal reservoir levels for stochastic rainfall 9000 Reservoir level 8000 ADP at last iteration Optimal for individual scenarios 7000 6000 5000 4000 3000 2000 1000 0 0 100 200 300 400 500 600 700 800 Time period © 2008 Warren B. Powell Slide 117 © 2008 Warren B. Powell Slide 118 The post-decision state What happens if use Bellman’s equation? » State variable is: • The supply of each type of blood – 8 blood types – 6 ages • The demand for each type of blood – 8 blood types • 56 dimensional state vector » Decision variable is how much of 8 blood types to supply to 8 demand types. • 162- dimensional decision vector » Random information • Blood donations by week (8 types) • New demands for blood (8 types) • 16-dimensional information vector © 2008 Warren B. Powell Slide 119 Blood management Managing blood inventories over time Week 0 Week 1 Week 2 Week 3 x0 S0 S0x x1 S1 S1x x2 S2 S2x x3 S3 S3x t=0 ˆ Rˆ1, D 1 ˆ Rˆ2 , D 2 t=1 t=2 © 2008 Warren B. Powell ˆ Rˆ3 , D 3 t=3 Slide 120 Approximate dynamic programming Optimization Optimization Strengths Produces optimal decisions. Mature technology Weaknesses Cannot handle uncertainty. Cannot handle high levels of complexity Simulation Simulation Strengths Extremely flexible High level of detail Easily handles uncertainty Weaknesses Models decisions using user-specified rules. Low solution quality. Cplex Approximate dynamic programming combines simulation and optimization in a rigorous yet flexible framework. Approximate dynamic programming Step 1: Start with a pre-decision state Stn Step 2: Solve the deterministic optimization using Deterministic an approximate value function: optimization n n n 1 M ,x n vˆt max x Ct ( St , xt ) Vt (S ( St , xt )) to obtain xtn. Step 3: Update the value function approximation Vt n1 (Stx,1n ) (1 n1 )Vt n11 (Stx,1n ) n1vˆtn Recursive statistics Step 4: Obtain Monte Carlo sample of Wt (n ) and Simulation compute the next pre-decision state: Stn1 S M (Stn , xtn ,Wt 1 ( n )) Step 5: Return to step 1. © 2008 Warren B. Powell Slide 122 Approximate dynamic programming Step 1: Start with a pre-decision state Stn Step 2: Solve the deterministic optimization using Deterministic an approximate value function: optimization n n n 1 M ,x n vˆt max x Ct ( St , xt ) Vt (S ( St , xt )) to obtain xtn. Step 3: Update the value function approximation Vt n1 (Stx,1n ) (1 n1 )Vt n11 (Stx,1n ) n1vˆtn Recursive statistics Step 4: Obtain Monte Carlo sample of Wt (n ) and Simulation compute the next pre-decision state: Stn1 S M (Stn , xtn ,Wt 1 ( n )) Step 5: Return to step 1. © 2008 Warren B. Powell Slide 123