Approximate Dynamic Programming for High-Dimensional Problems in Energy Modeling Ohio St. University October 7, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2009 Warren B.
Download ReportTranscript Approximate Dynamic Programming for High-Dimensional Problems in Energy Modeling Ohio St. University October 7, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2009 Warren B.
Approximate Dynamic Programming for High-Dimensional Problems in Energy Modeling Ohio St. University October 7, 2009 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2009 Warren B. Powell, Princeton University © 2008 Warren B. Powell Slide 1 Goals for an energy policy model Potential questions » Policy questions • How do we design policies to achieve energy goals (e.g. 20% renewables by 2015) with a given probability? • How does the imposition of a carbon tax change the likelihood of meeting this goal? • What might happen if ethanol subsidies are reduced or eliminated? • What is the impact of a breakthrough in batteries? » Energy economics • What is the best mix of energy generation technologies? • How is the economic value of wind affected by the presence of storage? • What is the best mix of storage technologies? • How would climate change impact our ability to use hydroelectric reservoirs as a regulating source of power? Slide 2 Goals for an energy policy model Designing energy supply and storage portfolios to work with wind: » The marginal value of wind and solar farms depends on the ability to work with intermittent supply. » The impact of intermittent supply will be mitigated by the use of storage. » Different storage technologies (batteries, flywheels, compressed air, pumped hydro) are each designed to serve different types of variations in supply and demand. » The need for storage (and the value of wind and solar) depends on the entire portfolio of energy producing technologies. Slide 3 Intermittent energy sources Wind speed Solar energy Slide 4 Wind 30 days 1 year Slide 5 Storage Hydroelectric Batteries Flywheels Ultracapacitors Slide 6 Long term uncertainties…. Tax policy 2010 2015 Solar panels Batteries Price of oil 2020 Carbon capture and sequestration 2025 2030 Climate change Slide 7 Goals for an energy policy model Model capabilities we are looking for » Multi-scale • Multiple time scales (hourly, daily, seasonal, annual, decade) • Multiple spatial scales • Multiple technologies (different coal-burning technologies, new wind turbines, …) • Multiple markets – Transportation (commercial, commuter, home activities) – Electricity use (heavy industrial, light industrial, business, residential) – …. » Stochastic (handles uncertainty) • • • • Hourly fluctuations in wind, solar and demands Daily variations in prices and rainfall Seasonal changes in weather Yearly changes in supplies, technologies and policies Slide 8 Outline Modeling stochastic resource allocation problems An introduction to ADP ADP and the post-decision state variable A blood management example The SMART energy policy model Slide 9 A resource allocation model Attribute vectors: a Asset class Time invested Type Location Age Location ETA Home Experience Driving hours © 2008 Warren B. Powell Location ETA A/C type Fuel level Home shop Crew Eqpt1 Eqpt100 Slide 10 A resource allocation model Modeling resources: » The attributes of a single resource: a The attributes of a single resource a A The attribute space » The resource state vector: Rta The number of resources with attribute a Rt Rta aA The resource state vector » The information process: Rˆta The change in the number of resources with attribute a. © 2008 Warren B. Powell Slide 11 A resource allocation model Modeling demands: » The attributes of a single demand: b The attributes of a demand to be served. b B The attribute space » The demand state vector: Dtb The number of demands with attribute b Dt Dtb bB The demand state vector » The information process: Dˆ tb The change in the number of demands with attribute b. © 2008 Warren B. Powell Slide 12 Energy resource modeling The system state: St Rt , Dt , t System state, where: Rt Resource state (how much capacity, reserves) Dt Market demands t "system parameters" State of the technology (costs, performance) Climate, weather (temperature, rainfall, wind) Government policies (tax rebates on solar panels) Market prices (oil, coal) © 2008 Warren B. Powell Slide 13 Energy resource modeling The decision variables: xtcap New capacity Retired capacity Storage capacity for each: Type Location Technology xtdisp Flow from: Resource to conversion Conversion to storage Storage to grid Conversion to grid Grid to intermediate uses Grid to final demand Slide 14 Energy resource modeling Exogenous information: Wt New information = Rˆt , Dˆ t , ˆt Rˆt Exogenous changes in capacity, reserves Dˆ t New demands for energy by type ˆt Exogenous changes in parameters. Slide 15 Energy resource modeling The transition function St 1 S (St , xt ,Wt 1 ) M Known as the: “Transition function” “Transfer function” “System model” “Plant model” “Model” © 2008 Warren B. Powell Slide 16 Energy resource modeling Resources Demands Slide 17 Energy resource modeling t t+1 t+2 Slide 18 Energy resource modeling Optimizing over time t t+1 t+2 Optimizing at a point in time Slide 19 Energy resource modeling The objective function t max E C St , X ( St ) t Expectation over all State variable Contribution function Decision function (policy) Finding the best policy random outcomes » How do we find the best policy? • • • • Myopic policies Rolling horizon policies Simulation-optimization Dynamic programming Slide 20 Outline Modeling stochastic resource allocation problems An introduction to ADP ADP and the post-decision state variable A blood management example The SMART energy policy model Slide 21 Introduction to dynamic programming Bellman’s optimality equation: Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St xX Compute this for each state S Assume this is known Slide 22 Introduction to dynamic programming Bellman’s optimality equation: Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St xX Three curses Problem: Curse of dimensionality State space Outcome space Action space (feasible region) Slide 23 Introduction to dynamic programming The computational challenges: Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St xX How do we find Vt 1 (St 1 )? How do we compute the expectation? How do we find the optimal solution? Slide 24 Introduction to ADP Classical ADP » Most applications of ADP focus on the challenge of handling multidimensional state variables » Start with Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St xX » Now replace the value function with some sort of approximation Vt 1 (St 1 ) Vt 1 (St 1 ) f F f f (St 1 ) » May draw from the entire field of statistics/machine learning. Slide 25 Introduction to ADP Other statistical methods » Regression trees • Combines regression with techniques for discrete variables. » Data mining • Good for categorical data » Neural networks • Engineers like this for low-dimensional continuous problems » Kernel/locally polynomial regression • Approximations portions of the value function locally using simple functions » Dirichlet mixture models • Aggregate portions of the function and fit approximations around these aggregations. Slide 26 Introduction to ADP But this does not solve our problem » Assume we have an approximate value function. » We still have to solve a problem that looks like Vt ( St ) max Ct ( St , xt ) E f f ( St 1 ) xX f F » This means we still have to deal with a maximization problem (might be a linear, nonlinear or integer program) with an expectation. Slide 27 Outline Modeling stochastic resource allocation problems An introduction to ADP ADP and the post-decision state variable A blood management example The SMART energy policy model Slide 28 Information State Action Information Action State Rain .2 -$2000 Clouds .3 $1000 Sun .5 $5000 Rain .2 -$200 Clouds .3 -$200 Sun .5 -$200 Rain .8 -$2000 Clouds .2 $1000 Sun .0 $5000 Rain .8 -$200 Clouds .2 -$200 Sun .0 -$200 Rain .1 -$2000 Clouds .5 $1000 Sun .4 $5000 Rain .1 -$200 Clouds .5 -$200 Sun .4 -$200 Rain .1 -$2000 Clouds .2 $1000 Sun .7 $5000 Rain .1 -$200 Clouds .2 -$200 Sun .7 -$200 - Decision nodes - Outcome nodes The post-decision state New concept: » The “pre-decision” state variable: • St The information required to make a decision xt • Same as a “decision node” in a decision tree. » The “post-decision” state variable: x • St The state of what we know immediately after we make a decision. • Same as an “outcome node” in a decision tree. Slide 30 The post-decision state An inventory problem: » Our basic inventory equation: Rt 1 max 0, Rt xt Dˆ t 1 where Rt Resource at time t xt Order quantity at time t Dˆ Random demand t 1 » Using pre- and post-decision states: Rtx Rt xt Rt 1 max 0, Rtx Dˆ t 1 From pre- to post- From post- to pre- Slide 31 The post-decision state Pre-decision, state-action, and post-decision State Pre-decision state 9 3 states Action Post-decision state 3 9 state-action pairs 9 39 states Slide 32 The post-decision state Pre-decision: resources and demands St (Rt , Dt ) Slide 33 The post-decision state Stx S M , x (St , xt ) Slide 34 The post-decision state Stx St 1 S M ,W (Stx ,Wt 1 ) ˆ ) Wt 1 (Rˆt 1, D t 1 Slide 35 The post-decision state St 1 Slide 36 The post-decision state Classical form of Bellman’s equation: Vt ( St ) max Ct ( St , xt ) E Vt 1 ( St 1 ) | St xX Bellman’s equations around pre- and post-decision states: » Optimization problem (making the decision): Vt ( St ) max x Ct ( St , xt ) Vt x StM , x ( St , xt ) • Note: this problem is deterministic! » Expectation problem (incorporating uncertainty): Vt x ( Stx ) E Vt 1 ( S M ,W ( Stx ,Wt 1 )) | Stx Slide 37 Introduction to ADP We first use the value function around the post-decision state variable, removing the expectation: Vt (St ) max Ct (St , xt ) Vt x Stx (St , xt ) xX We then replace the value function with an approximation that we estimate using machine learning techniques: Vt (St ) max Ct (St , xt ) Vt Stx (St , xt ) xX Slide 38 The post-decision state Value function approximations: » Linear (in the resource state): Vt ( Rtx ) vta Rtax aA » Piecewise linear, separable: Vt ( Rtx ) Vta ( Rtax ) aA » Indexed PWL separable: Vt ( Rtx ) Vta Rtax | ( featurest ) aA Slide 39 The post-decision state Value function approximations: » Ridge regression (Klabjan and Adelman) Vt ( Rtx ) V R f F tf Rtf tf aA f fa Rta » Benders cuts Vt ( Rt ) x1 x0 Slide 40 Making decisions Following an ADP policy Slide 41 Making decisions Following an ADP policy Slide 42 Making decisions Following an ADP policy Slide 43 Making decisions Following an ADP policy Slide 44 Approximate dynamic programming With luck, the objective function will improve steadily 1900000 1850000 Objective function 1800000 1750000 1700000 1650000 1600000 0 100 200 300 400 500 600 700 800 900 1000 Iterations Slide 45 The post-decision state Comparison to other methods: » Classical MDP (value iteration) V n ( S ) max x C ( S , x) EV n 1 S M ( S , x,W ) » Classical ADP (pre-decision state): vˆtn max x Ct ( Stn , xt ) p ( s ' | Stn , xt )Vt n11 s ' Expectation s' vˆt updates Vt (St ) Vt n ( Stn ) (1 n 1 )Vt n 1 ( Stn ) n 1vˆtn » Updating Vt x ,n1 around post-decision state: vˆtn max x Ct ( Stn , xt ) Vt x ,n 1 ( S M , x ( Stn , xt )) Vt n1 ( Stx,1n ) (1 n 1 )Vt n11 ( Stx,1n ) n 1vˆtn No expectation vˆt updates Vt 1 (Stx1 ) Slide 46 Approximate dynamic programming Step 1: Start with a pre-decision state Stn Step 2: Solve the deterministic optimization using Deterministic an approximate value function: optimization n n n 1 M ,x n vˆt max x Ct ( St , xt ) Vt (S ( St , xt )) to obtain xtn. Step 3: Update the value function approximation Vt n1 (Stx,1n ) (1 n1 )Vt n11 (Stx,1n ) n1vˆtn Recursive statistics Step 4: Obtain Monte Carlo sample of Wt (n ) and Simulation compute the next pre-decision state: Stn1 S M (Stn , xtn ,Wt 1 ( n )) Step 5: Return to step 1. Slide 47 Approximate dynamic programming Step 1: Start with a pre-decision state Stn Step 2: Solve the deterministic optimization using Deterministic an approximate value function: optimization n n n 1 M ,x n vˆt max x Ct ( St , xt ) Vt (S ( St , xt )) to obtain xtn. Step 3: Update the value function approximation Vt n1 (Stx,1n ) (1 n1 )Vt n11 (Stx,1n ) n1vˆtn Recursive statistics Step 4: Obtain Monte Carlo sample of Wt (n ) and Simulation compute the next pre-decision state: Stn1 S M (Stn , xtn ,Wt 1 ( n )) Step 5: Return to step 1. Slide 48 Outline Modeling stochastic resource allocation problems An introduction to ADP ADP and the post-decision state variable A blood management example The SMART energy policy model Slide 49 Blood management Managing blood inventories Slide 50 Blood management Managing blood inventories over time Week 0 Week 1 Week 2 Week 3 x0 S0 S0x x1 S1 S1x x2 S2 S2x x3 S3 S3x t=0 ˆ Rˆ1, D 1 t=1 ˆ Rˆ2 , D 2 t=2 ˆ Rˆ3 , D 3 t=3 Slide 51 St R t , Dˆ t Dt , AB AB+,0 AB- Dˆ t , AB AB+,1 AB+,2 A+ Dˆ t , A Dˆ t , AB AB+,3 AB+ Rt ,( AB,0) AB+,0 Rt ,( AB,1) AB+,1 Rt ,( AB,2) AB+,2 A- Rt ,(O ,0) O-,0 Rt ,(O ,1) O-,1 Rt ,(O ,2) ˆ Rtx B+ B- O-,2 Dˆ t , AB Dˆ t , AB Dˆ t , AB O+ O- O-,0 O-,1 O-,2 O-,3 Dˆ t , AB Satisfy a demand Hold Rt Rtx Rt 1 Rˆt 1, AB Rt ,( AB,0) AB+,0 AB+,0 Rt ,( AB,1) AB+,1 AB+,1 AB+,1 Rt ,( AB,2) AB+,2 AB+,2 AB+,2 AB+,3 AB+,3 AB+,0 Rt ,(O ,0) O-,0 Rt ,(O ,1) O-,1 O-,0 Rt ,(O ,2) O-,2 O-,1 O-,1 O-,2 O-,2 O-,3 O-,3 Dˆ t Rˆt 1,O O-,0 Rtx Rt Rt ,( AB,0) AB+,0 AB+,0 Rt ,( AB,1) AB+,1 AB+,1 Rt ,( AB,2) AB+,2 AB+,2 AB+,3 Rt ,(O ,0) O-,0 Rt ,(O ,1) O-,1 O-,0 Rt ,(O ,2) O-,2 O-,1 O-,2 O-,3 Dˆ t Rtx Rt Rt ,( AB,0) AB+,0 AB+,0 Rt ,( AB,1) AB+,1 AB+,1 Rt ,( AB,2) AB+,2 AB+,2 AB+,3 Rt ,(O ,0) O-,0 Rt ,(O ,1) O-,1 O-,0 Rt ,(O ,2) O-,2 O-,1 O-,2 O-,3 Solve this as a linear program. Dˆ t F ( Rt ) Duals ˆt ,( AB,0) ˆt ,( AB,1) ˆt ,( AB,2) Rtx Rt AB+,0 AB+,0 AB+,1 AB+,1 AB+,2 AB+,2 AB+,3 ˆt ,(O ,0) O-,0 ˆt ,(O ,1) O-,1 O-,0 ˆt ,(O ,2) O-,2 O-,1 O-,2 O-,3 Dual variables give value additional unit of blood.. Dˆ t F ( Rt ) Updating the value function approximation Estimate the gradient at Rtn ˆtn,( AB,2) F ( Rt ) Rtn,( AB,2) Slide 57 Updating the value function approximation Update the value function at Rtx,1n Vt n11 ( Rtx1 ) ˆtn,( AB,2) F ( Rt ) Rˆtx,n Rtx,1n Rtn,( AB,2) Slide 58 Updating the value function approximation Update the value function at Rtx,1n Vt n11 ( Rtx1 ) ˆtn,( AB,2) Rtx,1n Slide 59 Updating the value function approximation Update the value function at Rtx,1n Vt n11 ( Rtx1 ) Vt n1 ( Rtx1 ) Rtx,1n Slide 60 Outline Modeling stochastic resource allocation problems An introduction to ADP ADP and the post-decision state variable A blood management example The SMART energy policy model Slide 61 SMART-Stochastic, multiscale model SMART: A Stochastic, Multiscale Allocation model for energy Resources, Technology and policy » Stochastic – able to handle different types of uncertainty: • Fine-grained – Daily fluctuations in wind, solar, demand, prices, … • Coarse-grained – Major climate variations, new government policies, technology breakthroughs » Multiscale – able to handle different levels of detail: • Time scales – Hourly to yearly • Spatial scales – Aggregate to fine-grained disaggregate • Activities – Different types of demand patterns » Decisions • Hourly dispatch decisions • Yearly investment decisions • Takes as input parameters characterizing government policies, performance of technologies, assumptions about climate Slide 62 The annual investment problem 2008 New information 2009 New information oil oil oil ˆ oil ˆ oil oil oil ˆ ˆ ˆ ˆ R x R Dt t Rt 1 xt 1 Rt 1Dt 1 t 1 oil t oil oil t t windˆ wind wind wind windˆ windˆ wind wind ˆ Rtwindxtwind Rt Dt ˆt Rt 1 xt 1Rt 1 Dt 1 ˆt 1 R x Rˆ Dˆ ˆ R x Rˆ Dˆ ˆ nat gasnat gas nat gasnat gasnat gas nat gasnat gas nat gasnat gasnat gas t t t t 1 t 1 t t t t 1 t 1 R x Rˆ Dˆ ˆ coal coal coal coal coal t t t t t R x Rˆ Dˆ ˆ coal coal coal coal coal t 1 t 1 t 1 t 1 t 1 Slide 63 The hourly dispatch problem Hourly electricity “dispatch” problem Slide 64 The hourly dispatch problem Hourly model » Decisions at time t impact t+1 through the amount of water held in the reservoir. Hour t Hour t+1 Slide 65 The hourly dispatch problem Hourly model » Decisions at time t impact t+1 through the amount of water held in the reservoir. Hour t Value of holding water in the reservoir for future time periods. Slide 66 The hourly dispatch problem Slide 67 The hourly dispatch problem Hour 2008 1 2 3 4 8760 2009 1 2 Slide 68 The hourly dispatch problem Hour 2008 1 2 3 4 8760 2009 1 2 Slide 69 SMART-Stochastic, multiscale model 2008 2009 Slide 70 SMART-Stochastic, multiscale model 2008 2009 oil 2009 wind 2009 nat gas 2009 coal 2009 Slide 71 SMART-Stochastic, multiscale model 2008 2009 2010 2011 2038 Slide 72 SMART-Stochastic, multiscale model 2008 2009 2010 2011 2038 Slide 73 SMART-Stochastic, multiscale model 2008 2009 ~5 seconds ~5 seconds 2010 ~5 seconds 2011 ~5 seconds 2038 ~5 seconds Slide 74 SMART-Stochastic, multiscale model Use statistical methods to learn the value of resources in the future. Resources may be: Vt ( Rt ) » Stored energy » Storage capacity • Batteries • Flywheels • Compressed air Value • Hydro • Flywheel energy • … » Energy transmission capacity • Transmission lines • Gas lines • Shipping capacity » Energy production sources Amount of resource • Wind mills • Solar panels • Nuclear power plants Slide 75 SMART-Stochastic, multiscale model Approximating continuous functions The algorithm performs very fine discretization over a small range of the function which is visited most often. Slide 76 SMART-Stochastic, multiscale model Benchmarking » Compare ADP to optimal LP for a deterministic problem • Annual model – 8,760 hours over a single year – Focus on ability to match hydro storage decisions • 20 year model – 24 hour time increments over 20 years – Focus on investment decisions » Comparisons on stochastic model • Stochastic rainfall analysis – How does ADP solution compare to LP? • Carbon tax policy analysis – Demonstrate nonanticipativity Slide 77 Benchmarking on hourly dispatch ADP objective function relative to optimal LP 2.50 Percentage error from optimal 2.50% 2.00% 2.00 1.50% 1.50 1.00% 1.00 0.50% 0.50 0.06% over optimal 0.00% 0.00 0 50 100 150 200 250 Iterations 300 350 400 450 Slide 78 500 Benchmarking on hourly dispatch Optimal from linear program Optimal from linear program Reservoir level Rainfall Demand Slide 79 Benchmarking on hourly dispatch Approximate dynamic programming ADP solution Reservoir level Rainfall Demand Slide 80 Benchmarking on hourly dispatch Optimal from linear program Optimal from linear program Reservoir level Rainfall Demand Slide 81 Benchmarking on hourly dispatch Approximate dynamic programming ADP solution Reservoir level Rainfall Demand Slide 82 Multidecade energy model Optimal vs. ADP – daily model over 20 years 40.00% 35.00% Percent over optimal 30.00% 25.00% 20.00% 15.00% 10.00% 0.24% over optimal 5.00% 0.00% 0 100 200 300 400 500 600 Iterations © 2009 Warren B. Powell Slide 83 Energy policy modeling Traditional optimization models tend to produce all-or-nothing solutions Investment in IGCC Traditional optimization IGCC is cheaper Pulverized coal is cheaper Approximate dynamic programming Cost differential: IGCC - Pulverized coal Slide 84 Stochastic rainfall 700 600 Precipitation Sample paths 500 400 300 200 100 0 0 100 200 300 400 500 600 700 800 Time period Slide 85 Stochastic rainfall 9000 8000 ADP Reservoir level 7000 Optimal for individual scenarios 6000 5000 4000 3000 2000 1000 0 0 100 200 300 400 500 600 700 800 Time period Slide 86 Energy policy modeling Following sample paths » Demands, prices, weather, technology, policies, … Wt Rˆt , Dˆ t , ˆt Metric (e.g. % renewable) Achieved goal w/ Prob. 0.70 Need to consider: Finge-grained noise (wind, rain, demand, prices, …) Coarse-grained noise (technology, policy, climate, …) 2030 Slide 87 Energy policy modeling Policy study: Carbon tax » What is the effect of a potential (but uncertain) carbon tax in year 8? 0 1 2 3 4 5 6 7 8 9 Year Slide 88 Energy policy modeling 80000 70000 Carbon-based technologies Installed Capacity 60000 50000 40000 Renewable technologies 30000 20000 No carbon tax 10000 0 2 4 6 8 10 Year 12 14 16 18 20 Slide 89 Energy policy modeling 80000 70000 Carbon-based technologies Installed Capacity 60000 50000 Carbon tax policy unknown Carbon tax policy determined 40000 Renewable technologies 30000 20000 With carbon tax 10000 0 2 4 6 8 10 Year 12 14 16 18 20 Slide 90 Energy policy modeling 80000 70000 Carbon-based technologies Installed Capacity 60000 50000 40000 Renewable technologies 30000 20000 With carbon tax 10000 0 2 4 6 8 10 Year 12 14 16 18 20 Slide 91 Conclusions Capabilities » SMART can handle problems with over 300,000 time periods so that it can model hourly variations in a longterm energy investment model. » It can simulate virtually any form of uncertainty, either provided through an exogenous scenario file or sampled from a probability distribution. » Accurate modeling of climate, technology and markets requires access to exogenously provided scenarios. » It properly models storage processes over time. » Current tests are on an aggregate model, but the modeling framework (and library) is set up for spatially disaggregate problems. Slide 92 Conclusions Limitations » More research is needed to test the ability of the model to use multiple storage technologies. » Extension to spatially disaggregate model will require significant engineering and data. » Run times will start to become an issue for a spatially disaggregate model. » Value function approximations capture the resource state vector, but are limited to very simple exogenous state variations. Slide 93 Outline Modeling stochastic resource allocation problems An introduction to ADP ADP and the post-decision state variable A blood management example The SMART energy policy model Merging machine learning and optimization Slide 94 Merging machine learning and optimization The challenge of coarse-grained uncertainty » Fine-grained uncertainty can generally be modeled as memoryless (even if it is not). » Coarse-grained uncertainty affects what might be called “state of the world.” » The value of a resource depends on the “state of the world.” • • • • • Is there a carbon tax? What is the state of battery research? Have there been major new oil discoveries? What is the price of oil? Did the international community adopt strict limits on carbon emissions? • Has their been advances in our understanding of climate change? Slide 95 Merging machine learning and optimization Modeling the “state of the world” » Instead of Vt ( Rt ), we have Vt ( Rt | StW ), where StW captures major exogenous variables. • Instead of one piecewise linear value function for each resource and time period… • We need one for each state of the world. There can be thousands of these. » We can use powerful machine learning algorithms to overcome these new curses of dimensionality. Slide 96 Merging machine learning and optimization Strategy 1: Locally polynomial regression » Widely used in statistics » Approximate complex functions locally using simple functions. » Estimate of the function is a weighted sum of these local approximations. » But cannot handle categorical variables. Slide 97 Merging machine learning and optimization Strategy 2: Dirichlet process mixtures of generalized linear models Slide 98 Merging machine learning and optimization Strategy 3: Hierarchical learning models » Estimate piecewise constant functions at different levels of aggregation: Slide 99 Merging machine learning and optimization Next steps: » We need to transition these machine learning techniques into an ADP setting: • Can they be adapted to work within a linear or nonlinear optimization algorithm? • All three methods are asymptotically unbiased, but this depends on unbiased observations. In an ADP algorithm, observations are biased. • We need to design an effective exploration strategy so that the solution does not become stuck. » Other issues • Will the methods provide fast, robust solutions for effective policy analysis? Slide 100 © 2008 2009 Warren B. Powell Slide 101 St 1 e gam e l u d Sche Canc el gam e St rain t s ca o re F For rep .3 ame g e l du Sche Cancel game su nn he r dy ast 6 y. eat lo u rec Use w st c Fo or t ec a .1 se rt t u po n o e r re Do eath w e e g am l u d Sche Cancel game ame g e l du Sche Cancel gamte xX Rain .2 -$2000 Clouds .3 $1000 Sun .5 $5000 Rain .2 -$200 t t 1 t 1 Clouds .3 -$200 Sun .5 -$200 Rain .8 -$2000 Clouds .2 $1000 Sun .0 $5000 Rain .8 -$200 Clouds .2 -$200 Sun .0 -$200 Rain .1 -$2000 Clouds .5 $1000 Sun .4 $5000 Rain .1 -$200 Clouds .5 -$200 Sun .4 -$200 Rain .1 -$2000 Clouds .2 $1000 Sun .7 $5000 Rain .1 -$200 Clouds .2 -$200 Sun .7 -$200 Vt ( St ) max C ( St , x ) E V ( S ) | St -Decision nodes - Outcome nodes Slide 102 rain t s ca o re F For ep o er r .3 ame g e l du Sche Cancel game su nn a th dy as t 6 y. we lo u .1 rec Use st c Fo rt ec a e gam e l u d Sche Canc el gam e se rt t u po no er re Do eath w e e g am l u d Sche Cancel game ame g e l du Sche Cancel game Rain .2 -$2000 Clouds .3 $1000 Sun .5 $5000 Rain .2 -$200 Clouds .3 -$200 Sun .5 -$200 Rain .8 -$2000 Clouds .2 $1000 Sun .0 $5000 Rain .8 -$200 Clouds .2 -$200 Sun .0 -$200 Rain .1 -$2000 Clouds .5 $1000 Sun .4 $5000 Rain .1 -$200 Clouds .5 -$200 Sun .4 -$200 Rain .1 -$2000 Clouds .2 $1000 Sun .7 $5000 Rain .1 -$200 Clouds .2 -$200 Sun .7 -$200 - Decision nodes - Outcome nodes Slide 103 Demand modeling Commercial electric demand 7 days © 2009 Warren B. Powell Slide 104