Chapter 13 Stochastic Optimal Control The state of the system is represented by a controlled stochastic process.
Download
Report
Transcript Chapter 13 Stochastic Optimal Control The state of the system is represented by a controlled stochastic process.
Chapter 13 Stochastic Optimal Control
The state of the system is represented by a controlled
stochastic process. Section 13.2 formulates a stochastic
optimal control problem. We shall consider stochastic
differential equations of a type known as Itô equations,
which are perturbed by Markov diffusion processes,
and our goal will be to synthesize optimal feedback
controls for systems subject to Itô equations.
In Section 13.3, we shall extend the production
planning model of Chapter 6.
In Section 13.4, we solve an optimal stochastic
advertising problem.
In Section 13.5, we will introduce investment decisions
in the consumption model of Example 1.3, and
consider both risk-free and risky investments.
In Section 13.6, we will conclude the chapter by
mentioning stochastic optimal control problems
involving jump Markov processes, which are treated in
Sethi and Zhang (1994a, 1994c).
13.2 Stochastic Optimal Control
We assume the state variables to be observable, and
we use the dynamic programming or the HamiltonJacobi-Bellman framework rather than the stochastic
maximum principle.
Maximize
subject to the Itô stochastic differential equation
V(x,t), the value function satisfies
By Taylor’s expansion, we have
From (3.29), we can formally write
The multiplication rules of the stochastic calculus are:
13.3 A stochastic Production Planning Model
Xt = the inventory level at time t (state variable),
Ut = the production rate at time t (control variable),
S = the constant demand rate at time t ; S >0,
T = the length of planning period,
= the factory-optimal inventory level,
= the factory-optimal production level,
x0 = the initial inventory level,
h = the inventory holding cost coefficient,
c = the production cost coefficient,
B = the salvage value per unit of inventory at time T,
zt = the standard Wiener process,
= the constant diffusion coefficient.
Let V(x,t) be the value function. It satisfies the
Hamilton-Jacobi-Bellman (HJB) equation
Remark 13.1: If production rate were restricted to be
nonnegative, then (13.44) would be changed to
13.3.1 Solution for the Production Planning Problem
Since (13.51) must hold for any value of x , we must
have
where the boundary conditions are obtained by
comparing (13.47) with the boundary condition
V(x,T) = Bx of (13.43).
To solve (13.52), we expand
fractions to obtain
by partial
where
Since S is assumed to be a constant, we can reduce
(13.53) to
By the change of variable defined by
solution is given by
the
Remark 13.2 The optimal production rate in (13.59)
equals the demand plus a correction term which
depends on the level of inventory and the distance
from the horizon time T. Since (y-1) < 0 for t < T, it is
clear that for lower values of x, the optimal production
rate is likely to be positive. However, if x is very high,
the correction term will become smaller than –S, and
the optimal control will be negative. In other words, if
inventory level is too high, the factory can save money
by disposing a part of the inventory resulting in lower
holding costs.
Figure 13.2: A Sample Path of Xt with X0=x0 >0 and
B>0
13.4 A Stochastic Advertising Problem
The model is :
where Xt is the market share and Ut is the rate of
advertising at time t .
V(x) is the expected value of the discounted profits
from time t to infinity. Since T = , the future looks the
same from any time t, and therefore the value function
does not depend on t.
We can write the HJB equation as
We obtain the explicit formula for the optimal feedback
control as
Eventually, the market share process hovers around
the equilibrium level
13.5 An Optimal Consumption-Investment Problem
Consider investing a part of Rich’s wealth in a
risky security or stock that earns an expected rate of
return that equals > r . The problem of Rich, known
now as Rich Investor, is to optimally allocate his
wealth between the risky-free savings account and the
risky stock over time and also consume over time so
as to maximize his total utility of consumption.
The savings account is easy to model.
Modeling the stock is more complicated.
where is the average rate of return on stock, is
the standard deviation associated with the return, and
zt is a standard Wiener process.
The price process Pt given by (13.74) is often referred
to as a logarithmic Brownian Motion.
Notation:
Wt = the wealth at time t ,
Ct = the consumption rate at time t ,
Qt = the fraction of the wealth invested in stock at
time t,
1-Qt= the fraction of the wealth kept in the savings
account at time t ,
U(c) = the utility of consumption when consumption is
at the rate c; the function U(c) is assumed to be
increasing and concave,
= the rate of discount applied to consumption
utility,
B = the bankruptcy parameter to be explained later.
The term QtWtdt represents the expected return from
the risky investment of QtWt dollars during the period
from t to t+dt. The term QtWtdzt represents the risk
involved in investing QtWt dollars in stock. The term
(1-Qt)rWt dt is the amount of interest earned on the
balance of (1-Qt)Wt dollars in the savings account.
Finally, Ctdt represents the amount of consumption
during the interval from t to t+dt .
We shall say that Rich goes bankrupt at time T , when
his wealth falls to zero at that time. T is a random
variable, called a stopping time, since it is observed
exactly at the instant of time when wealth falls to zero.
Rich’s objective function is:
See Sethi(1997a) for a detailed discussion of the
bankruptcy parameter B .
Simplify the problem by assuming:
We also assume B =-. The condition (13.81) together
with B =- implies a strictly positive consumpton level
at all times and no bankruptcy.
Karatzas, Lehoczky, Sethi, and Shreve(1986) assumed
that the value function is strictly concave and, therefore,
Vx is monotonically decreasing in x . This means that
The function c(•) defined in (13.83) has an inverse X(•)
such that (13.84) can be written as
Note that c(X(c))=c, c’(X)X’(c)=1, and therefore,
c’(X)= 1/ X’(c).
Differentiation with respect to c yields the intended
second-order, linear ordinary differential equation
This problem and its many extensions have been
studied in great detail. See,e.g., Sethi(1997a).
13.6 Concluding Remarks
For stochastic optimal control application to
manufacturing problems, see Sethi and Zhang (1994a)
and Yin and Zhang (1997). For applications to
problems in finance, see Sethi(1997a) and Karatzas
and Shreve(1998). For applications in marketing, see
Tapiero(1988). For applications in economics including
economics of natural resources, see Derzko and
Sethi(1981a), and Malliaris and Brock(1982).