Introduction to semi Markov decision process (SMDP)

Download Report

Transcript Introduction to semi Markov decision process (SMDP)

Introduction to Discrete Time
Semi Markov Process
Nur Aini Masruroh
Recall: Discrete Time Markov Process
• In the DTMC…
▫ Whenever a process enters a state i, we imagine
that it determines the next state j to which it will
move instantaneously according to the
transition probability of pij
Discrete time semi Markov Process
• In semi Markov process, after state j has been selected, but before
making this transition from state i to state j, the process “holds” for
a time tij in the state i.
• The holding times tij are positive, integer-valued random variables
each governed by a probability mass function hij(.) called the
holding time mass function for a transition from state i to state j
• After holding in state i for the holding time tij, the process makes
transition to state j and then immediately select a new destination
state k using the transition probabilities pjk
• It next chooses a holding time tjk in state j according to the mass
function hjk(.) and makes its next transition at time tjk after entering
state j
• The process continues in the same way
Discrete time semi Markov Process
(cont’d)
• To describe semi Markov process completely, we need to define n2
holding time mass functions in addition to the transition probabilities
• Suppose, the cumulative probability distribution of tij, ≤hij(.) is defined
as

hij (n)   hij  Ptij  n
n
m 0
and

hij (n) 

 h (m)  1 h (n)  Pt

m  n 1
ij
ij
ij
 n
 Suppose we know that the process enters state i and choose successor j
but we don’t know the successor chosen. The pmf assigned to the time ti
spent in i is defined as
N
wi (m)   pij hij (m)  Pti  m
j 1
wi(m): probability that the system will spend m time unit in state i
Discrete time semi Markov Process
(cont’d)
• So,ti: waiting time in state i and wi(.): waiting time pmf
▫ Waiting time is a holding time that is unconditional on the
destination state
▫ The mean waiting time t is related to the mean holding time t ij by
i
N
ti   pij t ij
j 1
2
We compute the second moment of the waiting time ti from the second
2
moments of the holding time tij using
N
ti   pij tij
2
j 1
2
Variance waiting time,
v
ti  ti  t i
2
2
Car rental example
• A car rental rents cars at two locations, town 1 and town 2. the
experience of the company shows that a car is rented in town 1 is a
0.8 probability that it will be returned to town 1 and a 0.2
probability that it will be returned to town 2. when the car is rented
in town 2, there is a 0.7 probability that it will be returned to town 2
and a 0.3 probability that it will be returned to town 1. We assumed
that there are always many customers available at both towns and
that cars are always rented at the town to which they are last
returned
• Because of the nature of the trips involved, the length of time a car
will be rented depends on both where it is rented and where it is
returned. The holding time tij is thus the length of time a car will be
rented if it was rented at town i and returned to town j. From the
company records, the possible holding time pmf follows geometric
distribution with the following expressions:
▫
▫
▫
▫
h11(m) = (1/3)(2/3)m-1
h21(m) = (1/4)(3/4)m-1
h12(m) = (1/6)(5/6)m-1
h22(m) = (1/12)(11/12)m-1
Car rental example: solution
Transition probability
0.8 0.2
P

matrix 0.3 0.7


The holding time distribution are all geometric distributions
General term for geometric distribution is (1-a)an-1, with mean 1/(1-a)
and second moment (1+a)/(1-a)2, and variance a/(1-a)2
Therefore the moments of four holding times are:
t 11  3, t11  15, v t11  6
2
t 21  4, t 21  28, v t 21  12
2
t 12  6, t12  66, v t12  30
2
t 22  12, t 22  276, v t 22  132
2
These numbers indicate that people renting cars at town 2 and
returning them to town 2 often have long rental periods
Car rental example: solution
A complete description of the semi Markov process:
p12 = 0.2
h12(m) = 1/6(5/6)m-1
t12 bar = 6
p11 = 0.8
h11(m) = 1/3(2/3)m-1
t11 bar = 3
1
2
p21 = 0.3
h21(m) = 1/4(3/4)m-1
t21 bar = 4
p22 = 0.7
h22(m) = 1/12(11/12)m-1
t22 bar = 12
Car rental example: solution
If hij(m) is the geometric distribution (1-a)am-1, m=1,2,3,…, then the
cumulative and complementary cumulative distributions ≤hij(n) and >hij(n) are

n
n
m 0
m 1
hij (n)   hij (m)   (1  a)a m1  1  a n
n  0,1,2,...
and

hij (n) 


 h (m)   (1  a)a
m  n 1
ij
m 1
 an
n  0,1,2,...
m  n 1
Therefore, the matrix forms of these distributions for the example are


1  (2 / 3) n 1  (5 / 6) n 
H ( n)  
n  0,1,2,...
n
n
1  (3 / 4) 1  (11/ 12) 
(2 / 3) n
H ( n)  
n
(3 / 4)
(5 / 6) n 
 n  0,1,2,...
(11/ 12) n 
The results show, for example, that the chance that a car rented in
town 1 and returned to town 2 will be rented for n or fewer time periods
is 1 – (5/6)n. A car rented in town 2 and returned to town 1 has a
chance (3/4)n of being rented for more than n periods
Car rental example: solution
Waiting time
t 1  p11. t 11  p12 . t 12  0.8(3)  0.2(6)  3.6
t 2  p21. t 21  p22 . t 22  0.3(4)  0.7(12)  9.6
The mean time that a car rented in town 1 will be rented, destination
unknown, is 3.6 period. If car rented in town 2, the mean is 9.6 period
Distribution of waiting time (probability that a car rented in each town will
be rented for m periods, destination unknown) is:
w1 (m)  p11. h11 (m)  p12 . h12 (m)  0.8(1 / 3)(2 / 3) m1  0.2(1 / 6)(5 / 6) m1
w2 (m)  p21. h21 (m)  p22 . h22 (m)  0.3(1 / 4)(3 / 4) m1  0.7(1 / 12)(11/ 12) m1
Car rental example: solution
Cumulative and complementary cumulative distributions of waiting time:

w1 (n)  1  0.8(2 / 3) n  0.2(5 / 6) n

w2 (n)  1  0.3(3 / 4) n  0.7(11/ 12) n
and

w1 (n)  0.8(2 / 3) n  0.2(5 / 6) n

w2 (n)  0.3(3 / 4) n  0.7(11/ 12) n
The expression for >w2(n), for example, shows the probability that a
car rented in town 2 will be rented for more than n periods if its
destination is unknown
Interval transition probabilities, Φij(n)
• Corresponds to multistep transition
probabilities for the Markov process
• Φij(n): probability that a discrete-time semi
Markov process will be in state j at time n given
that it entered state i at time zero
 interval transition probability from state i to
state j in the interval (0,n)
▫ Note that an essential part of the definition is that the
system entered state i at time zero as opposed to its
simply being in state i at time zero
Limiting behavior
• The chain structure of semi-Markov process is
the same as that of its imbedded Markov process
• Dealing with monodesmic semi Markov process
▫ Monodesmic process: Markov process that has a Φ with equal
rows
▫ Monodesmic process:
 Sufficient condition: able to make transition
 Necessary condition: exit only one subset of states that must be
occupied after infinitely many transitions
Limiting behavior (cont’d)
• Limiting interval probabilities, Φij for a monodesmic semi
Markov process:
 j j
 j j
ij  N

 j



 j j
j 1
With:
πj: limiting state probability of the imbedded Markov process for state j
τj bar: mean waiting time for state j
Consider: car rental example
0.8 0.2
P

• Transition probability matrix,
0.3 0.7


π1 = 0.6, π2 = 0.4
 1  3.6  2  9.6
1 
 1 1
0.6(3.6)

 0.36
0
.
6
(
3
.
6
)

0
.
4
(
9
.
6
)
 1 1   2 2
2 
 2 2
0.4(9.6)

 0.64
0
.
6
(
3
.
6
)

0
.
4
(
9
.
6
)
 1 1   2 2