Discrete Random Variate Generation

Download Report

Transcript Discrete Random Variate Generation

Generating Random Variates
• Use Trace Driven Simulations
• Use Empirical Distributions
• Use Parametric Distributions
Ref: L. Devroye Non-Uniform Random Variate Generation
Trace-driven Simulations
+
+
-
Strong validity argument
Easy to model systems
Expensive
Validity (data representative?)
Sensitivity analysis difficult
Results cannot be generalized to other systems
(restrictive)
- Slow
- No rare data information
Empirical Distributions (histograms)
+
+
+
-
Strong validity argument
Easy to model systems
Replicate
Data may not be representative
Sensitivity analysis is difficult
Results cannot be generalized to other systems
Difficult to model dependent data
Need a lot of data (must do conditional sampling)
Outliers and rare data problems
Parametric Probability Model
+ Results can be generalized
+ Can do sensitivity analysis
+ Replicate
- Data may not be representative
- Probability distribution selection error
- Parametric fitting error
Random Variate Generation:
General Methods
•
•
•
•
Inverse Transform
Composition
Acceptance/Rejection
Special Properties
Criteria for Comparing Algorithms
• Mathematical validity – does it give what it is
supposed to?
• Numerical stability – do some “seeds” cause
problems?
• Speed
• Memory requirements – are they excessively
large?
• Implementation (portability)
• Parametric stability – is it uniformly fast for all
input parameters (e.g. will it take longer to
generate PP as rate increases?)
Inverse Transform
•
Consider a random variable X with a
(cumulative) distribution function FX
•
Algorithm to generate X ~ FX :
1. Generate U ~ U(0,1)
2. Return X = F-1(U)
•
F-1(U) will always be defined because
0U1 and range of F is [0,1] (if monotonic)
•
What if F discrete or has “atoms”? (rt. cont.)
Consider discrete random variate, X
P{X=k} = Pk
-4
1 2
Pk = 1 so lay out Pks along Unit Interval
P-4
P1
0
P2
1
U (0,1)
Random Number, U, falls in interval k with Prob = Pk
Return k corresponding to interval (use an array or table)
Equivalent to “inverting” the CDF
P{X=k}
-4
1
2
Fx(k) = Prob{X≤k}
1
U1
U2
X2=-4
X1=2
x
Generating Continuous Random Variates
FX(x)
1
U1
U2
X2
X1
x
Proof Algorithm Works
• Must show the X generated by the algorithm
is, in fact, from FX : P{X  x} = FX(x)
• P{X  x} = P{FX
-1(U)
 x} =
defn
P{U  FX(x)} = FX(x) = P{X  x}
defn
• First equality is conjecture, X = FX-1 (U),
second is mult. by Fx (monotone non-dec)
• Note: The = in the conjecture should really
be , meaning RVs have same distribution.
e.g. Exponential R.V.
• X is an exponential random variable with
mean 
1  e
F x   
0
x/
if x  0
otherwise
• To generate values of X set U = Fx(X) and
solve for (rv) X.
X = -  ln(1-U) (or X = -  ln(U))
Weibull Random Variable (with parameters  and )
-(x/ ) 
PDF: f(x) =  - xe -1
e
, x0, 0 otherwise
-(x/ ) 
CDF:F(x) = 1-
, x0
Setting F(x) = u~U(0,1)
leads to x = (-LN(1-u))1/.
Triangular Distribution
— Used when a crude, single mode distribution is needed.
f(x)
a
b
x
c
2(x-a)/(b-a)(c-a) for a  x  b
— The PDF is f(x) =
2(c-x)/(c-a)(c-b) for b  x  c
0
otherwise
where a and c are the boundaries, and b is the mode.
—
The CDF is F(v) =
(v-a)2/(b-a)(c-a) for a  v  b
1 - (c-v)2/(c-a)(c-b) for b  v  c
—
By the inversion method,
v = a + (u)1/2[(b-a)(c-a)]1/2 for 0  u  (b-a)/(c-a)
c - (1-u)1/2[(c-a)(c-b)]1/2 for (b-a)/(c-a)  u  1
where F(b) = (b-a)/(c-a)
Geometric Random Variable (with parameter p)
- Number of trials until the first success (with probability p)
- Probability mass function (PMF) P{X=j} = p(1-p) j-1, j=1,2,…
- Cumulative distribution function (CDF)
F(k) = j<k p(1-p) j-1 = 1- (1-p)k
If u = F(k) ~U(0,1), then k = LN(1-u)/(1-p) +1 = LN(1-u)/(1-p).
u = 1- (1-p)k
1- u = (1-p)k
k = LN(1-u)/LN(1-p)
(use floor function to discretize)
Inverse Transform: Advantages
• Intuitive
• Can be very fast
• Accurate
• One random variate generated per U(0,1) random
number
• Allows variance-reduction techniques to be
applied (later)
• Truncation easy
• Order statistics are easy
Inverse Transform:
Disadvantages
• May be hard to calculate F-1(U) in terms of
computer operations
• F-1(U) may not even exist
• If use power-series expansions, stopping
rule?
• For discrete distributions, must perform
search
Conditional Distributions - want to generate X
conditioned that it is between a and b (truncated)
Fx(x)
Fx(b)
U’
Fx(a)
a
b
Generate U’ between Fx(a) and Fx(b) (how?)
U’ = Fx(a) + RND*(Fx(b) - Fx(a))
X = Fx-1(U’)
Inverse Transform for Discrete
Random Variates
• Inverse transform can be used for discrete
random variables, also
• E.g., can use empirical distribution function
(see next slide)
• If using known probabilities, replace 1/n
step size with p(xi) step size
Variate Generation Techniques:
Empirical Distribution
1
1/n
0
x1
x2 x3
xn
How to sample from a discrete empirical distribution:
• Determine {(x1, F(x1)), (x2, F(x2)),…, (xn, F(xn))}
• Generate u
• Search for interval (F(xi)  u  F(xi+1))
• Report xi
Order Statistics
• ith order statistics is the ith smallest of n
observations
• Generate n iid observations x1, x2,…,xn
• Order the n observations
• x[1], x[2],…,x[n]
• x[1] describes the failure time in a serial system
• x[n] describes the failure time in a parallel system
• How can we generate x[1] and x[n] using one
U(0,1) variate?
Order Statistics
• Serial system
1
2
• Parallel system
…
1
2
…
n
n
Order Statistics
• F[n](a) = P{x[n]a} = P{Max{xi}a}
= P{x1a, x2a, …, xna}
= P{x1a} P{x2a} … P{xna}(indep)
= Fn(a)
(identically distributed)
Represents CDF of failure time of parallel
system using CDF of failure time of individual
component
• Inversion: Fn(a) = U implies a = F-1((U)1/n)
Order Statistics
• F[1](a) = P{x[1]a} = 1 - P{x[1]>a}
=1-P{Min{xi}>a}
= 1 - P{x1>a, x2>a, …, xn>a}
= 1 - P{x1>a} P{x2>a} … P{xn>a}
= 1 – (1 - F(a))n
 Represents CDF of failure time of serial
system using CDF of failure time of
individual component
Order Statistics
• Inversion:
1 – (1 – F(a))n = u implies
a = F-1(1 – (1 – u)1/n)
• Find ith order statistic:
e.g. 2nd order statistic:
find X[1],
now sample n-1 from U[F(X[1]), 1]
 need 2 uniforms to generate X[2]
NOTE
—
—
As n+, u1/n 1 and 1 - (1 - u)1/n 0 for u ~ U(0, 1)
Once the CDF of the order statistics are known, the
densities (PDFs) can be obtained by differentiating.
u
F(X)
1
1/n
u
U(0,1)
1- (1-u)1/n
0
F-1(1-(1-u)1/n)
F-1(u) F-1(u1/n)
X
Example
Let X be exponential with mean = 1/5, and u  U(0, 1).
Let n = 100
Then x[100] = (-1/5)1n(1-u1/100) and
x[1] = (-1/5)1n(1 - (1 - (1 - u)1/100)) = (-1/5) 1n((1 - u)1/100))
For u=.2,
For u=.5,
For u=.8,
x[100] = .827 and x[1] = .0004.
x[100] = .995 and x[1] = .0014.
x[100] = 1.22 and x[1] = .0032.
Composition
• Can be used when F can be expressed as a
convex combination of other distributions
Fi, where we hope to be able to sample from
Fi more easily than from F directly.


i 1
i 1
F x    pi Fi x  and f x    pi f i x 
• pi is the probability of generating from Fi

p
i 1
i
1
Composition: Graphical Example
f(x)
0.5
Area = 0.5
Area = 0.5
0
x
Composition: Algorithm
1. Generate positive random integer I such
that
P{I = i} = pi
for i = 1, 2, …
2. Return X with distribution function FI
Think of Step 1 as generating I with mass
function pI. (Can use inverse transform.)
Composition: Another Example
f(x)
1. Generate U1
2. If U1  a, generate
and return U2
2-a
Area = 1 - a
3. Else generate and
return X from
right-triangular
distribution
a
Area = a
x
0
1
Acceptance/Rejection
• X has density f(x) with bounded support
• If F is hard (or impossible) to invert, too
messy for composition... what to do?
 Generate Y from a more manageable
distribution and accept as coming from f
with a certain probability
Acceptance/Rejection Intuition:
Density f(x) is really ugly ... Say, Orange!
M’(x)
f(x)
x
M’ is a “Nice” Majorizing function..., Say Uniform
Intuition:
Throw darts at rectangle under M’ until hit f
M’(x)
Missed again!
Reject!
f(x)
X
Accept X! - done
Proof: Prob{Accept X} is Proportional to height of f(X)
x
Acceptance/Rejection
•
•
Create majorizing function M(x)  f(x) for all
x; normalize M( ) to be a density (area=1)
r(x) = M(x)/c where c is the area under M(x)
Algorithm:
1. Generate X from r(x)
2. Generate Y from U(0,M(X)) (independent of X)
3. If Y  f(X), Return X; else go to 1 and repeat
Generalized Acceptance Rejection: f(x)<g(x) for all x
1) Generate x with pdf g(x)/Ag.
2) Generate y  U[0, g(x)]
3) If y<f(x), accept x, otherwise, go to 1)
THEOREM: P{X t |Y  f(X)} = F(t) = -t f(z)dz.
Proof:
P{X  t |Y  f(X)} = P{X  t,Y f(X)} / P{Y  f(X)}
P{X t, Y  f(X)}
t
f (x)

0

=
=
t
f (x)

0

t
=
 (1/g(x)(g(x)/Ag) dy dx
 (1/Ag) dy dx
 (f(x)/Ag) dx

Y uniform 0-g(x)
X distn g(x)/Ag
and indep.
Second,
P{Y  f(X)}

=



=

f (x)

0
f (x)

g(x)/(g(x)/Ag) dy dx
(1/Ag) dy dx
0

=  (f(x)/Ag) dx = 1/Ag

t
Therefore, P{X  t | Y  f(X)} =  f(x) dx
Performance of Algorithm
Will Accept X with a probability equal to
(area under f(X))/ (area under M’(X))
If c = area under M’( ), Probability of
acceptance is 1/c – so want c to be small
What is the expected number of U’s needed to
generate one X?
2c Why?
Each iteration is coin flip (Bernoulli Trial).
Number of trials until first success is Geometric, G(p)
E[G] = 1/p and here p=1/c (2 Us per iteration)
Increase Prob(accept) and stop with
tighter Majorizing Function, M(X)
M’(x)
M(x)
f(X)/M(X)
f(X)
f(x)
x
X
What if f(x) is hard to evaluate?
Use minorizing function m(x)  f(x)
M(x)
m(x)
f(x)
Acceptance/Rejection
Final algorithm:
1. Generate X from r(x)
2. Generate U from U(0,1) (indep. of X)
3. If U  m(X)/M(X), return X and stop
4. Else if U  f(X)/M(X), return X and stop
5. Else go to 1 and try again
Biggest Problems
— Choosing majorizing and minorizing functions such
that it is easy to generate points under the majorizing
function and the minorizing function is easy to compute
(f(x) may be difficult to compute)
— We want it to be easy to sample under g(x)
AND close to f(x) (contradictory constraints)
Result
— As (dimension of x)  +,
(Area under f(x))/(Area in “cube”)  0
Special Properties
• Methods that make use of “special
properties” of distributions to generate them
n
• E.g.
=  X where Xi is N(0,1) random
variable i 1
• Z1 and Z2 are 2 with k1 and k2 degrees of
freedom, then
is Fk1,k2
Z /k
n2
2
i
X
1
1
Z 2 / k2
Special Properties
• An Erlang is a sum of exponentials
ERL (r,) =i=1,2,…,r (- 1n(ui)) = - 1n(i=1,2,…,r ui))
(Gamma allows r to be non-negative real)
• If X is Gamma (,1), and Y = X, then Y is Gamma (, ).
If  = 1, then X is Exp (1).
• Beta is a ratio of Gammas
X1 is distributed Gamma (1, )
X2 is distributed Gamma (2, )
X1 and X2 are independent
Then X1/(X1+X2) is distributed Beta (1, 2)
Binomial Random Variable Binomial(n,p)
(Bernoulli Approach)
0)
X = 0, i = 1
1)
Generate U distributed U(0, 1)
2)
If U  p, X  X+1
If U > p, X  X
If in, set i  i+1 and go to 1)
If i = n, stop with X distributed Binomial (n,p)
Geometric Random Variable Geometric (p)
(Bernoulli Approach) (already saw inversion approach)
0)
X=1
1)
Generate U  U(0, 1)
2)
If U p, X  geometric (p)
If U> p, X  X+1 and go to 1)
• Negative Binomial (r, p) is the sum of r IID Geometric (p)
Example of Special Properties:
Poisson RV, P, is sum of exponentials in unit interval
E[1/]
E[1/]
E[1/]
E[1/]
1
0
Has same probability as a Poisson() = 3...
Generate E[1/]  - ln{Ui} (–L*LN{RND} in sigma)
k
k+1
i=1
i=1
- ln{Ui} ≤ 1 ≤ - ln{Ui} <=> P = k
k
k+1
i=1
i=1
- ln{Ui} ≤ 1 ≤ - ln{Ui}
k
k+1
- ln{Ui} ≤ 1 ≤ - ln{ Ui}
k
k+1
ln{Ui} ≥ -1/  ≥ ln{ Ui}
i=1
(multiple by -1/ )
i=1
i=1
Ui ≥
(log of sum is product of logs)
i=1
i=1
k
<=> P = k
e-1/  ≥
k+1
 Ui
(take exp of everything-monotonic)
i=1
Algorithm: to generate Poisson (mean = ), multiply
iid Uniforms (RND) until product is less than e-1/ .
Special Properties:
Assume have Census Data only for Poisson Arrivals
eg: Hospitals, Deli Queues, etc.
Use fact that, given K Poisson events occur in T,
The Event times are distributed as K Uniforms on (0,T)
First arrival time:
Generate T1 = T*U[1:K] = smallest in K Uniforms on (0,T) (min order statistic)
= T*(1 – U1/K) (beware of rounding errors on computer)
Next interarrival time = smallest of K-1 Uniforms on (0,T-T1):
Generate T2 = (T-T1)*(1 – U1/(K-1))
and so on until get K events in T
Special Properties
• Given X ~ N(0,1), can generate
X’ ~ N(, 2) as X’ =  + X
• So sufficient to be able to generate X
• Inversion not possible
• Acceptance/rejection an option
• But better methods have been developed
Normal Random Variables
• Need only generate N(0, 1)
— If X is IID N(0, 1), then Y = X+ is N(,)
1) Crude (Central Limit Theorem) Method
• U  U(0, 1), hence E(U) = .5 and V(U) = 1/12.

• Set Y =
n
Ui, hence E(Y) = n/2 and V(Y) = n/12
i1
• By CLT, (Y-n/2)/(n/12)1/2 D N(0, 1) as n +
• Consider
 Ui-6 (i.e., n = 12).
12
i1
However n may be too small!
Box-Muller Method
• Generate U1, U2 U(0,1) random numbers
• Generate X1  2ln U1 cos  2 U 2 
and
X 2  2ln U1 sin  2 U 2 
• X1 and X2 will be iid N(0,1) random variables
Box-Muller Explanation
• If X1 and X2 iid N(0,1)
• D2 = X12 + X22 is 22 which is the SAME as
an Exponential(2)
 D  2ln U
• X1 = D cos 
X2 = D sin 
D
X2
where  = 2U2
X1
Polar Method
1.
2.
3.
4.
5.
Let U1, U2 be U(0,1),
define Vi = 2Ui – 1: Vi is U(-1,1)
Define S = V12 + V22
If S>1, go to 1.
2ln S
If S1, then Y 
S
1. X1=V1Y and X2=V2Y are iid N(0,1)
Polar Method Graphically
(-1,1)
(1,1)
S

V1
(-1,-1)
V2
(1,-1)
Let’s Look at some Normal Random Variates...
Notes
— X2 + X2 is distributed  22 or exponential (2)
1
2
or ( V12  V22 )Y2 = -2ln(S)
is true if
Theorem: S ~ U(0, 1) given S 1.
Proof:
P{S z | S 1} = P{Sz , S1}/ P{S1} = P{Sz }/ P{S1}
= P{ V12  V22  z | S  1}
(since V12  V22 =S)
2
2
and ( V1  V2 )Y2 = -21n(S))
= P{-(z- V22)1/2  V1  z- V22 )1/2} / P{S1}
=
=

z1/ 2
 z1/ 2
 /2

 /2
1/2 (2/4) (z- V22 )1/2 dV2 / ( /4)
(zcos2()/2) d/ ( /4) = z
(later)
If S>1, go to (*).
Rejection occurs with probability P{S>1}=1-  /4
Proof:
P{S  1} = P{ V12 + V22  1}
= P{-(1- V22 )1/2  V1 (1- V22 )1/2 }
=
=

1

π/2
-1
-π/2
(1- V22 )1/2 dV2 /2
(cos2 ()/2)d =  /4
Scatter Plot of X1 and X2 from NOR{} in Sigma
Two successiv e N ormals f rom Sigma
530
372
214
X2
56
-102
-260
-418
-382
-234
-86
62
X1
210
358
506
Histograms of X1 and X2 from NOR{} in Sigma
648
Two successiv e N ormals f rom Sigma
618
540
515
432
412
Count 324
Count 309
216
206
108
103
0
-418 -387 -356 -325 -294 -263 -232 -201 -170 -139 -108
-77
-46
-15
X2
16
47
78
109
140
171
202
233
264
295
326
357
Two successiv e N ormals f rom Sigma
0
-382 -353 -324 -295 -266 -237 -208 -179 -150 -121
-92
-63
-34
-5
X1
24
53
82
111
140
169
198
227
256
285
314
343
Autocorrelations of X1 and X2 from NOR{} in Sigma
Two successiv e N ormals f rom Sigma
1
Two succ essive Normals from Sigma
1
X2 0
X1 0
-1
-1
0
1
2
3
4
5
Lag
6
7
8
9
10
0
1
2
3
4
5
Lag
6
7
8
9
10
Histograms of X1 and X2 from Exact Box Muller Algorithm
C:\TEACHING\SIMULA~1\__IEOR~1\BOXMULLR.OUT (X2 Histogram)
924
770
835
616
668
Count 462
Count
308
501
334
154
0
-247
C:\TEACHING\SIMULA~1\__IEOR~1\BOXMULLR.OUT (X1 Histogram)
1002
167
-227
-207
-187
-167
-147
-127
-107
-87
-67
-47
-27
-7
13
X2
33
53
73
93
113
133
153
173
193
213
233
253
0
-296
-274
-252
-230
-208
-186
-164
-142
-120
-98
-76
-54
-32
-10
X1
12
34
56
78
100
122
144
166
188
210
232
254
Autocorrelations of X1 and X2 - Exact Box Muller Algorithm
C:\TEACHING\SIMULA~1\__IEOR~1\BOXMULLR.OUT (Autocorrelations of X1)
1
C:\TEACHING\SIMULA~1\__IEOR~1\BOXMULLR.OUT (Autocorrelations of X2)
1
X1 0
X2 0
-1
-1
0
1
2
3
4
5
Lag
6
7
8
9
10
0
1
2
3
4
5
Lag
6
7
8
9
10
Scatter plot of X1 and X2 - Exact Box Muller Algorithm
BOXMULLR.OUT (X2 vs. X1)
371
268
165
X2
62
-41
-144
-247
-296
-183
-70
43
156
269
382
X1
This is weird! WHY? – Marsaglia’s Theorem in polar coordinates
Discrete Random Variate Generation
Discrete random variables can be tough to generate.
eg. Binomial (N,p), with N large ( say...yield from a
machine)... always check web for updates. (approx.)
If a large number of discrete observations are
needed, how can they be generated efficiently?
Discrete Random Variate
Generation
• Crude method: Inversion
 Requires searching (sort mass function first)
• Continuous approximation
 e.g. “Geometric is the greatest integer <
Exponential”
• Alias method, ref: Kromal & Peterson,
Statistical Computing 33.4, pp.214-218
Alias Method
• Use when
— there are a large number of discrete values.
— you want to generate many variates from this distribution.
• Requires only one U(0, 1) variate.
• Transforms a discrete random variable into a discrete
uniform random variable with aliases at each value
(using conditional probability).
Example:
p1 = .2
p2 = .4
p3 = .35
p4 = .05
If uniform, pi = .25 for i = 1, 2, 3, 4
.25
.25
1
2
3
4
Define Qi = probability that i is actually chosen given
that i is first selected
=
P{i chosen | i selected}
Ai is where to move (alias) to if i is not chosen
.25
2
1
3
2
1
2
A1 = 2
Q1 = .8
A2 = 3
Q2 = .6
3
2
4
3
A3 = 3
Q3 = 1.
4
A4 = 2
Q4 = .2
Qi = probability that i is actually chosen given that i is first selected
= P{i chosen | i selected}
Ai is where to move (alias) to if i is not chosen
1
0.
2
.2
2
.25
3
.4
3
.50
4
.75 .80
Other Possible Alias Combinations
A1 = 3
Q1 = .8
A2 = 2
Q2 = 1.
A3 = 2
Q1 = .4
A4 = 3
Q2 = .2
A1 = 3
Q1 = .8
A2 = 3
Q2 = .8
A3 = 3
Q1 = 1.
A4 = 2
Q2 = .2
2
1.0
Alias Table Generation Algorithm
For i = 1, 2, . . .,n, Do: Qi = npi
G = {i: Qi>1}
(needs extra probability above 1/n}
H = {i: Qi<1 and Ai has not been assigned}
(shift probability away from)
While H is nonempty Do:
j: any member of H
k: any member of G
Aj = k
Qk = Qk - (1-Qj)
If Qk <1 then:
G = G\{k}
H = H{k}
end
H = H\{j}
end
Example
i:
p i:
Q i:
G
H
j = 3, k = 1
i:
p i:
Q i:
G
H
Ai
1
.210
1.050
X
1
.210
.495
2
.278
1.390
X
2
.278
1.390
X
3
.089
.445
4
.189
.945
X
X
3
.089
.445
4
.189
.945
X
X
1
5
.234
1.170
X
5
.234
1.170
X
j = 1, k =2
i:
p i:
Q i:
G
H
Ai
j = 2, k = 5
i:
p i:
Q i:
G
H
Ai
1
.210
.495
2
.278
.885
3
.089
.445
X
2
1
.210
.495
4
.189
.945
X
1
2
.278
.885
3
.089
.445
4
.189
.945
X
2
5
.234
1.170
X
5
1
5
.234
1.055
X
j = 4, k =5
i:
p i:
Q i:
G
H
Ai
1
.210
.495
2
.278
.885
3
.089
.445
4
.189
.945
2
5
1
5
5
.234
1.000
X
To verify that the table is correct, use
P{i chosen} =
=
P{i chosen | j selected} P{j selected}
(Qi/n) +  (1-Qj)(1/n)
j i
p1 = (1/5) (.495 +.555)
= .210
p2 = (1/5) (.885 +.505)
= .278
p3 = (1/5) (.445)
= .089
p4 = (1/5) (.945)
= .189
p5 = (1/5) (1.000 +.115 + .055) = .234
Using the Alias Table
• Suppose u = .67. Therefore, 5u=3.35, which gives i = 4.
Since .67 is .35 of the way between .6 and .8, and .35<.945,
we get i = 4.
• Suppose u = .39. Therefore, 5u=1.95, which gives i = 2.
Since .39 is .95 of the way between .2 and .4, and .885<.95,
we get (the alias) i = 5.
Marsaglia Tables
• For discrete random variables
• Must have probabilities with denominator as a power of 2
• Use when
— there are a large number of discrete values
(shifts work from n values to log2 (n) values)
— you want to generate many values from this distribution
• Requires 2 U(0, 1) variates (actually only need one)
Example
Prob.
p0=7/32
p1=10/32
p2=10/32
p3=3/32
p4=2/32
Binary
.00111
.01010
.01010
.00011
.00010
.5
.25 .125
x
x
x
qi
.5
.125
.0625 .03125
x
x
x
x
x
x
x
.3125 .0625
Algorithm:
1) pick an urn with probability qi
2) pick a value from the urn with discrete uniform
NOTE: At most log2 (n) values of qi needed (= #columns).
Check with Law of Total Probability:
p0 =
0
+(.125)(1) + (.3125)(1/5) + (.0625)(1/2) = 7/32
p1 = (.5)(1/2)
0
+ (.3125)(1/5)
+0
= 10/32
p2 = (.5)(1/2)
0
+ (.3125)(1/5)
+0
= 10/32
p3 =
0
+0
+ (.3125)(1/5) + (.0625)(1/2) = 3/32
p4 =
0
+0
+ (.3125)(1/5)
+0
= 2/32
Poisson Process
N(t)
4
3
2
1
t
0
E(1/) E(1/) E(1/) E(1/)
E(1/)
Key: times between arrivals are exponentially
distributed
Nonhomogeneous PP
• Rate  no longer constant: (t), with
t
distribution
  t      y  dy
0
• Tempting: time between ti+1 and ti
exponentially distributed with rate (ti)
• BAD IDEA
Nonhomogeneous PP
Generating at rate (ti) might cause
arrival during closed period!
(t)
ti
t
NHPP
Use thinning – same idea as acceptance/rejection:
1. Generate from PP with rate max
2. Accept with probability (t)/max
max
(t)
t
NHPP: Thinning
Reject
Reject
Accept
(t)
max
Inhomogeneous Poisson Process (Thinning)
m = max (t)
m
Reject
Accept
exp(m)
0
(t)
t
t
* Generate a homogeneous Poisson process with rate m.
* Accept arrivals with probability (t)/m
see Sigma model NONPOIS.MOD
NHPP: Inversion
(t)
Exp(1)
t
What about fact we don’t really know (t)?
Generating Dependent Data
• Many ways to generate dependent data
• AR(p) process: autoregressive process:
Yt = a1Yt-1 + a2Yt-2 + … + apYt-p + t
• MA(q) process: moving average process:
Yt = t + b1t-1 + … + bqt-q
• EAR process: exponential autoregressive:
Yt = R*Yt-1 + M*ERL{1}*(RND>R)
Autocorrelation
AR(1) autocorrelation lag k = k
Satisfies 1st order difference equn
k = α k-1 k>0
with boundary condition 0=1
Soln: k = αk
EAR Process
• EAR_Q.MOD
• Histograms of the interarrival and service
times will appear exponentially distributed.
• Plots of the values over time will look very
different
• Output from the model will be very
different as R is varied
demo earQ.MOD
Driving Processes
Problem: Real World Stochastic Processes are not
independent nor identically distributed...not even
stationary.
•
Serial Dependencies: yield drift
•
Cross Dependencies: maintenance
•
Spikes: hard and soft failures
•
Nonstationarity – trends, cycles (rush hours)
Driving Processes
Example: Machine subject to unplanned downtime...TTR distribution held constant but not
independent... EAR(1)
IID TTR (T Histogram)
Count
90% Soft/Hard TTR (T Histogram)
150
144
125
120
100
96
75
Count
72
50
48
25
24
0
0
0 29 58 87 116145174203232261290319348377406435464493522551580
1 31 61 91 121151181211241271301331361391421451481511541571601
T
T
Driving Processes
Example: WIP Charts
90% SOFT /HARD T T R (WIP vs. T ime)
iid T TR (WIP vs. Time)
250
250
aveW=4.9
aveW=44.3
200
200
150
150
WIP
WIP
100
100
50
50
0
0
0
10000
20000
30000
T ime
40000
50000
0
20000
40000
60000
T ime
80000
100000
Models for Dependent Discrete-Valued Processes
Ref: Edward McKenzie "Time Series Analysis in Water Resources" in
Water Resources Bulletin; 21.4 pp. 645-650.
Motivation: wish to model realistic situation where random process has
serial correlation. This is different from time-dependency (say, for
modeling rush-hour traffic) in that value of process depends on its own
history, maybe in addition to being dependent on its index (time).
Criteria: (P.A.W.Lewis: Multivariate Analysis - V. (pp: 151-166) North Holland)
1. Models specified by marginal distributions and correlation structure.
2. Few parameters that can be easily interpreted.
3. Structure is linear in parameters, making them easy to fit and generate
AR(order 1) (Markov Models)
Continuous Autoregressive order-1 sequence, {Xn} satisfies the
difference equation
(Xn-) = α(Xn-1- ) + n
or remove means
Xn = αXn-1 + n
with {n} being a sequence of iid random variables and α is a
positive fraction...process retains fraction, α, of previous value.
Note: If Xn is discrete then n must be dependent on Xn-1 ...want
to reduce Xn-1 by the "right" amount to have same distn.
Models for Dependent Discrete-Valued Processes
McKenzie's idea: Generate each “unit” in integer Xn-1
separately, and "keep" it with probability α .
Replace αXn-1 with α*Xn-1 defined as
  X n 1 
X n 1
 B ( )
i
i 1
where {Bi()} is an iid sequence of Bernoulli trials with
Prob{B=1} = α. This "reduces" Xn-1 by the same amount
(expected) as in the continuous autoregression.
Applications
Poisson random variables (eg. number arriving in a
interval at bus stop).
Xn = α*Xn-1+ n
With n Poisson with mean (1-α), if Xo is Poisson with
mean , then so are the Xn.
The correlation at lag k is αk.
This process is time-reversible.
Negative Binomial
(Note: check reference on this?)
(Number of trials until  successes where prob of
success on each trial is ) .
Xn = α*Xn-1+ n
n is NB(,) and αi (Binomial probability in the
term α*X ) is Beta with parameters α and (1-α).
This has the same AR(1) correlation structure as
the other models. k= αk Process is also timereversible.
Geometric
(Special case of Neg. Binomial)
Xn = αXn-1+ BnGn
with Bn Bernoulli with Prob(B=1) = 1- α and
Gn is Geometric with parameter .
This is discrete analog with the EAR process.
McKenzie also discusses Binomial and Bernoulli as well as adding time
dependencies (seasonally, trends, etc.)
Summary: Generating Random Variates
• Could Use Trace Driven Simulations
• Could Use Empirical Distributions
• Could Use Parametric Distributions
Know the advantages and disadvantages of
each approach...
Summary: General Methods
•
•
•
•
Inverse Transform
Composition
Acceptance/Rejection
Special Properties
Look at the data! Scatter plots, histograms,
autocorrelations, etc.
Data Gathering
• Needing more Data is a common, but
usually invalid, excuse. Why? (sensitivity?)
• Timing Devices – RFID chips
• Benefits vs. hassle of tracking customers
Collect queue lengths Why? L/=W
• Coordinating between observers
• Hawthorn effect
Distributions and BestFit or
ExpertFit or Stat::fit or...
•
•
•
•
•
•
Gamma
Exponential (ie. Gamma w/ param 1)
Erlang (ie. Gamma w/ integer param)
Beta (-> Gamma in the limit)
Log-Logistic
Lognormal *** ln not log10 ***
5 Dastardly D’s of Data
(updated from Reader)
Data may be
• Distorted
– Material move times include (un)load
 underestimate value of AGVs (loaders?)
– Want demand but get backorders (those willing
to wait)
 overestimate service levels (resources?)
5 Dastardly D’s of Data, cont.
• Dependent
– Get means or histograms but not correlations,
cycles, or trends
 underestimate congestion (capacity?)
– Fail to get cross dependencies
 skill levels of operators, shift change effects
5 Dastardly D’s of Data, cont.
• Deleted
Data is censored (accounting)  model not
valid with new data
• Damaged
Data entry errors, collection errors
(observer effect)
• Dated
last month’s data, different product mix
 valid models later fail validation
6th and
th
7
Dastardly D of Data...
• Doctored – well intentioned people tried to
clean it up...
• Deceptive - any of the other problems might
be intentional!
Data often used for performance evaluations,
or thought to be...
Data Collection: General
Concepts
• Relevance:
Degree of control
Sensitivity
• When:
Need to observe  early in study
Need sensitivities  late in study
• Cost:
Setup (include training and P.R.)
Sample size (sensitivity)
• Accuracy: Technique, skill, motivation, training,
timing (Monday a.m.??),…
Data Collection, cont.
• Precision:
• Analysis:
Technique, etc.
Sample size
Sample interval (dependencies)
Error control
Verification
Dependencies within data set
Dependencies between data sets
What to do? - recommendations
• Don’t wait for data before you build model
• Run sensitivity experiments on current
system design to see what matters.
• Collect data for validation, if required.
• Remember: you are ultimately making a
forecast not predicting the past!... so
• Do sensitivity analysis on new systems
• Present output using Interval Estimators!!
Simulation study
Start
Anticipate system and human behavior
Formulate questions
Characterize answers
Enough time
and money?
Identify
Prejudices
Design experiments
(Re)plan study and (re)secure resources
Adjust Expectations
Develop, verify, refine model - as necessary
Code, re-code until have ‘no known errors’
Analyze, Advise
- and Caution
Do sensitivity analysis
If necessary, collect data
Redesign and Run Experiments