Analysis of Simulation Experiments

Download Report

Transcript Analysis of Simulation Experiments

Analysis of Simulation
Experiments
1
Outline















Introduction
Classification of Outputs
DIDO vs. RIRO Simulation
Analysis of One System
Terminating vs. Steady-State Simulations
Analysis of Terminating Simulations
Obtaining a Specified Precision
Analysis of Steady-State Simulations
Method of Moving Average for Removing
the Initial Bias
Method of Batch Means
Multiple Measures of Performance
Analysis of Several Systems
Comparison of Two Alternative Systems
Comparison of More than Two Systems
Ranking and Selection
2
Introduction

The greatest disadvantage of
simulation:



Careful design and analysis is
needed to:



Don’t get exact answers
Results are only estimates
Make these estimates as valid and
precise as possible
Interpret their meanings properly
Statistical methods are used to
analyze the results of simulation
experiments.
3
What Outputs to
Watch?

Need to think ahead about what
you would want to get out of the
simulation:

Average, and worst (longest) time in
system

Average, and worst time in queue(s)

Average hourly production

Standard deviation of hourly
production

Proportion of time a machine is up,
idle, or down

Maximum queue length

Average number of parts in system
4
Classification of
Outputs

There are typically two types
of dynamic processes:

Discrete-time process: There is a
natural “first” observation, “second”
observation, etc.—but can only
observe them when they “happen”.
If Wi = time in system for the ith
part produced (for i = 1, 2, ..., N),
and there are N parts produced
during the simulation
Wi
1
2
3
..................................
i
5
N
Classification of
Outputs

Typical discrete-time output
performance measures:

Average time in system
N
 Wi
W( N )  i 1
N

Maximum time in system

Proportion of parts that were in
the system for more than 1 hour

Delay of ith customer in queue

Throughput during ith hour
6
Classification of
Outputs

Continuous-time process: Can
jump into system at any point in
time (real, continuous time) and
take a “snapshot” of somethingthere is no natural first or second
observation.
If Q(t) = number of parts in a
particular queue at time t between
[0,T] and we run simulation for T
units of simulated time
3
2
Q(t )
1
0
t
7
T
Classification of
Outputs

Typical continuous-time output
performance measures:

Time-average length of queue
T
 Q(t )dt
Q( T ) 

0
T
Server Utilization (proportion of time the
server is busy)
T
(T ) 
 B(T )dt
0
T
1
B(t )
0
t
8
T
Classification of
Outputs
Other continuous-time performance
measures:

Number of parts in the system at time t

Number of machines down at time t

Proportion of time that there were more
than n parts in the queue
9
DIDO Vs. RIRO
Simulation
Inputs:
Cycle
times
Interarrival
times
Batch
sizes
Simulation
Model
Outputs:
Hourly
production
Machine
utilization
10
DIDO
DIDO Vs. RIRO
Simulation
Inputs:
Cycle
times
Interarrival
times
Batch
sizes
Simulation
Model
Outputs:
Hourly
production
Machine
utilization
11
RIRO
Analysis of One System
Average number in queue
Average delay in queue
Single-server queue (M/M/1), Replicated 10 times
8
6
4
2
0
1
2
3
4
1
2
3
4
1
2
3
4
5
6
7
8
9
10
5
6
7
8
9
10
5
6
7
8
9
10
Replication
8
6
4
2
0
Replication
Server utilization
1.0
0.9
0.8
0.7
0.6
0.5
Replication
12
Analysis of One System
CAUTION: Because of autocorrelation
that exists in the output of virtually all
simulation models, “classical”
statistical methods don’t work directly
within a simulation run.
Time in system for individual jobs: Y1, Y2, Y3, ...,
Yn
m = E(average time in system)
Sample mean:
n
Y (n) 
Y
i 1
i
n
is an unbiased estimator for m , but how close is
this sample mean to m ?
Need to estimate Var( Y (n) ) to get confidence intervals
on m .
13
Analysis of One System
Problem:
Because of positive autocorrelation
between Yi and Yi+1 (Correl (Yi, Yi+l) > 0), sample
variance is no longer an unbiased estimator
of the population variance (i.e., unbiasedness
of variance estimators can only be achieved if
Y1, Y2, Y3, ..., Yn are independent).
As a result, the sample variance
n
[Y  Y (n)]
2
2
S ( n)

n
i 1
i
n(n  1)
may be severely biased for Var[ Y (n)].
In fact, usually E[ S
( n) ]
n
2
< Var[ Y (n) ]
Implications: Understating variances
causes us to have too much faith in our
point estimates and believe the results too
much.
14
Types of Simulations with
Regard to Output Analysis

Terminating: A simulation where
there is a specific starting and
stopping condition that is part of
the model.

Steady-state: A simulation where
there is no specific starting and
ending conditions. Here, we are
interested in the steady-state
behavior of the system.
“The type of analysis depends
on the goal of the study.”
15
Examples of Terminating
Simulations

A retail/commercial establishment (a
bank) that operates from 9 to 5 daily and
starts empty and idle at the beginning of
each day. The output of interest may be
the average wait time of first 50 customers
in the system.

A military confrontation between a blue
force and a red force. The output of
interest may be the probability that the
red force loses half of its strength before
the blue force loses half of its strength.
16
Examples of SteadyState Simulations

A manufacturing company that operates
16 hours a day. The system here is a
continuous process where the ending
condition for one day is the initial
condition for the next day. The output of
interest here may be the expected longrun daily production.

A communication system where service
must be provided continuously.
17
Analysis for Terminating
Simulations
Objective: Obtain a point estimate and
confidence interval for some parameter m
Examples:
m = E (average time in system for n customers)
m = E (machine utilization)
m = E (work-in-process)
Reminder: Can not use classical
statistical methods within a
simulation run because observations
from one run are not independently
and identically distributed (i.i.d.)
18
Analysis for Terminating
Simulations

Make n independent replications of the
model

Let Yi be the performance measure from
the ith replication
Yi = average time in system, or
Yi = work-in-process, or
Yi = utilization of a critical facility

Performance measures from different
replications, Y1, Y2, ..., Yn, are i.i.d.

But, only one sample is obtained from
each replication

Apply classical statistics to Yi’s, not to
observations within a run

Select confidence level 1 – a (0.90, 0.95,
etc.)
19
Analysis for Terminating
Simulations

Approximate 100(1 – a)% confidence
interval for m:
n
Y (n) 
Y
i 1
i
unbiased estimator of m
n
n
S 2 ( n) 
2
[
Y

Y
(
n
)]
 i
i 1
unbiased estimator of Var(Yi)
n 1
Y (n)  t n 1,1a
2
S ( n)
n
 (n, a )  tn 1,1a 2
S ( n)
n
covers m with approximate
probability (1 – a)
is the Half-Width expression
20
Example
Consider a single-server (M/M/1) queue. The
objective is to calculate a confidence interval
for the delay of customers in the queue.
n = 10 replications of a single-server queue
Yi = average delay in queue from ith replication
Yi’s: 2.02, 0.73, 3.20, 6.23, 1.76, 0.47, 3.89,
5.45, 1.44, 1.23
For 90% confidence interval, a = 0.10
Y(10) = 2.64, S 2 (10) = 3.96, t9, 0.95 = 1.833
Approximate 90% confidence interval is
2.64 ± 1.15, or [1.49, 3.79]
21
Analysis for Terminating
Simulations
Interpretation: 100(1 – a)% of the time,
the confidence interval formed in this way
covers m
m (unknown )
Wrong Interpretation: “I am 90%
confident that m is between 1.49 and 3.79”
22
Issue 1

This confidence-interval method
assumes Yi’s are normally
distributed. In real life, this is
almost never true.

Because of central-limit theorem, as
the number of replications (n)
grows, the coverage probability
approaches 1 – a.

In general, if Yi’s are averages of
something, their distribution tends
not to be too asymmetric, and the
confidence- interval method shown
above has reasonably good
coverage.
23
Issue 2

The confidence interval may be too wide
In the M/M/1 queue example, the approximate
90% C.I. was:
2.64 ± 1.15, or [1.49, 3.79]
The half-width is 1.15 which is 44% of the
mean (1.15/2.64)
That means that the C.I. is 2.64 44% which is
not very precise.

To decrease the half-width:
Increase n until  (a , n) is small enough
(this is called Sequential Sampling)

There are two ways of defining the
precision in the estimate Y:


Absolute precision
Relative precision
24
Obtaining a Specified
Precision

Absolute Precision:

Want to make n large enough such that
 (a , n)   , where  (a , n) is the halfwidth and  > 0 .

Make n0 replications of the simulation
2
model and compute Y (n) , S (n), and the
half-width,  (a , n) .

Assuming that the estimate of the
2
variance, S (n) , does not change
appreciably, an approximate expression
for the required number of replications to
achieve an absolute error of  is
2


S
( n)
*
na (  )  mini  n: ti 1,1a 2
 
i


25
Obtaining a Specified
Precision

Relative Precision:

Want to make n large enough such
that  (a , n) Y (n)  
where 0    1.

Make n0 replications of the simulation
model and compute Y (n) , S 2 (n), and the
half-width,  (a , n) .

Assuming that the estimates of both
population mean, Y (n) , and population
variance, S 2 (n) , do not change
appreciably, an approximate
expression for the required number of
replications to achieve an absolute
error of  is 
2

S (n)
ti 1,1a 2




i
*
nr ( )  mini  n:

Y ( n)




26
Analysis for Steady-State
Simulations
Objective: Estimate the steady state mean
  limi E (Yi )
Basic question: Should you do many short
runs or one long run ?????
Many short
runs
One long
run
X1
X1
X2
X3
X4
X5
27
Analysis for Steady-State
Simulations

Advantages:

Many short runs:



One long run:



Simple analysis, similar to the analysis for
terminating systems
The data from different replications are i.i.d.
Less initial bias
No restarts
Disadvantages

Many short runs:


Initial bias is introduced several times
One long run:


Sample of size 1
Difficult to get a good estimate of the
variance
28
Analysis for Steady-State
Simulations

Make many short runs: The analysis is
exactly the same as for terminating
systems. The (1 – a)% C.I. is computed as
before.

Problem: Because of initial bias, Y (n) may
no longer be an unbiased estimator for the
steady state mean,  .

Solution: Remove the initial portion of the
data (warm-up period) beyond which
observations are in steady-state.
Specifically pick l (warm-up period) and n
(number of observations in one run) such
that
 n

  Yi
E  i  l 1
 nl

29

 


Method of Moving Average for
Removing the Initial Bias

Welch’s method for removing the warm-up
period, l:

Make n replications of the model (n>5),
each of length m, where m is large. Let
Yji be the ith observation from the jth
replication ( j = 1, 2, …, n; i =1, 2, …, m).
n

Let Y i   Yji n for i =1, 2, …, m.
j 1

To smooth out the high frequency
oscillations in Y 1 , Y 2 , ..., Y m define the
moving average Y i (w) as follows (w is the
window and is a positive integer such
that w  m / 2 ):
w
Y
is
s  w
Y i ( w) 
if i  w  1, ..., m  w
2w  1
i 1
Y
is
s  ( i 1)
if i  1, ..., m
2i  1
30
Method of Moving Average for
Removing the Initial Bias

Plot Y i ( w) and choose l to be the value
of i beyond which Y 1 ( w), Y 2 ( w), ... seem
to have converged.
Note: Perform this procedure for several
values of w and choose the smallest w for
which the plot of Y i ( w) looks reasonably
smooth.
31
Analysis for Steady-State
Simulations

Make one Long run: Make just one long
replication so that the initial bias is only
introduced once. This way, you will not be
“throwing out” a lot of data.
Problem: How do you estimate the variance
because there is only one run?
Solution: Several methods to estimate the
variance:
 Batch means (only approach to be
discussed)
 Time-series models
 Spectral analysis
 Standardized time series
32
Method of Batch Means

Divide a run of length m into n adjacent
“batches” of length k where m = nk.

Let Y j be the sample or (batch) mean of
the jth batch.
Yi
k
Y1

k
k
Y2
k
Y3
Y4
k
Y5
i
m  nk
The grand sample mean Y is computed as
n
Y
Y
j 1
n
m
j

33
Y
i 1
m
i
Method of Batch Means

The sample variance is computed as
n
 (Y
S ( n) 
2
Y

j
 Y )2
j 1
n 1
The approximate 100(1 – a )% confidence
interval for  is
Y  t n 1,1a 2
34
SY (n)
n
Method of Batch Means
Two important issues:

Issue 1: How do we choose the
batch size k?

Choose the batch size k large enough
so that the batch means, Y j 's are
approximately uncorrelated.
2
Otherwise, the variance, SY (n) , will be
biased low and the confidence interval
will be too small which means that it
will cover the mean with a probability
lower than the desired probability of
(1 – a ).
35
Method of Batch Means

Issue 2: How many batches n?

Due to autocorrelation, splitting the
run into a larger number of smaller
batches, degrades the quality of each
individual batch. Therefore, 20 to 30
batches are sufficient.
36
Multiple Measures of
Performance


In most real-world simulation models, several
measures of performance are considered
simultaneously.
Examples include:







Throughput
Average length of queue
Utilization
Average time in system
Each performance measure is perhaps
estimated with a confidence interval.
Any of the intervals could “miss” its expected
performance measure.
Must be careful about overall statements of
coverage (i.e., that all intervals contain their
expected performance measures
simultaneously).
37
Multiple Measures of
Performance


Suppose we have k performance measures
and the confidence interval for performance
measure s for s = 1, 2, ..., k, is at confidence
level 1 a s .
Then the probability that all k confidence
intervals simultaneously contain their
respective true measures is
k
 All s intervals contain their

P
 1   as
 respective performance measure 
s 1

This is referred to as the Bonferroni
inequality.
38
Multiple Measure of
Performance

To ensure that the overall probability (of all
k confidence intervals simultaneously
containing their respective true mean) is at
least 100( 1  a ) percent, choose a ’s such
that
k
 as  a
s1

Can select as  a k for all s, or pick a ’s
differently with smaller a ’s for the more
important performance measures.
39
Multiple Measures of
Performance

Example: If k =2 and we want the desired
overall confidence level to be at least 90%,
we can construct two 95% confidence
intervals.

Difficulty: If there are a large number of
performance measures, and we want a
reasonable overall confidence level (e.g.,
90% ), the individual as ’s could become
small, making the corresponding
confidence intervals very wide. Therefore,
it is recommended that the number of
performance measures do not exceed 10.
40
Analysis of Several
Systems

Most simulation projects involve
comparison of two or more systems or
configurations:



With two alternative systems, the goal
may be to:



Change the number of machines in some
workcenters
Evaluate various job-dispatch policies
(FIFO, SPT, etc.)
test the hypotheses:
H0: m1  m2 , or H0 : m1  m2
build confidence interval for m1  m2
With k > 2 alternatives, the objective may
be to:




build simultaneous confidence intervals for
various combinations of mi1  mi2
select the “best” of the k alternatives
select a subset of size m < k that contains
the “best” alternative
select the m “best” (unranked)
of the alternatives 41
Analysis of Several
Systems

To illustrate the danger in making only
one run and eyeballing the results when
comparing alternatives, consider the
following example:
Compare:
Alternative 1: M/M/1 queue with
interarrival time of 1 min., and one “fast”
machine with service time of 0.9 min., and
Alternative 2: M/M/2 queue with
interarrival time of 1 min., and two “slow”
machines with service time of 1.8 min. for
each machine.
vs.
42
Analysis of Several
Systems

If the performance measure of interest is
the expected average delay in queue of the
first 100 customers with empty-and-idle
initial conditions, using queuing analysis,
the true steady-state average delays in the
queues are:
m1  4.13  3.70  m2
Therefore, system 2 is “better”

If we run each model just once and
calculate the average delay, Yi , from each
alternative, and select the system with the
smallest Yi , then
Prob(selecting system 1 (wrong answer)) = 0.52

Reason: Randomness in the output
43
Analysis of Several
Systems

Solution:



Replicate each alternative n times
Let Yij = average delay from jth replication
of alternative i
Compute the average of all replications for
alternative i
n
Y
j 1
Yi 


ij
n
Select the alternative with the lowest
Yi .
If we conduct this experiment many times,
the following results are obtained:
n
P(wrong Answer)
1
5
10
20
0.52
0.43
0.38
0.34
44
Comparison of Two
Alternative Systems

Form a confidence interval for the difference
between the performance measures of the two
systems ( i.e.,   m1  m2 ).

If the interval misses 0, there is a statistical
difference between the two systems.

Confidence intervals are better than
hypothesis tests because if a difference exists,
the confidence interval measures its
magnitude, while a hypothesis test does not.

There are two slightly different ways for
constructing the confidence intervals:


Paired-t
Two-Sample-t.
45
Paired-t Confidence Interval

Make n replications of the two systems.

Let Yij be the jth observation from system i
(i = 1, 2).

Pair
Y1 j with Y2 j and define Z j  Y1 j  Y2 j for
j = 1, 2, …, n.

Then, the Z j 's are IID random variables and
E ( Z j )   , the quantity for which we want to
construct a confidence interval.

Let
n
Z
j 1
Z ( n) 
and
n
Z
n


Var Z (n) 

j
j 1
j
 Z ( n)

2
n(n  1)
Then, the approximate 100(1- a ) percent C.I. is

Z (n)  t n 1,1a 2 Var Z (n)
46

Two-Sample-t
Confidence Interval


Make n1 replications of system 1 and n2
replications of system 2. Here n1  n2 .
Again, for system i= 1, 2, let
ni
Y
j 1
Yi (ni ) 
and
ni
 Y
ni
Si2 (ni ) 

 Yi ( ni )
ij

2
ni
Estimate the degrees of freedom as
f 

j 1
ij
S
2
1

(n1 ) n1
S12 (n1 ) n1  S22 (n2 ) n2

2


(n1  1)  S (n2 ) n2
2
2
2

2
(n2  1)
Then, the approximate 100(1-a ) percent C.I. is
Y1 (n1 )  Y 2 (n2 )  t f ,1a 2
47
S12 (n1 ) S22 (n2 )

n1
n2
Contrasting the Two
Methods

The two-sample-t approach requires
independence of Y1 j 's and Y2 j 's , whereas
in the paired-t approach Y1 j 's and Y2 j 's do
not have to be independent.

Therefore, in the paired-t approach,
common random numbers can be used to
induce positive correlation between the
observations on the different systems to
reduce the variance.

In the paired-t approach, n1 = n2, whereas
in the two-sample-t method , n1  n2 .
48
Confidence Intervals For
Comparing More Than Two
Systems

In the case of more than two alternative
systems, there are two ways to construct a
confidence interval on selected differences
mi1  mi 2
.


Comparison with a standard, and
All pairwise comparisons
NOTE: Since we are making c > 1 confidence
intervals, in order to have an overall
confidence level of 1  a , we must make
each interval at level 1 a c (Bonferroni).
49
Comparison with a
Standard

In this case, one of the systems (perhaps
the existing system or policy) is a
“standard”. If system 1 is the standard and
we want to compare systems 2, 3, ..., k to
system 1, k-1 confidence intervals must be
constructed for the k-1 differences
m2  m1 , m3  m1 , ..., mk  m1

In order to achieve an overall confidence
level of at least 1  a , each of the k-1
confidence intervals must be constructed at
level 1  a ( k  1) .

Can use paired-t or two-sample-t methods
described in the previous section to make
the individual intervals.
50
All Pairwise
Comparisons

In this case, each system is compared to
every other system to detect and quantify
any significant differences. Therefore, for
k systems, we construct k (k -1) / 2
confidence intervals for the k (k -1) / 2
differences:
m2 – m1
m3 – m1
m3 – m2
...
...
mk – m1
mk – m2
..
.
mk – mk–1

Each of the confidence intervals must be
constructed at a level of 1  a [ k ( k  1) 2] ,
so that an overall confidence of at least
1  a can be achieved.

Again, we can use paired-t or two-samplet methods to make the individual
confidence intervals.
51
Ranking and Selection

The goals of ranking and selection are
different and more ambitious than simply
making a comparison between several
alternative systems. Here, the goal may
be to:
 Select the best of k systems

Select a subset of size m containing the
best of k systems

Select the m best of k systems
52
Ranking and Selection
1. Selecting the best of k systems:

Want to select one of the k alternatives as
the best.

Because of the inherent randomness in
simulation modeling, we can’t be sure that
the
mi selected system is the one with smallest
(assuming small is good). Therefore, we
specify a correct-selection probability P*
(like 0.90 or 0.95).

Also we specify an indifference zone d*
which means that if the best mean and
next-best mean differ by more than d*, we
select the best one with probability P*.

As an example, suppose that we have 5
alternative configurations and we want to
identify the best system with a probability
of at least 95%.
53
Ranking and Selection
2. Selecting a subset of size m containing
the best of k systems:

Want to select a subset of size m (< k) that
contains the best system with probability of
at least P*.

This approach is useful in initial screening
of alternatives to eliminate the inferior
options.

For example, suppose that we have 10
alternative configurations and we want to
identify a subset of 3 alternatives that
contains the best system with a probability
of at least 95% .
54
Ranking and Selection
3. Selecting the m best of k systems:

Want to select the m best (unranked) of the
k systems so that with probability of at
least P* the expected responses of the
selected subset are equal to the m smallest
expected responses.

This situation may be useful when we want
to identify several good options, in case the
best one is unacceptable for some reason.

For example, suppose that we have 5
alternative configurations and we want to
select the 3 best alternatives and we want
the probability of correct selection to be at
least 90% .
55