Quick & Simple Simulation in Excel with Clinical Trials Applications Presented to the Delaware Chapter of the American Statistical Association 20 October 2011 Dennis Sweitzer, Ph.D. www.Dennis-Sweitzer.com.

Download Report

Transcript Quick & Simple Simulation in Excel with Clinical Trials Applications Presented to the Delaware Chapter of the American Statistical Association 20 October 2011 Dennis Sweitzer, Ph.D. www.Dennis-Sweitzer.com.

Quick & Simple
Simulation in Excel
with Clinical Trials Applications
Presented to the
Delaware Chapter of the American Statistical Association
20 October 2011
Dennis Sweitzer, Ph.D.
www.Dennis-Sweitzer.com
Background
• Occasional need for simulations
• Excel is convenient, but
– does not explicitly support simulations
– Simulation usually requires VBA programming
(so why not use R or SAS instead)
– Or Add-in commercial programs (eg., @Risk)
– Or some academic add-ins
• Does have iterative calculations, Solver
• Why not simulation?
Simulate what?
• Stochastic Models
– Unknown parameters? Guestimate a distribution
– Optimizing policy? Test each with simulations
• Sensitivity Analysis
– Variations in Inputs
 Variations in Outputs
– 2 parameters: use a table
– >2 parameters: simulate & compare variation
Excel: Pros
Common Language / Common Tools
• Most people understand Excel
MEGO
• Many tools available in Excel
Transparency: Modeling assumptions can be:
Specified -- Graphed -- Debated
What you see is what you get!
More hands on deck, more eyes on the prize….:
Statistician
Team Member
Initial Model
Explores & breaks model
Repair & enhance
…Repeat until satisfied
Excel Cons
Slower than in SAS, S+, R, etc
Lacks some statistical/probability functions
• Latest versions are a little better
• Still need to add some VBA code
• Known bugs in statistical routines (often fixed)
Tradeoffs:
• Quicker modifications
vs slower execution
Simple Solution: Data Tables
Excel Data Tables
•Creates a table of values of a function
– (ie, Random Variables)
•Leftmost column is used as an argument
– (which is ignored in a simulation)
•Data Table repeats calculations for each row
– (Each row is an iteration of the simulation)
1. Create Simulation
Create Random Variables using Inverse Probability Method:
For Random Variable X with distribution function F(x),
F(x): → [0,1]
If Random Uniform U
X = F-1(U)
(Excel: U=Rand() )
2. Align Random Variables
• Calculations can be
anywhere in
Spreadsheet
• Reference the
Variables in a row
• Is best to label
variables in same way
3. Select Data Table
• Select table region
– 1st row is Rand Vars
– 1st column is not used
(can label iterations)
• From toolbar:
– Data>Data Table
4. Create Simulation Table
• Column input cell =
Upper left hand corner
of table
• Row input cell = ignore
• OK  Populates the
table
• (may have to manually
recalcule)
5. Execute Simulation
Iterative development
•Simulation can be changed
•Add reporting variables
•Recalculate to rerun
– (no need to use Data Table
again, unless expanding)
•Hint: debug with short table,
expand for final run
The End
(of the key concepts)
But still more….
• Why use inverse probability distributions
(instead of random variables)?
• When not to use a spreadsheet for simulation?
• Tools:
– Macros to set up a simulation
– VBA functions for common simulation distributions
Inverse Probability Function
• Most systems directly generate random
variables with the desired distribution
• Why use Inverse Probability Functions?
– Which are (probably) slower?
Personal opinion
• Testing & Debugging
• Verification  Calculates correctly
• Validation  Calculations answer Problem
• Sensitivity  Input vs Output variability
As Mapping function
U
Probability Distribution:
Random Uniform:
Inverse PDF:
⟼
F-1
F(x): → [0,1]
U
X = F-1(U)
For Continuous (or monotone) F-1
Small changes in u∈U  small changes in F-1 (u)
Mapping
2 Random Uniform Var
As input to
Deterministic Function
Mapping
Random numbers in
(should)
Map to outputs in
Example #1
Simple model,
function of 2 RV
A Max value looks high.
Is it a bug? If not, how often?
Saved random U[0,1]
For each iteration
Check u∈U[0,1]
That generated high value
u=0.983…  random high
 Rarely happens
Saving {Ui}:
•Verify
•Replicate
•Quantify
Example #1 (Sensitivity)
Sort by U1, U2
Sensitive
to U1
Insensitive
to U2
Spreadsheet limitations
• Only simple data structures are available
– Rows & columns, no lists & trees
– Discrete event simulations
• Complex algorithms: difficult
– Eg, While or for loops
– Can improvise (cumbersome, slow, buggy)
• Speed: slow
• Data Storage: what-you-see-is-all-you-get
Tools: Excel Simulation Template
• Adds some missing random functions
• Adds some set-up macros
Macro SimulateSampler
To start a new simulation when you don't
remember the names & parameters of
common random variables used in simulation:
•Run the Macro SimulationSample
•Copy, delete, and edit as needed.
•Make sure all random values are referenced
in the first row of the data table at the bottom.
Macro SimulationSampler
• Creates a simulation with
each of common
simulation functions
Macro SimulationSampler
………
•Sets up header
row for data table
•Sets up a place
for statistics
Macro Simulate
• Highlight the row of random variables
– (1st row of simulation table)
• Run macro "Simulate”
– Prompts for which will ask for the number of
simulation iterations,
– The default number of iterations is 100
– Debug & develop (manually recalculate)
– Final run with >1000 iterations
– Visual Basic code is computationally intensive,
Macro Simulate
Note bene
• Run Simulate right after SimulationSampler
– Risk of “Ref!” error
• SimTemplate,Plot,Sampler contains
– The sampler
– A distribution plot of all random variable
• Crude, but handy for quick comparisons
– Ready to edit
SimTemplate,Plot,Sampler
• Crude
Distribution plot
of ALL variables
• Uses Percentile Ranks
to save space
• Good for Continuous Var.
• Bad for Discrete Var.
1. Copy
2. Delete unwanted variables
3. Make it pretty
Excel Random Variables
Rand() --Random Uniform [0,1]
NormSInv() – Inverse Standard Normal Distribution
CriticalBinomial() – Inverse Binomial Distribution
LogNormInv() - Inverse Log Normal Distribution
Caveat: parameters are mean, SD after the Log transformation
Erlang Distribution
How long do you wait until you get a
predetermined number of arrivals?
•Interarrival times are distributed IID
exponential
•Erlang is Gamma with integer parameter
Beta Distribution
Can use as
• Distribution of a Binomial probability
• Range = [0,1]
• Generic bounded hump (vs Normal as generic unbounded hump)
Example#2, Problem
Client: “Here’s our plan….”
•Simple spreadsheet calculation
– But only the expected value,
– but not variability
Example #2, Simulation
• Time to 100th
patient
• Patients arrive
IID Exponential
Summary Statistics of Simulated values
(below)
Interpretation: under the assumptions,
90% of simulations required more than 4.4
months
Added VBA Functions
Inverse Functions Needed for Simulation
•Poisson, Negative Binomial
Interpolation from Table
•Interpolate: 1 or 2 dimensional interpolation
Convenience
• Beta with Mean, SD as parameters
• Beta with Hi, Low, and Mode used for
parameters (often used for PERT/CPM charts)
•Log Normal with mean, SD as parameters
Missing Statistical Functions
• InvPoisson :: Poisson Distribution
• InvPascal :: Integer valued Negative
Binomial
– (how many failures before k successes)
Negative Binomial is continuous valued
distribution; discrete version is often
denoted Pascal distribution
Example#3,
Patients to Screen
Expected Enrollment rate
= 75% ± 5%
~ Beta Distribution
# Screen Failures
~ Negative Binomial (Pascal)
– Depends on Enrollment
Rate
Beta Distribution (2)
For Convenience
•Beta distribution given Mean, SD
•Beta distribution given Mean, SD, upper, lower bounds
•Beta distribution given Mode, Upper, Lower bounds
– Sometimes used for PERT/Critical Path Analysis
• 3 estimates for tasks: Optimistic, Pessimistic, Most Likely
• Beta distributed time for each task
– Assumes SD = 1/6 of the interval [low, high]
Simulation from a Table
Simulate arbitrary distribution:
•Top Row: values in [0,1]
•Bottom Row: Quantiles
•Result: interpolated value of U from table
Or a function: y=f(x)
•X is found in top row, y is interpolated from bottom row
Table Simulation Uses
•Polygonal distributions (like Triangular)
•Survival curve (for time to event)
–Est. K-M curve from data, simulate rest of trial
•Arbitrary empirical distributions
•Distribution from observations
Simulation from a 2-dimensional table
Here:
• Rows are quartiles of a random function
• Left column is value of a parameter
• A family of distributions which vary with the parameter
• Parameter y=75% (can be random)
• Generate random numbers from the interpolated distribution.
Example #4: Interim Review
• After 2 months, review randomization rates
• Continue to Randomize to 100 patients
• How long?
Example#4: Interim Review (Simulation)
Y= # Patients at 2 mos
~ Poisson
Time to Randomize
(100-Y) additional pts
~ Erlang (Gamma)
80% CI:; (2.5, 3.7)
months
Clinical Trials Applications
• Simulations for planning
• Prototyping larger simulation
• Checking assumptions/validation
Why Simulate?
Expected Trial Performance
• Usually not of interest -- already done w/o simulation
Variability of Trial Performance
• Important for Risk Management: “What’s the earliest,
the latest, the most, the least, etc”
• 80% CIs
Structural Problems
• Interactions of parameters may doom the trial before it
even starts! (eg, mean (max{ X, Y} ) vs max{ mean(X), mean(Y) } )
Prototyping
Prototyping:
• Toy simulation with hands-on teamwork
• Development model
• Get team buy-in on assumptions
• Processing speed not important
• Rapid modifications are important
Ideal?
• Develop a prototype in an 1 hour meeting
• Check for errors later
• Run large simulations later for precise estimates
Checking planning assumptions
• H0 = Simulation assumptions
• Observed: a value X
• {xi} = corresponding values in simulation
• Rank of X in {xi} ≈ p-value
Stored Values: Use Function Percent Rank
Descriptive Statistics: Use Frequency Count
Use to:
• Test assumptions, validate model, +??
• If an observed value of X is rare in the simulation,
question assumptions!
Checking Assumptions (2)
Example:
• A trial is designed based on a non-trivial simulation.
• The model predicts a completion rate of 65%
with 95% C.I.= (55%, 75%)
• 4 months into the trial, a 50% completion rate is
observed.
• How significant is this discrepancy?
Resimulate:
• {xi} = simulated completion rates (1/iteration)
• Rank of observed 50% in {xi} ≈ p-value
• “How likely is the observation, under the modeled
assumptions?”
Example #5: Simulating a 30 patient trial
• Each patient is a random variable
• Survival times are interpolated
• Estimated survival curves have confidence intervals
•All 30 patients in an iteration use the same random conf.level
•Conf. Level is updated each iteration
Example #5: Testing Assumptions
Statistics on the
patients are the
simulation random
variables
• Assume the trial was
carried out
• 70% of patients
complete
Q: is this consistent
with the simulations?
A: Yes, but… Only
6.1% of simulations
had >70% completion
Macro Management
VBA Editor:
Alt-F11 (or find the menu)
• Some versions of Excel
• Copy Module between sheets
• Copy code from .xls sheet &
insert into VBA editor
• Open & save as new sheet
Macro Management (newer)
Later versions:
In Visual Basic
From the
Tool Bar
•File > Export File
– Export VBA code
(module: “SweitzerSimulationCoreCode”)
•File > Import File
– Imports VBA code (into a module)
Further resources
Commercial and Free software packages
Provide:
•More rigorous algorithms
•More functions
– Resampling, multivariate, etc
•More support
Commercial Add-Ins
@RISK
www.palisade.com
Crystal Ball
www.decisioneering.com
Free Add-Ins
PopTools
www.cse.csiro.au/poptools
SimTools.xla
http://home.uchicago.edu/~rmyerson/addins.htm
Caveat: Licensing
•Free for non-commercial (eg, education)
•Not clear for other uses
Semi-Commercial
Low-cost Excel simulation add-in:
•RiskSim by Michael Middleton
www.treeplan.com/
–Currently 1 output var with lots of graphs
– also: decision trees, sensitivity analysis
– on-line text-book:
http://www.treeplan.com/chapters.htm
Additional Reading
INTRODUCTION TO MODELING AND GENERATING
PROBABILISTIC INPUT PROCESSES FOR SIMULATION
www.informs-sim.org/wsc07papers/008.pdf
Spreadsheet Simulation (Seila, 2006)
www.informs-sim.org/wsc06papers/002.pdf
Work Smarter, Not Harder: Guidelines for
Designing Simulation Experiments
www.informs-sim.org/wsc06papers/005.pdf
Tips for the Successful Practice of Simulation
www.informs-sim.org/wsc06papers/007.pdf
The End
(Actual – not simulated)