Factorial Design of Experiments Kevin Leyton-Brown Terminology: variables • response variable – outcome of an experiment.

Download Report

Transcript Factorial Design of Experiments Kevin Leyton-Brown Terminology: variables • response variable – outcome of an experiment.

Factorial Design of Experiments
Kevin Leyton-Brown
Terminology: variables
• response variable
– outcome of an experiment. The thing being measured
• factor
– a variable upon which the experimenter believes that one or more response
variables may depend, and which the experimenter can control
• the design of the experiment will largely consist of a policy for determining how to
set the factors in each experimental trial
– possible values of a factor are called levels.
• in other literature, the word “version” is used for a qualitative (not quantitative) level
• experimental region / factor space
– all possible levels of the factors which are candidates for inclusion in the
experiment
• covariates
– these are like factors which are not under the control of the experimenter.
• Like factors, it is believed that they might affect the value of the response.
• Unlike factors, they cannot be controlled, though they can be observed.
Experimental Design
2
Terminology: experiments
• test run
– a single factor/level combination for which an observation of the response is
recorded
• repeat tests
– two or more test runs. Every effort is made to ensure that the individual test
runs take place under identical circumstances
• replications
– two or more repetitions of an experiment or a portion of an experiment under
potentially different conditions
• experimental unit
– a measurement (or a material upon which a measurement is made)
– it is important that the experimental units be homogeneous
• i.e., they do not exhibit systematic variation
Experimental Design
3
Terminology: Experimental Design
• confounding
– if two factors are varied simultaneously, we can say the effect of the factors is
confounded. Likewise, bad experimental design can lead to the effect of one or
more factors being confounded with one or more covariates.
• blocking
– if experimental units can be partitioned into groups (blocks) so that
experimental units within groups are more homogeneous than experimental
units across groups, the experiment can be designed differently to minimize
the effects of the non-homogeneity of experimental units. For example, the
experiment could be replicated in each block.
• experimental design / experimental layout
–
–
–
–
choice of the factor-level combinations to be examined
number of repeat tests, replications; blocking
assignment of factor-level combinations to experimental units
sequencing of test runs
• effect of factors on response
– the change in the average response under two or more factor-level
combinations
Experimental Design
4
Examples
1. Suspended-Particulate study
– measure three factors
– consider all combination of factor levels (possible because each factor takes
on only two or three values)
– random sequencing
– temperature measured as a covariate
2. Soil treatment study
– effect of two factors (fertilizer; plowing) on soybean yield
– three different fields must be used, so blocking is done
– thus, each factor-level combination will be tested on each field, with the
experiment replicated three times
Experimental Design
5
Common Design Problems
1. Masking factor effects
– if variation in test results is on the same order of magnitude as the factor
effects, the latter may go undetected.
•
This can be addressed through appropriate choice of sample size.
– unmeasured covariates (e.g., the effect of time passing, as an instrument
degrades) can also lead to variation that masks factor effects
2. Uncontrolled factors
– it’s pretty obvious that all variables of interest should be included as factors
– sometimes, though, it can be tricky to choose the appropriate level of model
granularity.
•
•
Sometimes a “high-level feature” (i.e., some function of the levels of many factors)
could be chosen in place of the many factors that influence it
this can make it difficult to vary the factor appropriately, and can limit analysis of the
experimental results.
3. One-factor-at-a-time testing
– it can be tempting to vary each factor value independently, holding the others
constant. However, this does not explore the factor space very effectively.
•
It would be a terrible local search strategy!
– Looking at the same point in another way, it neglects the possibility that
interactions between factors could be important
Experimental Design
6
Experimental Design: Overview
1. Consideration of Objectives
– consider the sorts of hypotheses that should be tested by the experiment
– determine which variables are observable
2. Factor Effects
– select factors, covariates
– guess at whether factors act independently of each other or not
3. Precision and Efficiency
– Precision: the least precise definition of this term that could possibly be given.
My best guess: “variability in a statistic of interest”
•
•
variance in measured response values given identical repeat tests
variance in measured feature levels under homogeneous conditions
– ways of improving precision: blocking, repeat tests, replication, accounting for
covariates
•
that is, either increasing sample size or accounting for previously unconsidered
sources of variation
– Efficiency: for efficiency reasons, the authors leave this one out!
4. Randomization
– …in the sequence of test runs or the assignment of factor-level combinations
can help protect against systematic bias. Do it.
Experimental Design
7
Factorial Experiments in Completely Randomized Designs
• A (complete) factorial experiment includes all possible factor-level
combinations
• One of the most straightforward experimental designs to implement in this
setting is a completely randomized design
– in such a design, factor-level combinations are sequenced / assigned to
experimental units uniformly at random
• Randomization can prevent all kinds of problems, most of which we
probably don’t have in a computational setting:
– “instrument drift, power surges, equipment malfunctions, technician or operator
errors, …”
– note, of course, that randomization doesn’t prevent these problems from
occurring. It just eliminates bias
• While many of these problems don’t affect us in empirical algorithmic
experiments, some still can, and in any case randomization should be very
easy for us to do.
– Amazingly enough, they suggest that we obtain random numbers by consulting
a table in the appendix! 
Experimental Design
8
Repeat Tests
• Experiment designs should include repeat test runs
– that is, runs where all the parameters are exactly the same
– can be used to estimate variance due from uncontrolled / random causes
• while we often do this when studying randomized algorithms, we probably
do it very rarely when studying non-randomized algorithms.
– Is it a good idea?
• balance – having each factorial-level combination repeated an equal
number of times is desirable
– when this is not possible or appropriate, a good approach is to randomly
(without replacement) select factor-level combinations to repeat once
– simply repeating some factor-level combinations many times while not
repeating others is a bad idea
– for some reason, they state that six repeated tests is a good amount for
estimating variance. In practice, I’m sure it depends on dimensionality of factor
space, degree of interaction between factors, etc.
Experimental Design
9
Interactions between Factors
• Here we investigate situations where joint factor effects are present
– in other words, the value of the response variable does not depend only on a
function of this kind:
– instead, the response variable depends on a function of this kind:
– how do we calculate the magnitudes of the interaction effects?
Experimental Design
10
Calculation of Factor Effects: Notation
• factors are denoted using uppercase Latin letters: A, B, C, …
• an individual response is a lowercase letter, usually subscripted
– one subscript for each factor; designates factor-level combination used to
generate response. Possibly other subscripts to denote which repeat trial.
• example: 3 factors, A, B, C.
– Domain of A is {1,2}, subscripted by i; domain of B is {1,2}, subscripted by j.
– Domain of C is {1,2,3,4}, subscripted by k.
– r repeat trials were conducted; the repeat trial is denoted by a fourth subscript
l, with domain {1, …, r}
– responses would be written as yijkl
• To denote averaging over one or more of the subscripts, we replace the
subscript by a dot. In our example (where n = 2 ¢ 2 ¢ 4 ¢ r):
Experimental Design
11
Calculation of Factor Effects
• Effects representation:
– in the case of two-level factors, encode the domain of each factor as {-1,1}
– the AB column is then the product of A and B
• Main effect for factor A (two factors, two levels each):
• Interaction between two factors A and B:
Experimental Design
12
Calculation of Factor Effects
• Based on these definitions, it seems pretty easy to calculate the factor
effects by summing appropriately. However, they give us:
– Algorithm for calculating effects of two-level factors:
• construct the effects representation for each main effect and each interaction
• calculate linear combinations of the responses, using the signs in the effects column
for ech main effect and for each interaction
• Divide the results from (2) by 2k-1 where k is the total number of factors
• Then, we can compare the calculated effects for each interaction with the
standard error. If an effect is much larger (e.g., more than twice as large)
the authors seem to treat it as significant.
– they calculate standard error as 2se/ n1/2, where se is the estimated standard
deviation of the uncontrolled experimental error, and n is the total number of
test runs (including repeat tests)
• What if we have more than two factor levels? Basically, things get messy.
One solution (if numbers of factor levels are powers of two) is to represent
each bit of the factor level using a new, derived two-level factor. Now we
need interactions between these simpler factors just to get the main effects
of the original factor. However, everything else works as before.
Experimental Design
13