Meadowfoam Example Continuation

Download Report

Transcript Meadowfoam Example Continuation

Indicator Variables
• Often, a data set will contain categorical variables which are
potential predictor variables.
• To include these categorical variables in the model we define
dummy variables.
• A dummy variable takes only two values, 0 and 1.
• In categorical variable with j categories we need j-1 indictor
variables.
STA302/1001 - week 12
1
Meadowfoam Example
• Meadowfoam is a small plant found in the US Pacific Northwest.
Its seed oil is unique among vegetable oils for its long carbon
strings, and it is nongreasy and highly stable. A study was conducted
to find out how to elevate meadowfoam production to a profitable
crop. In a growth chamber, plants were grown under 6 light
intensities (in micromol/m^2/sec) and two timings of the onset of
the light treatment, either late (coded 0) or early (coded 1). The
response variable is the average number of flowers per plant for 10
seedlings grown under each of the 12 treatment conditions.
• This is an example of an experiment in which we can make causal
conclusions.
• There are two explanatory variables, light intensity and timing.
• There are 24 data points, 2 at each treatment combination.
STA302/1001 - week 12
2
Question of Interests
• What is the effect of timing on the seedling growth?
• What are the effects of the different light intensity?
• Does the effect of intensity depend on timing?
STA302/1001 - week 12
3
Indicator Variables in Meadowfoam Example
• To include the variable time in the model we define a dummy
variable that takes the value 1 if early timing and the value 0 if late
timing.
• The variable intensity has 6 levels (150, 300, 450, 600, 750, 900).
We will treat these levels as 6 categories.
• It is useful to do so if we expect a complex relationship between
response variable and intensity and if the goal is to determine which
intensity level is “best”.
• The cost in using dummy variables is degrees of freedom since we
need multiple dummy variables for each of the multiple categories.
• We define the dummy variables as follows….
STA302/1001 - week 12
4
Partial F-test
• Partial F-test is designed to test whether a subset of β’s are 0
simultaneously.
• The approach has two steps. First we fit a model with all predictor
variables. We call this model the “full model”.
• Then we fit a model without the predictor variables whose
coefficients we are interested in testing. We call this model the
“reduced model”.
• We then compare the SSReg and RSS in these two models….
STA302/1001 - week 12
5
Test Statistic for Partial F-test
• To test whether some of the coefficients of the explanatory variables
are all 0 we use the following test statistic: .
ExtraSS Extradf
Fstat 
MSE full
Where Extra SS = RSSred - RSSfull, and
Extra df = number of parameters being tested.
• To get the Extr SS in SAS we can simply fit two regressions
(reduced and full) or we can look at Type I SS which are also called
Sequential Sum of Squares.
• The Sequential SS gives the additional contribution to SSR each
variable gives over and above variables previously listed.
• The Sequential SS depends on which order variables are stated in
model statement; the variables whose coefficients we want to test
must be listed last.
STA302/1001 - week 12
6
Meadowfoam Example Continuation
• Suppose now we treat the variable light intensity as a quantitative
variable.
• There are three possible models to look at the relationship between
seedling growth and the two predictor variables…
• If we want to know whether the effect of light intensity on number
of flowers per plant depends on timing we need to include in the
model an interaction term….
STA302/1001 - week 12
7
Meadowfoam Example – Summary of Findings
• There is no evidence that the effect of light intensity on flowers
depends on timing (P-value = 0.91). That means that the interaction
effect is not significant.
• If interaction did exist, it is difficult to talk about the effect of light
intensity on Y, as it varies with timing.
• Since the interaction was not significant, we remove it from the model.
• For same timing, increasing light intensity by 100 micromol/m2/sec
decreases the mean number of flower per plant by 4.0 flowers / per
plant. 95% CI: (-5.1, -3)
• For same light intensity, beginning the light treatment early increases
the mean number of flowers per plant by 12.2 flowers / plants.
95% CI (6.7, 17.6).
STA302/1001 - week 12
8