Transcript Analysis of Variance in Matrix form
Multiple Regression
• A multiple regression model is a model that has more than one explanatory variable in it.
• Some of the reasons to use multiple regression models are: Often multiple
X
’s arise naturally from a study.
We want to control for some
X
’s Want to fit a polynomial Compare regression lines for two or more groups STA302/1001 - week 9
Multiple Linear Regression Model
• In a multiple linear regression model there are
p
predictor variables.
• The model is
Y i
0 1
X i
1 2
X i
2
p X i p
i
,
i
1 , ...,
n
• This model is linear in the
β
’s. The variables may be non-linear, e.g., log(
X
1 ),
X
1 *
X
2 etc.
• We need to estimate
p
+1
β
’s and σ 2 .
• There are
p
+2 parameters in this model and so we need at least that many observations to be able to estimate them, i.e., need
n > p
+2.
STA302/1001 - week 9
Multiple Regression Model in Matrix Form
• In matrix notation the multiple regression model is:
Y=Xβ + ε
where
Y
Y
Y
1 2
Y n
, 1 2
n
, 1 0
p
,
X
1 1 1
X
11
X
21
X n
1
X X X
1
p
2
p n p
• Note,
Y
n
p
1 and
ε
are vectors,
β
matrix. The matrix
X
X
is called the ‘design matrix’.
is a • The Gauss-Markov assumptions are:
E
(
ε | X
) =
0
, Var(
ε | X
) = σ 2
I
.
• These result in
E
(
Y | X
) =
0
, Var(
Y | X
) = σ 2
I
.
• The Least-Square estimate of
β
is
b
X
'
X
1
X
'
Y
.
STA302/1001 - week 9
Estimate of σ
2
• The estimate of σ 2 is:
s
2
MSE
df
e
i
2
error
n
e
'
e
p
1 • It has
n-p
-1 degrees of freedom because… • • Claim: s 2 is unbiased estimator of σ 2 .
Proof:
STA302/1001 - week 9
General Comments about Multiple Regression
• The regression equation gives the mean response for each combination of explanatory variables.
• The regression equation will not be useful if it is very complicated or a function of large number of explanatory variables.
• We generally want “parsimonious” model, that is, a model that is as simple as possible to adequately describe the response variable.
• It is unwise to think that there is some exact, discoverable equation.
• Many possible models are available.
• One or two models may adequately approximate the mean of the response variable.
STA302/1001 - week 9
Example – House Prices in Chicago
• Data of 26 house sales in Chicago were collected (clearly collected some time ago). The variables in the data set are: price - selling price in $1000's bdr - number of bedrooms flr - floor space in square feet fp - number of fireplaces rms - number of rooms st - storm windows (1 if present, 0 absent) lot - lot size (frontage) in feet bth - number of bathrooms gar - garage size (0=no garage, 1=one-car garage, etc.) STA302/1001 - week 9
Interpreting Regression Coefficients
• In general, in multiple regression we interpret the coefficient of the
j
th predictor variable (
β j
or
b j
) as the change in
Y
associated with a change of one unit in
X j
with all the other variables held constant. • Note, that it may be impossible to hold all other variables constants.
• Example, re the home price example above, for 100 extra square feet (everything else held constant), the price goes up by $1760 on average. For one more room (everything else held constant), the price goes up by $3900 on average. For one more bedroom (everything else held constant), the price goes down by $7700 on average.
STA302/1001 - week 9
Inference for Regression Coefficients
• As in simple linear regression, we are interesting in testing: H 0 :
β j
= 0 versus H
a
:
β j
≠ 0. • The test statistics is
t stat
S
.
E b j
j
It has a
t
-distribution with
n-p-
1 degrees of freedom. • We can calculate the P-value from the
t
-table with
n-p-
1 df.
• This test gives an indication of whether or not the
j
th predictor variable statistically significant contributes to the prediction of the response variable
over and above
all the other predictor variables.
b
1 ; / 2 .
STA302/1001 - week 9
ANOVA Table
• The ANOVA table in multiple regression model is given by… STA302/1001 - week 9
Coefficient of Multiple Determination – R
2
• As in simple linear regression model,
R
2 = SSReg/SST.
• In multiple regression this is called the “coefficient of multiple determination”; it is not the square of a correlation coefficient.
• In multiple regression, need to be cautious judging model with
R
2 because it always goes up when more predictor variables are added to the model, regardless of whether the predictor variables are useful for predicting
Y
.
STA302/1001 - week 9
Adjusted R
2
• An attempt to make
R
2 more useful is to calculate Adjusted
R
2 (“Adj R-Sq” in SAS) • Adjusted
R
2 is adjusted for the number of predictor variables in the model.
• It can actually go down when more predictors are added.
• It can be used for choosing the best model.
• It is defined as Adj
R
2 1
n
1
MSE SST
1
n n
1
p
1
SSE
.
SST
• Note that Adjusted
R
2 will increase only is MSE decrease.
STA302/1001 - week 9
ANOVA F Test in Multiple Regression
• In multiple regression, the ANOVA
F
test is designed to test the following hypothesis:
H
0
H a
:
1 2
p
0 : at least one of
1
,
2
,
p
is not 0
• This test aims to assess whether or not the model have any predictive ability.
• The test statistics is
F stat
MSR eg MSE
• If
H
0 is true, the above test statistics has an
F
distribution with
p
,
n-p-
1 degrees of freedom.
STA302/1001 - week 9 12
F-Test versus t-Tests in Multiple Regression
• In multiple regression, the
F
test is designed to test the overall model while the
t
tests are designed to test individual coefficients.
• If the
F
-test is significant and all or some of the
t
-tests are significant, then there are some useful explanatory variables for predicting
Y
.
• If the
F
-test is not significant (large P-value), and all the
t
-tests are not significant, it means that no explanatory variable contribute to the prediction of
Y
.
• If the
F
-test is significant and all the
t
-tests are not significant, then it is an indication of “multicolinearity” – i.e., correlated
X
’s. It means that individual
X
’s don’t contribute to the prediction of
Y
above other
X
’s.
over and STA302/1001 - week 9 13
• If the
F
-test is not significant and some of the
t
-tests are significant, it is an indication of one of two things: The model has no predictive ability but if there are many predictors, a few may have small P-value (type I error in
t
-tests).
Predictors were chosen poorly. If one useful predictor is added to many that are unrelated to the outcome its contribution may not be enough for model to be significant (F-test).
STA302/1001 - week 9 14