Transcript Inference for the Regression Coefficient
Inference for the Regression Coefficient
• Recall,
b
0 and
b
1 are the estimates of the slope
β
1 population regression line.
and intercept
β
0 of • We can shows that
b
0 and
b
1 are the unbiased estimates of
β
1 and
β
0 and furthermore that
b
0
β
1 and
β
0 and
b
1 are Normally distributed with means and standard deviation that can be estimated from the data.
• We use the above facts to obtain confidence intervals and conduct hypothesis testing about
β
1 and
β
0 .
STA 286 week 13 1
CI for Regression Slope and Intercept
• A level 100(1-α)% confidence interval for the intercept
β
0
b
0
t
n
2 ; / 2
SE b
0 is where the standard error of the intercept is
SE b
0
s
1
n
x i x
2
x
2 • A level 100(1-α)% confidence interval for the slope
β
1
b
1
t
n
2 ; / 2
SE b
1 is where the standard error of the slope is
SE b
1
x i s
x
2 • Example ….
STA 286 week 13 2
Significance Tests for Regression Slope
• To test the null hypothesis H 0 :
β
1 = 0 we compute the test statistic
t
b
1
SE b
1 • The above test statistic has a
t
distribution with
n
-2 degrees of freedom. We can use this distribution to obtain the P-value for the various possible alternative hypotheses. • Note: testing the null hypothesis H 0 :
β
1 the null hypothesis H 0 :
ρ
= 0 is equivalent to testing = 0 where ρ is the population correlation. STA 286 week 13 3
Example
• Refer to the heart rate and oxygen example….
STA 286 week 13 4
Confidence Interval for the Mean Response
• For any specific value of
x
, say
x
0 , the mean of the response
y
subpopulation is given by: μ
y
=
β
0 +
β
1
x
0 .
in this • We can estimate this mean from the sample by substituting the least square estimates of
β
0 and
β
1 : ˆ
y
b
0
b
1
x
0 .
• A 100(1-α)% level confidence interval for the mean response μ
y
when
x
takes the value ˆ
y
x t
0
n
is 2 ; / 2
SE SE
ˆ
s
1
n
x
0
x i
x x
2 2 STA 286 week 13 5
Example
• Data on the wages and length of service (LOS) in months for 60 women who work in Indiana banks.
• We are interested to know how LOS relates to wages. The Minitab output and commands are given in a separate file.
STA 286 week 13 6
Prediction Interval
• The predicted response
x
0
y
for an individual case with a specific value of the explanatory variable
x
is:
b
0
b
1
x
0 • A useful prediction should include a margin of error to indicate its accuracy.
• The interval used to predict a future observation is called a
prediction interval
.
• A 100(1-α)% level prediction interval for a future observation on the response variable
y
from the subpopulation corresponding to
x
0
y
t
n
2 ; / 2
SE
is
SE
s
1 1
n
x
0
x
i
x x
2 2 STA 286 week 13 7
Example
• Calculate a 95% PI for the wage of an employee with 3 years experience (i.e. LOS=36).
• Calculate a 90% PI for the wage of an employee with 3 years experience (i.e. LOS=36).
STA 286 week 13 8
Analysis of Variance for Regression
• Analysis of variance, ANOVA, is essential for multiple regression and for comparing several means. • ANOVA summarizes information about the sources of variation in the data. It is based on the framework of DATA = FIT + RESIDUAL.
• The total variation in the response
y y i
y
is expressed by the deviations • The overall deviation of any
y
observation from the mean of the
y
’s can be split into two main sources of variation and expressed as
y i
y
ˆ
i
y y i
i
STA 286 week 13 9
Sum of Squares
•
Sum of squares
(SS) represent variation presented in the responses. They are calculated by summing squatted deviations. Analysis of variance partition the total variation between two sources.
• The total variation in the data is expressed as SST = SSM + SSE.
• SST stands for sum of squares for total it is given by...
• SSM stands for sum of squares for model it is given by...
• SSE stands for sum of squares for errors it is given by ...
• Each of the above SS has degrees of freedom associated with it. The degrees of freedom are… STA 286 week 13 10
Coefficient of Determination R
2
• The coefficient of variation
R
2 is the fraction of variation in the values of
y
that is explained by the least-squares regression. The SS make this interpretation precise. • We can show that
R
2 SSM SST 1 SSE SST • This equation is the precise statement of the fact that
R
2 is the fraction of variation in
y
explained by
x
.
STA 286 week 13 11
Mean Square
• For each source, the ratio of the SS to the degrees of freedom is called the
mean square
(MS).
• To calculate mean squares, use the formula MS sum of squares degrees of freedom STA 286 week 13 12
ANOVA Table and F Test
• In the simple linear regression model, the hypotheses H 0 :
β
1 H 1 :
β
1 ≠ 0 are tested by the
F statistic.
= 0 vr • The
F
statistic is given by
F
MSM MSE • The
F
statistic has an
F
(1,
n
-2) distribution which we can use to find the P-value.
• Example… STA 286 week 13 13
Residual Analysis
• We will use residuals for examining the following six types of departures from the model. The regression is nonlinear The error terms do not have constant variance The error terms are not independent The model fits but some outliers The error terms are not normally distributed One or more important variables have been omitted from the model STA 286 week 13 14
Residual plots
• We will use residual plots to examine the aforementioned types of departures. The plots that we will use are: Residuals versus the fitted values Residuals versus time (when the data are obtained in a time sequence) or other variables Normal probability plot of the residuals Histogram, Stemplots and boxplots of residuals STA 286 week 13 15
Example
• Below are the residual plots from the model predicting GPA based on SAT scores….
STA 286 week 13 16