Inference for the Regression Coefficient

Download Report

Transcript Inference for the Regression Coefficient

Inference for the Regression Coefficient

• Recall,

0 and

1 are the estimates of the slope

1 population regression line.

and intercept

0 of • We can shows that

0 and

1 are the unbiased estimates of

1 and

0 and furthermore that

1 and

0 and

1 are Normally distributed with means and standard deviation that can be estimated from the data.

• We use the above facts to obtain confidence intervals and conduct hypothesis testing about

1 and

0 .

STA 286 week 13 1

CI for Regression Slope and Intercept

• A level 100(1-α)% confidence interval for the intercept

0 



 2  ;  / 2

SE b

0 is where the standard error of the intercept is

SE b

0 

  

x i x

2 

 2 • A level 100(1-α)% confidence interval for the slope

1 



 2  ;  / 2

SE b

1 is where the standard error of the slope is

SE b

1   

x i s



 2 • Example ….

STA 286 week 13 2

Significance Tests for Regression Slope

• To test the null hypothesis H 0 :

1 = 0 we compute the test statistic



SE b

1 • The above test statistic has a

distribution with

-2 degrees of freedom. We can use this distribution to obtain the P-value for the various possible alternative hypotheses. • Note: testing the null hypothesis H 0 :

1 the null hypothesis H 0 :

= 0 is equivalent to testing = 0 where ρ is the population correlation. STA 286 week 13 3

Example

• Refer to the heart rate and oxygen example….

STA 286 week 13 4

Confidence Interval for the Mean Response

• For any specific value of

, say

0 , the mean of the response

subpopulation is given by: μ

0 +

0 .

in this • We can estimate this mean from the sample by substituting the least square estimates of

0 and

1 :  ˆ



0 

0 .

• A 100(1-α)% level confidence interval for the mean response μ

when

takes the value  ˆ



x t

0 

is  2  ;  / 2

SE SE

 ˆ 

1 

 

0 

x i

 

x x

 2  2 STA 286 week 13 5

Example

• Data on the wages and length of service (LOS) in months for 60 women who work in Indiana banks.

• We are interested to know how LOS relates to wages. The Minitab output and commands are given in a separate file.

STA 286 week 13 6

Prediction Interval

• The predicted response

for an individual case with a specific value of the explanatory variable

is: 

0 

0 • A useful prediction should include a margin of error to indicate its accuracy.

• The interval used to predict a future observation is called a

prediction interval

• A 100(1-α)% level prediction interval for a future observation on the response variable

from the subpopulation corresponding to





 2  ;  / 2



1  1 

 

 0



x x

 2  2 STA 286 week 13 7

Example

• Calculate a 95% PI for the wage of an employee with 3 years experience (i.e. LOS=36).

• Calculate a 90% PI for the wage of an employee with 3 years experience (i.e. LOS=36).

STA 286 week 13 8

Analysis of Variance for Regression

• Analysis of variance, ANOVA, is essential for multiple regression and for comparing several means. • ANOVA summarizes information about the sources of variation in the data. It is based on the framework of DATA = FIT + RESIDUAL.

• The total variation in the response

y y i



is expressed by the deviations • The overall deviation of any

observation from the mean of the

’s can be split into two main sources of variation and expressed as 

y i



y y i



 STA 286 week 13 9

Sum of Squares

•

Sum of squares

(SS) represent variation presented in the responses. They are calculated by summing squatted deviations. Analysis of variance partition the total variation between two sources.

• The total variation in the data is expressed as SST = SSM + SSE.

• SST stands for sum of squares for total it is given by...

• SSM stands for sum of squares for model it is given by...

• SSE stands for sum of squares for errors it is given by ...

• Each of the above SS has degrees of freedom associated with it. The degrees of freedom are… STA 286 week 13 10

Coefficient of Determination R

• The coefficient of variation

2 is the fraction of variation in the values of

that is explained by the least-squares regression. The SS make this interpretation precise. • We can show that

2  SSM SST  1  SSE SST • This equation is the precise statement of the fact that

2 is the fraction of variation in

explained by

STA 286 week 13 11

Mean Square

• For each source, the ratio of the SS to the degrees of freedom is called the

mean square

(MS).

• To calculate mean squares, use the formula MS  sum of squares degrees of freedom STA 286 week 13 12

ANOVA Table and F Test

• In the simple linear regression model, the hypotheses H 0 :

1 H 1 :

1 ≠ 0 are tested by the

F statistic.

= 0 vr • The

statistic is given by

 MSM MSE • The

statistic has an

(1,

-2) distribution which we can use to find the P-value.

• Example… STA 286 week 13 13

Residual Analysis

• We will use residuals for examining the following six types of departures from the model.  The regression is nonlinear  The error terms do not have constant variance  The error terms are not independent  The model fits but some outliers  The error terms are not normally distributed  One or more important variables have been omitted from the model STA 286 week 13 14

Residual plots

• We will use residual plots to examine the aforementioned types of departures. The plots that we will use are:  Residuals versus the fitted values  Residuals versus time (when the data are obtained in a time sequence) or other variables  Normal probability plot of the residuals  Histogram, Stemplots and boxplots of residuals STA 286 week 13 15

Example

• Below are the residual plots from the model predicting GPA based on SAT scores….

STA 286 week 13 16

Inference for the Regression Coefficient

Transcript Inference for the Regression Coefficient