Transcript Chapter 11

11-1
Chapter Eleven
Simple Linear
Regression Analysis
11-2
McGraw-Hill/Irwin
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Simple Linear Regression
11-3
11.1 The Simple Linear Regression Model
11.2 The Least Squares Point Estimates
11.3 Model Assumptions, Mean Squared Error, Std.
Error
11.4 Testing Significance of Slope and y-Intercept
11.5 Confidence Intervals and Prediction Intervals
11.6 The Coefficient of Determination and
Correlation
11.7 An F Test for the Simple Linear Regression
Model
*11.8 Checking Regression Assumptions by Residuals
*11.9 Some Shortcut Formulas
11.1 The Simple Linear Regression Model
y= μ y|x  ε = β0  β1 x  ε
Average
Hourly
Temperature
Week x (deg F)
1
28.0
2
28.0
3
32.5
4
39.0
5
45.9
6
57.8
7
58.1
8
62.5
Weekly Fuel
Consumption
y (MMcf)
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5
y|x = b0 + b1x + e is the mean value of the dependent variable y when
the value of the independent variable is x.
b0 is the y-intercept, the mean of y when x is 0.
b1 is the slope, the change in the mean of y per unit change in x.
e is an error term that describes the effect on y of all factors other than x.
11-4
The Simple Linear Regression Model
Illustrated
11-5
11.2 The Least Squares Point Estimates
yˆ  b0  b1 x
Estimation/Prediction Equation:
Least squares point estimate of the slope b1
b1 
SS xy
SS xx
SS xy

x  y 

  ( x  x )( y  y )   x y 
i
i
i
SS xx   ( xi  x )
i
i

x

x 
n
2
2
2
i
i
Least squares point estimate of the y-intercept b0
b0  y  b1 x
11-6
y

y
i
n
x

x
i
n
i
n
Example: The Least Squares Point Estimates
Prediction (x = 40) yˆ  b0  b1 x 15.84- 0.1279(40)10.72MMcfof Gas
y
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5
81.7
x2
784.00
784.00
1056.25
1521.00
2106.81
3340.84
3375.61
3906.25
16874.76
x
28.0
28.0
32.5
39.0
45.9
57.8
58.1
62.5
351.8
xy
347.20
327.60
403.00
421.20
431.46
549.10
464.80
468.75
3413.11
Slope b1
SS xy
 x  y   3413.11  (351.8)(81.7)  179.6475
x y 
i
i
i
SS xx   xi2 
b1 
11-7
SS xy
SS xx
 x 
2
i
n
i
n
 16874.76 
 179.6475

 0.1279
1404.355
8
2
(351.8)
 1404.355
8
y-Intercept b0
y
81.7
 10.2125
n
8
 xi  351.8  43.98
x
n
8
y
i

b0  y  b1 x
 10.2125  (0.1279)( 43.98)
 15.84
11.3 The Regression Model Assumptions
Model
y= μ y|x  ε = β0  β1 x  ε
Assumptions about the model error terms, e’s
Mean Zero The mean of the error terms is equal to 0.
Constant Variance The variance of the error terms s2 is,
the same for all values of x.
Normality The error terms follow a normal distribution
for all values of x.
Independence The values of the error terms are
statistically independent of each other.
11-8
Regression Model Assumptions Illustrated
11-9
Mean Square Error and Standard Error
SSE   ei2   ( yi  yˆ i ) 2 Sum of Squared Errors
s 2  MSE 
s  MSE 
SSE Mean Square Error, point estimate
n-2 of residual variance s2
SSE
n-2
Standard Error, point estimate of
residual standard deviation s
Example 11.6 The Fuel Consumption Case
y
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5
11-10
x
28.0
28.0
32.5
39.0
45.9
57.8
58.1
62.5
pred
12.2588
12.2588
11.6833
10.8519
9.9694
8.4474
8.4090
7.8463
y - pred
0.1412
-0.5588
0.7168
-0.0519
-0.5694
1.0526
-0.4090
-0.3462
SSE
(y - pred)2
0.019937
0.312257
0.513731
0.002694
0.324205
1.108009
0.167289
0.119889
2.568011
s 2  MSE 

SSE
n- 2
2.568
 0.428
6
s  s 2  0.428
 0.6542
11.4 Significance Test and Estimation for Slope
If the regression assumptions hold, we can reject H0: b1 = 0 at the 
level of significance (probability of Type I error equal to ) if and only
if the appropriate rejection point condition holds or, equivalently, if the
corresponding p-value is less than .
Alternative
Reject H0 if:
p-Value
t  t
Area under t distributi on right of t
H a : b1  0
t  t
Area under t distributi on left of t
H a : b1  0
t  t / 2 , that is
Twice area under t distributi on right of t
H a : b1  0
t  t / 2 or t  t / 2
Test Statistic
b
s
t= 1 where sb1 
sb1
SS xx
100(1-)% Confidence Interval for b1
[b1  t / 2 sb1 ]
t, t/2 and p-values are based on n – 2 degrees of freedom.
11-11
Significance Test and Estimation for y-Intercept
If the regression assumptions hold, we can reject H0: b0 = 0 at the 
level of significance (probability of Type I error equal to ) if and only if
the appropriate rejection point condition holds or, equivalently, if the
corresponding p-value is less than .
Alternative
H a : b0  0
Reject H0 if:
t  t
H a : b0  0
t  t
H a : b0  0
t  t / 2 , that is
p-Value
Area under t distributi on right of t
Area under t distributi on left of t
Twice area under t distributi on right of t
t  t / 2 or t  t / 2
100(1-)% Conf Interval for b0
Test Statistic
b0
1 x2
[b0  t / 2 sb0 ]
t=
where sb0  s

sb0
n SS xx
t, t/2 and p-values are based on n – 2 degrees of freedom.
11-12
Example: Inferences About Slope and y-Intercept
Regression Statistics
Multiple R
0.948413871
R Square
0.899488871
Adjusted R Square
0.882737016
Standard Error
0.654208646
Observations
8
Example 11.7
The Fuel Consumption Case
Excel Output
ANOVA
df
11-13
SS
22.980816
2.567934
25.548750
MS
22.980816
0.427989
F
Significance F
53.694882 0.000330052
Regression
Residual
Total
1
6
7
Intercept
Temp
Coefficients Standard Error
t Stat
P-value
15.83785741
0.801773385 19.75353349 0.000001092
-0.127921715
0.01745733 -7.327679169 0.000330052
Tests
Intercept
Temp
Coefficients Standard Error Lower 95%
Upper 95%
15.83785741
0.801773385 13.87598718 17.79972765
-0.127921715
0.01745733 -0.170638294 -0.085205136
Intervals
11.5 Confidence and Prediction Intervals
Distance Value
1 ( x0  x ) 2

n
SS xx
If the regression assumptions hold,
Prediction (x = x0)
yˆ  b0  b1 x0
100(1 - )% confidence interval for the mean value of y, y|xo
[yˆ  t/2s Distance value ]
100(1 - )% prediction interval for an individual value of y
[yˆ  t/2s 1 + Distance value ]
t/2 is based on n-2 degrees of freedom
11-14
Example: Confidence and Prediction Intervals
Example 11.7 The Fuel Consumption Case
Minitab Output (predicted FuelCons when Temp, x = 40)
Predicted Values
Fit StDev Fit
10.721
0.241
11-15
(
95.0% CI
10.130, 11.312)
(
95.0% PI
9.014, 12.428)
11.6 The Simple Coefficient of Determination
The simple coefficient of determination r2 is
Explained variatio n
r 
Total variation
2
r2 is the proportion of the total variation in y explained by
the simple linear regression model
Total variation  Explained variation  Unexplaine d variation
Total variation =  (yi  y )2 Total Sum of Squares (SSTO)
Explained variation =  (yˆ i  y )2 Regression Sum of Squares (SSR)
Unexplaine d variation =  (yi  yˆ i )2
11-16
Error Sum of Squares (SSE)
The Simple Correlation Coefficient
The simple correlation coefficient measures the strength of the
linear relationship between y and x and is denoted by r.
r=  r 2 if b1 is positive, and
r=  r 2 if b1 is negative
Where, b1 is the slope of the least squares line.
ANOVA
df
Regression
Residual
Total
1
6
7
SS
22.980816
2.567934
25.548750
MS
22.980816
0.427989
Regression Statistics
Multiple R
0.948413871
R Square
0.899488871
Adjusted R Square
0.882737016
Standard Error
0.654208646
Observations
8
11-17
F
Significance F
53.694882 0.000330052
Example 11.15
Fuel Consumption
Excel Output
22.980816
r 
 0.899489
25.548750
r   0.899489  0.948414
2
Different Values of the Correlation Coefficient
11-18
11.7 F Test for Simple Linear Regression Model
To test H0: b1= 0 versus Ha: b1 0
at the  level of significance
Test Statistic:
F(model) 
Explained variation
(Unexplained variation)/(n-2)
Reject H0 if
F(model) > F or
p-value < 
F is based on 1 numerator and n-2 denominator degrees of freedom.
11-19
Example: F Test for Simple Linear Regression
Example 11.17 The Fuel Consumption Case
ANOVA
df
Regression
Residual
Total
1
6
7
SS
22.980816
2.567934
25.548750
MS
22.980816
0.427989
F
Significance F
53.694882 0.000330052
Excel Output
F-test at  = 0.05
level of significance
Test Statistic:
F(model) 
Explained variation
22.980816

 53.695
(Unexplain ed variation )/(n - 2) 2.567904 /(8  2)
Reject H0 at  level of significance, since
F(model) 53.695  5.99  F.05 and
p - value  0.00033  0.05  
F is based on 1 numerator and 6 denominator degrees of freedom.
11-20
*11.8 Checking the Regression Assumptions by
Residual Analysis
For an observed value of y, the residual is e  y  yˆ  (observed y  predicted y)
where the predicted value of y is calculated as yˆ  b0  b1 x
If the regression assumptions hold, the residuals should look like a
random sample from a normal distribution with mean 0 and variance s2.
Residual Plots
Residuals versus independent variables
Residuals versus predicted y’s
Residuals in time order (if the response is a time series)
Histogram of residuals
Normal plot of the residuals
11-21
Checking the Constant Variance Assumption
Example 11.18: The QHIC Case
Plot: Residual versus x and predicted responses
11-22
Checking the Normality Assumption
Example 11.18: The QHIC Case
Plots: Histogram and Normal Plot of Residuals
11-23
Checking the Independence Assumption
Plots: Residuals versus Fits (to check for functional form, not shown)
Residuals versus Time Order
11-24
Combination Residual Plots
Example 11.18: The QHIC Case
Minitab Output
Plots: Histogram and Normal Plot of Residuals, Residuals
versus Order (I Chart), Residuals versus Fit.
Residual Model Diagnostics
Normal Plot of Residuals
I Chart of Residuals
500
300
3.0SL=396.3
100
Residual
Residual
200
0
-100
2
2
0
X=0.000
-200
-3.0SL=-396.3
-300
-500
-1
0
1
0
10
20
30
Observation Number
Histogram of Residuals
Residuals vs. Fits
9
8
7
6
5
4
3
2
1
0
40
300
200
100
0
-100
-200
-300
-300 -200 -100 0
100 200 300
Residual
11-25
2
Normal Score
Residual
Frequency
-2
0 2004006008001000
1 200
1400
1 600
1800
Fit
*11.9 Some Shortcut Formulas
Total variation  SSTO  SS yy
Explained variation  SSR 
SS xy2
SS xx
Unexplaine d variation  SSE = SS yy 
SS xy2
SS xx
where
SS xy

x  y 

  ( x  x )( y  y )   x y 
i
i
SS xx   ( xi  x )
i
i

x

x 
n

y

y 
n
2
2
SS yy   ( yi  y )
11-26
i
2
i
i
2
2
2
i
i
i
n
Simple Linear Regression
Summary:
11.1
11.2
11.3
11.4
11.5
11.6
11.7
*11.8
*11.9
11-27
The Simple Linear Regression Model
The Least Squares Point Estimates
Model Assumptions, Mean Squared Error, Std.
Error
Testing Significance of Slope and y-Intercept
Confidence Intervals and Prediction Intervals
The Coefficient of Determination and
Correlation
An F Test for the Simple Linear Regression
Model
Checking Regression Assumptions by
Residuals
Some Shortcut Formulas