Testing the Significance of the y

Download Report

Transcript Testing the Significance of the y

Chapter 13
Simple Linear Regression
Analysis
McGraw-Hill/Irwin
Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved.
Simple Linear Regression
13.1 The Simple Linear Regression
Model and the Least Square Point
Estimates
13.3 Testing the Significance of Slope
and y-Intercept
13-2
The Simple Linear Regression Model
and the Least Squares Point Estimates
• The dependent (or response) variable
is the variable we wish to understand or
predict
• The independent (or predictor) variable
is the variable we will use to understand
or predict the dependent variable
• Regression analysis is a statistical
technique that uses observed data to
relate the dependent variable to one or
more independent variables
13-3
Objective of Regression Analysis
The objective of regression analysis is
to build a regression model (or
predictive equation) that can be used to
describe, predict and control the
dependent variable on the basis of the
independent variable
13-4
Example 13.1: Fuel Consumption
Case #1
Average
Hourly
Temperature
Week x (deg F)
1
28.0
2
28.0
3
32.5
4
39.0
5
45.9
6
57.8
7
58.1
8
62.5
Weekly Fuel
Consumption
y (MMcf)
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5
13-5
Example 13.1: Fuel Consumption
Case #2
13-6
Example 13.1: Fuel Consumption
Case #3
13-7
Example 13.1: Fuel Consumption
Case #4
• The values of β0 and β1 determine the value
of the mean weekly fuel consumption μy|x
• Because we do not know the true values of β0
and β1, we cannot actually calculate the
mean weekly fuel consumptions
• We will learn how to estimate β0 and β1 in the
next section
• For now, when we say that μy|x is related to x
by a straight line, we mean the different mean
weekly fuel consumptions and average hourly
temperatures lie in a straight line
13-8
Form of The Simple Linear Regression
Model
• y = β0 + β1 x + ε
• y = β0 + β1x + ε is the mean value of the
dependent variable y when the value of the
independent variable is x
• β0 is the y-intercept; the mean of y when x is
0
• β1 is the slope; the change in the mean of y
per unit change in x
• ε is an error term that describes the effect on
y of all factors other than x
13-9
Regression Terms
• β0 and β1 are called regression
parameters
• β0 is the y-intercept and β1 is the slope
• We do not know the true values of these
parameters
• So, we must use sample data to
estimate them
• b0 is the estimate of β0 and b1 is the
estimate of β1
13-10
The Simple Linear Regression Model
Illustrated
13-11
The Least Squares Estimates, and
Point Estimation and Prediction
• The true values of β0 and β1 are
unknown
• Therefore, we must use observed data
to compute statistics that estimate these
parameters
• Will compute b0 to estimate β0 and b1 to
estimate β1
13-12
The Least Squares Point Estimates
• Estimation/prediction equation
ŷ = b0 + b1x
• Least squares point estimate of the
slope β1
b1 
SSxy
SSxy
SSxx

x  y 

  ( x  x )( y  y )   x y 
i
i
SSxx   ( xi  x )
i
i
i

x

x 
n
i
n
2
2
2
i
i
13-13
The Least Squares Point Estimates
Continued
• Least squares point estimate of the yintercept 0
b0  y  b1 x
y

y
i
n
x

x
i
n
13-14
Example 13.3: Fuel Consumption Case
#1
y
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5
81.7
x
28.0
28.0
32.5
39.0
45.9
57.8
58.1
62.5
351.8
x2
784.00
784.00
1056.25
1521.00
2106.81
3340.84
3375.61
3906.25
16874.76
xy
347.20
327.60
403.00
421.20
431.46
549.10
464.80
468.75
3413.11
13-15
Example 13.3: Fuel Consumption Case
#2
• From last slide,
– Σyi = 81.7
– Σxi = 351.8
– Σx2i = 16,874.76
– Σxiyi = 3,413.11
• Once we have these values, we no
longer need the raw data
• Calculation of b0 and b1 uses these
totals
13-16
Example 13.3: Fuel Consumption Case
#3 (Slope b1)
SS xy

x  y 

x y 
i
i
i
i
n
(351.8)(81.7)
 3413.11 
 179.6475
8

x

x 
n
2
SS xx
2
i
i
2
(351.8)
 16874.76 
 1404.355
8
b1 
SS xy
SS xx
 179.6475

 0.1279
1404.355
13-17
Example 13.3: Fuel Consumption Case
#4 (y-Intercept b0)
y

y
81.7

 10.2125
n
8
xi 351.8

x

 43.98
n
8
i
b0  y  b1 x
 10.2125  (0.1279)( 43.98)
 15.84
13-18
Example 13.3: Fuel Consumption Case
#5
• Prediction (x = 40)
• ŷ = b0 + b1x = 15.84 + (-0.1279)(28)
• ŷ = 12.2588 MMcf of Gas
13-19
Example 13.3: Fuel Consumption Case
#6
13-20
Example 13.3: The Danger of Extrapolation
Outside The Experimental Region
13-21
Testing the Significance of the Slope
• A regression model is not likely to be useful
unless there is a significant relationship
between x and y
• To test significance, we use the null
hypothesis:
H0: β1 = 0
• Versus the alternative hypothesis:
Ha: β1 ≠ 0
13-22
Testing the Significance of the Slope #2
If the regression assumptions hold, we
can reject H0: 1 = 0 at the  level of
significance (probability of Type I error
equal to ) if and only if the appropriate
rejection point condition holds or,
equivalently, if the corresponding pvalue is less than 
13-23
Testing the Significance of the Slope #3
Alternative
Reject H0 If p-Value
Ha: β1 > 0
t > tα
Area under t distribution
right of t
Ha: β1 < 0
t < –tα
Area under t distribution
left of t
Ha: β1 ≠ 0
|t| > tα/2*
Twice area under t
distribution right of |t|
* That
is t > tα/2 or t < –tα/2
13-24
Testing the Significance of the Slope #4
• Test Statistics
b1
s
t=
where sb1 
sb1
SS xx
• 100(1-α)% Confidence Interval for β1
[b1 ± t /2 Sb1]
• t, t/2 and p-values are based on n–2
degrees of freedom
13-25
Example 13.6: MINITAB Output of
Regression on Fuel Consumption Data
13-26
Example 13.6: Excel Output of
Regression on Fuel Consumption Data
13-27
Example 13.6: Fuel Consumption
Case
• The p-value for testing H0 versus Ha is
twice the area to the right of |t|=7.33
with n-2=6 degrees of freedom
• In this case, the p-value is 0.0003
• We can reject H0 in favor of Ha at level
of significance 0.05, 0.01, or 0.001
• We therefore have strong evidence that
x is significantly related to y and that the
regression model is significant
13-28
A Confidence Interval for the Slope
• If the regression assumptions hold, a
100(1-) percent confidence interval for
the true slope B1 is
– b1 ± t/2 sb
• Here t is based on n - 2 degrees of
freedom
13-29
Example 13.7: Fuel Consumption
Case
• An earlier printout tells us:
– b1 = -0.12792
– sb1 = 0.01746
• We have n-2=6 degrees of freedom
– That gives us a t-value of 2.447 for a 95
percent confidence interval
• [b1 ± t0.025 · sb1] = [-0.12792 ± 0.01746]
= [-0.1706, -0.0852]
13-30
Testing the Significance of the
y-Intercept
If the regression assumptions hold, we
can reject H0: 0 = 0 at the  level of
significance (probability of Type I error
equal to ) if and only if the appropriate
rejection point condition holds or,
equivalently, if the corresponding pvalue is less than 
13-31
Testing the Significance of the
y-Intercept #2
Alternative
Reject H0 If p-Value
Ha: β0 > 0
t > tα
Area under t distribution
right of t
Ha: β0 < 0
t < –tα
Area under t distribution
left of t
Ha: β0 ≠ 0
|t| > tα/2*
Twice area under t
distribution right of |t|
* That
is t > tα/2 or t < –tα/2
13-32
Testing the Significance of the
y-Intercept #3
Test Statistics
b0
1 x2
t=
where sb0  s

sb0
n SS xx
100(1-)% Confidence Interval for 1
[b0  t / 2 sb0 ]
t, t/2 and p-values are based on n–2 degrees of
freedom
13-33