CHAPTER 1 Probability Theory 1.1 Probabilities 1.1.2

Download Report

Transcript CHAPTER 1 Probability Theory 1.1 Probabilities 1.1.2

Chapter 12. Simple Linear
Regression and Correlation
12.1 The Simple Linear Regression Model
12.2 Fitting the Regression Line
12.3 Inferences on the Slope Rarameter β1
12.4 Inferences on the Regression Line
12.5 Prediction Intervals for Future Response Values
12.6 The Analysis of Variance Table
12.7 Residual Analysis
12.8 Variable Transformations
12.9 Correlation Analysis
12.10 Supplementary Problems
NIPRL
1
12.1 The Simple Linear Regression Model
12.1.1 Model Definition and Assumptions(1/5)
• With the simple linear regression model
yi=β0+β1xi+εi
the observed value of the dependent variable yi is composed of
a linear function β0+β1xi of the explanatory variable xi, together
with an error term εi. The error terms ε1,…,εn are generally taken
to be independent observations from a N(0,σ2) distribution, for
some error variance σ2. This implies that the values y1,…,yn are
observations from the independent random variables
Yi ~ N (β0+β1xi, σ2)
as illustrated in Figure 12.1
NIPRL
2
12.1.1 Model Definition and Assumptions(2/5)
NIPRL
3
12.1.1 Model Definition and Assumptions(3/5)
• The parameter β0 is known as the intercept parameter, and the
parameter β0 is known as the intercept parameter, and the
parameter β1 is known as the slope parameter. A third unknown
parameter, the error
variance σ2, can also
be estimated from the
data set. As illustrated
in Figure 12.2, the data
values (xi , yi ) lie closer
to the line
y = β0+β1x
as the error variance σ2
decreases.
NIPRL
4
12.1.1 Model Definition and Assumptions(4/5)
• The slope parameter β1 is of particular interest since it indicates
how the expected value of the dependent variable depends
upon the explanatory variable x, as shown in Figure 12.3
• The data set shown in Figure 12.4 exhibits a
quadratic (or at least nonlinear) relationship
between the two variables, and it would make
no sense to fit a straight line to the data set.
NIPRL
5
12.1.1 Model Definition and Assumptions(5/5)
• Simple Linear Regression Model
The simple linear regression model
yi = β0 + β1xi + εi
fit a straight line through a set of paired data observations
(x1,y1),…,(xn, yn). The error terms ε1,…,εn are taken to be
independent observations from a N(0,σ2) distribution. The three
unknown parameters, the intercept parameter β0 , the slope
parameter β1, and the error variance σ2, are estimated from the
data set.
NIPRL
6
12.1.2 Examples(1/2)
• Example 3 : Car Plant Electricity Usage
The manager of a car plant wishes to investigate how the plant’s
electricity usage depends upon the plant’s production.
The linear model
y  0  1 x
will allow a month’s electrical
usage to be estimated as a
function of the month’s production.
NIPRL
7
12.1.2 Examples(2/2)
NIPRL
8
12.2 Fitting the Regression Line
12.2.1 Parameter Estimation(1/4)
The regression line y   0  1 x is fitted to the data points ( x1 , y1 ),
, ( xn , yn )
by finding the line that is "closest" to the data points in some sense.
As Figure 12.14 illustrates, the fitted line is chosen to be the line that minimizes
the sum of the squares of these vertical deviations
Q  in1 ( yi  (  0  1 xi )) 2
and this is referred to as
the least squares fit.
NIPRL
9
12.2.1 Parameter Estimation(2/4)
With normally distributed error terms, ˆ0 and ˆ1 are maximum
likelihood estimates.
( ) The joint density of the error terms 1 ,
,  n is
n
n
  i2
 1   i21 2
.

 e
 2 
This likelihood is maximized by minizing
  i2  ( yi  (  0  1 xi )) 2  Q
Q
   in1 2( yi  (  0  1 xi )) and
 0
Q
   in1 2 xi ( yi  (  0  1 xi ))
1
 the normal equations
 yi  nˆ0  ˆ1  in1 xi and
in1 xi yi  ˆ0 in1 xi  ˆ1  in1 xi2
NIPRL
10
12.2.1 Parameter Estimation(3/4)
n in1 xi yi  (in1 xi )(in1 yi ) S XY
1 

n
n
n i=1
xi2  (i=1
xi ) 2
S XX
and then
in1 yi
in1 xi
0 
 1
 y   1x
n
n
where
S XX  in1 ( xi  x ) 2  in1 xi2  nx 2

n
i 1
(in1 xi ) 2
x 
n
2
i
and
(in1 xi )( in1 yi )
S XY   ( xi  x )( yi  y )   xi yi  nxy   xi yi 
n
For a specific value of the explanatory variable x* , this equation
n
i 1
n
i 1
n
i 1
provides a fitted value yˆ |x*   0   1 x* for the dependent variable y , as
illustrated in Figure 12.15.
NIPRL
11
12.2.1 Parameter Estimation(4/4)
The error variance  2 can be estimated by considering the
deviations between the observed data values yi and their fitted
values yi . Specifically, the sum of squares for error SSE is
defined to be the sum of the squares of these deviations
SSE  in1 ( yi  yi ) 2  in1 ( yi  (  0   1 xi )) 2
 in1 yi2   0 in1 yi  1 in1 xi yi
and the estimate of the error variance is
2
SSE
 
n2
NIPRL
12
12.2.2 Examples(1/5)
• Example 3 : Car Plant Electricity Usage
For this example n  12 and
12
x
i 1
i
12
y
i 1
i
 4.51 
 4.20  58.62
 2.48 
 2.53  34.15
12
2
2
x

4.51

i
 4.202  291.2310
i 1
12
y
i 1
2
i
 2.482 
12
x y
i 1
NIPRL
i
i
 2.532  98.6967
 (4.51 2.48) 
 (4.20  2.53)  169.2532
13
12.2.2 Examples(2/5)
NIPRL
14
12.2.2 Examples(3/5)
The estimates of the slope parameter and the intercept parameter :
n
1 
n
n
n xi yi  ( xi )( yi )
i 1
i 1
i 1
n
n
i 1
i 1
n xi2  ( xi ) 2
(12 169.2532)  (58.62  34.15)
 0.49883
2
(12  291.2310)  58.62
34.15
58.62
 0  y  1 x 
 (0.49883 
)  0.4090
12
12

The fitted regression line :
y   0  1 x  0.409  0.499 x
y |5.5  0.409  (0.499  5.5)  3.1535
NIPRL
15
12.2.2 Examples(4/5)
Using the model for production values x outside this range is known
as extrapolation and may give inaccurate results.
NIPRL
16
12.2.2 Examples(5/5)
n
2
 
y
i 1
2
i
n
n
i 1
i 1
  0  yi  1  xi yi
n2
98.6967  (0.4090  34.15)  (0.49883  169.2532)

 0.0299
10
   0.0299  0.1729
NIPRL
17
12.3 Inferences on the Slope Parameter β1
12.3.1 Inference Procedures(1/4)
Inferences on the Slope Parameter β1
ˆ1   ,
2
S XX
).
A two-sided confidence interval with a confidence level 1   for the slope
parameter in a simple linear regression model is
1  ( 1  t / 2,n 2  s.e.( 1 ), 1  t / 2, n 2  s.e.( 1 ))
which is
1  ( 1 
 t / 2,n 2
S XX
, 1 
 t / 2,n 2
S XX
)
One-sided 1   confidence level confidence intervals are
1  (, 1 
NIPRL
 t ,n  2
S XX
) and 1  ( 1 
 t ,n  2
S XX
, )
18
12.3.1 Inference Procedures(2/4)
The two-sided hypotheses
H 0 : 1  b1 versus H A : 1  b1
for a fixed value b1 of interest are tested with the t -statistic
t 
1  b1
 S XX
The p -value is
 p -value  2  P ( X | t |)
where the random variable X has a t -distribution with n  2 degrees of freedom.
A size  test rejects the null hypothesis if | t | t / 2,n  2 .
NIPRL
19
12.3.1 Inference Procedures(3/4)
The one-sided hypotheses
H 0 : 1  b1 versus H A : 1  b1
have a p-value
 p-value  P( X  t )
and a size  test rejects the null hypothesis if t  t ,n  2 .
The one-sidedhypotheses
H 0 : 1  b1 versus H A : 1  b1
have a p-value
 p-value  P( X  t )
and a size  test rejects the null hypothesis if t  t ,n  2 .
NIPRL
Slki
Lab.
20
12.3.1 Inference Procedures(4/4)
• An interesting point to notice is that for a fixed value of the error
variance σ2, the variance of the slope parameter estimate
decreases as the value of SXX increases. This happens as the
values of the explanatory
variable xi become more
spread out, as illustrated
in Figure 12.30. This result
is intuitively reasonable
since a greater spread
in the values xi provides
a greater “leverage” for
fitting the regression line,
and therefore the slope
parameter estimate  1
should be more accurate.
NIPRL
21
12.3.2 Examples(1/2)
• Example 3 : Car Plant Electricity Usage
12
12
S XX   x 
i 1
2
i
 s.e.( 1 ) 
( xi ) 2
i 1
12

S XX

58.622
 291.2310 
 4.8723
12
0.1729
 0.0783
4.8723
The t -statistic for testing H 0 : 1  0
t
1
 
s.e. 1

0.49883
 6.37
0.0783
The two-sided p-value
p  value  2  P ( X  6.37)
NIPRL
0
22
12.3.2 Examples(2/2)
With t0.005,10  3.169, a 99% two-sided confidence interval for the
slope parameter
1  ( 1  critical point  s.e.( 1 ), 1  critical point  s.e.( 1 ))
  0.49883  3.169  0.0783, 0.49883  3.169  0.0783
  0.251, 0.747 
NIPRL
23
12.4 Inferences on the Regression Line
12.4.1 Inference Procedures(1/2)
Inferences on the Expected Value of the Dependent Variable
A 1   confidence level two-sided confidence interval for  0  1 x* , the
expected value of the dependent variable for a particular value x* of the explanatory variable, is
 0  1 x*  (  0  1 x*  t / 2,n 1  s.e.(  0  1 x* ),
 0  1 x*  t / 2,n 2  s.e.(  0  1 x* ))
where
1 ( x*  x ) 2

s.e.(  0  1 x )  
S XX
n
*
NIPRL
24
12.4.1 Inference Procedures(2/2)
One-sided confidence intervals are
 0  1 x*  (,  0  1 x*  t ,n 2  s.e.(  0  1 x* ))
and
 0  1 x*  (  0  1 x*  t ,n 1  s.e.(  0  1 x* ), )
Hypothesis tests on  0  1 x* can be performed by comparing the t -statistic
t
(  0  1 x* )  (  0  1 x* )
s.e.(  0  1 x* )
with a t -distribution with n  2 degrees of freedom.
NIPRL
25
12.4.2 Examples(1/2)
• Example 3 : Car Plant Electricity Usage
1 x  x
1 ( x*  4.885) 2
s.e.(  0  1 x )  

 0.1729 

n
S XX
12
4.8723
2
*
*
With t0.025,10  2.228, a 95% confidence interval for  0  1 x*
1 ( x*  4.885) 2
 0  1 x  (0.409  0.499 x  2.228  0.1729 

,
12
4.8723
*
*
1 ( x*  4.885) 2
0.409  0.499 x  2.228  0.179 

)
12
4.8723
*
At x*  5
 0  51  (0.409  (0.499  5)  0.113, 0.409  (0.499  5)  0.113)
 (2.79,3.02)
NIPRL
26
12.4.2 Examples(2/2)
NIPRL
27
12.5 Prediction Intervals for Future Response Values
12.5.1 Inference Procedures(1/2)
• Prediction Intervals for Future Response Values
A 1   confidence level two-sided prediction interval for y |x* , a future value
of the dependent variable for a particular value x* of the explanatory variable,
is
1 ( x*  x ) 2
y |x*  0  1 x  t / 2,n 1 1  
,
n
S XX
*
1 ( x*  x ) 2
 0  1 x  t / 2,n 2  1  

n
S XX
*
NIPRL
28
12.5.1 Inference Procedures(2/2)
One-sided confidence intervals are
1 ( x*  x ) 2
y |x*  (,  0  1 x  t ,n2  1  
)
n
S XX
*
and
1 ( x*  x ) 2
y |x*  ( 0  1 x  t ,n 1 1  
,)
n
S XX
*
NIPRL
29
12.5.2 Examples(1/2)
• Example 3 : Car Plant Electricity Usage
With t0.025,10  2.228, a 95% confidence interval for y |x*
13 ( x*  4.885) 2
y |x*  (0.409  0.499 x  2.228  0.1729 

,
12
4.8723
*
*
2
13
(
x

4.885)
0.409  0.499 x*  2.228  0.179 

)
12
4.8723
At x*  5
y |5 (0.409  (0.499  5)  0.401, 0.409  (0.499  5)  0.401)
 (2.50,3.30)
NIPRL
30
12.5.2 Examples(2/2)
NIPRL
31
12.6 The Analysis of Variance Table
12.6.1 Sum of Squares Decomposition(1/5)
NIPRL
32
12.6.1 Sum of Squares Decomposition(2/5)
NIPRL
33
12.6.1 Sum of Squares Decomposition(3/5)
Source
Degrees of freedom
Sum of squares
Regression
Error
1
N-2
SSR
SSE
Total
n-1
Mean squares

MSR=SSR
=MSE=SSE/(n-2)
2
F-statistic
F=MSR/MSE P( F1,n-2 > F )
F I G U R E 12.41
Analysis of variance table for simple
linear regression analysis
NIPRL
p-value
34
12.6.1 Sum of Squares Decomposition(4/5)
NIPRL
35
12.6.1 Sum of Squares Decomposition(5/5)
Coefficient of Determination R2
The total variability in the dependent variable, the total sum of squares
SST  in1 ( yi  y ) 2
can be partitioned into the variability explained by the regression line,
theregression sum of squaresSSR  in1 ( yi  y ) 2
and the variability about the regression line, the error sum of squares
SSE  in1 ( yi  yi ) 2 .
 The proportion of the total variability accounted for by the regression line is
the coefficient of determination
SSR
SSE
1
 1

SST
SST 1  SSE
SSR
which takes a value between zero and one.
R 2 
NIPRL
36
12.6.2 Examples(1/1)
• Example 3 : Car Plant Electricity Usage
MSR 1.2124
F

 40.53
MSE 0.0299
R2 
NIPRL
SSR 1.2124

 0.802
SST 1.5115
37
12.7 Residual Analysis
12.7.1 Residual Analysis Methods(1/7)
• The residuals are defined to be
ei  yi  yi , 1  i  n
so that they are the differences between the observed values of
the dependent variable yi and the corresponding fitted values yi.
• A property of the residuals
in1 ei  0
• Residual analysis can be used to
–
–
–
–
NIPRL
Identify data points that are outliers,
Check whether the fitted model is appropriate,
Check whether the error variance is constant, and
Check whether the error terms are normally distributed.
38
12.7.1 Residual Analysis Methods(2/7)
• A nice random scatter plot such as the one in Figure 12.45
⇒ there are no indications of any problems with the regression
analysis
• Any patterns in the residual plot or any residuals with a large
absolute value alert the experimenter to possible problems with
the fitted regression model.
NIPRL
39
12.7.1 Residual Analysis Methods(3/7)
•
•
•
A data point (xi, yi ) can be considered to be an outlier if it does not
appear to predict well by the fitted model.
Residuals of outliers have a large absolute value, as indicated in Figure
12.46. Note in the figure that ei is used instead of ei .
sˆ
[For your interest only]
2
Var (ei )= (1-
NIPRL
1 ( xi - x ) 2
- 
)s .
n
S XX
40
12.7.1 Residual Analysis Methods(4/7)
• If the residual plot shows positive
and negative residuals grouped
together as in Figure 12.47, then
a linear model is not appropriate.
As Figure 12.47 indicates, a
nonlinear model is needed for
such a data set.
NIPRL
41
12.7.1 Residual Analysis Methods(5/7)
• If the residual plot shows a “funnel
shape” as in Figure 12.48, so that
the size of the residuals depends
upon the value of the explanatory
variable x, then the assumption of
a constant error variance σ2 is not
valid.
NIPRL
42
12.7.1 Residual Analysis Methods(6/7)
• A normal probability plot ( a normal score plot) of the residuals
– Check whether the error terms εi appear to be normally distributed.
• The normal score of the i th smallest residual
3

i



–
1
8
 
1
 n  
4

• The main body of the points in a normal probability plot lie
approximately on a straight line as in Figure 12.49 is reasonable
• The form such as in Figure 12.50 indicates that the distribution
is not normal
NIPRL
43
12.7.1 Residual Analysis Methods(7/7)
NIPRL
44
12.7.2 Examples(1/2)
• Example : Nile River Flowrate
NIPRL
45
12.7.2 Examples(2/2)
x  3.88
y |5  0.470  (0.836  3.88)  2.77
 ei  yi  yi  4.01  2.77  1.24
ei
1.24
 3.75

0.1092

x  6.13
ei  yi  yi  5.67  (0.470  (0.836  6.13))  1.02
ei
1.02
 3.07

0.1092

NIPRL
46
12.8 Variable Transformations
12.8.1 Intrinsically Linear Models(1/4)
NIPRL
47
12.8.1 Intrinsically Linear Models(2/4)
NIPRL
48
12.8.1 Intrinsically Linear Models(3/4)
NIPRL
49
12.8.1 Intrinsically Linear Models(4/4)
NIPRL
50
12.8.2 Examples(1/5)
• Example : Roadway Base Aggregates
NIPRL
51
12.8.2 Examples(2/5)
NIPRL
52
12.8.2 Examples(3/5)
NIPRL
53
12.8.2 Examples(4/5)
NIPRL
54
12.8.2 Examples(5/5)
NIPRL
55
12.9 Correlation Analysis
12.9.1 The Sample Correlation Coefficient
Sample Correlation Coefficient
The sample correlation coefficient r for a set of paired data observations
( xi , yi ) is
r

S XY
S XX SYY

in1 ( xi  x )( yi  y )
in1 ( xi  x ) 2 in1 ( yi  y ) 2
in1 xi yi  nxy
in1 xi2  nx 2 in1 yi2  ny 2
It measures the strength of linear association between two variables and can
be thought of as an estimate of the correlation  between the two associated
random variable X and Y .
NIPRL
56
Under the assumption that the X and Y random variables have a bivariate
normal distribution, a test of the null hypothesis
H0 :   0
can be performed by comparing the t -statistic
t
r n2
1 r2
with a t -distribution with n  2 degrees of freedom. In a regression framework,
this test is equivalent to testing H 0 : 1  0.
NIPRL
57
NIPRL
58
NIPRL
59
12.9.2 Examples(1/1)
• Example : Nile River Flowrate
r  R2  0.871  0.933
NIPRL
60