Transcript Chapter 12

Business Statistics, 3e
by Ken Black
Chapter 12
Discrete Distributions
Simple Regression
& Correlation
Analysis
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-1
Learning Objectives
• Compute the equation of a simple regression line from a
sample of data, and interpret the slope and intercept of the
equation.
• Understand the usefulness of residual analysis in testing the
assumptions underlying regression analysis and in
examining the fit of the regression line to the data.
• Compute a standard error of the estimate and interpret its
meaning.
• Compute a coefficient of determination and interpret it.
• Test hypotheses about the slope of the regression model and
interpret the results.
• Estimate values of Y using the regression model.
• Compute a coefficient of correlation and interpret it.
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-2
Correlation and Regression
• Correlation is a measure of the degree of
relatedness of two variables.
• Regression analysis is the process of
constructing a mathematical model or
function that can be used to predict or
determine one variable by another variable.
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-3
Simple Regression Analysis
• bivariate (two variables) linear regression -the most elementary regression model
– dependent variable, the variable to be
predicted, usually called Y
– independent variable, the predictor or
explanatory variable, usually called X
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-4
Airline
Cost
Data
Number of Passengers
($1,000)
X
61
63
67
69
70
74
76
81
86
91
95
97
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
Cost
Y
4.28
4.08
4.42
4.17
4.48
4.30
4.82
4.70
5.11
5.13
5.64
5.56
1-5
Scatter Plot of Airline Cost Data
6
5
Cost ($1000)
4
3
2
1
0
0
20
40
60
80
100
120
Number of Passengers
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-6
Regression Models
 Deterministic
Regression Model
Y =  0 +  1X
 Probabilistic
Regression Model
Y =  0 +  1X + 
 0
and 1 are population parameters
 0
and 1 are estimated by sample statistics b0 and b1
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-7
Equation of the Simple Regression Line
Yˆ  b0  b1 X
where :
b
0
= t hesampleint ercept
b = t hesampleslope
1
Yˆ = t hepredict edvalue of Y
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-8
Least Squares Analysis
 X  X Y  Y   XY  nXY


b
 X n X
 X  X 
2
1
2
2


X  Y 


XY 
n
X
2


X
2
n
Y
X


b Y b X  n b n
0
1
1
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-9
Least Squares Analysis
SSXY    X  X Y  Y    XY 
SSXX  
b1 
X  X
2

X
2


 X  Y 
n
X
2
n
SSXY
SSXX
Y
X


b  Y b X  n b n
0
1
1
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-10
Solving for b1 and b0 of the Regression
Line for the Airline Cost Example (Part 1)
X
Number of
Passengers
X
Cost ($1,000)
Y
X2
61
63
67
69
70
74
76
81
86
91
95
97
4.28
4.08
4.42
4.17
4.48
4.30
4.82
4.70
5.11
5.13
5.64
5.56
3,721
3,969
4,489
4,761
4,900
5,476
5,776
6,561
7,396
8,281
9,025
9,409
= 930
Y
= 56.69
X
2
= 73,764
XY
261.08
257.04
296.14
287.73
313.60
318.20
366.32
380.70
439.46
466.83
535.80
539.32
 XY
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
= 4,462.22
1-11
Solving for b1 and b0 of the Regression
Line for the Airline Cost Example (Part 2)
SSXY 
 XY 
SSXX 
X
b1 
b0 
2

 X Y
n
( X ) 2
n
 4,462.22 
(930)(56.69)
 68.745
12
(930) 2
 73,764 
 1689
12
SSXY
68.745

 .0407
SSXX
1689
Y  b  X
n
1
n

56.69
930
 (.0407)
 1.57
12
12
Yˆ  1.57  .0407X
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-12
Graph of Regression Line
for the Airline Cost Example
6
5
Cost ($1000)
4
3
2
1
0
0
20
40
60
80
100
120
Number of Passengers
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-13
Residual Analysis:
Airline Cost Example
Number of
Passengers
X
61
63
67
69
70
74
76
81
86
91
95
97
Cost ($1,000)
Y
Predicted
Value
Yˆ
Residual
Y  Yˆ
4.28
4.08
4.42
4.17
4.48
4.30
4.82
4.70
5.11
5.13
5.64
5.56
4.053
4.134
4.297
4.378
4.419
4.582
4.663
4.867
5.070
5.274
5.436
5.518
.227
-.054
.123
-.208
.061
-.282
.157
-.167
.040
-.144
.204
.042
 (Y  Yˆ )  .001
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-14
Excel Graph of Residuals
for the Airline Cost Example
0.2
Residual
0.1
0.0
-0.1
-0.2
-0.3
60
70
80
90
100
Number of Passengers
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-15
Nonlinear Residual Plot
0
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
X
1-16
Nonconstant Error Variance
0
0
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
X
X
1-17
Graphs of Nonindependent
Error Terms
0
X
X
0
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-18
Healthy Residual Plot
0
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
X
1-19
Standard Error of the Estimate
Sum of Squares Error
SSE  
Standard Error
of the
Estimate
 
Y Y
2
  Y  b0  Y  b1  XY
2
Se 
SSE
n2
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-20
Determining SSE
for the Airline Cost Example
Number of
Passengers
X
61
63
67
69
70
74
76
81
86
91
95
97
Cost ($1,000)
Y
4.28
4.08
4.42
4.17
4.48
4.30
4.82
4.70
5.11
5.13
5.64
5.56
Residual
Y  Yˆ
(Y  Yˆ ) 2
.227
-.054
.123
-.208
.061
-.282
.157
-.167
.040
-.144
.204
.042
.05153
.00292
.01513
.04326
.00372
.07952
.02465
.02789
.00160
.02074
.04162
.00176
 (Y  Yˆ )  .001
 (Y  Yˆ )
2
=.31434
Sum of squares of error = SSE = .31434
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-21
Standard Error of the Estimate
for the Airline Cost Example
Sum of Squares Error
SSE  
Standard Error
of the
Estimate
Y Yˆ 
2
 0.31434
SSE
Se  n  2
0.31434

10
 0.1773
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-22
Coefficient of Determination
SSYY  
Y Y   Y
2
 Y


2
2
n
SSYY  exp lained var iation  un exp lained var iation
SSYY  SSR  SSE
SSR SSE
1

SSYY SSYY
SSR
2
r  SSYY
SSE
 1
SSYY
SSE
 1
Y
2
Y  n


2
0 r 1
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
2
1-23
Coefficient of Determination
for the Airline Cost Example
SSE  0.31434
Y


56.69
 3.11209
 270.9251
 Y 
2
SSYY
2
2
n
SSE
r  1
SSYY
.31434
 1
3.11209
 ..899
12
2
89.9% of the variability
of the cost of flying a
Boeing 737 is accounted for
by the number of passengers.
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-24
Hypothesis Tests for the Slope
of the Regression Model
S

b
t
S
S

S
SSE
n2
1
H 0:  1  0
H 1:  1  0
H 0:  1  0
H 1:  1  0
H 0:  1  0
H 1:  1  0
1
b
where:
e
b
e
SSXX

 X


2
SSXX  

1
X
2
n
 the hypothesized slope
df  n  2
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-25
Point Estimation
for the Airline Cost Example
Yˆ  1.57  0.0407X
For X  73,
Yˆ  1.57  0.040773
 4.5411or $4,541.10
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-26
Confidence Interval to Estimate Y :
Airline Cost Example


X
1
X
0

n
SS XX
2
where : X 0  a part icularvalue of X
Yˆ  t  , n  2 S e

2
 X


2
SS XX =  X
2
n
For X 0  73 and a 95% confidencelevel,
73  77.5
930
73,764
2
4.5411 2.2280.1773
1

12
2
12
 4.5411 1220
4.4191 E Y 73   4.6631
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-27
Confidence Interval to Estimate the
Average Value of Y for some Values of X:
Airline Cost Example
X
62
68
73
85
90
Confidence Interval
4.0934 + .1876
4.3376 + .1461
4.5411 + .1220
5.0295 + .1349
5.2230 + .1656
3.9058 to 4.2810
4.1915 to 4.4837
4.4191 to 4.6631
4.8946 to 5.1644
5.0674 to 5.3986
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-28
Prediction Interval to Estimate Y
for a given value of X

1 X 0 X
ˆ
Y  t  ,n  2 S e 1  
n
SS XX
2
where : X 0  a particularvalue of X

2
X


2
SS XX =  X
2
n
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-29
Confidence Intervals for Estimation
Regression Plot
6
Cost
5
Regression
4
95% CI
95% PI
60
70
80
90
100
Number of Passengers
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-30
MINITAB Regression Analysis of
the Airline Cost Example
The regression equation is
Cost = 1.57 + 0.0407 Number of Passengers
Predictor
Constant
Number o
Coef
1.5698
0.040702
S = 0.1772
StDev
0.3381
0.004312
R-Sq = 89.9%
T
4.64
9.44
P
0.001
0.000
R-Sq(adj) = 88.9%
Analysis of Variance
Source
Regression
Residual Error
Total
Obs
1
2
3
4
5
6
7
8
9
10
11
12
Number o
61.0
63.0
67.0
69.0
70.0
74.0
76.0
81.0
86.0
91.0
95.0
97.0
DF
1
10
11
Cost
4.2800
4.0800
4.4200
4.1700
4.4800
4.3000
4.8200
4.7000
5.1100
5.1300
5.6400
5.5600
SS
2.7980
0.3141
3.1121
Fit
4.0526
4.1340
4.2968
4.3782
4.4189
4.5817
4.6631
4.8666
5.0701
5.2736
5.4364
5.5178
MS
2.7980
0.0314
F
89.09
StDev Fit
0.0876
0.0808
0.0683
0.0629
0.0605
0.0533
0.0516
0.0533
0.0629
0.0775
0.0912
0.0984
P
0.000
Residual
0.2274
-0.0540
0.1232
-0.2082
0.0611
-0.2817
0.1569
-0.1666
0.0399
-0.1436
0.2036
0.0422
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
St Resid
1.48
-0.34
0.75
-1.26
0.37
-1.67
0.93
-0.99
0.24
-0.90
1.34
0.29
1-31
Pearson Product-Moment
Correlation Coefficient
r


SSXY
 SSX  SSY 
  X  X Y  Y 
  X  X   Y Y 
X  Y 


 XY 
n
2




2
 X


2
X
2
n


Y
  Y 2 
n


 
2


Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1  r  1
1-32
Three Degrees of Correlation
r<0
r>0
r=0
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-33
Computation of r for
the Economics Example (Part 1)
Day
1
2
3
4
5
6
7
8
9
10
11
12
Summations
Interest
X
7.43
7.48
8.00
7.75
7.60
7.63
7.68
7.67
7.59
8.07
8.03
8.00
92.93
Futures
Index
Y
221
222
226
225
224
223
223
226
226
235
233
241
2,725
X2
55.205
55.950
64.000
60.063
57.760
58.217
58.982
58.829
57.608
65.125
64.481
64.000
720.220
Y2
48,841
49,284
51,076
50,625
50,176
49,729
49,729
51,076
51,076
55,225
54,289
58,081
619,207
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
XY
1,642.03
1,660.56
1,808.00
1,743.75
1,702.40
1,701.49
1,712.64
1,733.42
1,715.34
1,896.45
1,870.99
1,928.00
21,115.07
1-34
Computation of r
for the Economics Example (Part 2)
r






X  Y 


XY 

n



2


X
Y 
2
2
X  n   Y  n 


 92.93 2725
21115
, .07 
12
2


92
.
93
 720.22 
  619,207  2725
12
12




2



 
2


.815
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-35
Scatter Plot and Correlation Matrix
for the Economics Example
245
Futures Index
240
235
230
225
220
7.40
7.60
7.80
8.00
8.20
Interest
Interest
Interest
Futures Index
Futures Index
1
0.815254
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1
1-36
Covariance

2
XY


 X  
X
Y   
Y
N

X   Y 


XY 
N
N
SSXY

N
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
1-37
Covariance Matrix and Descriptive
Statistics for the Economics Example
Interest
Futures Index
Interest Futures Index
0.050408
1.11053
36.81060606
Interest
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Confidence Level(95.0%)
Futures Index
7.74416667
0.06481276
7.675
8
0.224518
0.05040833
-1.4077097
0.3197374
0.64
7.43
8.07
92.93
12
0.14265201
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Confidence Level(95.0%)
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
227.08
1.7514
225.5
226
6.0672
36.811
1.2427
1.3988
20
221
241
2725
12
3.8549
1-38