Transcript Econ 3780: Business and Economics Statistics
Econ 3790: Statistics Business and Economics
Instructor: Yogesh Uppal Email: [email protected]
Chapter 14
Covariance and Simple Correlation Coefficient Simple Linear Regression
Covariance
Covariance between x and y is a measure of relationship between x and y.
cov(
x
,
y
)
SS xy n
1 (
y
y
)(
x
x
)
n
1
Covariance Example: Reed Auto Sales Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown on the next slide.
Covariance Example: Reed Auto Sales Number of TV Ads 1 3 2 1 3 Number of Cars Sold 14 24 18 17 27
Covariance
x 1 3 2 1 3 Total=10 Total = 100 y 14
x
x
-1 24 18 17 27 1 0 -1 1
y
y
-6 (
x
x
)(
y
y
) 6 4 -2 -3 7 4 0 3 7 SS xy =20 cov(
x
,
y
)
SS xy n
1 20 5 4
Simple Correlation Coefficient
•
Simple Population Correlation Coefficient
• • 1 cov(
x
,
x
y y
) 1
If
< 0, a negative relationship between x and y.
If
> 0, a positive relationship between x and y .
Simple Correlation Coefficient
Since population standard deviations of x and y are not known, we use their sample estimates to compute an estimate of .
r
cov(
x
, 1
r s x s y
1
y
)
Simple Correlation Coefficient
Example: Reed Auto Sales x 1 3 2 1 3 Total=10 Total=98 14 y
x
x
-1
y
y
-6 24 18 1 0 4 -2 17 27 -1 1 SS x 1 1 0 SS y 36 16 4 -3 7 1 1 9 49 Total=4 Total= 114
Simple Correlation Coefficient
s x
(
x n
x
1 ) 2 4 4 1
s y
(
y n
y
1 ) 2 114 4 5 .
34
r xy
cov(
x
,
y
)
s x s y
5 1 * 5 .
34 0 .
936
Chapter 14 Simple Linear Regression
Simple Linear Regression Model Residual Analysis Coefficient of Determination Testing for Significance Using the Estimated Regression Equation for Estimation and Prediction
Simple Linear Regression Model
The equation that describes how y is related to x and an error term is called the regression model.
The simple linear regression model is: y = b 0 + b 1 x + e where: b 0 e and b 1 are called parameters of the model, is a random variable called the error term.
Simple Linear Regression Equation
Positive Linear Relationship
y
Regression line
Intercept b 0 Slope b 1 is positive
x
Simple Linear Regression Equation Negative Linear Relationship
y
Intercept b 0
Regression line
Slope b 1 is negative
x
Simple Linear Regression Equation No Relationship
y
Regression line
Intercept b 0 Slope is 0 b 1
x
Interpretation of
b
0
and
b
1
b 0 (intercept parameter): is the value of y when x = 0.
b 1 (slope parameter): is the change in y given x changes by 1 unit.
Estimated Simple Linear Regression Equation The estimated simple linear regression equation 0 • • • • The graph is called the estimated regression line.
b b
1 0 is the y intercept of the line.
is the slope of the line.
y
ˆ is the estimated value of y for a given value of x.
Estimation Process
Regression Model y = b 0 + b 1 x + e Regression Equation E(y|x) = b 0 + b 1
x
Unknown Parameters b 0 , b 1 Sample Data:
x y x
1
y
1
. .
. .
x n y n
Estimated Regression Equation
b
0
b
1
x b 0 and b
1 provide
point estimates
b 0 and b 1 of
Least Squares Method
Slope for the Estimated Regression Equation
b
1
and SS xy Where SS y SS xy SS x
( (
x y
y
)(
x x
) 2
x
)
Least Squares Method y-Intercept for the Estimated Regression Equation
b
0
Estimated Regression Equation
Example: Reed Auto Sales Slope for the Estimated Regression Equation
b
1
SS xy SS x
20 5 4
y
-Intercept for the Estimated Regression Equation
b
0
y
b
1
x
20 5 * ( 2 ) 10 Estimated Regression Equation 10 5 *
x
Scatter Diagram and Regression Line
26 24
y
ˆ 10 5
x
22 20 18 16 14 12 .5
1.0
num ber of ads 1.5
2.0
2.5
3.0
3.5
Estimate of Residuals
x 1 3 2 1 3 y 14 24 18 17 27
y
ˆ 15 25 20 15 25
e
y
-1.0
-1.0
-2.0
2.0
2.0
Decomposition of total sum of squares
Relationship Among SST, SSR, SSE SST = SSR + SSE ( ) 2 (
i
) 2 ( ˆ
i
) 2 where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error
Decomposition of total sum of squares
e
y
y
ˆ -1 -1 -2 2 2
SSE
(
y
y
) 2 SSE=14 4 1 1 4 4 Check if SST= SSR + SSE 15 25 20 15 25
y
ˆ
y
-5 5 0 -5 5
SSR
(
y
) 2 25 25 0 25 25 SSR=100
Coefficient of Determination The coefficient of determination is:
r
2 = SSR/SST
r
2 = SSR/SST = 100/114 = 0.8772
• The regression relationship is very strong; about 88% of the variability in the number of cars sold can be explained by the number of TV ads.
• The coefficient of determination (r 2 ) is also the square of the correlation coefficient (r).
Sample Correlation Coefficient
r
(sign of
b
1 )
r
(sign of
b
1 ) C oefficien t of Determinat ion
r
2
r
0 .
8772 0 .
936
Sampling Distribution of b
1
Estimate of σ
2
The mean square error (MSE) provides the estimate of
σ
2 .
s
2 = MSE = SSE/(n 2) where: SSE (
y i
y
ˆ
i
) 2
Interval Estimate of
b
1
:
Example:
Reed Auto Sales
5 3 .
182 1 .
08 5 3 .
44
Testing for Significance: t Test
Hypotheses
H
0
H a
: b 1 : b 1 0 0 Test Statistic
t
b
1 0
SE
(
b
1 ) Where b 1 is the slope estimate and SE(b 1 ) is the standard error of b 1 .
Testing for Significance: t Test Rejection Rule Reject H 0 if p-value < or t < -t or t > t where:
t
is based on a t distribution with n - 2 degrees of freedom
Testing for Significance: t Test 1. Determine the hypotheses.
H H
0
a
: b : b 1 1 0 0 2. Specify the level of significance.
= .05
3. Select the test statistic.
4. State the rejection rule.
t
b
1
SE
(
b
1 ) Reject H 0 if p-value < .05
or t ≤ 3.182 or t ≥ 3.182
Testing for Significance: t Test 5. Compute the value of the test statistic.
t
b
1
SE
(
b
1 ) 5 1 .
08 4 .
63 6. Determine whether to reject H 0 .
t = 4.63 > t /2 = 3.182. We can reject H 0 .
Some Cautions about the Interpretation of Significance Tests
Rejecting H 0 : b 1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y.
Just because we are able to reject H between x and y.
0 : b 1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship