No Slide Title

Download Report

Transcript No Slide Title

Lesson 10:
Regressions Part I
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-1
Outline
Correlation Analysis
Regression Analysis
Standard error of estimate
Confidence interval and prediction interval
Inference about the regression slope
Cautions about the interpretation of significance
Evaluating the model
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-2
Correlation Analysis
 Correlation Analysis is a group of statistical techniques used to
measure the strength of the association between two variables.
 A Scatter Diagram is a chart that portrays the relationship
between the two variables.
 The Dependent Variable is the variable being predicted or
estimated.
 The Independent Variable provides the basis for estimation. It
is the predictor variable.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Example
 Suppose a university administrator wishes to determine whether
any relationship exists between a student’s score on an entrance
examination and that student’s cumulative GPA. A sample of eight
students is taken. The results are shown below
Student
Exam Score
GPA
A
74
2.6
B
69
2.2
C
85
3.4
D
63
2.3
E
82
3.1
F
60
2.1
G
79
3.2
H
91
3.8
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-4
Cumulative GPA
Scatter Diagram: GPA vs. Exam Score
4.00
3.75
3.50
3.25
3.00
2.75
2.50
2.25
2.00
We would like to know whether there
is a strong linear relationship between
the two variables.
|
|
|
|
|
|
|
|
|
|
50 55 60 65 70 75 80 85 90 95
Exam Score
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-5
The Coefficient of Correlation, r
 The Coefficient of Correlation (r) is a measure of the strength
of the linear relationship between two variables.
 It can range from -1.00 to 1.00.
 Values of -1.00 or 1.00 indicate perfect and strong
correlation.
 Values close to 0.0 indicate weak correlation.
 Negative values indicate an inverse relationship and
positive values indicate a direct relationship.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Formula for r
We calculate the coefficient of correlation from the following
formulas.
Sample covariance
between x and y
n
 (x
r
s
2
xy
sxsy
i1

i1
i
 x)
(n  1)
Sample
standard
deviation of x
Ka-fu Wong © 2007
n
(n  1)
n
 (x
 x )( y i  y )
i
2

n
 (y
i1
i
 y)
(n  1)
2
 (x
i1
i
 x )( y i  y )
n
n
2 
2
(
x

x
)
(
y

y
)
 i
  i

 i1
  i1

Sample
standard
deviation of y
ECON1003: Analysis of Economic Data
Coefficient of Determination
 The coefficient of determination (r2) is the proportion of the total
variation in the dependent variable (Y) that is explained or
accounted for by the variation in the independent variable (X).
 It is the square of the coefficient of correlation.
 It ranges from 0 to 1.
 It does not give any information on the direction of the
relationship between the variables.
 Special cases:
 No correlation: r=0, r2=0.
 Perfect negative correlation: r=-1, r2=1.
 Perfect positive correlation: r=+1, r2=1.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
EXAMPLE 1
 Dan Ireland, the student body president at Toledo State University, is
concerned about the cost to students of textbooks. He believes there is a
relationship between the number of pages in the text and the selling price of
the book. To provide insight into the problem he selects a sample of eight
textbooks currently on sale in the bookstore. Draw a scatter diagram.
Compute the correlation coefficient.
Ka-fu Wong © 2007
Book
Page
Price ($)
Intro to History
500
84
Basic Algebra
700
75
Intro to Psyc
800
99
Intro to Sociology
600
72
Bus. Mgt.
400
69
Intro to Biology
500
81
Fund. of Jazz
600
63
Princ. Of Nursing
800
93
ECON1003: Analysis of Economic Data
Example 1 continued
Scatter Diagram of Number of Pages and Selling Price of Text
100
90
Price ($)
80
70
60
400
500
600
700
800
Page
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-10
Example 1
continued
n
Book
Page
Price ($)
X
Y
Intro to History
500
84
Basic Algebra
700
75
Intro to Psyc
800
99
Intro to Sociology
600
72
Bus. Mgt.
400
69
Intro to Biology
500
81
Fund. of Jazz
600
63
Princ. Of Nursing
800
93
4,900
636
Total
Ka-fu Wong © 2007
r
 (x  x)(y  y)
i1
i
i
n
n


2
2
 ( x i  x )   ( y i  y ) 
 i1
  i1

The correlation between the
number of pages and the
selling price of the book is
0.614. This indicates a
moderate association
between the variable.
ECON1003: Analysis of Economic Data
Lesson10-11
EXAMPLE 1
continued
Is there a linear relation between number of pages and price of
books?
 Test the hypothesis that there is no correlation in the population.
Use a .02 significance level.
 Under the null hypothesis that there is no correlation in the
population. The statistic
t
r
(1  r 2 ) /(n  2)
follows student t-distribution with (n-2) degree of freedom.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-12
EXAMPLE 1
continued
 Step 1:
H0: The correlation in the population is zero.
H1: The correlation in the population is not zero.
 Step 2: H0 is rejected if t>3.143 or if t<-3.143. There are 6
degrees of freedom, found by
n – 2 = 8 – 2 = 6.
 Step 3: To find the value of the test statistic we use:
t
r
(1  r ) /(n  2)
2

.614 8  2
1  (.614)
2
 1.905
 Step 4: H0 is not rejected. We cannot reject the hypothesis that
there is no correlation in the population. The amount of
association could be due to chance.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Regression Analysis

In regression analysis we use the independent variable (X) to
estimate the dependent variable (Y).

The relationship between the variables is linear.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Simple Linear Regression Model
 Relationship Between Variables Is a Linear Function
0 and 1 are unknown,
Random
Y intercept
Slope
therefore, are estimated
Error
from the data.
Yi  0  1X i   i
y
1 = Rise/Run
Rise
Dependent
(Response)
Variable
Ka-fu Wong © 2007
Independent
(Explanatory)
Variable
0
Run
ECON1003: Analysis of Economic Data
x
Lesson10-15
Finance Application: Market Model
 One of the most important applications of linear regression is the
market model.
 It is assumed that rate of return on a stock (R) is linearly related to
the rate of return on the overall market (Rm).
R = 0 + 1Rm +
Rate of return on a
particular stock
Rate of return on some major
stock index
The beta coefficient measures how sensitive the
stock’s rate of return is to changes in the level
of the overall market.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-16
Estimation
Method of moments
 Given a set of data (x1,y1),…,(xn,yn), how should we estimate the
parameters 0 and 1?
 Start from the simple case – suppose 1 is known to be 0. That is,
we have only one parameter to estimate.
 yi=0 + i
 Possible assumption #1
 E() = 5. That is E(y) =0 + E() = 0 + 5
 Possible assumption #2
 E() = 0. That is E(y) =0 + E() = 0
Better assumption!!
 0 has the interpretation of mean of y.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-17
Estimation of 0
Method of moments
 Given a set of data (x1,y1),…,(xn,yn), how should we estimate the
parameters 0?
 yi=0 + i
 Assumption: E() = 0. That is E(y) =0 + E() = 0
 0 has the interpretation of mean of y.
 Use the sample analog # 1:
 E() = 0 implies 0 = E(y)
 Take the estimate as b0 = (y1 + y2 + … + yn)/n = ∑ yi /n
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-18
Estimation of 0
Method of moments
 Given a set of data (x1,y1),…,(xn,yn), how should we estimate the
parameters 0?
 yi=0 + i
 Assumption: E() = 0. That is E(y) =0 + E() = 0
 0 has the interpretation of mean of y.
 Use the sample analog # 1:
(yn-b0): deviation of y from
 E() = 0 in population.
the assumed value of 0
 Set the (e1 + e2 + … + en )/n = 0
 (y1- b0) + (y2- b0) + … + (yn- b0) = 0
 (y1 + y2 + … + yn) - nb0 = 0
 b0 = (y1 + y2 + … + yn)/n = ∑ yi /n
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-19
Estimation of 1
Method of moments
 Given a set of data (x1,y1),…,(xn,yn), how should we estimate the
parameters 0 and 1?
 Start from another simple case – suppose 0 is known to be 0.
That is, we have only one parameter to estimate.
 yi=1xi + i
 Possible assumption #1
 E() = 0. That is E(y) =1E(x) + E() = 1E(x) and 1=E(y)/E(x)
 Possible assumption #2
 E(x) = 0. That is E(yx) =1E(x2) + E(x) = 1E(x2) and
1=E(yx)/E(x2)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-20
Estimation of 0 and 1
Method of moments
 Given a set of data (x1,y1),…,(xn,yn), how should we estimate the
parameters 0 and 1?
 yi=0 + 1 xi+ i
 Assumption #1
 E() = 0.
If x is known and non-random:
E(y) =0 + 1 x + E() = 0 + 1 x
 If x is unknown and random:
E(y) =0 + 1 E(x) + E() = 0 + 1 E(x)
This assumption gives us only one equation. Need an extra
equation….
 Assumption #2….
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-21
Estimation of 0 and 1
Method of moments
 Given a set of data (x1,y1),…,(xn,yn), how should we estimate the
parameters 0 and 1?
 yi=0 + 1 xi+ i
 Assumption #1
 E() = 0 implies E(y) – 0 – 1 E(x) = 0
 Assumption #2
 How about E(x) = 0?
 Since Cov(, x) = E(x) – E()E(x) = E(x), the assumption
really imply  and x are uncorrelated.
 E(x) =0 implies E[(y – 0 – 1x)x]=0
 E[yx – 0x – 1x2]= E(yx) – 0E(x) – 1E(x2) = 0
Two equations are adequate to solve two unknowns.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-22
Estimation of 0 and 1
Method of moments
Two equations are adequate to solve two unknowns.
 E(y) – 0 – 1 E(x) = 0
E(y) E(x) – 0 E(x) – 1 E(x)2 = 0
 E(yx) – 0E(x) – 1E(x2) = 0
 E(yx) – E(y) E(x) – 1E(x2) + 1 E(x)2 = 0
Cov(x,y) – 1Var(x) = 0
1= Cov(x,y)/Var(x)
 E(y) – 0 – 1 E(x) = 0
0 = E(y)– 1 E(x)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-23
Estimation of 0 and 1
Method of moments
 The two assumptions:
 Assumption #1: E() = 0
 Assumption #2: E(x) = 0
implies 1= Cov(x,y)/Var(x) and 0 = E(y)– 1 E(x)
 To estimate the parameters, use the sample analog:
 Suffice to compute the
Sample covariance between x and y, Sx,y
Sample variance of x, Sxx
Sample mean of y, my
Sample mean of x, mx
and use them to replace the corresponding population
quantities.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-24
Estimation of 0 and 1
Method of moments
 The two assumptions:
 Assumption #1: E() = 0
 Assumption #2: E(x) = 0
 To estimate the parameters, we can also use the sample analog of
the two assumptions directly:
 Sample analog of E() = 0:
∑ (yi – b0 – b1 xi) =0
∑yi– nb0 – b1 ∑xi =0
 Sample analog of E(x) = 0?
∑ (yi – b0 – b1 xi)xi =0
∑(yi xi) – b0 ∑xi – b1 ∑xi2 =0
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-25
Estimation of 0 and 1
Which way is better?
 Matching the moments of the two assumptions directly:
 Assumption #1: E() = 0
 Assumption #2: E(x) = 0
 Or the implied parameters in terms of the moments
 1= Cov(x,y)/Var(x) and 0 = E(y)– 1 E(x)
 It is not surprise that both approaches yield the same estimator.
 b1 = Sxy / Sxx and b0 = my – b1 mx
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-26
E( |x) = 0
one assumption implies two
 By law of iterated expectations
 E( |x) = 0 implies E [E( |x) ] = E[  ] = 0
 E( |x) = 0 implies E( x |x) = 0,
implies E[ E( x |x) ] = E[  x ] = 0
(assumption #1)
(assumption #2)
 What does E( |x) = 0 mean?
 E( |x) =0
 E[ (y – 0 – 1 x) | x] = 0
 E (y|x) –0 – 1 x = 0
 E (y|x) = 0 + 1 x
 That the expectation of y conditional on x is 0 + 1 x
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-27
What if the assumption E(x) = 0 fails?
 Still have:
 E[  ] = 0
(assumption #1)
 Need to find another moment conditions, say another variables z
such that E[  z ] = 0
(assumption #2’)
instrumental variable
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-28
Maximum likelihood
 Maximum likelihood estimation (MLE) is a popular
statistical method used to make inferences about
parameters of the underlying probability distribution from
a given data set. That is to say, you have a sample of data
x1, x2, …, xn
and you want to infer the distribution of the random
variable x.
 Commonly, one assumes the data are independent,
identically distributed (iid) drawn from a particular
distribution with unknown parameters and uses the MLE
technique to create estimators for the unknown
parameters, that define the distribution.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-29
Maximum likelihood
 Given a sample of data
x1, x2, …, xn
We would like to find the distribution P0such that for all
feasible distribution P
Prob(P0 | x1,x2,…,xn)  Prob(P | x1,x2,…,xn)
 For example, the distribution P0 = N(m0,s02)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-30
An example: Maximum likelihood
 Consider tossing an unfair coin 80 times (i.e., we sample something
like x1=H, x2=T, ..., x80=T, and count the number of HEADS "H"
observed). Call the probability of tossing a HEAD p, and the
probability of tossing TAILS 1-p (so here p is θ above).
 Suppose we toss 49 HEADS and 31 TAILS, and suppose the coin
was taken from a box containing three coins:
 one which gives HEADS with probability p=1/3,
 one which gives HEADS with probability p=1/2 and
 another which gives HEADS with probability p=2/3.
 The coins have lost their labels, so we don't know which one it
was. Using maximum likelihood estimation we can calculate which
coin has the largest likelihood, given the data that we observed.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-31
An example: Maximum likelihood
 The likelihood function (defined below) takes one of three values:
 Pr(H=49 | p=1/3) = C(80,49) (1/3)49 (1-1/3)31≈0.000
 Pr(H=49 | p=1/2) = C(80,49) (1/2)49 (1-1/2)31≈0.012
 Pr(H=49 | p=2/3) = C(80,49) (2/3)49 (1-2/3)31≈0.054
 Which coin is more likely?
 The one with p=2/3
because among the three possibilities, the observation of the
sample data has the highest probability when p=2/3.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-32
An example: Maximum likelihood
 Now suppose we had only one coin but its p could have been any
value 0 ≤ p ≤ 1. We must maximize the likelihood function:
L(p) = C(80,49) p49 (1-p)31
Over all possible values of 0 ≤ p ≤ 1.
 To find the maximum, take the first derivative of L(p) with respect
to p and set to zero.
p48(1-p)30(49-80p) = 0
 Possible solutions of p are:
 P=0, P=1 and p=49/80.
 L(0)=L(1)=0. L(49/80) >0. Hence the maximum likelihood
estimator for p is 49/80.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-33
L(p) = C(80,49) p49 (1-p)31
over all possible values of 0 ≤ p ≤ 1.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-34
To estimate 0 and 1 using ML
 Assume i to be independent identically distributed with normal
distribution of zero mean and variance s2. Denote the normal
density for i be
 f(i)=f(yi-0-1xi)
i = yi - 0 - 1xi
normal density (which has a rather ugly formula)
 The joint likelihood of observing 1, 2, …, n:
 L = f(1)*f(2)*…*f(n)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-35
To estimate 0 and 1 using ML (Computer)
 We do not know 0 and 1. Nor do we know i. In fact, our
objective is estimate 0 and 1.
 The procedure of ML:
1. Assume a combination of 0 and 1, call it b0 and b1. Compute the
implied ei = yi-b0-b1xi and f(ei)=f(yi-b0-b1xi)
2. Compute the joint likelihood conditional on the assumed values of
b0 and b1:
 L(b0,b1) = f(e1)*f(e2)*…*f(en)
 Assume many more combination of 0 and 1, and repeat the above
two steps, using a computer program (such as Excel).
 Choose the b0 and b1 that yield a largest joint likelihood.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-36
To estimate 0 and 1 using ML (Calculus)
 The procedure of ML:
1. Assume a combination of 0 and 1, call it b0 and b1. Compute the
implied ei = yi-b0-b1xi and f(ei)=f(yi-b0-b1xi)
2. Compute the joint likelihood conditional on the assumed values of
b0 and b1:
 L(b0,b1) = f(e1)*f(e2)*…*f(en)
 Choose b0 and b1 to maximize the likelihood function L(b0,b1) –
using calculus.
 Take the first derivative of L(b0,b1) with respect to b0, set it to
zero.
 Take the first derivative of L(b0,b1) with respect to b1, set it to
zero.
 Solve b0 and b1 using the two equations.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-37
MM versus ML in the estimation of 0 and 1
 It turns out that in this special case, the estimators b0 and b1 turn
out to the same as the one obtained by method of moments and
OLS (to be discussed later).
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-38
Alternative assumption
X and Y are bivariate normal
Suppose (X1, X2) ~ BVN(m1,m2,s12,s22, s12),
 The marginal distribution of X1 is normal:
 X1~N(m1,s12)
 The conditional distribution of X2 given x1 is normal:
 X2|x1 ~ N(a +  x1, s2)
where a = m2-m1, rs2/s1 = s12/s12,
s2 = s22(1-r2)=s22-2s12
E(X2|X1)
Reference: Goldberger, Arthur S. (1991): A course in Econometrics, Harvard University Press.
See page 75.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-39
Estimation
Ordinary least squares
 For each value of X, there is a group of Y values, and these
Y values are normally distributed.
Yi~ N(E(Y|X),si2), i=1,2,…,n
 The means of these normal distributions of Y values all lie
on the straight line of regression.
E(Y|X) = 0+1X
 The standard deviations of these normal distributions are
equal.
si2= s2
i=1,2,…,n
i.e., homoskedasticity
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-40
Assumptions Underlying Linear
Regression
 yi and yk are independently drawn from the population, say,
as in sampling with replacement.
 Cov(i,j) = 0 for all i ≠ j
 Note that independence implies much more than zero
covariance.
 For two discrete random variables, they are independent
if P(x,y) = P(x)P(y)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-41
Choosing the line that fits best
The question is:
Which straight line fits best?
y
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
x
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-42
Choosing the line that fits best
The best line is the one that minimizes the sum of squared
vertical differences between the points and the line.
Sum of squared differences = (2 - 1)2 + (4 - 2)2 +(1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
4
3
2.5
2
Let us compare two lines
(2,4)
w
The second line is horizontal
w (4,3.2)
(1,2) w
w (3,1.5)
1
1
Ka-fu Wong © 2007
The smaller the sum of squared
differences the better the fit of the line
to the data. That is, the line with the
least sum of squares (of differences)
will fit the line best.
2
3
4
ECON1003: Analysis of Economic Data
Lesson10-43
Choosing the line that fits best
Ordinary Least Squares (OLS) Principle
 Straight lines can be described generally by
Y = b0 + b1X
 Finding the best line with smallest sum of squared difference is the same
as
Min S(b0,b1) = S[yi – (b0 + b1xi)]2
 Let b0* and b1* be the solution of the above problem.
Y* = b0* + b1*X
is known as the “average predicted value” (or simply “predicted
value”) of y for any X.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-44
Coefficient estimates from the
ordinary least squares (OLS) principle
 Solving the minimization problem implies the first order conditions:
S(b0,b1) = S[yi – (b0 + b1xi)]2
∂S(b0,b1) /∂ b0= S (2)[yi – (b0 + b1xi)](-1) = 0
S [yi – (b0 + b1xi)] = 0
S ei = 0
∂S(b0,b1) /∂ b1= S (2)[yi – (b0 + b1xi)](-xi) = 0
S [yi – (b0 + b1xi)](xi) = 0
S ei xi = 0
The same as what we had earlier in the discussion of
method of moments.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-45
The consistency of b0 and b1 as estimators of
0 and 1
 Same equations implies same estimator:
 b1 = Sxy / Sxx and b0 = my – b1 mx
 The estimators are consistent as long as Sxy, Sxx, my and mx are
consistent estimators of the corresponding population quantities.
 Sxy converges to Cov(x,y) as sample size increases.
 Sxx converges to Var(x) as sample size increases.
 my converges to E(y) as sample size increases.
 mx converges to E(x) as sample size increases
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-46
The b0 and b1 are linear in yi
b1





n
i1
n
n
i1
( x i  x )(yi  y )

n
2
(
x

x
)
i
i1
( x i  x ) yi
2
(
x

x
)
i
i1




n 
(xi  x )
 i  1
 n ( x  x )2
 i  1 i

n 
(xi  x )
 i  1
 n ( x  x )2
 i  1 i
n
i1
n
(xi  x )y
2
(
x

x
)
i
i1
n

y  i1 ( x i  x ) y
n
2
 i
(
x

x
)

i
i1


n
y 
cy
 i i  1 i i

Similarly, it can be shown that b0 is also a linear combination of yi.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-47
The unbiasedness of b0 and b1 as estimators
of 0 and 1
 The estimator.
ci 
 b1 = S ci yi

(x i  x)
n
2
(
x

x
)
i
i 1
 E(b1|x) = S ci E(yi) = S ci E(0+1xi+i) = 0 S ci +1 S ci xi+ S ci E(i)
= 0 0 +1 1 + 0 = 1
i 1 ci x i 
n

n
n
 ci 
i 1
n

i 1
(x i - x)

n
 (x
i 1
Ka-fu Wong © 2007
i
- x)
2
 (x i - x)
i 1
n
 (x
i 1
i
- x)
2
0

n
i 1




( x i  x )x i
n
i 1

n
2
(
x

x
)
i
i 1
( x i  x )x i
i 1( x i  x)2
n
 i 1
n
( x i  x)x

n
2
(
x

x
)
i
i 1
n
2
(
x

x
)
i
i 1
n
i 1
( x i  x)
2
1
ECON1003: Analysis of Economic Data
Lesson10-48
Best estimator
 It can also be shown that the estimators b0 and b1 have the
smallest variance among all unbiased estimators that are linear
combination of yi.
 That is, if there is another unbiased estimator b1’ that is linear
combination of yi, say, b1’ = S ci’ yi, we must have
Var(b1)  Var(b1’)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-49
BLUE




Best: smallest variance
Linear: linear combination of yi
Unbiased: E(b0) = 0, E(b1) = 1
Estimator
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-50
Interpretation of Coefficients
yi = b0 + b1xi + ei
1.
Slope (b1)
 Estimated Y changes by b1 for each 1 unit increase in X
y* + Dy*= b0 + b1(x+1)
Dy*= b1
More generally,
y* + Dy*= b0 + b1(x+Dx)
Dy*/Dx = b1
2.
Y-Intercept (b0 )
 Estimated value of Y when X = 0
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-51
EXAMPLE 2
continued from Example 1
 Develop a regression equation for the information given in
EXAMPLE 1. The information there can be used to estimate the
selling price based on the number of pages.
The regression equation is:
Y* = 48.0 + .05143X
The equation crosses
the Y-axis at $48. A
book with no pages
would cost $48.
The slope of the line is .05143.
Each additional page costs
about $0.05 or five cents.
Note: the sign of the b value and the sign of r will always be the same.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Result part 1
Coefficients
Intercept
X
Standard Error
t Stat
P-value
48
16.94157037
2.833267456
0.029829
0.051428571
0.026998733
1.904851306
0.105458
=0.051428571/ 0.026998733
b1
Sb1
=(b1-0)/Sb1
H0: 1 = 0
Pr(t>t Stat)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-53
Example 2
continued from Example 1
We can use the regression equation to estimate values of Y.
 The estimated selling price of an 800 page book is $89.14,
found by
Y*
Ka-fu Wong © 2007
=
=
48.0 + .05143X
48.0 + .05143(800) = 89.14
ECON1003: Analysis of Economic Data
Lesson10-54
Standard Error of Estimate
(denoted se or Sy.x)

Additional assumption: Var(i) = Var(j) = Var() = s2 for all i,j

Recall yi=0 + 1 xi+ i

Var() = 0 implies y are
perfectly linear with x.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-55
Scatter Around the Regression Line
Small Var()
More Accurate Estimator
of X, Y Relationship
Ka-fu Wong © 2007
Large Var()
Less Accurate Estimator
of X, Y Relationship
ECON1003: Analysis of Economic Data
Lesson10-56
Standard Error of Estimate
(denoted se or Sy.x)

Additional assumption: Var(i) = Var(j) = s2 for all i,j.

Se2 is a estimate of s2.
 Hence, it may be interpreted as
 a measure of the reliability of the estimating equation
 A measure of dispersion
 Measures the variability, or scatter of the observed values around
the regression line
n
se  sy . x 
 (y
i 1
i
 y *i )2
n2
n

Ka-fu Wong © 2007
2
(
y

b

b
x
)
 i 0 1i
i 1
n2
ECON1003: Analysis of Economic Data
Lesson10-57
Interpreting the Standard Error of the
Estimate

Assumptions:
 Observed Y values are normally distributed around
each estimated value of Y*
 Constant variance

se measures the dispersion of the points around the

se may be used to compute confidence intervals of the
regression line
 If se = 0, equation is a “perfect” estimator
estimated value
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-58
Variation of Errors Around the
Regression Line
f(e)
y values are normally distributed
around the regression line.
For each x value, the “spread” or
variance around the regression line
is the same.
Y
X2
X1
X
Ka-fu Wong © 2007
Regression Line
ECON1003: Analysis of Economic Data
Lesson10-59
Variation of Errors Around the
Regression Line
X2
Y
X
E(y|x)=0+1x
y is distributed normal with mean E(y|x)=0+1x, and variance s2.
Confidence intervals may be constructed in the usual fashion.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-60
Scatter around the Regression Line
Y = b0 + b1X + 2se
Y = b0 + b1X + 1se
Dependent Variable ( Y)
Y = b0 + b1X regression line
Y = b0 + b1X - 1se
Y = b0 + b1X - 2se
2se (95.5% Lie in this Region)
Independent Variable (X)
Ka-fu Wong © 2007
1se (68% Lie in this Region)
ECON1003: Analysis of Economic Data
Lesson10-61
Example 3
continued from Example 1 and 2.
Find the standard error of estimate for the problem
involving the number of pages in a book and the selling
price.
n
se 
2
(
y

b

b
x
)
 i 0 1 i
i 1
n2
51,606  48(636)  0.05143(397,200)

82
 10.408
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-62
Equations for the Interval Estimates
Confidence Interval for the Mean of y
y *  tse h
y*=b0 + b1 x
Prediction Interval for y
y  tse 1 h
*
y=b0 + b1 x + e
1
( x  x )2
h  n
n
2
(
x

x
)
 i
Minimized when
xx
i 1
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-63
Confidence Interval Estimate for Mean
Response
y* = b0+b1xi
The following factors
influence the width of
the interval: Std Error,
Sample Size, X Value
X
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-64
Confidence Interval
continued from Example 1, 2 and 3.
 For books of 800 pages long, what is that 95%
confidence interval for the mean price?
 This calls for a confidence interval on the average
price of books of 800 pages long.
y  tse h  y  tse
*
*
1

n
( x  x )2
n
2
(
x

x
)
 i
i 1
1
 89.14  2.447(10.408)

8
(800  612.5)2
( 4900)2
3,150,000 
8
 89.14  15.31
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Prediction Interval continued from Example 1, 2 and 3.
 For a book of 800 pages long, what is the 95% prediction
interval for its price?
 This calls for a prediction interval on the price of an
individual book of 800 pages long.
y  tse 1  h  y  tse
*
*
1
1 
n
( x  x )2
n
2
(
x

x
)
 i
i 1
1
89.14  2.447(10.408) 1  
8
(800  612.5)2
( 4900)2
3,150,000 
8
89.14  29.72
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Test of Slope Coefficient (b1)
1.
Tests if there is a linear relationship between X & Y
2.
Involves population slope 1
3.
Hypotheses
 H0: 1 = 0 (no linear relationship)
 H1: 1  0 (linear relationship)
4.
Theoretical basis is sampling distribution of slopes
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-67
Sampling Distribution of the Least
Squares Coefficient Estimator


( x i  x)

y 
b1  i 1
 n ( x  x )2  i
 i 1 i

n


n
i 1
ci y i
If the standard least squares assumptions hold, then b1 is an
unbiased estimator of 1 and has a population variance
s b2 
1
s2

n
 ( x  x)
2
s2
(n  1) s x2
i
i 1
and an unbiased sample variance estimator
sb21 
Ka-fu Wong © 2007
se
n
2
2
(
x

x
)
 i
2
se

(n  1) s x2
i 1
ECON1003: Analysis of Economic Data
Lesson10-68
Basis for Inference About the Population
Regression Slope

Let 1 be a population regression slope and b1 its least
squares estimate based on n pairs of sample observations.
Then, if the standard regression assumptions hold and it can
also be assumed that the errors i are normally distributed,
the random variable
b1 - 1
t
sb1
is distributed as Student’s t with (n – 2) degrees of freedom.
In addition the central limit theorem enables us to conclude
that this result is approximately valid for a wide range of
non-normal distributions and large sample sizes, n.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-69
Tests of the Population Regression Slope

1.
If the regression errors i are normally distributed and the
standard least squares assumptions hold (or if the distribution
of b1 is approximately normal), the following tests have
significance value a:
To test either null hypothesis
H0 : 1 = 1 *
or H0: 1  1*
against the alternative
H1 : 1 > 1 *
The decision rule is to reject if
*
b1 - β1
t
 t (n- 2), a
s b1
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-70
Tests of the Population Regression Slope
2.
To test either null hypothesis
H0:  1 =  1*
or H0: 1 > 1*
against the alternative
H 1:  1   1*

the decision rule is to reject if
*
b1 - β1
t
≤ t (n -2), a
s b1
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-71
Tests of the Population Regression Slope
3.
To test either null hypothesis
H0:  1 =  1*
against the alternative
H 1:  1   1*

the decision rule is to reject if
*
*
b1 - β1
b1 - β1
t
≥ t (n -2), a/2 or t 
≤ - t (n -2), a/2
s b1
s b1
Equivalently
Ka-fu Wong © 2007
*
b1 - β1
t
≥ t (n- 2), a/2
s b1
ECON1003: Analysis of Economic Data
Lesson10-72
Confidence Intervals for the Population
Regression Slope 1

If the regression errors i , are normally distributed and
the standard regression assumptions hold, a 100(1 a)% confidence interval for the population regression
slope 1 is given by
b1 - t (n- 2),a/2 sb1  β1  b1  t (n-2),a/2 sb1
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-73
Some cautions about the interpretation of
significance tests
 Rejecting H0: b1 = 0 and concluding that the relationship
between x and y is significant does not enable us to
conclude that a cause-and-effect relationship is present
between x and y.
 Causation requires:
 Association
 Accurate time sequence
 Other explanation for correlation
Correlation  Causation
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-74
Some cautions about the interpretation of
significance tests
 Just because we are able to reject H0: 1 = 0 and
demonstrate statistical significance does not enable us to
conclude that there is a linear relationship between x and
y.
 Linear relationship is a very small subset of possible
relationship among variables.
 A test of linear versus nonlinear relationship requires
another batch of analysis.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-75
Evaluating the Model

Variation Measures
 Coeff. Of Determination
 Standard Error of Estimate

Test Coefficients for
Significance
yi* = b0 +b1xi
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-76
Variation Measures
Y
Yi
Unexplained Sum of
Squares (Yi - Yi*)2
Total Sum of
Squares (Yi - Y)2
SSE
yi* = b0 +b1xi
SST
Explained Sum of
Squares (Yi* - Y)2
SSR
Y
Xi
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
X
Lesson10-77
Measures of Variation in Regression




Total Sum of Squares (SST)
 Measures variation of observed Yi around the
mean,Y
Explained Variation (SSR)
 Variation due to relationship between
X&Y
Unexplained Variation (SSE)
 Variation due to other factors
SST=SSR+SSE
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-78
Variation in y (SST) = SSR + SSE
n
SST:
2
(
y

y
)
 i
i 1
n
*
*
2
(
y

y

y

y
)
 i
i 1
n
* 2
*
2
*
*
(
y

y
)

(
y

y
)

(
y

y
)(
y
 y)
 i
i
i 1
n
n
n
i 1
i 1
i 1
n
 y )   ( y  y)
* 2
*
2
*
*
(
y

y
)

(
y

y
)

(
y

y
)(
y
 y)
 i

 i
(y
i 1
n
* 2
i
SSE
Ka-fu Wong © 2007
*
i 1
2
=0, as imposed in the
estimation, E(x)=0.
SSR
ECON1003: Analysis of Economic Data
Lesson10-79
Variation in y (SST) = SSR + SSE
 R2 (=r2, the coefficient of determination) measures the
proportion of the variation in y that is explained by the
variation in x.
n
R 2  1
SSE

n
 (y i  y )
i 1
2
2
(
y

y
)
 SSE
 i
i 1
n
 (y i  y )
i 1
2

SSR
n
2
(
y

y
)
 i
i 1
 R2 takes on any value between zero and one.
 R2 = 1: Perfect match between the line and the data
points.
 R2 = 0: There are no linear relationship between x and y.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-80
Summarizing the Example’s results
(Example 1, 2 and 3)
 The estimated selling price for a book with 800 pages is
$89.14.
 The standard error of estimate is $10.41.
 The 95 percent confidence interval for all books with 800
pages is $89.14 ± $15.31. This means the limits are
between $73.83 and $104.45.
 The 95 percent prediction interval for a particular book with
800 pages is $89.14 ± $29.72. The means the limits are
between $59.42 and $118.86.
 These results appear in the following output.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Example 3
continued
Regression Analysis: Price versus Pages
The regression equation is
Price = 48.0 + 0.0514 Pages
Predictor
Constant
Pages
S = 10.41
Coef
48.00
0.05143
R-Sq = 37.7%
Analysis of Variance
Source
DF
Regression
1
Residual Error
6
Total
7
Ka-fu Wong © 2007
SE Coef
16.94
0.02700
SS
393.4
650.6
1044.0
T
2.83
1.90
P
0.030
0.105
R-Sq(adj) = 27.3%
MS
393.4
108.4
ECON1003: Analysis of Economic Data
F
3.63
P
0.105
Lesson10-82
Testing for Linearity
Key Argument:
 If the value of y does not change linearly with the value of x,
then using the mean value of y is the best predictor for the
actual value of y. This implies y  y is preferable.
 If the value of y does change linearly with the value of x,
then using the regression model gives a better prediction
for the value of y than using the mean of y. This implies
y=y* is preferable.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Three Tests for Linearity
 Testing the Coefficient of Correlation
H0: r = 0 There is no linear relationship between x and y.
H1: r  0 There is a linear relationship between x and y.
Test Statistic:
t
r
(1  r 2 ) /(n  2)
 Testing the Slope of the Regression Line
H0: 1 = 0 There is no linear relationship between x and y.
H1: 1  0 There is a linear relationship between x and y.
Test Statistic:
b1
t 
sb1
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Three Tests for Linearity
 The Global F-test
H0: There is no linear relationship between x and y.
H1: There is a linear relationship between x and y.
n
Test Statistic:
( y i*  y )2

MSR
SSR / 1
F

 n i 1
MSE SSE /(n  2)
 ( y i  y i* )2 /(n  2)
i 1
[Variation in y] = SSR + SSE. Large F results from a large SSR. Then, much of
the variation in y is explained by the regression model. The null hypothesis
should be rejected; thus, the model is valid.
Note: At the level of simple linear regression, the global F-test is equivalent to
the t-test on b1. When we conduct regression analysis of multiple variables,
the global F-test will take on a unique function.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Residual Analysis


Purposes
 Examine Linearity
 Evaluate violations of assumptions
Graphical Analysis of Residuals
 Plot residuals versus Xi values
 Difference between actual Yi & predicted Yi*
 Studentized residuals:
 Allows consideration for the magnitude of the residuals
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-86
Residual Analysis for Linearity
OK
Not Linear
e
Linear
e
X
X
For example, if truth is
y = 0 + 1 x + 2 x2 + 
The estimated residuals are
likely e=2x2 + 
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-87
Residual Analysis for Homoscedasticity
 When the requirement of a constant variance
(homoscedasticity) is violated we have heteroscedasticity.
Using Standardized Residuals (e/se)
OK Homoscedasticity
Heteroscedasticity
SR
SR
X
X
For example, for xi>xj
Var(i|xi)>var(j|xj)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-88
Residual Analysis for Independence
Using Standardized Residuals (e/se)
OK Independent
Not Independent
SR
SR
X
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
X
Lesson10-89
Non-independence of error variables
 A time series is constituted if data were collected over time.
 Examining the residuals over time, no pattern should be
observed if the errors are independent.
 When a pattern is detected, the errors are said to be
autocorrelated.
 Autocorrelation can be detected by graphing the residuals
against time.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-90
Patterns in the appearance of the residuals
over time indicates that autocorrelation exists.
Residual
Residual
+ ++
+
0
+
+
+
+
+
+ +
+
+
+
++
+
+
+
Time
Note the runs of positive residuals,
replaced by runs of negative residuals
Ka-fu Wong © 2007
+
+
+
0 +
+
+
+
Time
+
+
Note the oscillating behavior of the
residuals around zero.
ECON1003: Analysis of Economic Data
Lesson10-91
The Durbin-Watson Statistic


Used when data is collected over time to detect
autocorrelation (Residuals in one time period are
related to residuals in another period)
Measures Violation of independence assumption
n
D
 (e
i 2
i
 ei 1 )2
n
e
i 1
2
i
Should be close to 2.
If not, examine the model for
autocorrelation.
Intuition: If x and y are independent, Var(x-y)= Var(x) + Var(y)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-92
Outliers
 An outlier is an observation that is unusually small or large.
 Several possibilities need to be investigated when an outlier
is observed:
 There was an error in recording the value.
 The point does not belong in the sample.
 The observation is valid.
 Identify outliers from the scatter diagram.
 It is customary to suspect an observation is an outlier if its
|standard residual| > 2
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-93
An outlier
An influential observation
+ +
+
+ +
+ +
+ +
+++++++++++
… but, some outliers
may be very influential
+
+
+
+
+
+
+
The outlier causes a shift
in the regression line
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-94
Remedying violations of the required
conditions
 Nonnormality or heteroscedasticity can be remedied using
transformations on the y variable.
 The transformations can improve the linear relationship
between the dependent variable and the independent
variables.
 Many computer software systems allow us to make the
transformations easily.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-95
A brief list of transformations
 Y’ = log y (for y > 0)
 Use when the se increases with y, or
 Use when the error distribution is positively skewed
 y’ = y2
 Use when the se2 is proportional to E(y), or
 Use when the error distribution is negatively skewed
 y’ = y1/2 (for y > 0)
 Use when the se2 is proportional to E(y)
 y’ = 1/y
 Use when se2 increases significantly when y increases beyond
some value.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-96
Example: Transformation to get linearity
Yi = b0 + b1 Xi + b2 Xi2 + ei
Y i = b 0 + b Xi + e i
OK
Not Linear
e
e
X
Ka-fu Wong © 2007
Linear
ECON1003: Analysis of Economic Data
X
Lesson10-97
Lesson 10:
Regressions Part I
- END -
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson10-98