Simple Regression

Download Report

Transcript Simple Regression

Simple Regression
Major Questions
• Given an economic model involving a
relationship between two economic
variables, how do we go about
specifying the corresponding statistical
model?
• Given the statistical model and a
sample of data on two economic
variables, how do we use this
information?
Main Points
• Identify relationships between economic
variables
• Answer questions like: If one variable
changes in some way, by how much does
another variable change?
• Move from studying one economic
variable to studying two
Specific Example
• Extend household food expenditure
• Let population of interest be all
households with three members, no
matter what income level
• Can now look at what happens to food
expenditure as income rises or falls?
Economic Model
• Y : Household expenditure on food
• X : Household income
• Economic Model (general form)
y = f(x)
• Specifies that household expenditure
on food is some function of household
income
Relationship between y and x
• Need to quantify the change in food
expenditure that occurs when income
changes
• Must be more precise about the nature of
the relationship between x and y
• Many possible forms – sometimes theory
provides some guide
• Simplest form: y = a + bx (linear)
Statistical Model: Error Terms
• Economic model is an approximation
• Need to account for other factors that affect
the relationship between economic variables
• Add an unobservable error term (e)
y = a + bx + e
1. The combined effect of other influences on x
2. Approximation error from functional form
3. Elements of random behavior by individuals
Adding Data
• Suppose we have observations of y and x
from i = 1,2,…, n households
yi = a + bxi + ei
• y : Dependent or Response variable
• x : Explanatory variable
• Level of household expenditure on food is
related to the level of household income
Method of Least Squares
• The parameters a and b tell us about the
relationship between y and x
• We need a rule to tell us how to make use of
sample data to estimate the parameters
• We use the Least Squares Method: find a
line so that the sum of the squares of the
vertical distances from each point to the line
is as small as possible
Residual Errors
y
y = a + bx
yi (exp)
.
e
}
.
i
yi (obs)
xi
x
Residual Errors
• There will be n of these
• Depend on the fitted line as
defined by the specific values a
and b
• Squares can be summed
2=
e
i
 (y i -
a - bxi )2
Normal equations
y = na + bx
xy = ax + bx
2
• Two equations in two unknowns
• They can be solved for a unique solution
Formulae for the regression
coefficients
xy  n x y

b
2
2
 x  nx
y  b x

a
n
Correlation
The correlation between two
random variables X and Y
measures the strength of the
relationship between them.
Coefficient of determination
The coefficient of determination is a
statistic which measures the extent to
which the variation in Y is explained by
the regression line of Y on X.
It is denoted by r2.
Coefficient of determination
y
y = a + bx
y=y
x
Coefficient of determination
The quantity
 ( y  y)
2
is known as the unexplained variation
since the deviations
( y  y )
are completely random and thus
unpredictable.
Coefficient of determination
The coefficient of determination is given
by
2
r
2
( y  y )


2
 ( y  y)
where numerator = explained variation
denominator = total variation
Coefficient of determination
The coefficient of determination is given
by
a y  b xy  ny
r 
2
2
 y  ny
2
2
Correlation coefficient
The linear product-moment correlation
coefficient is given by
r
n xy   x y
n x   x 
2
2
n y   y 
2
2