What is the MPC?

Download Report

Transcript What is the MPC?

What is the MPC?
Learning Objectives
1. Use linear regression to establish the
relationship between two variables
2. Show that the line is the line of best fit in
precise sense
3. Show that the line links the conditional
expectations of the variables
4. A more formal approach to hypothesis
testing
Consumption Function
• Keynesian Consumption function
•  income today,  consumption today
• C=a+b*Y
• Econometrics : quantify economic
relationships
– What are “a” and “b”
Look at some data
• Look at individual level data: individual.dta
• Stata: scatter cons nmwage
• This gives a scatter plot with the first variable
on the vertical axis and the second variable on
the x axis
-1000
0
1000
2000
3000
Look at data
0
500
1000
net monthly wage
1500
2000
Two Obvious facts
1. Observe many households at different
income levels
– There is clearly a positive relationship
2. cons depends on income but households
with same income will not have same
consumption
– other factors influence consumption
How do we Calculate the MPC?
• Draw a line
• Many possible lines
• Intuition tells us that an “average” line would
be a better estimate
– We will show why this intuition is correct later
• Any line we draw (even the “best”) will not go
through all the points
– There will be deviations from the line
Conditional Expectation
• As an alternative to the line we could follow the logic
of the gender example from the pervious section and
look at conditional expectation
• Recall we answered the question of gender
discrimination by comparing the average wage of two
groups
– The expected waged conditional on being a man or woman
– we used the “summ if” command
• Formally
– E(hwage|gender==1)=6.701875
– E(hwage|gender==2)= 5.451302
Conditional Expectation
• We can apply the same logic to the
consumption function.
• Divide in two groups
– Rich: nmwage>1000
– Poor: nmwage<1000
– generate rich=(nmwage>1000)
• Compare the average consumption of each
using summ if
Conditional Expectation
• We get average consumption conditional on
being rich or poor
– E(Cons|Rich)= 1024.11
– E(Cons|Poor)= 534.33
• We can measure the marginal propensity of
consume by taking the average income of
each group
– E(nmwage|Rich)= 1282.42
– E(nmwage|Poor)= 621.14
Conditional Expectation
• As you move from “poor” to “rich” your
income rises by:
– 1282-621=661
– And consumption rises by: 1024-534=490
• So an estimate of the MPC would be 490/661
which is 0.74
• This is a simple and intuitive method that
builds on the logic of the gender example
• But…..
Obvious Problem
• The division between risk and poor was entirely
arbitrary
– Not natural like gender
• We throw away information by forcing individuals
into one group or another
• Why not have 3 groups or any number of groups
you like
• Intuitively the more the better
– 10 group example
• But large numbers of groups would make
calculations tedious and would always leave out
some information
0
500
group_c
1000
1500
10 Income Groups
0
500
1000
group_w
1500
2000
Compromise
• Imagine there are an infinity of groups but the
conditional means are all related
• Specifically they have a linear relationship
– E(cons|nmwage)=a+b*nmwage
• From now on we will write in more general
notation
– E(Y|X)=b1+b2X
Comment
• Note this is a restriction and it may not be true in
the real world
• We impose it on the model
– Looks reasonable in the consumption example
• If it isn't true then there might be a problem
– Linear approx
– GIGO
• Relationship doesn’t have to be linear but it does
have to be parametric
– We will see more on this later
So to Recap…
• We have data that appears to illustrate a
relationship between two variables
• Intuitively we will put a line through the data
that represents the data in some way
• What way? Two ways:
1. the line links all the conditional means
2. We choose the particular line that is closest to
the data in a defined way
• These turn out to be the same
Draw a line to represent the data
Show three data points for illustration
Y
E(Y|X)=b 1+b2X
Y1
u1
Y3
u3
Y2
u2
b1
X2
X1
X3
b2  E(Y | X ) : slope coefficient: Change in E(Y|X)
X
for a change in x.
X
An Explanation
• Change in notation to be more general
– Y is the LHS or dependent variable
– X is the RHS or independent variable
• E(Y|Xi) = conditional mean i.e. does not describe
every observation
– Yi = E(Y|Xi) + ui
– ui represents the deviation of each individual
observation from the conditional mean
• Yi = E(Y|Xi) + ui
• Yi = b1+b2 Xi + ui
What is Ui?
• Any factor other than income (X) which
influences consumption (Y)
– individual tastes and unpredictability
• approximation error because of assumption of
linear relationship
• Later we will model this a random variable
• Perhaps with a normal distribution
– Remember our warnings about the bell curve
OLS Estimation
• Find line of “best fit”
• Method of Ordinary Least Squares (OLS) to
estimate b1 b2
• Objective: find estimates of b1 b2 that
minimizes the distance between the
regression line and the actual data points, i.e.
minimize the error terms
• Minimise the sum of squared deviations i.e.
– Aside: why not absolute deviation or others?
Algebra of OLS
•
•
•
•
min i ui2 i.e. min (u12 + u22+u32+…+ui2)
Yi = b1+b2Xi+ui => ui = Yi - b1-b2X
i ui2 = i (Yi - b1-b2X )2 = S(b1 , b2)
=> sum of squared errors is a function of b1 ,
b2
• min S(b1 , b2) = min i (Yi - b1-b2X )2
• To find minimum of any function: differentiate
with respect to the arguments and set
derivative = 0 i.e. find the point where the
slope with respect to the argument = 0.
S(b1 , b2) = Ni (Yi - b1-b2X )2
N
 S (b1, b2 )
 2Yi  b1  b2 X i
b1
i1
N
 S (b1, b2 )
 2 X i (Yi  b1  b2 X i )
b2
i1
To find the minimum set these equal to zero.
b1 , b2 are the solutions to these equations when
they are set = 0:
N
2Yi  b1  b2 X i  0
i1
N
2 X i (Yi  b1  b2 X i )  0
i1
N
b1  y  b2 x
 ( xi  x)( yi  y)
b2  i1 N
 ( xi  x)2
i1
An Explanation
• b1, b2 are the Ordinary Least Squares (OLS)
estimators of the true population parameters
b1 , b2.
• b2 is the estimator of the slope coefficient: the
slope coefficient measures the effect on y of a
one unit change in x
• b1 is the estimator of the intercept: the value
of Y which occurs if X=0;
OLS in stata
i ui2
regress cons nmwage
Source |
SS
df
MS
-------------+-----------------------------Model | 98124170.1
1 98124170.1
Residual |
215041332 1328 161928.714
-------------+-----------------------------Total |
313165502 1329 235639.956
Number of obs
F( 1, 1328)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
1330
605.97
0.0000
0.3133
0.3128
402.4
-----------------------------------------------------------------------------cons |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------nmwage |
.7562304
.0307205
24.62
0.000
.6959644
.8164964
_cons |
62.47876
25.9165
2.41
0.016
11.63701
113.3205
------------------------------------------------------------------------------
Estimated
coef
The Answer
• The regression gives us a measure of the MPC
• The OLS estimate of the MPC is 0.756
• What use is this
– Prediction
– Causation
– Statistical inference
Prediction
• We can use this to make predictions
• What would the consumption be if income were
2500
• Cons= 62.47876 + 0.7562304*2500
– This is equal to 1953
• Be careful this is the predicted conditional mean
– It is the next point on the line
– What people with 2500 would consume on average
– What they actually will consume is unknown because
we don’t observe their Ui
Predicted Consumption
Actual
Consumption
E(Y|X)=b 1+b2X
Y
u4
Y4
u1
Y1
Y2
Predicted
Cons
u3
Y3
u2
b1
X2
X1
X3
X4
X
Causation
• Remember all this only really identifies
variables that move together
• It doesn’t show causation
• Need theory for that
• Obvious in the gender example (wages don’t
cause changes in gender)
• Not obvious here causation can run both ways
Statistical Inference
• This estimate is generated from a sample
• Recall that the issue is whether we can use
this fact about the sample to make statements
about the world (“population”)
• The same issues of statistical inference arise in
context of regression
– OLS estimates are sample statistics just like the
sample average wages in the gender example
More on the Residual (Ui)
• The residual is the difference between the line
(conditional expectation) and the actual data
• Think of every individuals consumption as being
made up of two bits
– Conditional expectation
– Residual
• The conditional expectation is that same for
everyone with the same X (income)
• Residual is potentially different even for those
with same income
Random Variable
• Residual is unknown in advance so we model
it as a random variable
• Think of consumption being determined by
systematic bit plus a roll of a dice
• See diagram
– Actual consumption (expectation+residual) is
distributed around the mean
– All the means are linked
-1000
0
1000
2000
3000
Each distribution is a slice in the data
0
500
1000
net monthly wage
1500
2000
Distribution of Y for two different
“slices” of X
f(Y|X=900)
y|x=900
consumption
Probability distribution of expenditure given
income = 900 and income = 1200
f(Y|X=900)
f(Y|X)
f(Y|X=1200)
Empirical Distribution
• We can use the hist comand in stata to look at
this
• Just as we got distribution of hwage for men and
women
• hist cons, by(rich) norm
• We could do the same for any income group
– hist cons if nmwage<1100 &nmwage >900, norm
• All OLS does is draw a line through all the means
• Imagine laying all these distributions side by side
1
0
5.0e-04
Density
.001
.0015
0
-1000
0
1000
2000
3000 -1000
monthly consumption
Density
normal cons
Graphs by rich
0
1000
2000
3000
0
2.0e-04 4.0e-04 6.0e-04 8.0e-04
Density
.001
The “Slice” Around nmwage=1000
-500
0
1000
500
monthly consumption
1500
2000
Distribution of Y
f(Y|X)
E(Y|X)
X=600
X =900
X=1200
Putting it all together
• We usually assume that the residual is a
normal random variable
• Seems reasonable in this case
– But remember our concerns about normal
• So the full model is
– Yi = b1+b2 Xi + ui
– Where E(Y|Xi)= b1+b2 Xi
– And ui ~N(0,s2)