Economics of the Government 政府经济学

Download Report

Transcript Economics of the Government 政府经济学

Analysis of Cross Section and Panel Data

Yan Zhang School of Economics, Fudan University CCER, Fudan University

Introductory Econometrics A Modern Approach

Yan Zhang School of Economics, Fudan University CCER, Fudan University

Analysis of Cross Section and Panel Data

Part 1. Regression Analysis on Cross Sectional Data

Chap 4. Multiple Regression Analysis

：

Inference



E., Var., BLUE the full sample distribution of the OLS estimators



Assumption 6 (Normality): Assumption 6 Assumption 3 and 5 (zero conditional mean; homogenous)

 古典假设

Classical Linear Model (CLM) Assumptions: Assumption 1

－

The Efficiency of the OLS estimators



The Efficiency of the OLS estimators under different assumptions:



Gauss-Markov Assumptions: BLUE, minimum variance

linear

unbiased estimators



CLM Assumptions: minimum variance unbiased estimators (among

all

, not only linear estimators in the y

)



The Population Assumptions of the CLM:



Assumption 6 Given x, the distribution of y is normal



Clearly wrong e.g. narr86; prate How?

The Normality of the OLS estimators



Normality of u of the OLS estimators: normal sampling distributions



Therefore, standard normal random v.:



More results:



any linear combination of the is also normally distributed, and any subset of the has a joint normal distribution. Testing Hypothesis

Whether Normality of u Can Be Assumed?

  

Empirical matter: transformation, log(price) CLT(

中心极限定理

) : Non-normality of the errors is not a serious problem with large sample sizes.



Even though the y

estimators are not from a normal distribution, the OLS are approximately normally distributed, at least in large sample sizes.

（

Chap. 5.

）  仅需

Gauss-Markov

假设；有限方差、同方差、零条件均值

Whether normality of u can be assumed?



And

 

The R-Squared Form of the

Statistic



F Statistic:



P Value:



The F Statistic for Overall Significance of a Regression

SST 

Testing General Linear Restrictions

4.5 Reporting Regression Results

 

How to report multiple regression results for relatively complicated empirical projects —— Including:



the estimated OLS coefficients (interpret the estimated coefficients of the key variables; the units of measurement)



the standard errors (s.e VS t statistic)



The R-squared



The number of observations

 

Reporting the SSR and the standard error of the regression is sometimes a good idea, but it is not crucial.

Summarized in equation form Summarize the results in one or more tables.

Example 4.10: Salary-Pension Tradeoff of Teachers



Economic Model:

（

Standard wage model)



Econometric Model



Key variable: b/s



Null Hypothesis:



Controlling v. : enroll, staff, droprate, gadrate



Regression and Reports



Interpretations

Example 4.10: Salary-Pension Tradeoff of Teachers

Analysis of Cross Section and Panel Data

Part 1. Regression Analysis on Cross Sectional Data

Chap 6. Multiple Regression Analysis

：

Further Issues

6.1 Effects of Data Scaling on OLS Statistics



Units of Measurement Estimator; Statistic



standard errors, t statistics, F statistics, and confidence intervals.



When variables are rescaled, the coefficients, standard errors, confidence intervals, t statistics, and F statistics change in ways that preserve all measured effects and testing outcomes .

 含

log

形式只改变截距，不改变斜率系数，不改变

R 2

，不改变检验结果

6.1 Effects of Data Scaling on OLS Statistics

Beta Coefficients

 

E.g. test score Standardized: Z=(X-

均值

标准差

(sd)



Advantages:



The importance of each v.: the scale of the regressors irrelevant, comparing the magnitudes

 

6.2 More on Functional Form



Logarithmic Functional Forms



Models with Quadratics



Models with Interaction Terms

6.2.1 More on Using Logarithmic Functional Forms

   

Approximate Accurate Small % change: not crucial Large change: mitigate or eliminate heteroskedastic or skewed Advantages of using Ln:



leads to coefficients with appealing interpretations



when y>0, models using log(y) as the dependent v. often satisfy the CLM assumptions more closely than models using the level of y



taking logs usually narrows the range of the v., in some cases by a considerable amount. This makes estimates less sensitive to outlying (or extreme) observations on the dependent or independent v.

Some Standard Rules of Thumb for Taking Logs

  

The Log is often taken when the v. being large integer values



a positive dollar amount, wages, salaries, firm sales, and firm market value;



population, total number of employees, and school enrollment Variables that are measured in years — such as education, experience, tenure, age, and so on — usually appear in their

original

form.

A variable that is a proportion or a percent arrest rate on reported crimes in level forms.

— — such as the unemployment rate, the participation rate in a pension plan, the percentage of students passing a standardized exam, the can appear in either original or logarithmic form, although there is a tendency to use them



a percentage change and a percentage point change

More on Taking Logs



If a variable takes on zero or negative values:



The percentage change interpretations are often closely preserved, except for changes beginning at y=0 (where the percentage change is not even defined)



One

drawback

to using a dependent v. in logarithmic form is that it is more difficult to predict the original v.



it is not legitimate to compare

R 2

from models where y is the dependent v. in one case and log(y) is the dependent v. in the other.



(see Section 6.4)

6.2.2 Models with Quadratics



Decreasing or Increasing Marginal Effects: Quadratic Functions



parabolic shape and U-shape



Turning point



If this turning point is beyond all but a small percentage of the v. in the sample, then this is not of much concern



Other forms:



using quadratics along with logarithms.



some care is needed in making a useful interpretation with a dependent v. in log form and an explanatory v. entering as a quadratic



a nonconstant elasticity: double-log and quadratics



other polynomial terms: a cubic and even a quartic term

6.2.3 Models with Interaction Terms



Interaction Effects:



Sometimes it is natural for the partial effect, elasticity, or semi-elasticity of the dependent v. with respect to an explanatory v. to depend on the magnitude of yet another explanatory v.

Example 6.3: Effects of Attendence

   

on Final Exam Performance

Dependent v.: Final Exam Performance

Standardized

outcome on a final exam (stndfnl)



Be easier to interpret a student ’ s performance relative to the rest of the class Explanatory v.-s:



Percentage of classes attended (atdrte)



prior college grade point average (priGPA), and ACT score Functional Form:



Quadratics in priGPA and ACT



class attendance might have a different effect for students who have performed differently in the past, as measured by priGPA.

an interaction between priGPA and the atdrte. Econometric Model

Example 6.3: Regression Results, Inference and Interpretations



Sample Regression Function

：  

Jiont Hypothesis: What ’ s the Partial Effect of atdrte (priGPA) on stndfnl?



We must plug in interesting values of priGPA to obtain the partial effect.



The mean of priGPA :at the mean priGPA, the effect of atndrte on stndfnl is



-.0067+ .0056

(2.59)= .0078.

Meaning: a 10 percentage point increase in atndrte increases stndfnl by .078 standard deviations from the mean final exam score.



Statistic Significance: New Regression



replace priGPA

atndrte with (priGPA -2.59)

atndrte



gives the standard error of , which yields t=.0078/.0026= 3

6.3 More on Goodness-of-fit and Selection of Regressors



R-squared and Adjusted R-squared:



when a new independent v. is added to a regression, SSR and (n-k-1) both decrease, can go up or down.



increases if, and only if, the t statistic on the new v. is greater than one in absolute value. (An extension of this is that increases when a group of v.-s is added to a regression if, and only if, the F statistic for joint significance of the new v.-s is greater than unity.)



A negative indicates a very poor model fit relative to the number of degrees of freedom.



that it is R

, not , that appears in the F statistic

Using to Choose Between Nonnested Models



To choose a model without redundant independent v.:



Different functional form (different explanatory v.)



Limitation in using to choose between nonnested models: we cannot use it to choose between different functional forms for the

dependent

variable.



they are fitting two separate dependent variables.

Controlling for Too Many Factors in Regression Analysis



Overemphasize goodness-of-fit,:



regressed log(price) on log(assess), log(lotsize), log(sqrft), and bdrms



we should always include independent v. that affect y and are uncorrelated with all of the independent v.-s of interest.



Adding Regressors to Reduce the Error Variance

6.2 Prediction and Residual Analysis



Confidence Intervals for a Prediction from the OLS regression line.



The estimator of



The Standard Error of



Run the regression: The intercept term



Unobserved Error; Prediction Error

Residual Analysis



Residual Analysis:



whether the actual value of the dependent v. is above or below the predicted value; that is, to examine the residuals for the individual observations. This process is called residual analysis.

Predicting y When log( y) Is the Dependent Variable

The Goodness-of-fit Measure in the y Model and the log( y) Model



it is not legitimate to compare

R 2

from models where y is the dependent v. in one case and log(y) is the dependent v. in the other.



Solution

：

Analysis of Cross Section and Panel Data

Part 1. Regression Analysis on Cross Sectional Data

Chap 7. Multiple Regression Analysis with Qualitative Information: Binary (or Dummy) Variables

7.1 Dummy Explanatory Variables

 

Describing Dummy (Binary) Variables

 赋值

，

1 Meaning: e.g.



:Whether there is discrimination against women? The difference in hourly wage between females and males, given the same amount of education (and the same error term u).

 

Intercept Shift Dummy Variable Trap:



to keep track of which group is the base (benchmark) group.



we will always include an overall intercept for the base group.



Nothing changes about the mechanics of OLS or the statistical theory when some of the

independent

v.-s are defined as dummy variables. The only difference with what we have done up until now is in the interpretation of the coefficient on the dummy variable.

Interpreting Coefficients on Dummy Explanatory



V. When the Dependent V. Is log(y) Percentage Interpretation



Example

：

7.2 The Binary Dependent Variables: The Linear Probability Model

 

Dependent variable y has quantitative meaning What Happens then? LPM



Interpretation of the OLS coefficients:



In the LPM, measures the change in the probability of success when x

changes, holding other factors fixed:



Example: the coefficient on educ means that, everything else in (7.29) held fixed, another year of education increases the probability of labor force participation by .038

Limitations of the LPM



Some Shortcomings:



predictions either less than zero or greater than one.



Linear constant marginal effect unrealistic



Smaller marginal effect of subsequent children on working probability of women



It usually works well for values of the independent variables that are near the averages in the sample.



Heteroskedasticy



Unbiasedness; incorrect statistic



Dummy Dependent and Explanatory Variables:



The coefficient measures the predicted difference in probability when the dummy v. goes from zero to one.

7.3 Using Dummy V. in Multiple Categories: Interpretation



Adding a dummy v. married



Same “ marriage premium ” for men and women;



(0,1); (1,1); (1,0); (0,0) Different “ marriage premium ”



Three dummy variables?



marrmale, marrfem, and singfem



(1,0,0); (0,1,0); (0,0,1); (0,0,0)



Ordinary Variable:



to define dummy variables for each value or each categories of respective information

Example 7.6 (7.1,7.5) The Determination of log Hourly Wage:

    

Explanatory Variables: educ, exper, tenure, marriage, gender Dummy Variables:



Same “ marriage premium ” ; (0,1); (1,1); (1,0); (0,0)



Different “ marriage premium ” ; (1,0,0); (0,1,0); (0,0,1); (0,0,0)



Adding Interaction Term Regression: Inference:



we can use this equation to obtain the estimated difference between any two groups.



Unfortunately, we cannot use it for testing whether the estimated difference between single and married women is statistically significant. to choose one of these groups to be the

base group

and to reestimate the equation.

Interpretation:

Example 7.6 (7.1,7.5) The Determination of log Hourly Wage:

7.4 Interaction Effects Involving with Dummy Variables



Adding Interaction Term



The marriage premium depends on gender



the rest of the regression is necessarily identical to (7.11).



Equation (7.14) is just a different way of finding wage differentials across all gender-marital status combinations. It has no real advantages over (7.11); in fact, equation (7.11) makes it easier to test for differentials between any group and the base group of single men.

Interaction Effects: Differences in Slopes



Adding Interaction Term: Differences in Slopes



The return of education depends on gender



Hypothesis Test:



the return to education is the same for women and men.



average wages are identical for men and women who have the same levels of education:

test

7.4.2 Testing for Differences in Regression Functions Across Groups



H 0 : two populations or groups follow the same regression function, against the alternative that one or more of the slopes differ across the groups.



Chow Statistic:

 

Caution: there is no simple R separate regressions have been estimated for each group; the R

form of the test can be used only if interactions have been included to create the unrestricted model.

form of the test if One important limitation of the Chow test: regardless of the method used to implement it, is that the null hypothesis allows for no differences at all between the groups.

7.5 Policy Analysis and Program Evaluation with Dummy Variables

   

Policy analysis; Program evaluation



Control group; experimental (treatment) group be careful to include factors that might be systematically related to the binary independent variable of interest.

Self-Selection Problems:



The term is used generally when a binary indicator of participation might be systematically related to unobserved factors.



another way that an explanatory variable can be endogenous .

Solutions:



Data



more advanced methods

Example: the effect of the job training grants on worker productivity



Consider again the Holzer et al. (1993) study, where we are now interested in the effect of the job training grants on worker productivity (as opposed to amount of job training, example 7.3).

References



Jeffrey M. Wooldridge, Introductory Econometrics —— A Modern Approach, Chap 4

－

Economics of the Government 政府经济学

Transcript Economics of the Government 政府经济学

Analysis of Cross Section and Panel Data

Introductory Econometrics A Modern Approach

Part 1. Regression Analysis on Cross Sectional Data

The Efficiency of the OLS estimators

The Normality of the OLS estimators

Whether Normality of u Can Be Assumed?

Other Topics about Large Sample

Other Topics about Large Sample

The R-Squared Form of the

Statistic

Example 4.10: Salary-Pension Tradeoff of Teachers

Chap 6. Multiple Regression Analysis

Further Issues

6.2.1 More on Using Logarithmic Functional Forms

More on Taking Logs

Example 6.3: Effects of Attendence

on Final Exam Performance

Example 6.3: Regression Results, Inference and Interpretations

Using to Choose Between Nonnested Models

Controlling for Too Many Factors in Regression Analysis

Chap 7. Multiple Regression Analysis with Qualitative Information: Binary (or Dummy) Variables

References

Economics of the Government 政 府 经 济 学

Transcript Economics of the Government 政 府 经 济 学

Analysis of Cross Section and Panel Data

Introductory Econometrics A Modern Approach

Part 1. Regression Analysis on Cross Sectional Data

The Efficiency of the OLS estimators

The Normality of the OLS estimators

Whether Normality of u Can Be Assumed?

Other Topics about Large Sample

Other Topics about Large Sample

The R-Squared Form of the

Statistic

Example 4.10: Salary-Pension Tradeoff of Teachers

Chap 6. Multiple Regression Analysis

Further Issues

6.2.1 More on Using Logarithmic Functional Forms

More on Taking Logs

Example 6.3: Effects of Attendence

on Final Exam Performance

Example 6.3: Regression Results, Inference and Interpretations

Using to Choose Between Nonnested Models

Controlling for Too Many Factors in Regression Analysis

Chap 7. Multiple Regression Analysis with Qualitative Information: Binary (or Dummy) Variables

References

Directory

Economics of the Government 政府经济学

Transcript Economics of the Government 政府经济学