No Slide Title

Download Report

Transcript No Slide Title

Regression Analysis: Outline

• Review on Regression Analysis • Regression with Categorical explanatory variables • Pooled Regression: Fixed Effect and Random Effect models

1

Regression Analysis in the overall context of Research

• Research Purpose

– Research questions, objectives, hypotheses

• Methodology

– Type of Study – Sampling plan and sample size determination – Data collection methods – Data analysis plan

• Execution

– Data collection and analysis – Data collection and Data analysis – Discussion and Conclusion – Research Evaluations 2

Regression Analysis: Review • What is Regression?

• Dependence measure~ estimate the overall relationships between the dependent and independent variables • Examples of dependent and independent variables?

• Regression and Causality (~ experiment, theory ) • Regression (~predict dependent) and Correlation (~ linear association) • Uses of Regression • Descriptive~ describe relationship and how strong?

• Inference ~ which variables are most important/ significant?

• Predictive ~ forecasting • Hypothesis Testing • Sample Size 3

Type of Variables in Regression Analysis • Independent • Dependent • Moderating • Mediating • Moderation-mediation

4

Moderating Variables

• Moderating Variables

Testing Moderation • Y = b0 + b1* X + b2* Z + b3* XZ +e Y = [b1 + b3* Z] X + [b0+b2*Z]

5

Mediator Variables

• Mediator Variables Attitude a BI c b B 6

Multivariate Research Methods: Regression Analysis: Review • How it works?

• Formalization of regression model: Systematic part • y = b 0 + b 1 x 1 + b 2 X 2 + …+b k X k + error – intercept, slope, error – Examples??

• What do we observe? Y and X’s and estimate b’s • Which variables to include?

– Theory, Prior research, common sense – If you don’t have any idea?

» statistical criteria: stepwise, Forward and Backward ( in cases of only metric data??) • Moderator Effects ~ Interaction Variables usystematic part • How to Obtain Estimates?

– Least square method of Regression – Any straight line you fit will have some error – Objective is to minimize that errors e.g. sums of squared values of difference between Y and Y-predicted.

– Or minimize the sum squares errors – Y = a + b*X + e leads to e = Y - a -b*X – e 2 = (Y - a - b*X) 2 ~ minimize sum of e 2 7

Multivariate Research Methods: Regression Analysis: Review • Interpretation of parameter estimates?

• Intercept • mean of the dependent ~ when value of all independent variables are zero • Mean of the dependent ~ when all slopes are zero • Not always meaningful • Slopes: • Change in Y as we change one unit of X.

• zero slope ? X does not affect Y • b 1 , b 2 ,…..b

k : partial regression coefficients • e.g. b 1 = Change in the value of Y if X 1 is changed by one unit while all other explanatory variables are ( X 2 …X k ) kept constant. 8

Multivariate Research Methods: Regression Analysis: Review • Interpretation of parameter estimates?

• Size of the regression coefficient • depends on the scale of the explanatory variable • Which variable is a good explanatory variables then size of the coefficient is not a good predictor for that. • Scale of the independent variables ~ within 10 times • Beta coefficients/ or standardized coefficients, • provides relative importance • Elasticity: This measures the percentage change in dependent variable for 1 % change in the independent variable.

elasticity

 

X Y

9

Multivariate Research Methods: Regression Analysis: Review • Is Regression coefficient Significant?

• Is Regression Significant?

• • Overall goodness of fit?

r 2

r

0 2  

ESS TSS r

2  1  1 

RSS TSS

• • r ~ coefficient of multiple correlation adjusted r 2 Y RSS ( error) TSS ESS Y= b 0 +bX X 10

Multivariate Research Methods: Regression Analysis: Review

He t e ro s c e das t ic it y Au t o c o rre lat io n Mu lt ic o llin e arit y Majo r as s u m pt io n s

Th e v arian c e o f t h e e rro r t e rm is c o n s t an t

Th e re is n o au t o c o rre lat io n in t h e e rro r t e rm

Th e re is n o e x ac t lin e ar re lat io n s h ip in t h e in de pe n de n t v ariable s

Th e re m u s t be v ariabilit y in t h e in de pe n de n t v ariable s

Th e re gre s s io n m o de l is c o rre c t ly s pe c ifie d

Th e re gre s s io n m o de l is lin e ar in param e t e rs

Th e m e an v alu e o f t h e e rro r t e rm is z e ro

No c o v ariat io n be t we e n e rro rs an d in de pe n de n t v ariable s

Th e e rro r t e rm is n o rm ally dis t ribu t e d

11

Multivariate Research Methods: Regression Analysis: Review • • Variance Detecting problems with the assumptions?

Heteroscedasticity • error variances are not same • when errors are related to either dependent or independent variables • e.g more stable saving ( or consumption) with lower income families/ larger variances with brand switchers than brand loyal customers Saving •Remedy ?? If we know the nature of heteroscedasticity, we can use WLS • Volatility ~ Finance ??

Income 12

Regression Analysis : Review • Detecting problems with the assumptions?

• Autocorrelation~ more a time-series problem • when errors are correlated with consecutive obs.

• Reasons?

• Omitted variables • Model mis-specification • Y Detection • Graphical methods • Durbin-Watson ~ DW= 2 (1-r), DW varies between 0 - 4 – ideal number is 2 e t Positive X Problem?

• Over estimate coeff. of determination and underestimate the standard errors Negative e t-1 13

Multivariate Research Methods: Regression Analysis: Review • • Detecting problems with the assumptions?

Multicollinearity • • • • X 1 Y X 2 X 1 Y X 2 presence of very high interrelations among explanatory variables (do not violate any assumption) Symptoms:The standard errors are likely to be high, Estimates are not reliable?

Detection • Bivariate correlation • Variance Inflation Factor (VIF)~ 10 • Tolerance = 1/VIF Remedies • Drop variables • composite variables e.g. Family life cycles, Social Status • Factor analysis VIF  1  1 r i 2 .

14

Multivariate Research Methods: Regression Analysis: Review • • • Detecting problems with the assumptions?

Linear in parameters • Y = a + b*X 2 + e ~ linear in parameters but non-linear in variables • Y = a + b 2 *X 1 + b*X 2 + e ~ non-linear in parameters: Non-linear regression The Regression model is correctly specified • Functional form, e.g. new consumer durable sales • Influential observation • outliers • whether one or a few observations??

15

Regression Analysis: Review •

Outliers:

In linear regression, an outlier is an observation with large residual. Problem with dependent variable??

Leverage:

An observation with an extreme value on a independent variable is called a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. These leverage points can have an unusually large effect on the estimate of regression coefficients. •

Influence:

An observation is said to be influential if removing the observation substantially changes the estimate of coefficients. • Detection • RESIDUAL CHECK – Standardized residual – Studentized residual – Problem approx.: abs. value > 2

e e

*

i

*

i

 

s s i e

1

i

e i

1 

h i h i

16

Regression Analysis: Review • Transformation of variables – Dependent variable should be normally dist., constant variance etc – e.g. GNP per capita, Log(Price) etc – Retransformation ??

• Forecasting • model fit versus forecasting • forecasting independent variables • Model Selection / comparing models • adjusted R-sq • Model Validation • Cross-validation • Jackknife validation 17

Multivariate Research Methods: Regression Analysis: Limitations • Nominal independent variables ~ dummy variable regression – gender, income groups, ethnicity, region, race etc.

• Measurement error~ Structural equation models • X True = X obs + e x • Y=b0 +b1 * X True + e Y • Y= b0 +b1 * (X obs + e x ) + e Y • Y= b0 +b1 * X obs + b1*e x + e Y • Y= b0 +b1 * X obs + b1*e x + e Y Error term is correlated with x-variable ~ this violates the reg.

assumption 18

Regression Analysis: Limitations • Limited dependent variable – Censored dependent variable ~ lots of zeros Tobit Regression • • • Expenditures in home buying Demand in a supply restricted situation vacation expenditures                  X (e.g. income) – Truncated dependent variable ~ duration analysis, available in LIMDEP • • Interpurchase times duration of unemployment 19

Regression with Categorical Explanatory Variables • Some modeling problems • Is gender important in determining the level of expenditure on medical expenses?

• Do Nescafe’s supermarket coffee sales vary by state?

• How would you model the impact of local crime on housing prices if crime rate were rated - none, moderate or high?

• How do I include income as a determinant of cigarette demand when data have only been collected by income class?

• Examples • Medical expenditure = intercept+ b1* Gender + b2* age group + error • Sales=intercept+ b1*Provinces+ error 20

Interpretation of regression coefficients: Binary Coding • Midterm exam scores by sex

Y avg

,

fem Y Y i Y i D i

   0 

score

 1

D i

 1 ,

if male

 0 ,

if female

   0  0   1 • .

score  0 female  1 male 21

Interpretation of regression coefficients: Effect Coding • Midterm exam scores by sex • .

score Y i   0   1 D i Y i  score D i   1 , if  1 , if male female  0 Y overall Y avg .

mean male    0   0  1  1  2    2    1 0 • Note: we are not estimating average score of female and male student: female  2  2  1 male 22

Regression Analysis: Non-Linear Regression • Example: Sales and Price dynamics of New Product Sales First Purchase Sales Price Time Time 23

Pooled Regression: Fixed Effect and Random Effect models • Panel Data – Cross Sectional Time Series Data • Observations on “n” individuals (or countries, firms etc), each measured at T points in time (T can be different for each measuring unit) • Observations are not independent • use panel structure to get better parameter estimates • Control for fixed or random individual differences • Example of Data Setup….

• Software : LIMDEP ( also SAS…) • Example: Cross-sectional survey 50% Female Participation in Labor Force??

24

y it      i   X ' it   e it Pooled Regression: Fixed Effect and Random Effect models • Fixed Effect – individual slopes are different - shifted by “fixed” amount y it y it      i   X ' it   i  X ' it   e it   e it • Random Effect – individual differences are random rather than fixed – random slope terms. The slope is function of mean slope value plus random error y it    X ' it   ( e it  u i ) - Unobserved heterogeneity that is stable over time - This u i is uncorrelated with X’s 25

Pooled Regression: Fixed Effect and Random Effect models

The Hausman Test:

• Model Selection – Fixed Effect vs Random Effect

– H0: that random effects would be consistent and efficient, versus – H1: that random effects would be inconsistent. Chi-Square Test Statistic. 26