Quantile Regression - College of Business

Download Report

Transcript Quantile Regression - College of Business

Advanced Topics in
Regression
Quantile Regression
 Analysis of Causality
 Mediation Analysis
 Hierarchical Linear Modeling

Compiled by Nick Evangelopoulos, 2013
1
Part 1: Quantile Regression
2
Motivation for Quantile Regression
Problem
ANOVA and regression provide information only about the
conditional mean.
More knowledge about the distribution of the statistic may
be important.
The covariates may shift not only the location or scale of the
distribution, they may affect the shape as well.
Solution
Quantile regression models the relationship between X and
the conditional quantiles of Y given X = x
3
Quantile Definition
• Definition: Given p ∈ [0, 1]. A pth quantile
of a random variable Z is any number ζp
such that Pr(Z< ζ p ) ≤ p ≤ Pr(Z ≤ ζ p ). The
solution always exists, but need not be
unique.
Ex: Suppose Z={3, 4, 7, 9, 9, 11, 17, 21}
and p=0.5 then
Pr(Z<9) = 3/8 ≤ 1/2 ≤ Pr(Z ≤ 9) = 5/8
So, the 50th percentile is equal to 9
Quantile Regression
• A family of conditional quantiles of Y given X=x.
• The median regression line is also the OLS
regression line. The other quantile functions are
solutions to a set of linear programming
problems
90%
Y
75%
50%
25%
10%
x
Quantile Regression
Daily High Temperature
50
45
40
35
Today
A scatter of
daily high
temperature
in Sydney.
The red line is
the 45-degree
line
30
25
20
15
10
5
0
0
10
20
30
Yesterday
40
50
Quantile Regression
Cool Yesterday (n=259)
75 80
Frequency
60
 
X1
40
20
1
5
10
7.6
 
X0
Temperature Today
15
20
18.4
Quantile Regression
Hot Yesterday (n=259)
61 80
Frequency
60
 
X1
40
20
6
15
14
20
25
30
35
 
X0
Temperature Today
40
45
42.55
Quantile Regression
Quantiles at .9, .75, .5, .25, and .10. Given
yesterday’s temperature, today’s temperature has
an expected distribution which is non-symmetrical
Temperature Quantiles
60
Today
50
40
30
20
10
0
5
15
25
Yesterday
35
45
Quantile Regression
Estimation
• The quantile regression coefficients are
the solution to


1 n
min  p  12  12 sgn y i  x 'i y i  x 'i
 n
i 1
(1)
• The k first order conditions are


1 n 
1 1
'ˆ 
p


sgn
y

x
i
i p  x i  0


n i1 
2 2

( 2)
Quantile Regression
Coefficient Interpretation
Q  y i | x i 
x ij
• The marginal change in the Θth
conditional quantile due to a marginal
change in the jth element of x. There is no
guarantee that the ith person will remain in
the same quantile after her x is changed.
Quantile Regression
Bibliography
• Koenker and Hullock (2001), “Quantile
Regression,” Journal of Economic
Perspectives, Vol. 15, Pps. 143-156.
• Buchinsky (1998), “Recent Advances in
Quantile Regression Models”, Journal of
Human Resources, Vo. 33, Pps. 88-126.
• www.econ.uiuc.edu/~roger
• http://Lib.stat.cmu.edu/R/CRAN
Quantile Regression in SAS
Optional Reading:
Colin (Lin) Chen, An Introduction to Quantile
Regression and the QUANTREG Procedure, SUGI30,
Paper 213-30
Part 2: Analysis of Causality

For more information: BUSI 6280

The material presented here is based on a paper by
Josef Brüderl (University of Mannheim, Germany)
14
Get more at http://dilbert.com/strips/
Panel Data
 Methods for analysis of causality exploit a data structure of


multi-dimensional longitudinal data, which is typically
described in the statistics and econometrics literature as Panel
Data
Panel data is defined as a combination of cross-section data,
where data on one or more variables are collected at the same
point in time, and time-series data, where data are collected at
regular time intervals.
Analysis of panel data will be performed using the TSCREG
procedure in the statistical package SAS (Allison 2005; Mohd
Nor & Maarof 2007) and the xtreg procedure in the statistical
package Stata (Brüderl 2005).







References
Allison, P.D. (2005). Fixed Effects Regression Methods for Longitudinal Data
Using SAS. SAS Press.
Brüderl, J. (2005). Panel Data Analysis. University of Mannheim,
http://www2.sowi.uni-mannheim.de/lsssm/veranst/Panelanalyse.pdf (accessed
October 15, 2012)
Mohd Nor, A. H. S., & Maarof, F. (2007). “Panel Data Analysis Using SAS”.
Proceedings of the 21st Annual SAS Malaysia Forum, 5th September 2007, Kuala
Lumpur.
Halaby, C. (2004). Panel Models in Sociological Research. Annual Review of
Sociology, 30: 507-544.
Wooldridge, J. (2002). Econometric Analysis of Cross Section and Panel Data.
MIT Press.
Wooldridge, J. (2003). Introductory Econometrics: A Modern Approach. Thomson.
Chapters 13, 14.
Baron and Kenny (1986)
Part 3: Mediation Analysis

For more information: BUSI 6280, EPSY 6270

The material presented here is based on Wikipedia
18
Mediation Models
 Mediation is a hypothesized causal chain in which one
variable affects a second variable that, in turn, affects a third
variable. The intervening variable, M, is the mediator. It
“mediates” the relationship between a predictor, X, and an
outcome Y.
a and b: direct effects of X on M and M on Y, resp.
c’: direct effect of X on Y after accounting for M
c’
X
a
M
b
Y
Baron and Kenny steps
 The Baron and Kenny (1986) approach is not the best, but

many researchers are still using it
STEP 1: Conduct a simple regression analysis with X
predicting Y to test for path c alone
c is the direct effect of X on Y, without taking into account
M. This is not the same as c’ on the previous slide!
c
X
M
Y
Baron and Kenny steps
 STEP 2: Conduct a simple regression analysis with X
predicting M to test the significance of path a alone
X
a
M
Y
Baron and Kenny steps
 STEP 3: Conduct a simple regression analysis with M


predicting Y to test the significance of path b alone
The purpose of Steps 1-3 is to establish that zero-order
relationships among the variables exist. If one or more of
these relationships are non-significant, researchers usually
conclude that mediation is not possible or likely
Assuming there are significant relationships from Steps 1
through 3, proceed to Step 4.
X
M
b
Y
Baron and Kenny steps
 STEP 4: Conduct a multiple regression analysis with X and M

predicting Y
In Step 4, some form of mediation is supported if the effect of
M (path b) remains significant after controlling for X. If X is
no longer significant when M is controlled, the finding
supports full mediation. If X is still significant, the finding
supports partial mediation.
c’
X
M
b
Y
Sobel steps
 STEP 1: Conduct a multiple regression analysis with X and M
predicting Y:
X
Y = b 0 + b1 X + b 2 M + e
c’
M
b
Y
 STEP 2: Conduct a simple regression analysis with X
predicting M:
X
M = b 3 + b4 X + u
a
M
Y
 STEP 3: Compute the indirect effect as bindirect = (b2)(b4)
 Significance is best determined using bootstrapping
SEM approach
 The Structural Equation Modeling (SEM) approach is

considered the best for testing mediation effects. In SEM, a
single mediation model is tested.
Full mediation and partial mediation models can be compared
by fitting both as alternative models. The model with the
highest fit statistics is the more appropriate
c’
X
a
M
b
Full mediation
Y
X
a
M
b
Partial mediation
Y
References
 Baron, R.M. & Kenny, D.A. (1986). The Moderator-


Mediator variable distinction in Social Psychological
research: Conceptual, strategic, and statistical
considerations. Journal of Personality and Social
Psychology, 51, 1173-1182.
MacKinnon, D.P. (2008). Introduction to statistical
mediation analysis. Mahwah, NJ: Erlbaum.
Sobel, M. E. (1982). Asymptotic confidence intervals for
indirect effects in structural equation models. In S.
Leinhardt (Ed.), Sociological Methodology (pp. 290312). Washington DC: American Sociological
Association.
Part 4: Hierarchical Linear
Modeling

For more information: BUSI 6480, EPSY 6230
(EPSY offered at the UNT College of Education)
27
Multilevel Models
 Multilevel models are particularly appropriate for research

designs where the data for participants is organized at more
than one level
Analysis of Covariance (ANCOVA) include nested designs
Individuals nested within groups
Companies nested within industries