atlanta-aug2010

Download Report

Transcript atlanta-aug2010

Otis Dudley Duncan
Memorial Lecture:
THE RESURRECTION OF
DUNCANISM
Judea Pearl
University of California
Los Angeles
(www.cs.ucla.edu/~judea/)
1
OUTLINE
1. Duncanism = Causally Assertive SEM
2. History: Oppression, Distortion, and Resurrection
3. The Old-New Logic of SEM
4. New Tools
4.1
Local testing
4.2
Non-parametric identification
4.3
Logic of counterfactuals
4.3
Non linear mediation analysis
2
Duncan book
3
Duncan book
4
FINDING INSTRUMENTAL VARIABLES
Can you find a n instrument for identifying b34? (Duncan, 1975)
By inspection: X2 d-separate X1 from V
Therefore X1 is a valid instrument
5
DUNCANISM = ASSERTIVE SEM
SEM – A tool for deriving causal conclusion from data and
assumptions:
•
•
•
•
[y = bx + u] may be read as "a change in x or u
produces a change in y” ... or “x and u are the causes
of y” (Duncan, 1975, p. 1)
“[the disturbance] u stands for all other sources of
variation in y” (ibid)
“doing the model consists largely in thinking about
what kind of model one wants and can justify.”
(ibid, p. viii)
Assuming a model for sake of argument, we can
express its properties in terms of correlations and
(sometimes) find one or more conditions that must hold
if the model is true“ (ibid, p. 20).
6
HISTORY: BIRTH, OPPRESSION,
DISTORTION, RESURRECTION
•
Birth and Development (1920 - 1980)
Sewell Wright (1921), Haavelmo (1943), Simon (1950),
Marschak (1950), Koopmans (1953), Wold and Strotz (1963),
Goldberger (1973), Blalock (1964), and Duncan (1969, 1975)
•
The regressional assault (1970-1990)
Causality is meaningless, therefore, to be meaningful, SEM
must be a regression technique, not “causal modeling.”
Richard (1980) Cliff (1983), Dempster (1990), Wermuth (1992),
and Muthen (1987)
•
The Potential-outcome assault (1985-present)
Causality is meaningful but, since SEM is a regression
technique, it could not possibly have causal interpretation.
Rubin (2004, 2010), Holland (1986) Imbens (2009), and
Sobel (1996, 2008)
7
REGRESSION VS. STRUCTURAL EQUATIONS
(THE CONFUSION OF THE CENTURY)
Regression (claimless, nonfalsifiable):
Y = ax + Y
Structural (empirical, falsifiable):
Y = bx + uY
Claim: (regardless of distributions):
E(Y | do(x)) = E(Y | do(x), do(z)) = bx
The mothers of all questions:
Q. When would b equal a?
A. When (uY  X), read from the diagram
Q. When is b a partial regression? b = YX • 
A. Shown in the diagram, Slide 40.
8
THE POTENTIAL-OUTCOME
ASSAULT (1985-PRESENT)
•
“I am speaking, of course, about the equation: { y  a  bx   }
What does it mean? The only meaning I have ever
determined for such an equation is that it is a shorthand way
of describing the conditional distribution of {y} given {x}.”
(Holland 1995)
•
“The use of complicated causal-modeling software
[read SEM] rarely yields any results that have any
interpretation as causal effects.”
(Wilkinson and Task Force 1999 on Statistical Methods in
Psychology Journals: Guidelines and Explanations)
9
THE POTENTIAL-OUTCOME
ASSAULT (1985-PRESENT) (Cont)
•
In general (even in randomized studies), the structural
and causal parameters are not equal, implying that the
structural parameters should not be interpreted as effect.”
(Sobel 2008)
•
“Using the observed outcome notation entangles the
science...Bad! Yet this is exactly what regression
approaches, path analyses, directed acyclic graphs,
and so forth essentially compel one to do.”
(Rubin 1010)
10
WHY SEM INTERPRETATION IS
“SELF-CONTRADICTORY”
D. Freedman JES (1987), p. 114, Fig. 3
X  aZ  bW  U
Y  cX  dZ  V
(7.1)
(7.2)
"Now try the direct effect of Z on Y: We intervene by fixing W
and X but increasing Z by one unit; this should increase Y by
d units.
However, this hypothetical intervention is self-contradictory,
because fixing W and increasing Z causes an increase in X.
The oversight:
Fixing X DISABLES equation (7.1);
11
SEM REACTION TO
FREEDMAN CRITICS
Total Surrender:
•
It would be very healthy if more researchers abandoned
thinking of and using terms such as cause and effect
(Muthen, 1987).
•
“Causal modeling” is an outdated misnomer (Kelloway, 1998).
•
Causality-free, politically-safe vocabulary: “covariance
structure” “regression analysis,” or “simultaneous equations.”
•
[Causal Modeling] may be somewhat dated, however, as it
seems to appear less often in the literature nowadays”
(Kline, 2004, p. 9)
12
SEM REACTION TO THE
STRUCTURE-PHOBIC ASSAULT
NONE TO SPEAK OF!
Galles, Pearl, Halpern (1998 - Logical equivalence)
Heckman-Sobel
Morgan Winship (1997)
Gelman Blog
13
THE RESURRECTION
Why non-parametric perspective?
Y  x  u
What are the parameters all about and why we labor to
identify them?
Can we do without them?
Consider:
Y  f ( x, u )
Only he who lost the parameters and needs to find
substitutes can begin to ask:
Do I really need them? What do they really mean?
What role do the play?
14
THE LOGIC OF SEM
A - CAUSAL
ASSUMPTIONS
CAUSAL
MODEL
(MA)
A* - Logical
implications of A
Causal inference
Q Queries of
interest
Q(P) - Identified
estimands
T(MA) - Testable
implications
Statistical inference
Data (D)
Q - Estimates
of Q(P)
Q(Q | D, A)
Conditional claims
g (T )
Model testing
Goodness of fit
15
TRADITIONAL STATISTICAL
INFERENCE PARADIGM
Data
P
Joint
Distribution
Q(P)
(Aspects of P)
Inference
e.g.,
Infer whether customers who bought product A
would also buy product B.
Q = P(B | A)
16
FROM STATISTICAL TO CAUSAL ANALYSIS:
1. THE DIFFERENCES
Probability and statistics deal with static relations
Data
P
Joint
Distribution
P
Joint
Distribution
change
Q(P)
(Aspects of P)
Inference
What happens when P changes?
e.g.,
Infer whether customers who bought product A
would still buy A if we were to double the price.
17
FROM STATISTICAL TO CAUSAL ANALYSIS:
1. THE DIFFERENCES
What remains invariant when P changes say, to satisfy
P (price=2)=1
Data
P
Joint
Distribution
P
Joint
Distribution
change
Q(P)
(Aspects of P)
Inference
Note: P (v)  P (v | price = 2)
P does not tell us how it ought to change
Causal knowledge: what remains invariant
18
FROM STATISTICAL TO CAUSAL ANALYSIS:
1. THE DIFFERENCES (CONT)
1. Causal and statistical concepts do not mix.
CAUSAL
Spurious correlation
Randomization / Intervention
Confounding / Effect
Instrumental variable
Exogeneity / Ignorability
Mediation
STATISTICAL
Regression
Association / Independence
“Controlling for” / Conditioning
Odd and risk ratios
Collapsibility / Granger causality
Propensity score
2.
3.
4.
19
FROM STATISTICAL TO CAUSAL ANALYSIS:
2. MENTAL BARRIERS
1. Causal and statistical concepts do not mix.
CAUSAL
Spurious correlation
Randomization / Intervention
Confounding / Effect
Instrumental variable
Exogeneity / Ignorability
Mediation
STATISTICAL
Regression
Association / Independence
“Controlling for” / Conditioning
Odd and risk ratios
Collapsibility / Granger causality
Propensity score
2. No causes in – no causes out (Cartwright, 1989)
statistical assumptions + data
causal conclusions
causal assumptions
}
3. Causal assumptions cannot be expressed in the mathematical
language of standard statistics.
4.
20
FROM STATISTICAL TO CAUSAL ANALYSIS:
2. MENTAL BARRIERS
1. Causal and statistical concepts do not mix.
CAUSAL
Spurious correlation
Randomization / Intervention
Confounding / Effect
Instrumental variable
Exogeneity / Ignorability
Mediation
STATISTICAL
Regression
Association / Independence
“Controlling for” / Conditioning
Odd and risk ratios
Collapsibility / Granger causality
Propensity score
2. No causes in – no causes out (Cartwright, 1989)
statistical assumptions + data
causal conclusions
causal assumptions
}
3. Causal assumptions cannot be expressed in the mathematical
language of standard statistics.
4. Non-standard mathematics:
a) Structural equation models (Wright, 1920; Simon, 1960)
b) Counterfactuals (Neyman-Rubin (Yx), Lewis (x
Y))
21
THE STRUCTURAL MODEL
PARADIGM
Data
Joint
Distribution
Data
Generating
Model
Q(M)
(Aspects of M)
M
Inference
M – Invariant strategy (mechanism, recipe, law,
protocol) by which Nature assigns values to
variables in the analysis.
•
“Think
Nature, not experiment!”
22
STRUCTURAL
CAUSAL MODELS
Definition: A structural causal model is a 4-tuple
V,U, F, P(u), where
• V = {V1,...,Vn} are endogenous variables
• U = {U1,...,Um} are background variables
• F = {f1,..., fn} are functions determining V,
vi = fi(v, u)
e.g., y    x  uY
• P(u) is a distribution over U
P(u) and F induce a distribution P(v) over
observable variables
23
CAUSAL MODELS AND
COUNTERFACTUALS
Definition:
The sentence: “Y would be y (in unit u), had X been x,”
denoted Yx(u) = y, means:
The solution for Y in a mutilated model Mx, (i.e., the equations
for X replaced by X = x) with input U=u, is equal to y.
U
X (u)
U
Y (u)
M
X=x
YX (u)
Mx
24
CAUSAL MODELS AND
COUNTERFACTUALS
Definition:
The sentence: “Y would be y (in unit u), had X been x,”
denoted Yx(u) = y, means:
The solution for Y in a mutilated model Mx, (i.e., the equations
for X replaced by X = x) with input U=u, is equal to y.
The Fundamental Equation of Counterfactuals:
Yx (u )  YM x (u )
25
CAUSAL MODELS AND
COUNTERFACTUALS
Definition:
The sentence: “Y would be y (in situation u), had X been x,”
denoted Yx(u) = y, means:
The solution for Y in a mutilated model Mx, (i.e., the equations
for X replaced by X = x) with input U=u, is equal to y.
• Joint probabilities of counterfactuals:
P(Yx  y, Z w  z ) 
In particular:

u:Yx (u )  y , Z w (u )  z
P( y | do(x ) ) 
 P(Yx  y ) 
PN (Yx'  y '| x, y ) 


u:Yx (u )  y
P(u )
P(u )
P(u | x, y )
u:Yx ' (u )  y '
26
3-LEVEL HIERARCHY
OF CAUSAL MODELS
1. Probabilistic Knowledge
P(y | x)
Bayesian networks, graphical models
2. Interventional Knowledge
P(y | do(x))
Causal Bayesian Networks (CBN)
(Agnostic graphs, manipulation graphs)
3. Counterfactual Knowledge
P(Yx = y,Yx =y)
Structural equation models, physics,
functional graphs, “Treatment assignment
mechanism”
27
TWO PARADIGMS FOR
CAUSAL INFERENCE
Observed: P(X, Y, Z,...)
Conclusions needed: P(Yx=y), P(Xy=x | Z=z)...
How do we connect observables, X,Y,Z,…
to counterfactuals Yx, Xz, Zy,… ?
N-R model
Counterfactuals are
primitives, new variables
Structural model
Counterfactuals are
derived quantities
Super-distribution
P * ( X , Y ,..., Yx , X z ,...)
Subscripts modify the
model and distribution
X , Y , Z constrain Yx , Z y ,... P(Yx  y )  PM x (Y  y )
28
ARE THE TWO
PARADIGMS EQUIVALENT?
• Yes (Galles and Pearl, 1998; Halpern 1998)
• In the N-R paradigm, Yx is defined by
consistency:
Y  xY1  (1  x)Y0
• In SCM, consistency is a theorem.
• Moreover, a theorem in one approach is a
theorem in the other.
29
THE FIVE NECESSARY STEPS
OF CAUSAL ANALYSIS
Define:
Express the target quantity Q as a function
Q(M) that can be computed from any model M.
Assume: Formulate causal assumptions A using some
formal language.
Identify:
Determine if Q is identifiable given A.
Estimate: Estimate Q if it is identifiable; approximate it,
if it is not.
Test:
Test the testable implications of A (if any).
30
THE FIVE NECESSARY STEPS FOR
EFFECT ESTIMATION
Define:
Express the target quantity Q as a function
Q(M) that can be computed from any model M.
ATE 
 E (Y | do( x1))  E (Y | do( x0 ))
Assume: Formulate causal assumptions A using some
formal language.
Identify:
Determine if Q is identifiable given A.
Estimate: Estimate Q if it is identifiable; approximate it,
if it is not.
Test:
Test the testable implications of A (if any).
31
FORMULATING ASSUMPTIONS
THREE LANGUAGES
1. English: Smoking (X), Cancer (Y), Tar (Z), Genotypes (U)
2. Counterfactuals:
Z x (u )  Z yx (u ),
X y (u )  X zy (u )  X z (u )  X (u ),
Yz (u )  Yzx (u ),
Z x  {Yz , X }
Not too friendly:
consistent? complete? redundant? arguable?
3. Structural:
X
Z
Y
32
IDENTIFYING CAUSAL EFFECTS
IN POTENTIAL-OUTCOME FRAMEWORK
Define:
Express the target quantity Q as a
counterfactual formula, e.g., E(Y(1) – Y(0))
Assume: Formulate causal assumptions using the
distribution:
Identify:
P*  P( X | Y , Z , Y (1), Y (0))
Determine if Q is identifiable using P* and
Y=x Y (1) + (1 – x) Y (0).
Estimate: Estimate Q if it is identifiable; approximate it,
if it is not.
33
GRAPHICAL – COUNTERFACTUALS
SYMBIOSIS
Every causal graph expresses counterfactuals
assumptions, e.g., X  Y  Z
1. Missing arrows Y  Z
Yx, z (u )  Yx (u )
2. Missing arcs
Yx  Z y
Y
Z
consistent, and readable from the graph.
• Express assumption in graphs
• Derive estimands by graphical or algebraic
methods
34
IDENTIFICATION IN SCM
Find the effect of X on Y, P(y|do(x)), given the
causal assumptions shown in G, where Z1,..., Zk
are auxiliary variables.
G
Z1
Z2
Z3
X
Z4
Z5
Z6
Y
Can P(y|do(x)) be estimated if only a subset, Z,
can be measured?
35
ELIMINATING CONFOUNDING BIAS
THE BACK-DOOR CRITERION
P(y | do(x)) is estimable if there is a set Z of
variables such that Z d-separates X from Y in Gx.
Gx
G
Z1
Z1
Z2
Z3
Z2
Z3
Z4
X
Z
Z6
Z5
Y
Z4
X
Z6
Z5
Y
P ( x, y , z )
Moreover, P( y | do( x))   P( y | x, z ) P( z )  
z
z P( x | z )
•
(“adjusting” for Z)  Ignorability
36
IDENTIFYING TESTABLE IMPLICATIONS
Assumptions advertized in the missing edges are Z1 – Z2, Z1 – Y
Z1
W1
Implying:
X
Z1  Z 2
Z1  Y | { X1, Z 2 , Z3}
Z 2  X | {Z1, Z3}
Z2
Z3
W3
W2
Y
Z1  aZ 2  
Z1  b1Y  b2 X  b3Z 2  b4 Z3   '
Z 2  c1X  c3Z1  c4 Z3   ' '
The missing edges imply: z = 0, b1 = 0, and c1 = 0.
Software routines for automatic detection of all such tests
reported in Kyono (2010)
37
SEPARATION EQUIVALENCE
 MODEL EQUIVALENCE
38
FINDING INSTRUMENTAL VARIABLES
Can you find a n instrument for identifying b34? (Duncan, 1975)
By inspection: X2 d-separate X1 from V
Therefore X1 is a valid instrument
39
CONFOUNDING EQUIVALENCE
WHEN TWO MEASUREMENTS ARE
EQUALLY VALUABLE
L
W1
W4
Z
T
?
X
W2
W3
V1
V2
ZT
Y
40
CONFOUNDING EQUIVALENCE
WHEN TWO MEASUREMENTS ARE
EQUALLY VALUABLE
Definition:
T and Z are c-equivalent if
 P( y | x, t ) P(t )   P( y | x, z ) P( z )
t
z
x, y
Definition (Markov boundary):
Markov boundary Sm of S (relative to X) is the minimal
subset of S that d-separates X from all other members of S.
Theorem (Pearl and Paz, 2009)
Z and T are c-equivalent iff
1. Zm=Tm, or
2. Z and T are admissible (i.e., satisfy the back-door
condition)
41
CONFOUNDING EQUIVALENCE
WHEN TWO MEASUREMENTS ARE
EQUALLY VALUABLE
L
W1
X
W4
Z
T
W2
W3
V1
V2
ZT
Y
42
CONFOUNDING EQUIVALENCE
WHEN TWO MEASUREMENTS ARE
EQUALLY VALUABLE
W1
X
T
W4
Z
W2
W3
V1
V2
ZT
Y
43
CONFOUNDING EQUIVALENCE
WHEN TWO MEASUREMENTS ARE
EQUALLY VALUABLE
Z
T
W1
X
W4
W2
W3
V1
V2
ZT
Y
44
BIAS AMPLIFICATION
BY INSTRUMENTAL VARIABLES
W2
W1
W1  {W1, W2}
U
X
Y
• Adding W2 to Propensity Score increases bias (if such exists)
(Wooldridge, 2009)
• In linear systems – always
• In non-linear systems – almost always (Pearl, 2010)
• Outcome predictors are safer than treatment predictors
45
EFFECT DECOMPOSITION
(direct vs. indirect effects)
1. Why decompose effects?
2. What is the definition of direct and indirect
effects?
3. What are the policy implications of direct and
indirect effects?
4. When can direct and indirect effect be
estimated consistently from experimental and
nonexperimental data?
46
WHY DECOMPOSE EFFECTS?
1. To understand how Nature works
2. To comply with legal requirements
3. To predict the effects of new type of interventions:
Signal routing, rather than variable fixing
47
LEGAL IMPLICATIONS
OF DIRECT EFFECT
Can data prove an employer guilty of hiring discrimination?
(Gender) X
Z (Qualifications)
Y
(Hiring)
What is the direct effect of X on Y ?
E(Y | do( x1), do( z ))  E (Y | do( x0 ), do( z ))
(averaged over z) Adjust for Z? No! No!
48
NATURAL INTERPRETATION OF
AVERAGE DIRECT EFFECTS
Robins and Greenland (1992) – “Pure”
X
Z
z = f (x, u)
y = g (x, z, u)
Y
Natural Direct Effect of X on Y: DE ( x0 , x1;Y )
The expected change in Y, when we change X from x0 to
x1 and, for each u, we keep Z constant at whatever value it
attained before the change.
E[Yx1Z x  Yx0 ]
0
In linear models, DE = Controlled Direct Effect   ( x1  x0 )49
DEFINITION OF
INDIRECT EFFECTS
X
Z
z = f (x, u)
y = g (x, z, u)
Y
Indirect Effect of X on Y: IE ( x0 , x1;Y )
The expected change in Y when we keep X constant, say
at x0, and let Z change to whatever value it would have
attained had X changed to x1.
E[Yx0 Z x  Yx0 ]
1
In linear models, IE = TE - DE
50
WHY TE  DE  IE
Z
m1
X
TE  DE  IE (rev )
m2

Y
In linear systems
IE (rev )   IE
TE
TE    m1m2
DE  
TE - DE
IE  m1m2  TE  DE
DE
IE
IE  Effect sustained by mediation alone
Is NOT equal to:
TE  DE  Effect prevented by disabling
mediation
Disabling
mediation
Disabling
direct path
51
POLICY IMPLICATIONS
OF INDIRECT EFFECTS
What is the indirect effect of X on Y?
The effect of Gender on Hiring if sex discrimination
is eliminated.
GENDER X
IGNORE
Z QUALIFICATION
f
Y HIRING
Deactivating a link – a new type of intervention
52
MEDIATION FORMULAS
1. The natural direct and indirect effects are
identifiable in Markovian models (no confounding),
2. And are given by:
DE   [ E (Y | do( x1, z ))  E (Y | do( x0 , z ))]P( z | do( x0 )).
z
IE   E (Y | do( x0 , z ))[ P( z | do( x1))  P( z | do( x0 ))]
z
3. Applicable to linear and non-linear models,
continuous and discrete variables, regardless of
distributional form.
53
MEDIATION FORMULAS
IN UNCONFOUNDED MODELS
Z
X
Y
DE   [ E (Y | x1, z )  E (Y | x0 , z )]P( z | x0 )
z
IE   [ E (Y | x0 , z )[ P( z | x1)  P( z | x0 )]
z
TE  E (Y | x1)  E (Y | x0 )
IE  Fraction of responses explained by mediation
TE  DE  Fraction of responses owed to mediation
54
Z
X
COMPUTING THE
MEDIATION FORMULA
Y
X
Z
Y
n1
n2
0
0
0
0
0
1
n3
n4
n5
0
0
1
1
1
0
0
1
0
n6
n7
n8
1
1
1
0
1
1
1
0
1
E(Y|x,z)=gxz E(Z|x)=hx
n2
 g00
n1  n2
n3  n4
 h0
n1  n2  n3  n4
n4
 g01
n3  n4
n6
 g10
n5  n6
n8
 g11
n7  n8
n7  n8
 h1
n5  n6  n7  n8
DE  ( g10  g00 )(1  h0 )  ( g11  g01)h0
IE  (h1  h0 )( g01  g00 )
55
RAMIFICATION OF THE
MEDIATION FORMULA
• DE should be averaged over mediator levels,
IE should NOT be averaged over exposure levels.
• TE-DE need not equal IE
TE-DE = proportion for whom mediation is necessary
IE = proportion for whom mediation is sufficient
• TE-DE informs interventions on indirect pathways
IE informs intervention on direct pathways.
56
MEASUREMENT BIAS AND
EFFECT RESTORATION
Unobserved Z
P(w|z)
given (Selen, 1986; Greenland
& Lash, 2008)
W
X
P(y | do(x)) is identifiable from
measurement of W, if P(w | z) is
Y
P( w | x, y, z )  P( w | z ) (local independence)
P( x, y, w)   P( x, y, z , w)
z
  P ( w | x, y , z ) P ( x, y , z )
z
  P ( w | z ) P ( x, y , z )
z
P( x, y, z )   I ( z , w) P( x, y, w)
w
57
EFFECT RESTORATION
IN BINARY MODELS
P( z1 | x, y )
Z
To cell
(x,y,Z = 0)
1
Weight distribution
from cell (x,y,W = 1)
W
X
Y
undefined
P(W  0 | Z  1)  
undefined

To cell
(x,y,Z = 1)
P(W  1 | Z  0)  
P( w1 | x, y )
1 
1

1
1  P ( x) 
P ( x, y, w1) 
P( y | do( x)) 
1 


P ( x|w1) 
 P( w1 | x, y ) P( w1) 

1
1  P ( x) 
P ( x, y, w0 ) 


1   

P ( x|w0 ) 
 P( w0 | x, y ) P( w0 ) 


58
EFFECT RESTORATION
IN LINEAR MODELS
Z
c1
c3
Z
c2
c1
W
X
c0
W
Y
X
(a)
c0 
c4 c3
 xy   xw yw / k
2 /k
 xx xw
c2
V
c0
Y
(b)
k  c32 zz
cov( XY )cov( XV )  cov(YW )cov(WV )
c0 
cov( XV )var ( X )  cov( XW )cov(WV )
Correlated proxies (Cai & Kuroki, 2008)
59
CONCLUSIONS
IHe
TOLD
YOU
CAUSALITY
ISinference
SIMPLE
is wise
who
bases causal
an explicit
structure
that is
• on
Formal
basis forcausal
causal and
counterfactual
defensible
on scientific grounds.
inference (complete)
• Unification of the graphical, potential-outcome
(Aristotle
384-322
and structural equation
approaches
B.C.)
• Friendly and formal solutions to
From
Charlie
Pooleand confusions.
century-old
problems
60
QUESTIONS???
They will be answered
61