X - UCLA

Transcript X - UCLA

Otis Dudley Duncan Memorial Lecture

THE RESURRECTION OF DUNCANISM

Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/)

OUTLINE

1. Duncanism = Causally Assertive SEM 2. History: Oppression, Distortion, and Resurrection 3. The Old-New Logic of SEM 4. New Tools 4.1

Local testing 4.2

4.3

Non-parametric identification Logic of counterfactuals Non-linear mediation analysis 2

DUNCAN BOOK

DUNCANISM = ASSERTIVE SEM

SEM – A tool for deriving causal conclusion from data and assumptions: • • • • [

y = bx + u

] may be read as "a change in produces a change in

” ... or “

(ibid, p. viii) and

u x

are the causes of

” (Duncan, 1975, p. 1) “ [the disturbance]

stands for all other sources of variation in

” (ibid) “ doing the model consists largely in thinking about what kind of model one wants and can justify .” Assuming a model for sake of argument, we can express its properties in terms of correlations and (sometimes) find one or more conditions that must hold if the model is true“ (ibid, p. 20). 4

HISTORY: BIRTH, OPPRESSION, DISTORTION, RESURRECTION

• • • Birth and Development (1920 - 1980) Sewell Wright (1921), Haavelmo (1943), Simon (1950), Marschak (1950), Koopmans (1953), Wold and Strotz (1963), Goldberger (1973), Blalock (1964), and Duncan (1969, 1975) The regressional assault (1970-1990) Causality is meaningless, therefore, to be meaningful, SEM must be a regression technique, not “causal modeling.” Richard (1980) Cliff (1983), Dempster (1990), Wermuth (1992), and Muthen (1987) The Potential-outcome assault (1985-present) Causality is meaningful but, since SEM is a regression technique, it could not possibly have causal interpretation. Rubin (2004, 2010), Holland (1986) Imbens (2009), and Sobel (1996, 2008) 5

REGRESSION VS. STRUCTURAL EQUATIONS (THE CONFUSION OF THE CENTURY) Regression (claimless, nonfalsifiable):

Y = ax +



Structural (empirical, falsifiable):

Y = bx + u Y

(

= assignment) Claim: (regardless of distributions):

(

Y | do

(

))

= E

(

Y | do

(

)

, do

(

)) =

The mothers of all questions: Q. When would

equal

A. When (

u Y X

), read from the diagram Q. When is

a partial regression?

A. Shown in the diagram, Slide 36.

= 

YX • Z

THE POTENTIAL-OUTCOME ASSAULT (1985-PRESENT) • SEM = regressional strawman “I am speaking, of course, about the equation: {

What does it mean? The only meaning I have ever 



  }.

determined for such an equation is that it is a shorthand way of describing the conditional distribution of {

} given {

} .” (Holland 1995) • “The use of complicated causal-modeling software [read SEM] rarely yields any results that have any interpretation as causal effects .” (Wilkinson and Task Force 1999 on Statistical Methods in Psychology Journals: Guidelines and Explanations) 7

THE POTENTIAL-OUTCOME ASSAULT (1985-PRESENT) (Cont) • In general (even in randomized studies), the structural and causal parameters are not equal, implying that the structural parameters should not be interpreted as effect .” (Sobel 2008) • “Using the observed outcome notation entangles the science...Bad! Yet this is exactly what regression approaches, path analyses, directed acyclic graphs, and so forth essentially compel one to do.” (Rubin 1010) 8

SEM REACTION TO THE STRUCTURE-PHOBIC ASSAULT

NONE TO SPEAK OF!

Galles, Pearl, Halpern (1998 - Logical equivalence) Heckman-Sobel Morgan Winship (1997) Gelman Blog 9

WHY SEM INTERPRETATION IS “SELF-CONTRADICTORY”

D. Freedman JES (1987), p. 114, Fig. 3

X Y

 

aZ cX



( 7 .

1 ) 



( 7 .

2 ) "Now try the direct effect of

: We intervene by fixing

and

but increasing

by one unit; this should increase

units. However, this hypothetical intervention is self-contradictory , because fixing

and increasing

causes an increase in

. The oversight: Fixing

DISABLES equation (7.1); 10

SEM REACTION TO FREEDMAN CRITICS

• • • • Total Surrender: It would be very healthy if more researchers thinking of and using terms such as cause and effect (Muthen, 1987). abandoned “Causal modeling” is an outdated misnomer (Kelloway, 1998).

Causality-free, politically-safe vocabulary: “covariance structure” “regression analysis,” or “simultaneous equations.” [Causal Modeling] may be somewhat dated, however, as it seems to appear less often in the literature nowadays” (Kline, 2004, p. 9) 11

THE RESURRECTION

Why non-parametric perspective?

 



What are the parameters all about and why we labor to identify them?

Can we do without them?

Consider:



(

) Only he who lost the parameters and needs to find substitutes can begin to ask: Do I really need them? What do they really mean? What role do the play? 12

THE LOGIC OF SEM

- CAUSAL ASSUMPTIONS

) A* -

Logical implications of

Causal inference

Queries of interest

Q(P)

- Identified estimands Data (

)

T(M

) -

Testable implications Statistical inference

- Estimates of

Q(P)

(

, Conditional claims

)

(

) Model testing Goodness of fit 13

TRADITIONAL STATISTICAL INFERENCE PARADIGM

Data

Joint Distribution

(

) (Aspects of

) Inference e.g., Infer whether customers who bought product

would also buy product

(

) 14

FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES Probability and statistics deal with static relations Data

Joint Distribution change

 Joint Distribution

(

 ) (Aspects of

 ) Inference What happens when

changes?

e.g., Infer whether customers who bought product

would still buy

if we were to double the price .

FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES What remains invariant when

P P

 (

price

=2)=1 changes say, to satisfy Data

Joint Distribution change

 Joint Distribution

(

 ) (Aspects of

 ) Note:

 (

) Inference 

(

price =

does not tell us how it ought to change Causal knowledge: what remains invariant 16

FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES (CONT) 1.

Causal and statistical concepts do not mix.

CAUSAL Spurious correlation Randomization / Intervention Confounding / Effect Instrumental variable Exogeneity / Ignorability Mediation STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility / Granger causality Propensity score 2.

FROM STATISTICAL TO CAUSAL ANALYSIS: 2. MENTAL BARRIERS 4.

Causal and statistical concepts do not mix.

CAUSAL STATISTICAL Spurious correlation Regression Randomization / Intervention Confounding / Effect Association / Independence “Controlling for” / Conditioning Instrumental variable Odd and risk ratios Exogeneity / Ignorability Collapsibility / Granger causality Mediation Propensity score No causes in – no causes out statistical assumptions + data causal assumptions (Cartwright, 1989) }  causal conclusions Causal assumptions cannot be expressed in the mathematical language of standard statistics.

FROM STATISTICAL TO CAUSAL ANALYSIS: 2. MENTAL BARRIERS 1.

Causal and statistical concepts do not mix.

Non-standard mathematics: a) Structural equation models (Wright, 1920; Simon, 1960) b) Counterfactuals (Neyman-Rubin (

Y x

)

Lewis (

x Y

)) 19

THE STRUCTURAL MODEL PARADIGM

Data Joint Distribution Data Generating Model

M Q

(

Inference

– Invariant strategy (mechanism, recipe, law, protocol) by which Nature assigns values to variables in the analysis.

) (Aspects of

) 20

STRUCTURAL CAUSAL MODELS

• • • • Definition: A structural causal model 

V,U, F, P

(

)  , where

V U

= { = {

V U

1 1

,...,V n ,...,U

}

} is a 4-tuple are endogenous variables are background variables

F v i

= {

,..., f n

} =

f i

(

)

(

) are functions determining

, e.g.,

   is a distribution over





u Y P

(

) and

induce a distribution observable variables

(

) over 21

CAUSAL MODELS AND COUNTERFACTUALS

Definition: The sentence: “

would be

(in unit

), had

been

x ,

” denoted

Y x

(u) = y

, means: The solution for

in a mutilated model

M x

, (i.e., the equations for

replaced by

) with input

U=u ,

is equal to

y .

U U X

(

)

(

)

x Y X

(

)

M M x

CAUSAL MODELS AND COUNTERFACTUALS

Definition: The sentence: “

would be

(in unit

), had

been

x ,

” denoted

Y x

(u) = y

, means: The solution for

in a mutilated model

M x

, (i.e., the equations for

replaced by

) with input

U=u ,

is equal to

y .

The Fundamental Equation of Counterfactuals:

Y x

(

) 

Y M x

(

) 23

CAUSAL MODELS AND COUNTERFACTUALS

Definition: The sentence: “

would be

(in situation

), had

been

x ,

” denoted

Y x

(u) = y

, means: The solution for

in a mutilated model

M x

, (i.e., the equations for

replaced by

) with input

U=u ,

is equal to

y .

• Joint probabilities of counterfactuals:

(

Y x



Z w



) 

Y x

(

)  In particular: 

Z w

(

) 

z P

(

)

(

PN y

(

) (

Y x

' 

|' )

(

Y x y

)  

) 

Y x

 (

) 

y P

(

) 

Y x

' (

) 

(

) 24

3-LEVEL HIERARCHY OF CAUSAL MODELS

1. Probabilistic Knowledge

P(y | x)

Bayesian networks, covariance structure 2. Interventional Knowledge

P(y | do(x))

Causal Bayesian Networks (CBN) (Agnostic graphs, manipulation graphs) 3. Counterfactual Knowledge

P(Y

Structural equation models, physics,

= y,Y



)

functional graphs 25

TWO PARADIGMS FOR CAUSAL INFERENCE

Observed:

(

X, Y, Z,...

) Conclusions needed:

(

Y x =y

(

X y =x

Z=z

)...

How do we connect observables,

X,Y,Z

,… to counterfactuals

Y x , X z , Z y

,… ?

N-R

model Counterfactuals are primitives, new variables Super-distribution

* (

,...,

Y x

X z

,...)

constrain

Y x

Z y

,...

Structural model Counterfactuals are derived quantities Subscripts modify the model and distribution

(

Y x



) 

P M x

(



) 26

ARE THE TWO PARADIGMS EQUIVALENT?

• Yes (Galles and Pearl, 1998; Halpern 1998) • In the

N-R

paradigm,

Y x

consistency: is defined by



1  ( 1 

)

0 • In SEM, consistency is a theorem.

• Moreover, a theorem in one approach is a theorem in the other.

THE FIVE NECESSARY STEPS OF CAUSAL ANALYSIS

Define: Express the target quantity

Q Q

(

) as a function that can be computed from any model

Assume: Formulate causal assumptions

using some formal language. Identify: Determine if

is identifiable given

. Estimate: Estimate

if it is not. if it is identifiable; approximate it, Test: Test the testable implications of

(if any). 28

THE FIVE NECESSARY STEPS FOR EFFECT ESTIMATION

Define: Assume: Express the target quantity

as a function

(

) that can be computed from any model

ATE 

(

1 )) 

(

( Formulate causal assumptions

A x

0 )) using some formal language. Identify: Determine if

is identifiable given

. Estimate: Estimate

if it is not. if it is identifiable; approximate it, Test: Test the testable implications of

(if any). 29

FORMULATING ASSUMPTIONS THREE LANGUAGES

1. English: Smoking (

), Cancer (

), Tar (

), Genotypes (

) 2. Counterfactuals:

Z x

(

u X y

(

) )  

Z yx

(

X zy

(

) 

Y z

(

)

Z x

  

Y zx

(

), {

Y z

} Not too friendly: consistent? complete? redundant? arguable?

X z

(

) 

(

), 3.

Structural:

X Z Y

IDENTIFYING CAUSAL EFFECTS IN POTENTIAL-OUTCOME FRAMEWORK

Define: Assume: Identify: Express the target quantity

as a counterfactual formula, e.g.,

(

(1) –

(0)) Formulate causal assumptions using the distribution:

* 

(

( 1 ),

( 0 )) Determine if

is identifiable using

* and

Y=x Y

(1) + (1 –

)

(0).

Estimate: Estimate

if it is not.

if it is identifiable; approximate it, 31

GRAPHICAL – COUNTERFACTUALS SYMBIOSIS

Every causal graph expresses counterfactuals assumptions, e.g.,



1. Missing arrows



Z Y x

(

) 

Y x

(

) 2. Missing arcs

Y Z Y x

 

Z y

consistent, and readable from the graph.

• Express assumption in graphs • Derive estimands by graphical or algebraic methods 32

NON-PARAMETRIC IDENTIFIABILITY

Definition: Let

(

) be any quantity defined on a causal model

and let

be a set of assumption.

is identifiable relative to

iff

(

1 ) =

(

2 ) 

(

1 )

= Q

(

2 ) for all

1 ,

2 , that satisfy

In other words,

can be determined uniquely from the probability distribution

(

) of the endogenous variables,

, and assumptions

is displayed in graph

IDENTIFICATION IN SEM

Find the effect of

(

)), given the causal assumptions shown in are auxiliary variables.

, where

1 ,...,

Z k G Z 1 Z 2 Z 3 Z 4 Z 5 X Z 6 Y

Can

(

)) be estimated if only a subset, can be measured?

, 34

NON-PARAMETRIC IDENTIFICATION USING THE BACK-DOOR CRITERION

(

)) is estimable if there is a set

variables such that

Z d

-separates

from

G x .

Z 1 G Z 2 Z 1 G x Z Z 2 Z 3 Z 3 Z 4 Z 5 Z 5 Z 4 X Z 6 Y X Z 6 Y

• Moreover,

(

(“adjusting” for

) (

 ))  

(

x z

Ignorability ,

)

(

)  

z P

(

)

(

) 35

THE MOTHER OF ALL QUESTIONS Question: Under what conditions can a path coefficient  be estimated as a regression coefficient, and what variables should serve as the regressors?

Answer: It is all in the diagram (Pearl 2009, p. 180)

S X



The single door criterion  

r YX



  

x E

(

) i if 1. No member of

2. S S

is a descendant of

, and blocks all paths from

, except the direct path



SEARCHING FOR THE TESTABLE

FINDING THE TESTABLES BY INSPECTION

Z 1 Z 2 W 1 Z 3 W 2 X W 3 Y

Missing edges imply conditional independencies and vanishing regression coefficients.

Y X

| {

1 ,

2 ,

3 } | {

1 ,

3 }

2 

2    

Y c

 

X c

1  

3  

3  '' • Software for automatic detection of all tests (Kyono, 2010)   ' 38

FINDING INSTRUMENTAL VARIABLES

Find an instrument for identifying

34 (Duncan, 1975) Find a variable

that, conditioned on

, is independent of the disturbance

By inspection:

-separated from

Therefore,

1 is a valid instrument 39

FIND EQUIVALENT MODELS

• • Model-equivalence implies

-separation-equivalence

-separation tests should replace Lee & Herschberger rules

Z X W Z X W Y

(a)

(b) Reason: conditioned on

, Path







is open in (a) and blocked in (b) 40

CONFOUNDING EQUIVALENCE WHEN TWO MEASUREMENTS ARE EQUALLY VALUABLE

L W



T W

 

x E

(

)  

x E

(

)

FROM IDENTIFICATION TO ESTIMATION

Define: Assume: Identify: Express the target quantity

as a function

(

) that can be computed from any model



(

)) Formulate causal assumptions using ordinary scientific language and represent their structural part in graphical form. Determine if

is identifiable.

Estimate: Estimate

if it is not .

if it is identifiable; approximate it, 42

PROPENSITY SCORE ESTIMATOR (Rosenbaum & Rubin, 1983)

(

))

= ?

Z 3 Z 1 Z 4 Z 2 Z 5

X Z 6 Y

Theorem:

(

1 ,

2 ,

3 ,

4 ,

5 ) 

(

 1 |

1 ,

2 ,

3 ,

4 ,

5 ) 

(

y z

)

(

)  

(

y l



)

(



) Adjustment for

replaces Adjustment for

WHAT PROPENSITY SCORE (PS) PRACTITIONERS NEED TO KNOW

(

) 

(

 1 |



) 

z P

(

)

(

)  

(

y l

)

(

) 1. The asymptotic bias of PS is EQUAL to that of ordinary adjustment (for same

2. Including an additional covariate in the analysis CAN SPOIL the bias-reduction potential of others.

3. In particular, instrumental variables tend to amplify bias.

4. Choosing sufficient set for PS, requires knowledge of the model.

RETROSPECTIVE COUNTERFACTUALS ETT – EFFECT OF TREATMENT ON THE TREATED 1. Regret: I took a pill to fall asleep. Perhaps I should not have?

2. Program evaluation: What would terminating a program do to those enrolled?

(

Y x



' ) 45

COUNTERFACTUAL DEFINITION EFFECT OF TREATMENT ON THE TREATED Define: Assume: Identify: Express the target quantity

as a function

(

) that can be computed from any model

ETT 

(

Y x



' ) Formulate causal assumptions using ordinary scientific language and represent their structural part in graphical form. Determine if

is identifiable.

Estimate: Estimate

if it is not.

if it is identifiable; approximate it, 46

ETT - IDENTIFICATION

Theorem (Shpitser-Pearl, 2009) ETT is identifiable in

iff

( y |

(

) is identifiable in



W X Y

Moreover, ETT 

( (

Complete graphical criterion | |

' ' ) ) 

(

)

| '

| 

' 47

EFFECT DECOMPOSITION (direct vs. indirect effects)

1. Why decompose effects?

2. What is the definition of direct and indirect effects?

3. What are the policy implications of direct and indirect effects?

4. When can direct and indirect effect be estimated consistently from experimental and nonexperimental data?

WHY DECOMPOSE EFFECTS?

1. To understand how Nature works 2. To comply with legal requirements 3. To predict the effects of new type of interventions: Signal routing , rather than variable fixing 49

LEGAL IMPLICATIONS OF DIRECT EFFECT

Can data prove an employer guilty of hiring discrimination?

(Gender)

X Z

(Qualifications)

(Hiring) What is the direct effect of

(

1 ),

(

)) 

(

0 ),

(

)) (averaged over

) Adjust for

? No! No!

NATURAL INTERPRETATION OF AVERAGE DIRECT EFFECTS

Robins and Greenland (1992) – “Pure”

X Z z = f

(

x, u

)

y = g

(

x, z, u

)

Natural Direct Effect of

(

0 ,

1 ;

The expected change in

, when we change

) from

1 and, for each

, we keep attained before the change.

to constant at whatever value it

[

Y x

Z x

0 

Y x

0 ] In linear models,

= Controlled Direct Effect   (

1 

0 ) 51

DEFINITION OF INDIRECT EFFECTS

X Z z = f

(

x, u

)

y = g

(

x, z, u

)

Indirect Effect of

: The expected change in

(

0 ,

1 ;

) when we keep

constant, say at

0 , and let

attained had

change to whatever value it would have changed to

1 .

[

Y x



Y x

0 ]

1 In linear models,

IE = TE – DE

, but not in general.

POLICY IMPLICATIONS OF DIRECT EFFECTS

What is the direct effect of

Residual hiring disparity if skills inequality is eliminated GENDER

DISABLE

QUALIFICATION

f Y

HIRING Deactivating a link – a new type of intervention 53

POLICY IMPLICATIONS OF INDIRECT EFFECTS

What is the indirect effect of

Residual hiring disparity if employers’ prejudices are eliminated.

GENDER

X Z

QUALIFICATION IGNORE

f Y

HIRING Deactivating a link – a new type of intervention 54

WHY





IE Z TE





(

rev

)

In linear systems

 



(

rev

) 

2  



2 



DE Y

TE - DE

 Effect sustained by mediation alone



 Effect prevented by disabling mediation Disabling mediation

Disabling direct path 55

MEDIATION FORMULAS

1. The natural direct and indirect effects are identifiable in Markovian models (no confounding), 2. And are given by:



  [

(

Y z



(

Y z

| |

(

1 ,

)) 

(

Y do

(

0 ,

))[

(

| |

(

0 ,

))]

(

z do

(

1 )) 

(

| |

(

0 )).

(

0 ))] 3. Applicable to linear and non-linear models, continuous and discrete variables, regardless of distributional form.

MEDIATION FORMULAS IN UNCONFOUNDED MODELS

Z DE IE TE

 

[

(

X x

1 ,

) 

(

Y Y

0 ,

)]

(

0 )  

[

(

0 ,

)[

(

1 ) 

(

0 ) ] 

(

1 ) 

(

0 ) IE = TE – DE = Fraction of responses explained by Fraction of responses owed to mediation mediation 57

MODERATING MEDIATORS

1 

Y z y

 

0  





12 What portion of the effect can be explained-by/attributed-to mediation?

  

12 The Mediation Formula yields:

IE TE

 



12 Effect Explained: Effect Attributed:

IE TE



 

2 

12 58

X n

Z Y

MEDIATION FORMULA FOR BINARY VARIABLES

0 1 0 1 0 1 0 1 59

RAMIFICATION OF THE MEDIATION FORMULA

•

should be averaged over mediator levels,

should NOT be averaged over exposure levels.

•

TE-DE

need not equal

IE TE-DE

= proportion for whom mediation is necessary

= proportion for whom mediation is sufficient •

TE-DE

informs interventions on indirect pathways

informs intervention on direct pathways.

CONCLUSIONS

• Formal basis for causal and counterfactual

on an explicit causal structure that is defensible on scientific grounds.

and structural equation approaches • Friendly and formal solutions to century-old problems and confusions.

• No other method can do better (theorem) 61

X - UCLA

Transcript X - UCLA

Otis Dudley Duncan Memorial Lecture

THE RESURRECTION OF DUNCANISM

Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/)

OUTLINE

DUNCAN BOOK

DUNCANISM = ASSERTIVE SEM

HISTORY: BIRTH, OPPRESSION, DISTORTION, RESURRECTION

SEM REACTION TO THE STRUCTURE-PHOBIC ASSAULT

WHY SEM INTERPRETATION IS “SELF-CONTRADICTORY”

SEM REACTION TO FREEDMAN CRITICS

THE RESURRECTION

THE LOGIC OF SEM

TRADITIONAL STATISTICAL INFERENCE PARADIGM

THE STRUCTURAL MODEL PARADIGM

STRUCTURAL CAUSAL MODELS

CAUSAL MODELS AND COUNTERFACTUALS

CAUSAL MODELS AND COUNTERFACTUALS

CAUSAL MODELS AND COUNTERFACTUALS

3-LEVEL HIERARCHY OF CAUSAL MODELS

TWO PARADIGMS FOR CAUSAL INFERENCE

ARE THE TWO PARADIGMS EQUIVALENT?

THE FIVE NECESSARY STEPS OF CAUSAL ANALYSIS

THE FIVE NECESSARY STEPS FOR EFFECT ESTIMATION

FORMULATING ASSUMPTIONS THREE LANGUAGES

IDENTIFYING CAUSAL EFFECTS IN POTENTIAL-OUTCOME FRAMEWORK

GRAPHICAL – COUNTERFACTUALS SYMBIOSIS

NON-PARAMETRIC IDENTIFIABILITY

IDENTIFICATION IN SEM

SEARCHING FOR THE TESTABLE

FINDING THE TESTABLES BY INSPECTION

FINDING INSTRUMENTAL VARIABLES

FIND EQUIVALENT MODELS

FROM IDENTIFICATION TO ESTIMATION

PROPENSITY SCORE ESTIMATOR (Rosenbaum & Rubin, 1983)

ETT - IDENTIFICATION

EFFECT DECOMPOSITION (direct vs. indirect effects)

WHY DECOMPOSE EFFECTS?

LEGAL IMPLICATIONS OF DIRECT EFFECT

NATURAL INTERPRETATION OF AVERAGE DIRECT EFFECTS

DEFINITION OF INDIRECT EFFECTS

POLICY IMPLICATIONS OF DIRECT EFFECTS

POLICY IMPLICATIONS OF INDIRECT EFFECTS

MEDIATION FORMULAS

MEDIATION FORMULAS IN UNCONFOUNDED MODELS

MODERATING MEDIATORS

MEDIATION FORMULA FOR BINARY VARIABLES

RAMIFICATION OF THE MEDIATION FORMULA

CONCLUSIONS

on an explicit causal structure that is defensible on scientific grounds.

QUESTIONS???

They will be answered

Directory