Bayesian factor and structural equation models, issues in

Download Report

Transcript Bayesian factor and structural equation models, issues in

Bayesian factor and structural equation
models in spatial applications.
Specification, identification and model
assessment, with case study
illustrations
Peter Congdon, Queen Mary
University of London
Dept of Geography & Centre for
Statistics
Outline
 Background:
Bayesian approaches to LV
models, advantages & disadvantages
 Computational options including
WINBUGS
 Wider application contexts of Bayesian LV
& SEM models
 Spatial Priors; Common Spatial Factors
Outline (continued)
 Different
sorts of spatial factor model
(depending on form of manifest variables)
and possible identification issues
 Assessing models, model fit & model
choice. Possible variable/model choice
approaches
 Case studies
Case Studies
 Social
capital & mental health, multilevel
model using Health Survey for England
(HSE)
 Multilevel model for joint prevalence of
obesity & diabetes, BRFSS respondents
nested within US counties & states (CDC
Behavioral Risk Factor Surveillance
System)
 Suicide & self-harm, ecological study for
small areas (wards) in Eastern England
Background

SEM and factor models originate in (& still
most widely used) in psychological,
educational & behavioural applications.
 Recent Bayesian applications to
psychological & education testing data
include SEM (e.g. Lee & Song, 2003), LCA,
item analysis, and factor analysis per se (e.g.
Aitkin & Aitkin, 2005; Press & Shigemasu,
1998).
 Also some work on automated Bayesian
model choice in normal linear factor model
Advantages of Bayesian Approach
 Straightforward
to depart from standard
assumptions often built into classical
estimation methods (e.g. factor scores
multivariate normal & independent over
subjects)
 Advantage in generalizations such as
nonlinear factor effects, multiplicative factor
schemes
Advantages of Bayesian Approach
(continued)
 Random
effect models (of which
factor/SEM models are subclass) can be
fitted without relying on numerical methods
to integrate out random effects
 Potential for Bayesian model choice
procedures (e.g. stochastic search
variable selection) in factor/SEM models
Disadvantages of Bayesian Approach
 Identification
issues (re “naming” of
factors): can have label switching for latent
constructs during MCMC updating if there
aren’t constraints to ensure consistent
labelling.
 Slow convergence of model parameters or
global model fit measures (e.g. DIC and
effective parameter estimate) in large
latent variable applications (e.g. 1000 or
10000 subjects)
Disadvantages of Bayesian Approach
 Formal
Bayes model assessment
(marginal likelihoods/Bayes factors)
difficult for large realistic applications
 Sensitivity to priors on hyperparameters
(e.g. priors for factor covariance matrix)
 Bayesian approach may need sensible
priors when applied to factor models
(“diffuseness“ not necessarily suitable)
Bayesian Computing
 Many
Bayesian applications to SEM and
factor analysis facilitated by WINBUGS
package.
 See Congdon (Applied Bayesian
Modelling, 2003); Lee (Structural Equation
Modeling: a Bayesian Approach, 2007)
 Alternative is R…more programming
involved
 BayesX can’t model common factors
WINBUGS
 Despite
acronym, WINBUGS employs
Metropolis-Hastings updating where
necessary as well as Gibbs sampling
 Program code is essentially a description
of the priors & likelihood, but can monitor
model-related quantities of interest
Computing Illustration: a Normal
SEM

Wheaton Study: 3 latent variables, each
measured by two indicators. Alienation67
measured by anomia67 (1967 anomia scale)
and powles67 (1967 powerlessness scale).
 Alienation71 is measured in same way, but using
1971 scales.
 Third latent variable, SES (socio-economic
status) measured by years of schooling and
Duncan's Socioeconomic Index, both in 1967.
Structural model relates alienation in 1971
(F2) to alienation in 1967 (F1) and SES (G)
F2i = βF1i + g2Gi+u2i
F1i = g1Gi + u1i
Measurement model for alienation
yji=aj +ljF1i
j=1,2
yji=aj +ljF2i
j=3,4
Measurement model for SES
xji=dj +kjGi
j=1,2
WINBUGS for Wheaton study
model { for (i in 1:n) { # structural model
F2[i] ~ dnorm(mu.F2[i],1);
mu.F2[i] <- beta* F1[i]+gam[2]*G[i]
F1[i] ~ dnorm(mu.F1[i],1);
mu.F1[i] <- gam[1]*G[i]}
# priors (normal uses inverse variance)
for (j in 1:2) {gam[j] ~ dnorm(0,0.001)}
beta ~ dnorm(0,0.001)
# measurement equations for alienation
for (i in 1:n) { for (j in 1:4) { y[i,j] ~ dnorm(mu[i,j],tau[j])}
mu[i,1] <- alph[1]+lam[1]* F1[i]; mu[i,2] <- alph[2]+lam[2]* F1[i]
mu[i,3] <- alph[3]+lam[3]* F2[i]; mu[i,4] <- alph[4]+lam[4]* F2[i]}
# PRIORS
for (j in 1:4){ alph[j] ~ dnorm(0,0.001);
# gamma prior on precisions
tau[j] ~ dgamma(1,0.001)
# alternative prior starts with s.d. of residuals
# sd.y[j] ~ dunif(0,100); tau[j] <- 1/(sd.y[j]*sd.y[j])
# identifiability constraint on loadings to ensure
# positive alienation measure
lam[j] ~ dnorm(1,1) I(0,)}
# measurement of SES (G[i])
for (i in 1:n) { G[i] ~ dnorm(0,1)
for (j in 1:2) { x[i,j] ~ dnorm(mu.x[i,j],tau.x[j])}
mu.x[i,1] <- del[1]+kappa[1]* G[i];
mu.x[i,2] <- del[2]+kappa[2]* G[i]}
for (j in 1:2) {del[j] ~ dnorm(0,0.001);
# gamma prior on precisions
tau.x[j] ~ dgamma(1,0.001)
# identifying constraint ensures +ve SES scale
kappa[j] ~ dnorm(1,1) I(0,)}}
Monitoring model related quantities
 Suppose
one were interested in posterior
probs that F2i > F1i (alienation increasing
for ith subject)
 Add code
for (i in 1:n) {delF[i] <- step(F2[i]-F1[i])}
 Then posterior means of delF provide
required probabilities
Widening Applications of Latent
Variable Methods
 In
particular: application contexts of Bayes
SEM/factor models now include ecological
(area level) studies of health variations.
 Usually no longer valid to assume units
(i.e. areas) are independent.
 Instead spatial correlation in latent
variable(s) (common spatial factors) over
the areas should be considered
Multi-Level Latent Variable Models
 Latent
variable methods also more widely
applied in multilevel health studies
 Such models consider joint impact of
individual level and area level risk factors
on health status
 With several outcomes (data both
multivariate & multilevel) can model area
effects using common factor(s)
SOME SPATIAL PRIORS:
THE BASIS FOR COMMON
SPATIAL FACTORS
Priors incorporating spatial structure:
basis for common spatial factors
 May
be specified over continuous space
(geostatistical models often used for
“kriging”)
 OR for discrete sets of areas with irregular
boundaries (“lattices” or “polygons”)
 Major classes:
 Simultaneous Autoregressive (SAR) or
Conditional Autoregressive (CAR) priors
Spatial Priors
 My
focus: CAR priors for “lattices” (e.g.
administrative areas)
 These are priors for “structured” effects
(where labels of area units are important)
as opposed to unstructured effects
(unaffected or exchangeable over different
labelling scheme for areas)
Substantive Basis
 Generally
taken to represent
unmeasured area level risk factors for
health that vary relatively smoothly
over space (regardless of arbitrary
administrative boundaries that may
define units of analysis)
 Substantive grounding: increased
recognition of genuine spatial effects
on health (“contextual” effects)
DIFFERENT TYPES OF
COMMON SPATIAL FACTOR
(A) Manifest health variables
 Manifest
variables are health outcomes
yij (areas i, variable j)
 Common residual factor si, expresses
spatial clustering recurring over several
outcomes j
 Interpretable as index of common
health risks over outcomes
 Example: Wang & Wall 2003
(B) Census Indicator Confirmatory
Model.
 Common
Spatial Socioeconomic Factor or
Factors (deprivation, rurality, etc) based on
relevant indicators Zik (k=1,..,K) such as
unemployment, low income etc.
 Often census indicators form bulk of
manifest variables
 Example: Hogan & Tchernis JASA 2004
(C) Two Classes of Manifest
Variable
 Common
factor(s) used to explain
variations in observed Y variables (health
outcomes).
 But factors mainly measured by
socioeconomic indicators Z (e.g. census
data)
 Example: my Eastern region suicide study
 Partly confirmatory, partly exploratory
MANIFEST VARIABLES:
AREA HEALTH VARIABLES
(A) Shared Spatial Residual Effects
 Unobserved
area effects common to
several health outcomes modelled by
shared spatial effect
 Typical scenario: area counts yij for
areas i and outcomes j. Poisson or
binomial likelihood
Types of Event
 May
be deaths, hospitalizations, incidence
counts for different cancer types,
prevalence counts, etc
 Expected events (offset) Eij based on
standard age rates applied to area
populations: yij ~ Poisson(Eijrij)
 Can also have populations at risk: yij ~
Poisson(Nirij) or yij ~ Bin(Ni,pij)
Multivariate Spatial Effects
 One
option for such data: no reduction
 Multivariate residual effects
log(rij)=aj+sij
(or log(rij)=aj+jxi+sij)
 For sij could use multivariate version of
conditional autoregressive prior
Multivariate Spatial Effects
 Multivariate
normal CAR Prior is example
of Markov Random Field (Rue & Held,
2005).
 Easily applied in WINBUGS using mv.car
prior.
 May fit well but proliferation of parameters
(more parameters than data points)
Alternative : common spatial factor
 log(rij)=aj+ljsi
 Parsimonious
and provides interpretable
summary measure of health risk
 si is univariate CAR (or some other prior
with spatial dependence)
 Correlation between outcomes within areas
modelled via loadings lj.
Identification: Location & Scale
 Need
isi=0 for location identification.
Centre effects at each MCMC iteration.
 Scale identifiability:
EITHER set var(s)=1 and all lj are free
loadings (fixed scale),
OR leave var(s) unknown and constrain a
loading, e.g. l1=1.0 (anchoring constraint)
Labelling Problems in Repeated
Sampling
 Even
in simple model, labelling may be an
issue.
 Consider fixed variance identification
option, var(s)=1, loadings all unknown.
 Suppose diffuse priors are taken on
loadings in
log(rij)=aj+ljsi
without directional constraint.
Labelling Problems (continued)
 Then
can have:
a) lj all positive combined with si acting as
positive measure of health risk (higher si in
areas with higher cancer rates)
OR
b) lj all negative combined with si acting
as negative measure of health risk (si
higher in areas with lower cancer rates)
Identifying constraints for
consistent labelling
 For
unambiguous labelling advisable to
constrain one or more lj to be positive
(e.g. truncated normal or gamma prior)
 Note that anchoring constraint with var(s)
unknown, and preset loading (e.g. l1=1.0),
may be intrinsically better identified –
steers remaining unknown lcoefficients to
consistent labelling
Loadings and Labellings
 May
not be sufficient just to rely on
constraining one loading (e.g. assume +ve)
to ensure consistent labelling
 Sometimes said that constraining direction
on one loading ensures consistent
identification…
 What if indicator chosen for constrained
loading (e.g. lii> 0) is poor measure for
construct
Loadings and Labellings
 If
twenty indicators are measuring a
construct, the 19 unconstrained loadings
may “fit” a different label (e.g. deprivation)
to that implied by the remaining
constrained loading (e.g. affluence)
 Personal View: Much depends on suitable
selection of manifest indicators and which
(and how many, maybe >1 ) are chosen to
have constrained loadings
WINBUGS Code for manifest
variable scenario A
Extensions of Spatial Common Factors
 Product
schemes. Consider health
outcomes arranged by area i and age x.
Populations at risk Nix
yix ~ Poisson(Nixnix)
log(nix)=ax+xxsi
 xx show which age groups are most
sensitive to spatial variations in risk
represented by si
 Variation on Lee-Carter (JASA 1992)
mortality forecasting model
Random Effect Loadings
 xx
potentially random, rather than
fixed effects.
 Identified using sum to 1 or averaging
to 1 constraint, e.g. xx multinomial, or
xx~Gamma(h,h)
Nonlinear effects of common factor
 One
e.g.
possibility: just take powers of si,
log(rij)=aj+ljsi+kjs2i
 Or:
spline for nonlinear effects in
common factor score si.
 e.g. under fixed variance var(s)=1
option, locate knots wk at selected
quantiles on cumulative standard
normal.
Linear Spline
 Then
linear spline
log(rij)=aj+ljsi+kbjk(si- wk)+
 bjk might be random effects, but
raises identification issues…?
INDICATOR BASED
SPATIAL CONSTRUCTS
(B) Indicator Based Spatial Constructs
 Many
studies use latent constructs to
analyze population health variations.
 Such constructs (e.g. deprivation) not
directly observed
 Instead derived from a collection of relevant
indicator variables that are observed, using
multivariate techniques or other “composite
variable” methods
 Many health outcomes show “deprivation
gradient”
Latent Constructs in Population
Health
 Example:
Townsend deprivation score
based on summing standardized census
area values for 4 input variables (sum of “z
scores”)
 % unemployed, % with no car, %
households overcrowded, % households not
owner occupiers
Other area constructs
 Other
examples of latent constructs
relevant to area health variations:
rurality/urbanicity, social fragmentation
 Social fragmentation scores used to
analyze variations in area suicide rates
and psychiatric hospitalization rates
Confirmatory Indicator Based Model
 Confirmatory
model: indicators k=1,..,K are
established proxies for latent construct
 e.g. area unemployment rates, welfare
recipients, social housing rates as
indicators of area deprivation
 Census rates rik=zik/Dik where zik are counts
(e.g. unemployed), Dik are relevant
denominators (e.g. econ active
populations).
One option for confirmatory model
 Use
Gaussian approximation to binomial
(Hogan & Tchernis JASA 2004) with
variance stabilizing transformation:
Rik=rik, var(Rik)=fk/Dik.
→ normal measurement equations
Rik ~N( dk+kkFi, fk/Dik)
where Fi scores follow spatial CAR prior
Or use relevant Exponential Family links in
deriving common spatial factor
= exp([zikik-b(ik)]/f+c(zik, f))
 e.g. zik binomial with populations Ni, zik ~
Bin(Ni,pik)
 Logit link, plus overdispersion effects wik
logit(pik)= dk+kkFi+wik
 wik : normal and uncorrelated over
indicators k.
 P(zik|ik)
For other indicators transform to
normality
 For
intrinsic proportions (e.g. proportion of
area that is green space as indicator of
rurality) take logit transform to
approximate normality
 for population density take log transform
 etc
TWO CLASSES OF MANIFEST
VARIABLE
(C) Spatial Factors in Model with 2
classes of manifest variable
Outcomes Yij (j=1,…,J); e.g.
mortality or incidence counts
 Social Indicators Zik (k=1,..k); e.g. census
rates of unemployment
 Typical Scenario: multiple common spatial
factors (F1i,..,FQi) primarily measured by Z
variables (indicators established as
relevant).
 Health
2 class model
 But
Factors F also act to potentially
explain area variations in health outcomes
Y.
 Z to F links confirmatory, Y to F links
exploratory
Example








Four Poisson health outcomes Y1-Y4, Eight
indicators: Z1-Z4 measure F1; Z5-Z8 measure
F2 ; both F1 and F2 may explain Y
Yij ~ Po(Eijrij)
log(rij)=aj+j1F1i+j2F2i
Zik ~ EF(mik )
g(mi1)= d1+k11F1i+wi1
……
g(mi5)= d5+k52F2i+wi5
………
MODEL CHOICE
Formal Choice or Not
 Formal
Bayes model criteria (e.g. marginal
likelihood/Bayes factor) difficult to derive;
also change with priors
 Popular alternative (AIC analogue):
Deviance Information Criterion (DIC).
 Average deviance Dev.bar + effective
parameter count de
DIC=Dev.bar+ de
Model Fit in Realistic Applications
 Multilevel
applications to health survey
data may involve thousands of subjects
(e.g. HSE study).
 Ecological applications may involve
hundreds of small areas (Eastern region
suicide study)
Model Fit in Realistic Applications
 Convergence
of DIC and de typically slow
in models with many random effects (such
as factor scores)
 Slow convergence also applies to other
measures of fit, e.g. Monte Carlo
estimates of conditional predictive
ordinates
 Model selection alternatives…
Model Choice using Variable Selection
 Model
selection potentially for both
loadings and factor variance/covariance
structure.
 Don’t necessarily apply selection for all
elements in any particular application (e.g.
depending whether exploratory or
confirmatory)
 Apply to selected aspects of spatial SEM
models, e.g. loadings only or correlations
between factors only
Selection in 2 manifest variable
SEM
 Spatial
factor models with 2 types of
manifest variable (health outcomes Yj +
socioeconomic indices Zk)
 Apply selection to loadings jq linking Yj to
Fq (exploratory part of model)
 But don’t apply selection to Z on F
loadings (confirmatory sub-model based
on extensive prior knowledge)
Mixture Priors for Selecting Loadings
Random Effects Selection
 Selection
procedures for random effects
and/or their variance/covariance structure
e.g. Cai and Dunson (2008), Tüchler &
Frühwirth-Schnatter (2008)
 These extend to factor and SEM models
as factors are shared random effects
RE Selection: Multivariate Spatial
Prior
 Q>1
for shared common spatial
factors
 Within area covariance matrix in
MCAR prior denoted F
Cholesky Decomposition of Covariance Matrix F
Selection on variances and/or covariances
 Suppose
investigator sure about number
of factors (confirmatory model based on
substantial evidence)
 BUT not sure whether correlations
between factors are needed
 Selection can be applied to relevant
parameters in decomposition of F →
mixture prior selection on gqr parameters to
decide whether correlations needed
CASE STUDIES
 Social
capital and mental health,
multilevel model using Health Survey
for England (HSE)
 Multilevel model, joint prevalence of
obesity & diabetes, BRFSS subjects
nested within US counties & states
 Suicide & self-harm, ecological (area)
study for wards in Eastern England
Case Study 1, Mental Health & Social
Capital, Health Survey for England

Y is observed mental health status (binary). Y=1
if subject’s GHQ12 score is 4 or more, Y = 0
otherwise.
 Pr(Y=1) related to known socioeconomic risk
factors X at individual subject level
 Pr(Y=1) also related to known indicators of
geographic context, G (e.g. micro-area
deprivation quintile, region of residence, urbanrural residence). Micro-areas (32K in England)
called Super Output Areas
Latent Risks
 Finally
Pr(Y=1) also related to latent
subject level risks, {F1i,F2i,...,FQi}
 Examples: social capital, perceived stress.
 Structural model: Y~f(Y|X,G,F,,g,l)
Health Outcome Sub-Model
 Regression
involves 9065 adult subjects.
Yi~ Bin(1,pi) .
 Use log-link (→relative risk interpretation).
 Q=1 for single latent risk factor (social
capital)
 log(pi)=βXi+γGi+lFi
=β₀+β1,gend[i]+β2,age[i]+β3,eth[i]+β4,oph[i]+β5,own[i]
+β6,noqual[i]+g1,reg[i]+g2,dep[i]+g3,urb[i]+lFi
Multiple Indicators for Social
Capital
 Social
capital measured by a battery of K
survey `items' (e.g. questions about
neighbourhood perceptions, organisational
memberships etc), {Z₁,...,ZK}
Z~g(Z|F,k)
 e.g. with binary questions, link probability
of positive response pk=Pr(Zk=1) to latent
construct via
logit(pk)=dk+kkF
Indicators of Social Capital
 Social
Support Score (Z1)
 5 binary items (Z2-Z6) relate to
neighbourhood perceptions (e.g. can
people be trusted?; do people try to be
helpful?; this area is a place I enjoy living
in; etc)
 Final item (Z7) relates to membership of
organisations or groups.
Multiple Causes of Social Capital
 Social
capital varies by demographic
groups and geographic context (urban
status, region, small area deprivation
category, etc).
 So have multiple causes of F as well as
multiple indicators
F ~ h(F|X*,G*, φ)
 X* and G* are individual and contextual
variables relevant to “causing” social
capital variations
Multiple Cause Sub-Model
 Fi~N(μi,1)
μi=φ1,gend[i]+φ2,eth[i]+φ3,noqual[i]+φ4,urb[i]+φ5,reg[i]
+φ6,dep[i].
φ: fixed effects parameters with reference
category (zero coeff) for identification
 Only small number of regions in HSE
 If had finer spatial detail could take area φ
effects spatially random (but weak
identification…?)
Effect of F on Y

Social capital has significant effect in reducing
the chances of psychiatric caseness.
 The effect of social capital apparent in relative
risk 0.35 of psychiatric morbidity for high capital
individuals (with score F=+1) as compared to
low capital individuals (with F=-1).
 Obtained as exp(-0.525)/exp(0.525)
 l= -0.525 is coefficient for social capital effect.
Geographic Context: Micro-area Deprivation Gradient
from Multiple Cause Model
Case Study 2: Diabetes & Obesity
in US
 Data
from 2007 Behavioral Risk Factor
Surveillance System (BRFSS)
 Multinomial outcome (J=6 categories)
defined by diabetic status and weight
category (obese, overweight, normal).
Multinomial Categories

Reference category are subjects with
neither condition. All other categories are
“ill” relative to reference category
Multilevel multicategory regression
 Regression
o
o
o
includes:
subject level risk factors (age, ethnicity,
gender, education),
known geographic effects (e.g. county
poverty),
county and state random effects to model
unknown geographic influences (e.g.
unknown environmental exposures).
Regression & Likelihood
Model Form
 Model
includes known subject risk factors
and contextual variables (e.g. county
poverty)
 Unknown contextual risks: assume county
and state latent effects, shared over
categories j=1,..,J-1.
 Illustrates nested latent spatial effects
County & State Effects
 Take
county effects vc (c=1,..,3142) to be
spatially correlated CAR
 But us (state effects, s=1,..,51) taken to be
unstructured.
 Avoids confounding of two spatially
structured effects
Regression Terms for j=1,..J-1
Case Study 3, Suicide & Self Harm:
Eastern Region Wards in England
 Two
classes of manifest variables
 Y1-Y4: suicide totals in small areas
 Z1-Z14: Fourteen small area social
indicators
 Q=3 latent constructs (F1 fragmentation,
F2 deprivation, F3 urbanicity). Converse of
F3 is “rurality”. Common spatial factors.
Local Authority Map: Eastern England
Geographic Framework
 N=1118
small areas (called wards,
subdivisions of local authorities).
 Small area focus beneficial: people with
similar socio-demographic characteristics
tend to cluster in relatively small areas, so
greater homogeneity in risk factors related
to social status
 On other hand, health events may be
rare…
Confirmatory Sub-Model
 Confirmatory
Z-on-F model
 Each indicator Zk loads only on one
construct Fq.
 Most indicators binomial. A few taken as
normal after transformation. Mostly 2001
Census, a few non-census (service access
score, proportion greenspace)
Exponential Family Model for
modelling Z-on-F effects
indicator k1,..,14, Gk  1,2,3 denotes
which construct it loads on.
 Regression with link g allows for
overdispersion via “unique” w effects
g(mik)= dk+k[k,Gk]F[Gk,i]+wik
 For
Expected Direction of
Confirmatory Model Loadings
Health Outcome Sub-Model (Y-onF effects)
 Model
for Y-on-F effects
Yij ~ Po(Eijrij) j=1,..,4
log(rij)=aj+j1F1i+j2F2i+j3F3i+uij
 Coefficient selection on jq using relatively
informative priors under “retain” option
when Jjq=1. Using diffuse priors means null
model tends to be selected
Redundant Coefficients
 Some
coefficients (e.g. urbanicity on male
and female suicide, deprivation on female
suicide) not retained under model
selection
 Four coefficients in the Y-on-F model were
set to zero in at least some MCMC
iterations → averaging over 24 Y-on-F
models
Future Directions in Spatial Factor
Modelling

Extend model selection to interactions between
factors, nonlinear effects etc
 In England, model area socioeconomic structure
(and maybe some health outcomes) at
“neighbourhood” level (32000 “Super Output
Areas” with mean population 1500).
 In US, similar scope for modelling SES structure
in relation to health events for Zip Code
Tabulation Areas or ZCTAs (around 31K across
US, on average about 10K population)
More generally
 Bayesian
software options for latent
variable and SEM applications more
widely available
 Potentialities of WINBUGS in this context
not always appreciated
 Scope for dedicated Bayesian factor
analysis package