A model for spatially varying crime rates in English

Download Report

Transcript A model for spatially varying crime rates in English

Measuring spatial clustering in
disease patterns.
Peter Congdon, Queen Mary University of London
[email protected]
http://www.geog.qmul.ac.uk/staff/congdonp.html
http://webspace.qmul.ac.uk/pcongdon/
1
2
Background: spatial correlation






Tobler’s First Law of Geography: “All places are related
but nearby places are more related than distant places”
Spatial correlation: similar values in nearer spatial units
than more distant units
Common feature of geographically configured datasets
(spatial econometrics, area health, political science,
etc).
Can have positive or negative correlation, but positive
correlation most common
Spatial correlation indices measure correlation but also
account for distance between (or contiguity of )spatial
units
Reference (null) pattern: spatial randomness. Values
observed at one location do not depend on values
observed at neighboring locations
3
Background: spatial heterogeneity



Michael Goodchild in “Challenges in geographical
information science”, Proc RSA 2011”
Mentions a second principle of spatial data: spatial
heterogeneity.
In fact, an example of such heterogeneity is local
variation in the degree of spatial dependence,
leading to local indices of spatial association (LISA
measures)
4
Background: observation types
 My
focus is on spatial lattice data: N areal
subdivisions (e.g. administrative areas) which
taken together constitute the entire study region.
 Unlike point data (e.g. mineral readings in
geostatistics), where major focus is on
interpolating a response between observed
locations.
5
Global Indices of Spatial Association
 Moran
Index (for N areas, and continuous centred
data Zi)
6
Spatial Weights
Possible
options for spatial weights
W=[wij]
 Adjacency/contiguity: if area j is
adjacent to area i, then wij=1;
otherwise wij=0.
wij a distance-based weight such as
the inverse distance between
locations i and j:
wij=1/dij
7
Global Indices of Spatial Association: Binary data
8
Global Indices of Spatial Association: Binary data
9
Background: Area health data and spatial correlation
 Health
data with full population coverage (as
opposed to survey data) often only available for
geographic aggregates.
 These may be small neighbourhoods, such as English
lower super output areas (LSOAs). Average 1500-2000
population.
 Small area units (with relatively homogenous social
structure, physical environment and other exposures)
preferable for epidemiological inferences in terms
reducing ecologic bias
10
Background: Area health data and spatial
correlation
 Examples
of area health data (e.g. for electoral
wards, LSOAs): mortality data by cause, cancer
incidence data, health prevalence data
 Spatial correlation in area health outcomes reflects
clustering in risk factors (observed and unobserved),
such as deprivation/affluence, health behaviours,
environmental factors, neighbourhood social capital,
etc
11
Bayesian Relative Risk Models for Area Spatial
Data
 Bayesian
models for area disease risks now widely
applied (to detect smooth underlying risk surface over
region, etc).
 Assume observed disease counts yi Poisson
distributed,
yi ~Po(eiri), (ei = expected counts)
 Relative risks ri have average 1 when
sum(expected)=sum(observed). Expected counts
(demographic sense) based on applying region-wide
disease rates to each small area population
12
Bayesian Relative Risk Models for Area Spatial
Data
 One
option for modelling area relative risks,
convolution scheme (Besag et al, 1991)



log(ri)=+si+ui,
Spatial error: si~Conditional Autogressive (CAR) prior
Heterogeneity/overdispersion error: ui ~ Unstructured
normal White Noise
13
Neighbourhood Clustering in Elevated Risk
Consider binary risk measures:
bi=1 if relative risk ri>1, bi=0 otherwise.
 These binary indicators are latent (unknown) as ri are
latent.
 Can use other thresholds (e.g. ri>1.5)
 Interest often in posterior exceedance probabilities
of elevated disease risk
Ei=Pr(ri>1|y)=Prob(bi=1|y)
in each area separately.
 Possible rules: area i a hotspot if Ei > 0.9 or if Ei>0.8.
Suitable threshold that Ei must exceed may depend
on data frequency (higher thresholds can be set for
more frequent data)

14
15
Neighbourhood Clustering in Elevated Risk
 “Hotspot”
detection does not measure broader
local clustering in relative risks. Can have high
risk and low risk clusters.
 Can define high risk cluster centre: area i
embedded in high risk cluster (i.e. high risk
cluster centre) if both area i and all surrounding
areas j have elevated risk, (Ei and Ej both high).
 By contrast, high risk outlier: high risk area i (Ei
high), but all adjacent areas j are low risk (Ej
low)
 Also, cluster edge area: high risk area i (Ei
high), but adjacent areas j are mix of high and
low risk
16
Neighbourhood Clustering in Elevated Risk
 Low
risk cluster centre:
area i embedded in low risk cluster:
both area i and surrounding areas j
have low risk (Ei and Ej both low)
 By
contrast:
low risk outlier, low risk area i (Ei low)
but all adjacent areas are high risk
17
Spatial Scan Clusters
 Most
well known approach to spatial clustering
of lattice data based on spatial scan method:
produces lists of areas in a cluster at given
significance, e.g. under Poisson model for data
 Spatial scan: circle (or ellipse) of varying size
systematically scans the study region (moving
window).
 Each geographic unit (e.g. census tract, LSOA) is
a potential cluster centre.
 Clusters are reported for those circles (or other
area shapes) where total observed values within
circle are greater than expected values.
18
Stochastic Approach to Measuring Clustering
in Elevated Risk
 Method
to be described provides
measure of cluster status for each area in
situation where relative health risks ri (and
binary health status bi) are unknowns
 Can be considered a method of cluster
detection, included in MCMC updating
 Includes high risk and low risk clustering in
single perspective, and also
encompasses outliers (isolated high or low
risk hotspot)
19
Synthetic Data
 Known
adjacency structure: 113 middle level
super output areas (MSOAs) in Outer NE London
 15 out of 113 areas have high RR (ri circa 1.75).
Remainder have below average RR (ri circa 0.9).
 High risk areas are located in three high risk
clusters
 Known yi and ei, and hence known crude
relative risks (yi/ei), but whether (latent) RRs
significantly elevated or not depends on amount
of information in data (data frequency)
20
Synthetic Data
 Assess
Ei and bi (using Besag et al convolution
model) according to different expected cases:
ei=20.39, or ei=58.77.
 For ei=20.39, yi are either 18 or 36 (to ensure sum
of observed and expected are the same)
 For ei=58.77, yi are either 52 or 103
21
Synthetic Data. Average e=20.39, Known
RRs
22
Local Join-Counts to Detect Clustering in
Relative Disease Risk
 As
mentioned above, global join counts
(BB-WW-BW) measure global spatial
clustering in binary risk indicators bi
(note BW statistic combines two types of
discrepancy)
 To detect local clustering in risk (or
outlier status), use local versions of
global BB statistics.
23
Local Join-Counts to describe local clustering
 Local
version of BB statistic: summation only
over neighbours of area i (not double
summation)
J11i=bi∑jwijbj

wij either distance based or contiguity
based (wij=1 if areas i and j adjacent,
wij=0 otherwise)
 J11i measures
high risk “cluster
embeddedness” or high risk cluster centre
status. J11i will be high for areas surrounded
by other high risk areas
24
Local Join-Counts to describe local clustering
 Local
version of BW statistic :
J10i=bi∑jwij(1-bj)
 Measures high risk outlier status: when
area i has elevated risk, but all
neighbours have low risk
 Also tends to increase for high risk
cluster edges: area i has elevated risk,
but many neighbours have low risk

25
Local Join-Counts for low risk clustering
 Local
version of WW statistic :
J00i=(1-bi)∑jwij(1-bj)
high when area i and its neighbours
both have low risk
 Finally, local WB statistic. Measures
situation of low risk area but discrepant
from neighbours
J01i=(1-bi)∑jwijbj
26
Local Join-Counts under Binary Spatial Weights
 Consider
binary weights wij
 Denote areas adjacent to area i as its neighbourhood”
Ni
 Li =number areas adjacent to area i (number of areas in
neighbourhood Ni)
 Common high risk joins formula (local BB count) is now
J11i=bi∑jNibj
 Local BW count: J10i=bi∑jNi (1-bj)
 Also: J01i=(1-bi)∑jNibj

J00i=(1-bi) ∑jNi(1-bj)
27
Local Join-Counts under Binary Spatial Weights
 Simple
to show (and self-evident)
Li=J11i+J10i+J01i+J00i
 Multinomial sampling: Denominators Li
known, but {J11i,J10i,J01i,J00i} are unknowns in
modelling situation with relative disease
risks ri and risk indicators bi as unknowns.
28
Probabilities of Local Clustering
 Proportion
π11i of joins representing joint high
risk, defined by
E(J11i)=Liπ11i
 Estimate during MCMC run (J11i and bi
varying by iterations) as
π11i=J11i/Li=bi∑jNibj/Li
 π11i estimates probability that area i is
member of high risk cluster.
As 11i  Ei, area i likely to be cluster centre
 Term ∑jNibj/Li 1 when all adjacent areas

have definitive high risk

29
Probabilities of Local Clustering
Proportion of local joins that are (1,0)
pairs, defined by
E(J10i)=Liπ10i
 Estimates probability that area i is
high risk local outlier
 Estimate during MCMC run:
π10i=J10i/Li=bi∑jNi (1-bj)/Li,


30
Decomposition of Exceedance Probability
 Can
show that Ei=Pr(ri>1|y)=π11i+π10i
 Have
J11i+J10i=bi∑jNibj+bi∑jNi(1-bj)=biLi
so that
E(J11i)+E(J10i)=E(bi)Li=EiLi
 Also by definition
E(J11i)+E(J10i)=Liπ11i+Liπ10i
31
Synthetic Data. Average e=20.39, Known
RRs
32
Synthetic Data Example: Cluster Focus



Area 25, cluster centre. So
also is area 23 in terms of
having just high risk
neighbours
Areas 27 and 28, cluster
edges (have as many
background risk neighbours
as high risk neighbours)
Areas 22,23,25,27,28 all
have true RR of 1.77,
surrounding areas have RR
of 0.88.
33
Cluster Focus (simulation with average ei=20.39,
and bi=1 if ri>1)
Area ID
Cluster
Background Risk
ri (posterPr(bi=1)=Ei
ior mean)
11i
10i
22
1.45
1.00
0.89
0.11
23
1.55
1.00
0.99
0.00
25
1.48
1.00
1.00
0.00
27
1.39
1.00
0.74
0.26
28
1.35
0.99
0.67
0.33
24
1.04
0.58
0.37
0.21
29
0.98
0.40
0.23
0.17
31
0.99
0.46
0.26
0.21
33
1.00
0.47
0.26
0.21
39
0.97
0.39
0.22
0.17
40
0.97
0.37
0.18
0.19
34
Cluster Focus (simulation with average ei=58.77,
and bi=1 if ri>1)
Area ID
Cluster
Background Risk
ri (posterPr(bi=1)=Ei
ior mean)
11i
10i
22
1.64
1.00
0.85
0.15
23
1.70
1.00
1.00
0.00
25
1.67
1.00
1.00
0.00
27
1.58
1.00
0.65
0.35
28
1.56
1.00
0.58
0.42
24
0.98
0.39
0.21
0.18
29
0.93
0.24
0.10
0.14
31
0.94
0.30
0.13
0.17
33
0.95
0.29
0.13
0.16
39
0.93
0.23
0.11
0.13
40
0.93
0.22
0.08
0.15
35
Cluster Centres and Edges
Cluster
centre status verified: 11i
 Ei for areas 25 and 23.
Cluster edge status becomes
clearer with more frequent data
(for areas 27 and 28)
36
Cluster Focus (simulation with average ei=20.39)
Map of High Risk Cluster Probabilities 11i
37
Cluster Focus (simulation with average ei=58.77)
Map of High Risk Cluster Probabilities 11i
38
Another simulation where clustering pattern
known: cluster centre status under uneven risk
scenario
 Performance
of 11i for measuring cluster
centre status for contrasting situations
 (1) EVEN RISK. High risk characterises all
neighbours surrounding area i (so area i is
cluster centre), and risk evenly distributed
among neighbours
 (2) UNEVEN RISK. High risk is not common to all
neighbours, but unevenly concentrated
among a few neighbours, so area i is no longer
a cluster centre, and possibly a cluster edge.
39
Even risk vs uneven risk scenarios
40
41
Winbugs code
















model {for (i in 1:N) {y[i] ~ dpois(mu[i]); mu[i] <- e[i]*r[i]
log(r[i]) <- alph+s[i]+u[i]; u[i] ~ dnorm(0,tau.u);
b[i] <- step(r[i]-1);
# joins and join counts
for (j in C[i]+1:C[i+1]) {
j11[i,j] <- b[i]*b.map[j];
j10[i,j] <- b[i]*(1-b.map[j])
j01[i,j] <- (1-b[i])*b.map[j]; j00[i,j] <- (1-b[i])*(1-b.map[j])}
J11[i] <- sum(j11[i,C[i]+1 : C[i+1]]); J10[i] <- sum(j10[i,C[i]+1 : C[i+1]])
J01[i] <- sum(j01[i,C[i]+1 : C[i+1]]); J00[i] <- sum(j00[i,C[i]+1 : C[i+1]])
pi.L[1,i] <- J11[i]/L[i]; pi.L[2,i] <- J10[i]/L[i]; pi.L[3,i] <- J01[i]/L[i];
pi.L[4,i] <- J00[i]/L[i]}
# neighbourhood vector of risks and indicators
for (i in 1:NN) { wt[i] <- 1; r.map[i] <- r[map[i]]; b.map[i] <- b[map[i]]}
# priors
alph ~ dflat(); tau.s ~ dgamma(1,0.001); rho ~ dexp(1); tau.u <- rho*tau.s
s[1:N] ~ car.normal(map[], wt[], L[], tau.s)}
42
Real Example: Suicide in North West England
 Suicide
counts {yi,ei} for 922 small areas (middle
level super output areas, MSOAs) in NW England
over 5 years (2006-10).
 Model: yi ~Po(eiri), relative risks ri averaging 1
log(ri)=+si+ui,
si~CAR,
ui ~ WN
o Overdispersion: ui needed as well as spatial term
 Monitor exceedance and high risk clustering
with bi=1 if ri>1, bi=0 otherwise.
 Spatial interactions wij binary, based on
adjacency
43
Smoothed Suicide Risk
Note small expected
values ei, average 3.5:
impedes strong
inferences about
elevated risk, and so
also about clustering
44
Real Example: Suicide in North West England
 Flexscan
(developed by Toshiro Tango) detects
five significant clusters (p value under 0.05): most
likely cluster (albeit irregular shape) consists of 9
areas in Blackpool.
1.Census areas included .: 587, 588, 590, 591, 593, 594, 595, 597, 599
Maximum distance.......: 5823.08 (areas: 587 to 599)
Number of cases .......: 68
(Expected number of cases: 31.0964)
Overall relative risk .: 2.18675
Statistic value .......: 16.5159
Monte Carlo rank ......: 6/1000
P-value ...............: 0.006
45
High Suicide Risk
Cluster, Blackpool
and Surrounds
46
Real Example: Suicide in North West England,
Areas within the Flexscan cluster
ID_all_922
587
588
590
591
593
594
595
597
599
ARCMAP
ID
2
3
5
6
8
9
10
12
14
y_i
6
6
11
6
7
10
7
9
6
68
e_i
3.6
3.7
3.4
2.5
3.8
3.8
3.5
3.1
3.6
31.1
Exceedance
Prob
High risk
cluster prob
High risk
outlier prob
E_i
0.84
0.80
0.99
0.89
0.87
0.98
0.91
0.97
0.86
pi11_i
0.62
0.52
0.75
0.54
0.54
0.87
0.77
0.75
0.65
pi10_i
0.22
0.28
0.24
0.34
0.33
0.11
0.14
0.21
0.21
Possible Questions
What is most plausible cluster centre (if
any)?
Which areas are more likely to be cluster
edges?
Of two areas inside the doughnut, area 7
has higher exceedance prob (E7=0.72,
E4=0.48).
Area 9 has E9=0.98, and five of 6
neighbours have Ej>0.8. Other neighbour
has Ej=0.72. Area 9 has highest π11i
namely 0.87.
Area 6 has four neighbours, only two with
Ej>0.8, two with Ej below 0.5 (E4=0.48,
E41=0.26). Has π11i=0.54, π10i=0.34 
cluster edge
47
Exceedance Probs for Blackpool Suicide
Cluster (ARCMAP area IDs)
48
Local Join-Counts for Bivariate Clustering
Local
BB statistic for two outcomes A,
B with event counts yAi, yBi. Binary
indicators
bABi=1 if both rAi>1 and rBi>1
bABi=0 otherwise
Bivariate high risk clustering assessed
using local bivariate join count
J11ABi=bABi∑jwijbABj
49
Local Join-Counts for Bivariate Clustering
 J11ABi high
in bivariate high risk cluster –
when area i, and neighbours j of area i,
both have high risk on both outcomes.
 Bivariate high risk clustering probability
π11ABi, proportion of joins that are joint
high risk, defined by
E(J11ABi)=Liπ11ABi
 Estimate during MCMC run via
π11ABi=J11ABi/Li
50
Two outcomes: Likelihood and Prior
 NW
England, MSOAs, yA suicide deaths, yB
self-harm hospitalisations
 Self harm much more frequent than suicide,
average ei is 93.
 Likelihood yAi ~ Po(eAirAi), yBi ~ Po(eBirBi)
 Assume correlated spatial effects
log(rAi)=A+sAi+uAi; log(rBi)=B+sBi+uBi,
uAi ~ WN, uBi ~ WN
SA:B,i~BVCAR,
51
Example: suicide
mortality and selfharm hospitalisations
in North West England
Smoothed suicide risk
rAi, Wigan and
adjacent boroughs
52
Example: suicide
mortality and selfharm hospitalisations
in North West England
Smoothed self-harm
risk, rBi, Wigan and
adjacent boroughs
53
Bivariate clustering:
suicide and self-harm,
Wigan and surrounds
Probabilities π11ABi of
joint outcome high
risk cluster status
54
Another Bivariate Example: Pre-Primary Obesity (yA)
and End-Primary Child Obesity (yB) in NE London.
Map is of RRs in Pre-Primary Obesity
MSOAs
Relative Risks (Pre-primary obesity)
0.68 - 0.83
0.84 - 0.93
0.94 - 1.02
1.03 - 1.13
1.14 - 1.26
55
RRs for End-Primary Child Obesity (yB).
Relative risks in this outcome show negative skew
MSOAs
Relative Risk (end-primary obesity)
0.65 - 0.88
0.89 - 0.97
0.98 - 1.05
1.06 - 1.10
1.11 - 1.19
56
Probabilities of Joint High Risk Clustering
MSOAs
High Risk Clusters (pi11_AB)
0.00 - 0.25
0.25 - 0.75
Over 0.75
57
Probabilities of Joint Low Risk Clustering
MSOAs
Joint Low Risk Probs, pi00_ AB
Under 0.2
0.2 - 0.8
Over 0.8
58
Final Remarks
 Cluster
status approach can be embedded within
different models (including model averaging or
covariate impacts). Clustering (as well as
exceedance) inferences can be compared. So
provides “model based clustering”
 Provides alternative perspective to “list of areas”
approach, and additional insights with regard to



cluster centres vs edges,
low risk clustering as well as high risk clustering in an integrated perspective,
high/low risk outliers
 Can
also apply bivariate method when outcome
A is disease, outcome B is risk factor. Detects
varying strength of association between disease
and risk factor