GIS in Spatial Epidemiology: small area studies of

Download Report

Transcript GIS in Spatial Epidemiology: small area studies of

GIS in Spatial Epidemiology:
small area studies of exposureoutcome relationships
Robert Haining
Department of Geography
University of Cambridge
1. Spatial epidemiology:
•
•
•
•
•
•
•
Some definitions
Geographical correlation studies
Framework for analysis
Problems with small area analysis
Reasons for conducting small area analysis
Good practice
Regression models
2. Reference to a case study:
•
•
Data issues
Statistical modelling
• Spatial epidemiology is concerned with
describing and understanding spatial
variation in disease risk.
• Individual level data;
• Counts for small areas.
• Recent developments owe much to:
•
•
•
•
Geo-referenced health and population data;
Computing advances;
Development of GIS;
Statistical methodology.
Geographical correlation studies
These studies typically involve examining
geographical variations in exposure to
environmental variables (air; water; soil etc)
and their association with health outcomes
whilst controlling for other relevant factors
using regression.
Framework for analysis:
• Population is unevenly distributed
geographically;
• People move around (day-to-day
movements; longer term movements
including migration);
• People possess relevant individual
characteristics (age, sex, genetic make-up,
lifestyle, etc)
• Live in communities
Problems with small area analyses
• Frequency and quality of population data (e.g.
Census every 10 years);
• Spatial compatibility of different data sets;
• Availability of data on population movements;
• Measuring population exposure to the
environmental variable;
• Environmental impacts are often likely to be quite
small (relative to, for example, lifestyle effects)
and there may be serious confounding effects;
• Cannot estimate strength of an association;
• Ecological (or aggregation) bias.
Reasons for conducting small
area analysis
• Provides a qualitative answer about the
existence of an association (e.g. between
environmental variable and health
outcome);
• May provide evidence that can be followed
up in other ways.
Good practice (Richardson 1992)
• Allow for heterogeneity of exposure;
• Use “well defined” population groups;
• Use survey data to help obtain good
exposure data;
• Allow for latency times;
• Allow for population movement effects
Regression model specification:
Oi denotes the number of cases for area i. i = 1,…,n.
If the outcome is rare, typically, it is assumed that Oi is Poisson
distributed with parameter i.
The expected value of Oi is written:
E[Oi] = i = Eiri
i = 1, …, n,
where ri is the unknown area-specific relative risk in area i, and Ei
defines the expected number of cases for i given the size of the
population and its age and sex composition.
ln[i] = ln[Ei] + ln[ri] .
ln[i] = ln[Ei] +  + 1X1,i +2X2,i + ..... + kXk,i + Zi
• This defines a Poisson regression model where  is the intercept
parameter, and 1, 2,…, k and  are regression parameters. ln[Ei] is an
offset.
• The area-specific relative risk at i is associated with attributes of the
population X1,…,Xk and the environmental exposure Z at i.
• Adjustment for overdispersion is necessary because of population
heterogeneity at the scale of the individual small areas (see, for example,
Manton and Stallard 1981).
•
Allowance for data uncertainty arising from the use of sample data
A short case study: I Data Issues and GIS
• Demographic and social and economic data:
• Pre-2001 Census:
» Enumeration Districts (EDs);
» Wards.
• 2001 Census:
» Output Areas (OAs)
» Super Output Areas (SOAs)
• Health data (Heart disease & stroke: mortality & admissions):
• Individual records geo-referenced to ED
• Postcoded counts
• Environmental data (NOx; PM10; CO) :
• Grided
• Problem: obtain a measure of air pollution
exposure at the ED level.
Step 1: Measuring NOx exposure. The
Indic-Airviro model:
Average annual mean pollution levels 1994-9
(exc 1998): a) NOx (ug/m3) ; b) PM10 (ug/m3)
Comparing modelled and monitored values
for NOx.
350.00



300.00
Modelled



Modelled = -80.43 + 3.66 * monitor
R-Square = 0.73
250.00

200.00

150.00
100.00
50.00


50.00
100.00
150.00
200.00
250.00
M onitored
300.00
350.00
Step 2: Transferring the gridded data to the
ED framework. Areal Interpolation: i Area
weighting
Areal Interpolation (from grid to
EDs): ii point in polygon – ED
centroid
Areal Interpolation (from grid to
EDs): iii point in polygon –
weighted PostPoint
ID
1
2
3
4
5
6
7
8
9
Sum
Average
Domestic Pollution
properties for grid
13
16.95
3
18.29
33
16.72
31
16.97
19
16.97
3
17.40
33
17.02
20
17.72
7
18.72
162
Dom*Poll
220.38
54.88
551.63
526.19
322.51
52.20
561.80
354.44
131.04
2775.06
17.42
Weighted average = 17.13
Weighted PostPoint and ED centroid exposure
measures are very similar; areal weighting
different
Weighted PostPoint differs from both ED
centroid and areal weighting
Where all three methods will give the same or
similar results
Step 3: Making allowance for population
movements
1. Long term population movement:
•
Sheffield Health and Illness Prevalence study:
– 12,239 representative individuals 18-94 tracked from
1994-2002;
– 1491 died; 1572 left Sheffield.
– Of the 9176 remaining:
» 70% did not move;
» 23% made 1 move
» 5% made 2 moves;
» Just over 1% made 3 moves
» Under 1% made 4 or more moves.
•
=> significant risk of misclassification of
exposure level.
2. Short term population movement
Spatially smoothed CO average of the annual mean pollution
levels (1994-1999, excluding 1998) for Sheffield enumeration
districts (ug/m3)
(i) 1km; (ii) 2km; (iii) 4km
Comparing indoor and outdoor air pollution
exposure
• People spend between 75% and 90% of
their time indoors.
• Indoor pollution levels depend not only on
outdoor emissions but on housing
conditions (cooking, heating, ventilation
etc).
• Evidence on relationship between indoor
and outdoor pollution levels
Pollutant
Source
Indoor
Outdoor
Indoor/outdoor
ratio
mean (g/m3)
mean (g/m3)
CO
Valerio et al (1997)
7.8
9.55
0.82
NO
Drakou et al (1998)
56.53
70.66
0.80
NO2
Drakou et al (1998)
67.14
88.04
0.76
NOx
Drakou et al (1998)
126.98
187.74
0.68
O3
Drakou et al (1998)
17.54
37.14
0.47
PM2.5
Lee et al (1997)
25.3
26.3
0.96
PM2.5
Fischer et al (2000)
19.5
23
0.85
PM10
Lee and Chang (2000)
120
134
0.90
PM10
Fischer et al (2000)
29.5
39.5
0.75
SO2
Lee et al (1997)
6.18
11.1
0.56
Statistical modelling issues
ln[i] = ln[Ei] +  + 1X1,i +2X2,i +... + kXk,i + Zi
1.Overdispersion linked to spatially correlated missing
covariates.
2.Sampling errors where data are based on surveys (e.g
lifestyle data).
•
•
•
Fitted spatially structured random effects models in
WinBUGS (MCMC estimation) to handle
overdispersion;
Used posterior densities for some of the lifestyle
covariates (e.g. smoking prevalence);
WinBUGS output sent to GIS to map model output (e.g.
area specific risks).
Map of excess relative risks of coronary heart disease. An area (i) is
considered to have excess relative risk when 97.5% of the simulated
values of relative risk of area i (ri) are greater than 1.
References:
• P.Brindley, R.Maheswaran, T.Pearson, S.Wise and
R.Haining (2004) “Using modelled outdoor air
pollution data for health surveillance.” In
R.Maheswaran and M.Craglia (eds) GIS in Public
Health Practice. Taylor and Francis, London, p.125149.
• P.Brindley, S.Wise, R.Maheswaran, and R.Haining.
(2005) “The effect of alternative representations of
population location on the areal interpolation of air
pollution exposure.” Computers, Environment and
Urban Systems, Vol 29, 455-469.
• R.Maheswaran, R.Haining, P.Brindley, J.Law,
T.Pearson, N.Best (2006) “Outdoor NOx and stroke
mortality – adjusting for small area level smoking
prevalence using a Bayesian approach.” Statistical
Methods in Medical Research, 2006, 15, 499-516.