Combining prevalence estimates from multiple sources

Transcript Combining prevalence estimates from multiple sources

Combining prevalence estimates
from multiple sources
Julian Flowers
The problem (1)...
• No systematic way of monitoring health
behaviours at small area level in England
• => Have smoking targets but don’t know
smoking prevalence for PCTs/ districts
• But multiple potential sources of data
–
–
–
–
Surveys
Commercial datasets
GP data
Synthetic estimates
The problem (2)...
• Tend to use “favourite” data sources
• Different datasets give different
answers
• But all may have useful information
about smoking
• Question...what is the best estimate of
smoking prevalence given the data we
have...?
7 Datasets about districts...
• Synthetic estimates (from DH) for
districts based on Health Survey for
England 2003-5
• Estimates based on commercial data
abut tobacco expenditure by households
at small area (actually a synthetic
estimate)
• 3 years of commercial data based on
responses to market research data
• Separate analysis of HSE by ASH
7 datasets...
• All biased in someway – some
estimates looked to low; some not well
correlated – which one(s) to believe
• ? Could/ should they be combined –if
so how (heptangulation...)
HP02
Acx03
X axis
Acx04
Acx05
CACI05
ASH02
Acx03
0.74
Acx04
0.63
0.54
0.83
0.78
Y axis
Acx05
0.58
CACI
05
0.71
0.67
0.89
0.80
0.89
0.79
0.49
0.57
ASH0
2
0.62
0.75
0.89
0.77
0.72
HP05
0.61
0.89
Motivation for combining estimates
The situation in the East of England: different
estimates from different sources
Basildon: pooled smoking prevalence estimates
Proportion meta-analysis plot [random effects]
CACI 2005
0.24 (0.24, 0.24)
Axciom
0.17 (0.17, 0.18)
HDA 2003
0.33 (0.32, 0.33)
Synthetic estimates
0.27 (0.27, 0.27)
ASH estimates
0.30 (0.30, 0.31)
combined
0.26 (0.21, 0.32)
0.17
0.22
0.27
0.32
0.37
proportion (95% confidence interval)
9
08/01/2008
Bayesian modelling
• Work with MRC Biostatistics Unit
• Based on work looking at bias
adjusted meta-analysis
• Idea is that in meta analysis should
include all relevant studies which
contain relevant information but weight
them according to bias
Model
The basic model
08/01/2008

Bayesian hierarchical model structure.

Developed in WinBUGS.

Allows for additive bias (Turner et al. 2007, Spiegelhalter and Best 2003).

The model assumes the biases affecting the SP estimates to vary between data
sources.

Let yij be the SP estimate obtained from data source j (j=1,…,7) for LA i (i=1,…,48 for
the East of England), ij2 be the corresponding sampling variance (obtained from the
95% confidence limits and assumed known) and ij the corresponding biases
assumed exchangeable within data sources. Then the SP estimates2 are believed to be
generated by a normal distribution with meani   ij and variance ij , where  i is the
true SP estimate for the i-th LA.

A constraint is needed: our choice is an overall 23% smoking prevalence for the East of
England.

Several variants of this model (included a multivariate model aiming to detect
correlation among data sources) have been performed with no significant differences.
11
Statistical literature
Synthetic + classical + recent approaches
12

Multilevel synthetic estimation (Twigg et al. 2000): using a multilevel modelling approach and
nesting individuals within postcode sectors within health authorities, multilevel-derived synthetic
estimates are obtained by means of ecological and individual variables associated with the
phenomenon of interest. Prevalence estimates can be combined directly from surveys.

Multiple-frame estimation (Lohr and Rao, 2000; 2006): different sampling frames (not necessarily
non-overlapping) whose union covers the whole population are considered and probability samples
are drawn independently from each frame. Samples are then properly combined to obtain optimal
linear estimators of population quantities. The survey database is needed.

Statistical matching (Rodgers, 1984; Moriarity and Scheuren, 2001) considers records of subjects
having “similar profiles” from different data sources, and puts together different information from
them. The survey database is needed.

Scoring method (Elliot and Davis, 2005): this method is based on adjusting the survey weights
such that the complementary strengths of each survey in terms of sample size or unbiasedness are
exchanged. The surveys are therefore scored consequently. The survey database is needed.

Bayesian hierarchical methods: a recent work by Raghunathan et al., 2007 addresses the
problem of combining prevalence rates from two surveys by means of a hierarchical Bayesian
approach. One of the two surveys is believed less biased in terms of coverage and contains
information about the presence of a telephone line at home. Survey respondents are then divided in
two groups, depending on whether or not they have a telephone at home. The other survey is based
on telephone interviews only and for this reason is believed more biased, but its size is bigger. The
hierarchical Bayesian model maps the bigger survey with the information on telephone provided by
the less biased survey. Prevalence estimates can be combined directly from surveys.
08/01/2008
Modelled estimates with CIs
Comparison with 2008 survey
Conclusions
• Bayesian hierarchical models can be used to pool prevalence
estimates from different sources adjusting for measured bias
in each source. This is a type of formal triangulation of data.
• This method can be used to when direct estimates are not
available.
• It could be applied to any life-style or prevalence data where
multiple sources are available
• Further work is need to compare modelled estimates with
direct estimates and other for other life-style behaviours
• Further work is needed to implement the modelling in
conventional statistical packages
• Local surveys can help to recalibrate the models on a regular
basis
Modelling bias in combining small-area prevalence estimates
from multiple surveys
Giancarlo Manzi1,, David J Spiegelhalter1, Rebecca M Turner1,
Julian Flowers2, Simon G Thompson1
1MRC
Biostatistics Unit, Institute of Public Health, Cambridge, UK
2Eastern
Region Public Health Observatory, Institute of Public Health,
Cambridge, UK
Current address: Department of Economics, Business and Statistics, University of Milan, Italy.

Combining prevalence estimates from multiple sources

Transcript Combining prevalence estimates from multiple sources

Directory