www.publichealth.ie
Download
Report
Transcript www.publichealth.ie
Synthetic estimators in Ireland
Anthony Staines
DCU
What are synthetic estimators?
Estimates of something you haven't got
Typically estimates for a small area of
something
Making maximum use of what you have
Example
Lung cancer risk
Smoking is a key explanation
Suppose you want to study the geography of
lung cancer
What you have
Smoking data from a national survey by age
and sex
Small area level data on population and cancer
incidence by age and sex
What you can do at once
Estimate prevalence for small areas included
in the study
Using the sample in the study
What's wrong with this?
The areas you need may not be included
The estimates will be very imprecise
You can do better
In some obvious ways
And some not so obvious
What you assume
National age and sex specific rates apply in
each small area
And so
From these you calculate small area specific
prevalence estimates
This is indirect standardisation
Can be done smarter
requiring aggregation properties to hold
Adding in area level covariates (urban/rural etc.)
Can you do better?
Yes
How?
Model based estimators
These have a long history
Many diverse applications
Combine survey data and some kind of 'census
data'
'Census data' is that available for every area of
interest
Roughly
Use the survey data to estimate relationships
at the relevant level
between survey covariates
and the census data
Then
Assume the same relationship applies in the
other areas
Issues
Modelling can be hard
Remember these are predictive models, not
explanatory models
Data not easy to get at the right small area
level
Models
models using individual level covariates only
models using area level covariates only
models combining individual and area-level
covariates
Limits
Available data
Confidentiality
Complexity of methods, esp. multi-level
methods
Validation
Spatial data limits
Have to be able to link survey and census to
the same set of small areas
Given the primitive systems in the UK and the
nearly non-existent systems in the Republic
this is a lot of work
Errors here will lead to biassed estimates
Confidentiality
Need to respect confidentiality of survey
respondents
May limit the data available for these purposes
May need to design survey and survey consent
process carefully to get good estimates
Modelling
Can become very complex
Clustered survey designs
Survey weights
Variable selection
Model diagnostics
What and where to model
Data may exist at many different geographies
Multi-level models with individual, household,
local and regional effects can be considered
GIS might be very useful here for data
handling
Not advisable to aggregate covariates at
different spatial levels
This is just making a bad embedded synthetic
estimator
Validation
Not easy to do, but essential
How do you validate your synthetic estimates?
Cross-validation?
Another survey?
?
Options
How about
Health Atlas Ireland?
This is a system built for HSE, (led by Howard
Johnson) to plan health services
It already has
Maps
Census
HIPE
Mortality data
Census output options
Recently they have developed a very flexible
census output system
Uses census data at ED level
Locations of houses
Assumes that all the houses in a DED are
exchangeable
Census output options
Allocates census data to any given area
Directly weighted by using the number of
households and the ED composition of the
desired area
Futures?
Modern design of surveys
Could readily be extended to do SA from
almost any survey data where the necessary
geographical data have bene collected
Greatly improves value for money of large
scale surveys