www.publichealth.ie

Download Report

Transcript www.publichealth.ie

Synthetic estimators in Ireland
Anthony Staines
DCU
What are synthetic estimators?



Estimates of something you haven't got
Typically estimates for a small area of
something
Making maximum use of what you have
Example

Lung cancer risk

Smoking is a key explanation

Suppose you want to study the geography of
lung cancer
What you have


Smoking data from a national survey by age
and sex
Small area level data on population and cancer
incidence by age and sex
What you can do at once


Estimate prevalence for small areas included
in the study
Using the sample in the study
What's wrong with this?

The areas you need may not be included

The estimates will be very imprecise
You can do better

In some obvious ways

And some not so obvious
What you assume

National age and sex specific rates apply in
each small area
And so

From these you calculate small area specific
prevalence estimates

This is indirect standardisation

Can be done smarter

requiring aggregation properties to hold

Adding in area level covariates (urban/rural etc.)
Can you do better?

Yes
How?
Model based estimators

These have a long history

Many diverse applications


Combine survey data and some kind of 'census
data'
'Census data' is that available for every area of
interest
Roughly

Use the survey data to estimate relationships

at the relevant level

between survey covariates

and the census data
Then

Assume the same relationship applies in the
other areas
Issues

Modelling can be hard


Remember these are predictive models, not
explanatory models
Data not easy to get at the right small area
level
Models

models using individual level covariates only

models using area level covariates only

models combining individual and area-level
covariates
Limits

Available data

Confidentiality


Complexity of methods, esp. multi-level
methods
Validation
Spatial data limits



Have to be able to link survey and census to
the same set of small areas
Given the primitive systems in the UK and the
nearly non-existent systems in the Republic
this is a lot of work
Errors here will lead to biassed estimates
Confidentiality



Need to respect confidentiality of survey
respondents
May limit the data available for these purposes
May need to design survey and survey consent
process carefully to get good estimates
Modelling

Can become very complex

Clustered survey designs

Survey weights

Variable selection

Model diagnostics
What and where to model




Data may exist at many different geographies
Multi-level models with individual, household,
local and regional effects can be considered
GIS might be very useful here for data
handling
Not advisable to aggregate covariates at
different spatial levels

This is just making a bad embedded synthetic
estimator
Validation

Not easy to do, but essential

How do you validate your synthetic estimates?

Cross-validation?

Another survey?

?
Options

How about

Health Atlas Ireland?


This is a system built for HSE, (led by Howard
Johnson) to plan health services
It already has

Maps

Census

HIPE

Mortality data
Census output options

Recently they have developed a very flexible
census output system

Uses census data at ED level

Locations of houses

Assumes that all the houses in a DED are
exchangeable
Census output options


Allocates census data to any given area
Directly weighted by using the number of
households and the ED composition of the
desired area
Futures?



Modern design of surveys
Could readily be extended to do SA from
almost any survey data where the necessary
geographical data have bene collected
Greatly improves value for money of large
scale surveys