Disclosure Control in the UK Census Keith Spicer 11 January 2005 Contents National Statistics Code of Practice Background 2001 Census Disclosure Control – tables 2001 Samples of.

Download Report

Transcript Disclosure Control in the UK Census Keith Spicer 11 January 2005 Contents National Statistics Code of Practice Background 2001 Census Disclosure Control – tables 2001 Samples of.

Disclosure Control in the
UK Census
Keith Spicer
11 January 2005
2
Contents
National Statistics Code of Practice
Background
2001 Census Disclosure Control – tables
2001 Samples of Anonymised Records
Summary and lessons learnt
3
“The information you provide is protected by law and
treated in strict confidence”
2001 Census form
“Precautions will be taken so that published
tabulations and abstracts of statistical data do not
reveal any information about identifiable individuals
or households”
2001 Census White Paper Cm4523, para 120
4
National Statistics Code of Practice
“The National Statistician will set standards for
protecting confidentiality, including a guarantee that
no statistics will be produced that are likely to
identify an individual unless specifically agreed with
them”
“It would take a disproportionate amount of time,
effort and expertise for an intruder to identify a
statistical unit to others, or to reveal information
about that unit not already in the public domain”
5
National Statistics Code of Practice
The purpose of disclosure control is to ensure that no
unauthorised individual, technically competent with
public data and private information could:
identify information on an individual that has been
supplied in confidence to ONS (such as in census or
survey returns) with a reasonable degree of confidence
6
National Statistics Code of Practice
Identity Disclosure – the association of a
respondent’s identity with a disseminated data
record
Attribute Disclosure – the association of a
respondent with an attribute value in the
disseminated data (or an estimated attribute
value based on the disseminated data)
7
Background
Disclosure Example 1
For widowed males aged 45-59, COB=not UK
Area A
LLTI
TOTAL
4
No
LLTI
16
Econ
Active
Not Econ
Active
TOTAL
12
1
13
16
17
20
33
The table is disclosive
because:
(1) The person who is Not
Econ Active and not LLTI can
be identified in the table,
both by themselves and
others who know all the
information (Identity
Disclosure)
(2) Any of these could then
deduce that any other
widowed male 45-59,
COB=not UK and not Econ
Active, has LLTI.
8
Background
Disclosure Example 2
Area B
Single
The table is disclosive
because:
2+
1
0
TOTAL
Cars Car Cars
4
19
8
31
Married
14
8
5
27
Sep/Div
/Wid
0
6
0
6
TOTAL
18
33
13
64
If you know someone who is
Separated, Widowed or
Divorced in Area B, you can
deduce they have 1 Car.
Information being disclosed
(Attribute Disclosure)
9
Background
Disclosure Example 3
Area C (contains two smaller areas D and E)
Area C
LLTI No LLTI
TOTAL
Qual
12
165
177
No Qual
TOTAL
14
26
108
273
122
299
Area D
LLTI No LLTI
TOTAL
Qual
11
105
116
No Qual
8
73
81
TOTAL
19
178
197
The tables are disclosive
because:
Though each table is not
disclosive by itself, they are
in combination – we can
ascertain a similar table for
Area E
The Area E table would have
a 1 for LLTI – Qual cell
Disclosure by Differencing.
10
Background
1991 Census
Barnardisation: Adjustment of cells in tables by -1, 0
or +1, so that observed 1s not true 1s for certain
However, still a good chance that an observed 1
was a ‘true’ 1
A degree of uncertainty about the accuracy of
information apparently disclosed about an individual
does not ensure that confidentiality has been
completely protected
11
Background
Since 1991:
Increased risk of disclosure in 2001:•2001 Census results more widely accessible,
allowing Census data to be downloaded more freely
•Electronic storage of other data sets now much
easier – increased risk of Census data being
matched with other sources
12
Background
• More detail in 2001 Census outputs as smaller
areas and more flexible boundaries desired by
users. Data provided were considerably lower in
geographic size than lowest level provided in 1991
• Changing attitudes to trust in which public agencies
are held
• 2001 Census data 100% coded, as opposed to
10% (for some) in 1991 – the 10% added level of
uncertainty to published results
13
2001 Census Disclosure Control
PRE-TABULATION
Changes made to data records prior to preparing tables. 2001 Census the
first to consider pre-tabulation methods as part of disclosure control.
Record swapping
•
Entire household record, except geographic variables, swapped
with another in neighbouring area (paired on number, sex and
grouped age of persons)
•
Within LA - does not affect stats at LA or above
•
No need for additional edit checks
•
Statistical differences less than volume of changes
•
Creates uncertainty about accuracy of identity
14
2001 Census Disclosure Control
POST-TABULATION
Changes made subsequent to preparing tables. Generally timeconsuming as each output has to be checked.
Small Cell Adjustment
• Only cells containing small counts are adjusted, so level of
adjustment considerably less than that imposed under rounding
• Adjustment usually has little impact on the conclusions that can
be validly drawn from the data
• Each table internally additive, though some totals from different
breakdowns may be different
15
2001 Census Disclosure Control
2001 Census disclosure control used:• Record swapping – to introduce a degree of uncertainty into
identity without affecting figures at LA and above
• Small cell adjustment – in addition, so that highly unusual
people and households significantly less visible in the
outputs
• Thresholds for Output Areas – minimum 40 households,
100 persons (recommended size 125 households);
Standard Tables minimum 400 households, 1000 persons
• Use of Output Areas as building blocks
16
2001 Census Disclosure Control
Effects:• Small cells in tables will not necessarily be ‘true’
figures
• Each table internally additive, but totals may
appear inconsistent between different tables
• Time consuming for ONS to check each set of
tables produced – particularly for Commissioned
Output, for small areas; possibility of disclosure by
differencing
17
2001 Census Disclosure Control
Advice for users
• Use highest level of geography with fewest
breakdowns and fewest number of cells summed
• Sources of error not only in disclosure control but
in coverage error, respondent error and other
processing error, e.g. One Number Census
adjustment, data capture and coding, edit and
imputation
18
Samples of Anonymised Records
Licensed Samples of Anonymised Records (SARs)
from 2001 Census
• 3% sample of individual records to Regional level
(Version 1 available October 04)
• 1% sample of household records to Country level
(due to be available Spring 05)
• Version 2 of individual SAR due to be available
February 05
19
Samples of Anonymised Records
• Licensed Individual SAR – available through CCSR
• All researchers must sign agreement not to attempt
to identify any individual from the SAR
• Disclosure may be inadvertent by differencing
between a number of tables
20
Samples of Anonymised Records
• Initial approach to restrict sample uniques by recoding
• Version 1 Individual SAR –
– grouped age individual years to 15, 16-18, 8 bands 18-74,
individual year 75+,
– grouped ethnic group variable to 5 categories,
– occupation group to 25 categories,
– country of birth E, W, S, NI, Rep Ire, EU, Other
• Post-Randomisation (PRAMming) – perturbation of some
variables, normally by one category, only on a percentage
of ‘risky’ records
21
Samples of Anonymised Records
• Any observed ‘1’ in a SAR table is unlikely to be a
real population ‘1’:
– The 1 is 1 from a 3% sample (members unknown)
– PRAMming will have the effect of ‘moving’ members into
/ out of cells
• Version 2 Individual SAR will have:– 81 occupational categories (25 in Version 1)
– the full 16 ethnic group categories (5)
– breakdown of country of birth to 16 categories (7)
Due February 05
22
Samples of Anonymised Records
• In-house Controlled Access SARS with full detail
on 3% individuals
• Labs in Titchfield and London
• Access through application, form available through
ONS – applications assessed by Census Research
Access Board (CRAB)
• All lab outputs assessed for disclosure (normally
within one week)
23
Summary and lessons learnt
• Tables protected by both pre-tabulation (record
swapping) and post-tabulation (small cell
adjustment)
• SARs available for bespoke analysis
– Licensed through CCSR
– Controlled access through ONS data lab
24
Lessons learnt
• Protection of confidentiality of individual details
becomes more difficult with each Census
• Disclosure risk assessment should have been
carried out earlier to allow earlier consultation and
more time to conduct research and develop
different options
25
Lessons Learnt
• Need to provide users with information about the
measurement and other errors that exist within
Census data
• Review of 2001 disclosure control in preparation
for 2011
26
Contact details
Keith Spicer
Office for National Statistics
Segensworth Road
Titchfield
Fareham PO15 5RR
01329 813062
[email protected]
[email protected]