Transcript Document

New and easier ways of
working with aggregate
data and geographies
from UK censuses
Justin Hayes
UK Data Service Census Support
Overview
• The UK Data Service
• UK Data Service Census Support
• Census background
• Aggregate data
• Geographies
• Dissemination challenges
• InFuse
• Data model
• Geography model
• Live demonstration!
The UK Data Service
• An ESRC initiative integrating
several previous resources
• A single, comprehensive and
integrated point of access to a
wide range of social science data
• Access beyond traditional
academic audience where
possible
• Support, training and guidance
• ukdataservice.ac.uk
UK Data Service Census Support (CS)
• A specialist unit of the UK Data
Service
• Access to, and support for use of
data from the last five UK
censuses (1971 – 2011)
• Add value by making census
outputs easy to find, understand
and use
• Extend audience beyond ‘experts’
• Long history of innovation
• census.ukdataservice.ac.uk
ukdataservice.ac.uk
CS@Mimas
• Aggregate component of census outputs
Justin Hayes
Richard Wiseman
Rob Dymond-Green
CS@Mimas
• Aggregate component of census outputs
Justin Hayes
Richard Wiseman
Rob Dymond-Green
UK Censuses
• Decennial questionnaire surveys
• Entire UK population every ten years* since 1801
• Questions about people and households
• 2011 Census cost ~ £500m
• Primary evidence for government policy and spending
• Wide range of high quality demographic and socio-economic
characteristics
• What? - Detailed combinations of characteristics
• Where? - Small areas
• When? - Long history
• Rich secondary source of information
• Open Government License!
UK 2011 Census
•
•
•
•
•
•
•
27 March 2011
Three UK census agencies (ONS, NRS, NISRA)
New questions and variables
Targeted enumeration
Online and postal completion
Sophisticated quality assurance
Best census ever!
Census aggregate outputs
• Counts of people and households* with particular
combinations of characteristics for particular
geographical areas
• Females aged 16-74 in employment in associate professional
and technical occupations and usually resident in wards in the
County of Devon
• Derived from unit-level questionnaire responses
•
•
•
•
Variables and categories
Sex - Male and Female and All
Age - single and multiple years
Ethnicity and Occupation – standard classifications
• Traditionally specified by tables combining one or
more variables
Aggregate specification tables
Census aggregate data
Age : Age 16 to 74 - Economic activity : in employment the week
before the census - Occupation : 3. Associate professional and
technical occupations - Sex : Female - Unit : Persons
Age : Age 16 to 74 - Economic activity : in employment the week
before the census - Occupation : All categories\ Occupation - Sex :
Female - Unit : Persons
Census aggregate data
Aggregate data
2011 Census geographies
• Subdivisions of the UK into smaller areas
• Sets of similar areas called geographies
•
•
•
•
Functional and statistical geographies
Local government districts
Wards and electoral divisions
Expecting around 100 different geographies
• Hierarchies of geographies with nesting areas
• Administrative
• Statistical
• Health, Electoral, Postcode, etc
UK administrative geographies
UK statistical geographies
Dissemination challenges
• Size and complexity of planned outputs
• Ongoing releases from three different agencies
• Inconsistencies in definitions
• Categorisation differences within and between countries
• Table universes
• Inconsistent labelling
• Incomplete geographical availability of data
• Disclosure control
• Lower Threshold (LT), Higher Threshold (HT) and other data
• Thousands of separate datasets
• Restricted global operation and understanding
InFuse
•
•
•
•
•
Live service with 2001 census data since 2012
2011 data since 2013
Tip of the iceberg!
Data model
Geography model
InFuse data model
• Single multidimensional dataset
• Deconstruction, rationalisation and re-integration of
variables and categories
•
•
•
•
All UK table specifications processed
Integration of table universes as variables
Enforce consistency across dataset
Library of variables and categories to describe all counts
• Re-insertion of counts into model
• Retain original cell identifiers
• Attachment of metadata
2011 census variables
• 97 variables and counting!
InFuse geography model(s)
• Raw geography model
• All original geographies and their areas
• Direct and indirect hierarchical relationships
• Simplified geography model
• Combinations of equivalent geographies into geography sets
with UK coverage where possible
• Condensed standard/merged geographies in England
• Selections of areas across the UK
• Multiple geographies in one operation
• Geography jumps in interface
• Currently administrative and statistical geographies
• More to follow
Raw geography entities and relationships
InFuse administrative and statistical geographies
InFuse features
• Open access
• All data is open via Open Government Licence
• Global search across entire UK dataset
• Variable combinations
• No tables!
• Guide users to find data
• Populated variable combinations
• Available geographies
• More data for more geographies
• All LT and HT data available for all areas above LT
• Improved contextual information
• No data fast!
InFuse 2011 demonstration
• http://infuse.mimas.ac.uk/
What’s next?
• Big data release imminent!
• Progressive release of UK 2011 outputs
• Scottish and Northern Ireland
• Integrated boundary data in GIS formats
• Interface design and features
• More contextual information
After that?
•
•
•
•
•
Integration of multiple censuses
Non-census data
External access to API for application development
Development of data and geography models
Continued engagement with NSIs
• Data production using multidimensional approach
• Automated disclosure control
• No all or nothing table constraints
• Use InFuse!
• Let us know what you think
• [email protected]