Making Sense of Census Data - University of Alabama at

Download Report

Transcript Making Sense of Census Data - University of Alabama at

Making Sense of
Census Data
Robert Matthews
University of Alabama at Birmingham
Introduction



Our cohort consists of a 5% sample of the
entire U.S. Medicare population from 1999-2006
Zip+4 (9-digit) information available for 99.9%
of beneficiaries and providers
Task was to link our cohort with the census data
to obtain demographic variables:
Educational attainment, median household
income, and total population
Hierarchical Relationships of Census Geographic Structures
Source: U.S. Census Bureau, Summary file 3 documentation
Summary Files

“Short Form”



Summary File 1 (SF 1) – data from the Short Form
questions
Summary File 2 (SF 2) – data from the Short Form
questions, repeated for 249 population groups
Redistricting Data – used for congressional and state
redistricting
Summary Files

“Long Form”
Only asked for a sample of the U.S. population (1/6 households)


Summary File 3 (SF 3) – comprehensive results
from the Long Form
Summary File 4 (SF 4) – comprehensive results
from the Long Form, repeated for 335 population
groups
Summary File 3 components

53 sets of files
50 U.S. States
 District of Columbia (D.C.)
 Puerto Rico
 All states combined


53 x 77 = 3,927 files
Linking Census and Medicare data



Census Block Group is used to link Census and
Medicare data
Census block is a 4-character variable and the
block group is identified by the value in the first
position
We obtained a database containing 66 million
Zip+4 zipcodes from Melissa Data so that we
could get the census tract and block for each zip
Variables of interest

Variable description
Educational attainment (Table P37)
 Median household income (Table P53)
 Total population (Table P1)


Tables mapped to File Segmentation Table
P37  File 04
 P53  File 06
 P1  File 01

Summary File 3 components




Geographic Identifier file (GeoID)
76 data files containing different sets of
variables
GeoID file is linked to each of the 76 data files
by a variable named LogRecNo
The Summary Level must be selected from the
GeoID file to extract the desired stratification
level. This is used to identify the specific area
being tabulated.
Summary Level Sequence Chart
(partial listing)
Geographic component
Summary Level
00, 01-49, 52-95
040 State
00, 01, 43, 49
00
00
00
00
00
00
00
050 State-County
060 State-County-County Subdivision
070 State-County-County Subdivision-Place/Remainder
080 State-County-County Subdivision-Place/Remainder-Census Tract
085 State-County-County Subdivision-Place/Remainder-Census TractUrban/Rural
090 State-County-County Subdivision-Place/Remainder-Census TractUrban/Rural-Block Group
067 State [Puerto Rico Only]-County-County Subdivision-Subbarrio
140 State-County-Census Tract
00
144 State-County-Census Tract-American Indian Area/Alaska Native
Area/Hawaiian Home Land
00
150 State-County-Census Tract-Block Group
… more levels …
Subject Locator



Index designed to quickly identify tables in the
summary file for particular subjects or topics of
interest.
Arranged alphabetically by name of subject
Each row contains the type of entry and the
relevant table number for the data source
Subject Locator Index
(partial listing)
Subject Description
Subject Table Numbers
Median Income (dollars)
Families
P77
by Family Type by Presence of Own Children Under 18 Years
PCT40
by Presence of Own Children Under 18 Years
PCT39
Households
by Age of Householder
Nonfamily Households
by Sex of Householder by Living Alone by Age of Householder
P53
P56
P80
PCT42
Occupied Housing Units
by Tenure
HCT12
Population 15 Years and Over With Income
by Sex by Work Experience
… more subjects …
PCT45
Summary of steps for identifying
variables and merging with cohort
1.
2.
3.
4.
5.
Use Subject Locator to identify variables of
interest and their corresponding table numbers
Use File Segmentation Table to identify
specific data file(s) for each table number
Use the Summary Level Sequence Chart to
locate the desired stratification level
Identify SAS input statements to read each file
Merge census variables with existing cohort
data
Conclusion




Daunting task due to large volume of Census
data and documentation
Well organized into a manageable set of distinct
components
Flexibility comes at cost of thousands of
variables and data files
Process of extracting variables from Census data
becomes much easier once all the pieces are in
place
Contact information
Robert Matthews
University of Alabama at Birmingham
Department of Epidemiology
1665 University Blvd. RPHB 517
Birmingham, AL 35294-0022
Email: [email protected]
Web: www.epi.soph.uab.edu/rsm/