Transcript Access to Data Collected by the Census Bureau
Accessing Data Collected by the Census Bureau
THURSDAY, April 26, 2012
Quick Review
2010 decennial data is “short-form” only – limited demographic characteristics; ACS now source of “long-form” type of data Census Data released in two “flavors” – Aggregate data Microdata A third type of data product identifies geographic boundaries Aggregate data released in a variety of products, differing in content, geographic specificity and temporal coverage Microdata has flexibility of individual level information, but balances this by only gross geographic detail
Access and Resources
Aggregate data resources Microdata resources Geography resources Local resources
Access and Resources
Aggregate resources : American Factfinder 2, Social Explorer, DataFerrett, Uexplore/Dexter, NHGIS, Historical Census Browser, Geolytics NCDB Microdata resources Online Analysis: SDA & IPUMS Extract/Download: IPUMS, ICPSR, NBER, DataFerrett, Census, Unicon Restricted Use: California Census Research Data Center Geographic : Census, MABLE/Geocorr, IPUMS, NHGIS Documentation : IPUMS, AFF2, ICPSR Visualization : Social Explorer, Historical Census Browser, AFF2 Local Resources : DOF/DRU, SDCs, UC DATA, DataLab, CCRDC
Resources: Aggregate Census Data
Resource American FactFinder II Datasets Decennial, ACS, Economic Census, Pop Est., ++ Decennial, ACS, Temporal Coverage 2000 Current Geographic Grain Block to National Ease of Use Mixed Context Glossaries, Links to Tech Doc Social Explorer 1790-2010 Tract to National Very Good Links under “Data” Dataferrett Uexplore/Dexter NHGIS Historical Census Browser Decennial (old), ACS, SAIPE, CBP, ++ Decennial, ACS, PopEst, SAIPE, BLS, CBP 1990-2010 (varies by dataset) Block to Nation 1980-2010 Block to Nation Mixed Non Intuitive 1790-2010 1790-1960 Only US, State, and County until 1910. 1910 onward larger lists State, County, no US totals) Clunky Easy Limited Metadata Very nice guides on Geography Limited Limited Notes First point of release Broad array of datasets Multiple ways to narrow search “Deep Linking” Limited Historic Data Both Map and Data interface “Canned” reports UCB Library (5 concurrent, proxy) Mixed in terms of currency Can be very slow Both Aggregate and Microdata Great if familiar with interface, but steep learning curve Register for account (free) My “go-to” site for historical geography (now partners with Social Explorer) quick comparison over decades
Just When You thought it was safe…..
American Factfinder II
The “new” American Factfinder
Data Search Strategy
Specification act as a “sieve” – eliminating non-conforming tables, data, geographies, etc.
So…. Pick your most limiting conditions first.
–
Need very detailed geography? Pick that first.
– –
Know your base dataset? Pick that early.
Need data for 2010? Limit your search from the start
Think about “bookmarking” for geographies or items you’ll return to
Alternative to AFF: FTP Full Files
AFF 2: Deep Linking
Deep Linking in AFF2
http://factfinder2.census.gov/legacy/AFF_deep_linking_guide.pdf
factfinder2.census.gov/bkmk/table/version/lang /program/dataset/product [/geo_id[|geo_id]*][/codetype~code[|code]*]*
Deep Linking
Deep Linking
Same FTP Options for ACS
Historical Census Data Browser
Example: Class Assignment
Describe, in broad terms, the demographics of the Fruitvale community. Population size, SES, race, ethnicity, nativity, age, education, occupation, etc.. Over time?
– – – – Questions.
What is Fruitvale? A place? A CDP? A neighborhood?
How will we define our geography? Does this limit anything?
What data sets are available for evaluation?
What data items do we want?
http://www.acphd.org/media/53462/fruitvale.pdf
Tracts 4061-4063, 4065-4066, 4070-4072
Deep Linking approach
Deep Linking approach
http://factfinder2.census.gov/bkmk/table/1.0/en /ACS/10_5YR/B01001A /1400000US06001406100 |1400000US06001406201|1400000US06001406202 |1400000US06001406300|1400000US06001406400 |1400000US06001406500|1400000US06001406601 |1400000US06001406602|1400000US06001407000 |1400000US06001407101|1400000US06001407102 |1400000US06001407200
Micro-data Resources
Survey Documentation and Analysis (SDA) and the Integrated Public Use Microdata Samples (IPUMS)
The Integrated Public Use Microdata Samples
The Integrated Public Use Microdata Samples
www.ipums.org
at the Minnesota Population Center IPUMS-USA Harmonized data on people in the U.S. census and American Community Survey, from 1850 to the present
.
IPUMS-CPS Harmonized data on people in the Current Population Survey, every March from 1962 to the present Important! Harmonized : Questions asked change over time: How to make data comparable?
Integrated : Multiple data collections & surveys simultaneously available Microdata : The underlying individual-level data is available, not just pre-defined tables.
The American Community Survey and the Current Population Survey
CPS – Long-running monthly survey (dating back to the 1940’s) focused on labor force characteristics (unemployment, earnings, hours worked). ~ 55,000 sample HH’s, multiple interviews, personal In addition to the basic monthly questions, additional modules are “piggy-backed” onto the survey to provide more depth on particular topics. Most widely used supplement is the Annual Social and Economic Supplement (ASEC) - aka Annual Demographic Survey or the March Files. (~100,000 HH’s) In-depth survey – lots of detail about sources of income, work, occupational, hours, etc. (as well as core demographic information on race/ethnicity, nativity, age, sex, educataion)
The American Community Survey and the Current Population Survey
ACS – “New” continuous survey, replaces the long form of the decennial census, first fully implemented in 2005 (non institutionalized) and 2006 (institutionalized).
~ 2,000,000 HH’s annually, mixed mail-in/personal interviews Substantial overlapping content with CPS Broader range of content, somewhat less detail Larger sample sizes allow for greater geographic detail
The American Community Survey and the Current Population Survey
Microdata Aggregate vs.
The Integrated Public Use Microdata Samples
www.ipums.org
at the Minnesota Population Center Strengths: Tremendous centralized documentation Many “value-added” data items Wonderful extraction engine (if downloading data) Multiple statistical Packages supported Online Analysis also possible
The Integrated Public Use Microdata Samples
Online Analysis Links
The Basics of SDA
What is SDA?
What can you do with SDA?
• • • •
The parts of the SDA interface Menu Variable List Active variables Analysis Specification
Part II. Working with SDA
1. Parts of the SDA interface 2. Finding data/variables/subjects - search - documentation 3. Analysis Components - rows, columns, selection, controls Procedures - crosstabs, means, correlations 4. Aids in Analysis Recoding Saving new variables Downloading
The Basics of SDA
What is SDA?
SDA (Survey Documentation and Analysis) is a set of programs for the documentation and Web-based analysis of survey data.
It was developed and is maintained by the Computer-assisted Survey Methods Program (CSM) at UC Berkeley.
It was developed as a companion program with CASES (Computer Assisted Survey Execution Program), a package for collecting survey data based on structured questionnaires, using a variety of modes of data collection.
It operates on a transposed file structure, which makes analysis of datasets, especially large datasets, extremely fast.
Part I. The Basics of SDA
What is SDA?
SDA (Survey Documentation and Analysis) is a set of programs for the documentation and Web-based analysis of survey data.
It was developed and is maintained by the Computer-assisted Survey Methods Program (CSM) at UC Berkeley.
It was developed as a companion program with CASES (Computer Assisted Survey Execution Program), a package for collecting survey data based on structured questionnaires, using a variety of modes of data collection.
It operates on a transposed file structure, which makes analysis of datasets, especially large datasets, extremely fast.
Part I. The Basics of SDA
What data is available in SDA?
LOTS!
Many popular social science datasets (e.g. the GSS, the ANES, the PUMS from the Decennial Census, the ACS, the CPS Annual Demographic Files,…… can be found in SDA format.
Many archives (ICPSR, IPUMS, CPANDA, Roper, SDA, UCDATA….) provide at least some of their holdings in SDA format.
Multiple Census Samples at IPUMS
( http://usa.ipums.org/usa/sda/)
And CPS (March files) data, as well
(http://cps.ipums.org/cps/sda/)
The Basics of SDA
What can you do with SDA?
• • • • • •
SDA can be used to: learn about a dataset (metadata, paradata) search for variables of interest investigate sample sizes and variable distributions perform statistical analyses transform, manipulate and create variables for each unit extract and download subsets or full datasets
Part I. The Basics of SDA
The four parts of the SDA interface
•
Action Menu
•
Variable List
•
Active Variable
•
Analysis Specification
Action Menu
Collapsed Variable Tree
Active Variables
Analysis Specification
2. Finding data/variables/subjects Online SDA codebook IPUMS detailed documentation
Working with SDA
Analysis – Components - rows, columns, selection, controls Procedures - crosstabs , means, correlations Screens will vary depending upon what procedure you are using.
Start with exploratory – frequencies, cross-tabulations
The variables you are interested in Who to include in the table
Part II. Working with SDA
Aids in Analysis Recoding Saving new variables Downloading
Recoding variables – on the fly
Can be used in row, column, control (Crosstabs)
age (5-18) age (r: 5-18) age (d: 5-18) Selects, but does not collapse Selects AND Collapses Collapses, but does not select age (c:13,5) Collapses into categories of width w age (c:st,w) starting with value st Recoding variables – Web interface
Question 1: Use the CPS or ACS?
Question 2: What is the desired level of analysis (person, family, household)?
Question 3: Who should be excluded?
(How to limit to family households, or only particular age groups, or….?
DataFerrett Content
Geographic Resources
Selected Data Resources at Berkeley
Library Data Lab http://sunsite3.berkeley.edu/wikis/datalab/ SDA (Survey Documentation & Analysis) http://sda.berkeley.edu/ Statewide Database http://swdb.berkeley.edu/ California Census Research Data Center http://www.ccrdc.ucla.edu/ The Econometrics lab http://emlab.berkeley.edu/data2.shtml
Thomas J. Long Business & Economics Library http://www.lib.berkeley.edu/BUSI/electres.html
Questions/Comments
email me at: [email protected]
http://ucdata.berkeley.edu