Public Use Microdata Samples Using PDQ Explore Software Grace York University of Michigan Library May 2004

Download Report

Transcript Public Use Microdata Samples Using PDQ Explore Software Grace York University of Michigan Library May 2004

Public Use Microdata
Samples
Using PDQ Explore Software
Grace York
University of Michigan Library
May 2004
2000 Census Data
Tabulations
• Summary Files 1-4, Equal Employment
Opportunity, School District Data, and
Work Flow data are TABULATED data
• American Factfinder EXTRACTS the
tabulated data
Public Use Microdata
Samples
• Copies of the original questionnaires
with identifying information edited
out
• Create your own cross tabulations
of census data
Typical PUMS Questions
• Single years of age by sex for teachers in
Michigan (e.g. when will they retire?)
• Race of those with Arab ancestry (no, they are
not all white)
• Demographic characteristics of immigrants from
Senegal (age, sex, education, occupation,
income, citizenship for a social survey)
• Age, race and sex of automotive industry
employees (campaign for organ donations)
PUMS Software Programs
• FTP data from Census Bureau (and manipulate
with SAS or SPSS)
http://www.census.gov/PressRelease/www/2003/PUMS5.html
• Census Bureau CD-ROMS (Beyond 20/20
software)
http://www.census.gov/mp/www/Tempcat/
PUMS.html
• SDA Software for Michigan (UMich Only)
http://nds.umdl.umich.edu/n/nds/
• PDQ Explore
http://www.pdq.com
PDQ Explore Software
• Easy interface to
– Public Use Microdata Samples, 1 and 5%,
1980-2000
– IPUMS, edited PUMS, 1850-1880, 19001920, 1940-1990
– Current Population Survey, 1991+
– Mortality Schedules
• Permits users to tabulate their own
variables
Access to PDQ
• Librarians may request free Ids, passwords, and
software from PDQ
• Send e-mail to [email protected]
– You are a librarian who talked to Grace York
– Requesting ID and password for using PDQ
Explore
– Want to download software for the PDQ
Toolbox, Expert Edition
http://www.pdq.com
Software
• Download the software per
instructions to your hard drive
• To begin searching, open the icon on
your desktop
Before Beginning …
Choose File
Two PUMS files – 1% and 5% sample
• 1% has data for the nation, states,
MSAs and super-Pumas (areas of
400,000)
• 5% has data for the nation, states,
MSAs and Pumas (areas of 100,000)
Before Beginning…
Define the data you want in terms of a
spreadsheet. The longer part should be
defined as rows rather than columns.
I want single years of age by sex for all
Vietnam-era veterans in the United
States
Universe = Vietnam-era veterans in the U.S.
Column=sex (not very wide)
Row=single years of age (could be long)
Before Beginning…
Consult Chapter 7 of the PUMS codebook if you
want to check the possible variables and the
appendices for place/language/ancestry and
occupation codes
http://www.census.gov/prod/cen2000/doc/pums.pdf
Chapter 7 is also available on the University of Michigan web site at:
http://www.lib.umich.edu/govdocs/census2/pums2000/pums7.pdf
Before Beginning…
Housing Record
All geographic codes (state, MSA, PUMA)
All housing records
Some population records
Population Record
All population variables
Ok to combine with geographic codes in housing
Ask for help for other population/housing
combinations at: [email protected]
Before Beginning…
Variable Codes for the Question
in the Technical Documentation Data Dictionary
AGE
SEX
VPS5
Single Years of Age
Male or Female
Veteran’s Period of Service 5:
On active duty during the
Vietnam Era (Aug. 1964 to Apr.
1975)
http://www.lib.umich.edu/govdocs/census2/pums2000/pums7.pdf
Logging On
Enter the subscriber name and password that
you were given by the PDQ staff
Logging On
Press OK to close the message of the
day
Defining Workspace
• To conduct a new search, create a new
workspace
• Press Finish or return twice
Defining Workspace
Name your file on your hard drive and save.
Defining Workspace
At the next screen, use the top menu to choose
Workspace; then Add a Data Set
Defining Workspace
Browse data sets; highlight ipums, pums, cps, or
mortality file; Open
Defining Variables
•
•
Once you choose a data set, its codebook will open up
Click on the plus button to get a list of variables, their
alphabetic symbols, and any numeric values
Defining Variables
•
Determine the alphanumeric variables you want
(e.g. Vietnam-era veteran: yes is VPS5=1)
• Use Top Menu to Choose Query/Setup New Expert Query
(Access the codebook later through a tab on the desktop toolbar)
Expert Query Form
1.
2.
3.
Make sure you have the correct data set
Determine if you want a tabulation (counts or numbers)
Name your file
Expert Query Form
Enter the code for UNIVERSE (what you’re counting)
in the Universe box (e.g. vps5=1 are Vietnam-era veterans
for the entire U.S.)
Expert Query Form
•
•
•
Enter the code for the variables in the ROW box
(age = single years of age; age/5 would be five year age groups)
Enter the code for the variables in the COLUMN box (e.g. sex)
Press RESULTS to run the query
Search Results
Search results appear in spreadsheet format
Saving Results
•
•
•
Click on File/Export Query Results
You can save as CSV , tab delimited and several other formats.
CSV (WYSIWIG) recommended for use with Excel
Use SETUP button to return to query or icon at bottom to review
the codebook
Geographic Codes
• Geographic codes are found in the Housing documentation
• Limit files to Michigan with the code state=26
• Click on Query/New Expert Query to continue
Narrowing the Universe
Narrow the universe by using & newcode (e.g.
vps5=1 & state=26)
Logical Operators in PDQ
http://www.lib.umich.edu/govdocs/census2/pdqop.pdf
& is one of numerous operators used in PDQ
Operator
X:a..b
unary +
unary *
/
%
+
<
>
<=
>=
= or ==
!= or <>
& or &&
^
| or ||
Name
range
plus
minus
multiply
divide
modulo
add
subtract
less than
greater than
less than or equal
greater than or equal
equal
not equal
and
exclusive or
or
Example/Comment
age:15..44
sex=+1 (never needed)
income4<=-1000
73*income1/100
rhhinc/persons
subsample%10
income1+income2
rhhinc-rearning
age<65
age>64
age<=65
age>=65
age=23
income!=0
race=2 & looking=1
bit-wise--use with caution
age<18 | age>=65
Altering the Spreadsheet
Tabulations
Once you have a spreadsheet, click on Options to
create totals or percentages for tables or columns
Adding More Parameters
Expand the table detail by repeating the row and column
data for another parameter (e.g. race) as shown in
Dimension 3
Altering Spreadsheet
Appearance
•
•
The default shows separate tables for each of the values in the
third dimension (e.g. separate spreadsheets for white and black)
Change Axis3 tab to FOREACH everything on same spreadsheet
Calculating Means or Averages
•
•
Calculate averages by changing the query type to summary
statistics (e.g. mean or average) at the top
Fill in the new Describe Expression box at the bottom with a
variable code (e.g. age, income)
Complex Table
Mean income of white male Vietnam-era veterans in Michigan
by age, whether or not they have earnings
You can respecify only veterans with earnings
Altering Mean Income
Add & incws > 0 to universe to count only Vietnam-era
veterans who are earning more than $0
Complex Table
Mean income is higher when data limited to wage-earning
veterans
Small Area Geography
• Data from the PUMS 5% file is available for
states, metropolitan areas, and Public Use
Microdata Areas (PUMAS) of 100,000
• You can identify a PUMA or group of PUMAs
using
– Maps in American Factfinder (http://factfinder.census.gov/)
– PDF maps on the Census Bureau web site
(http://www.census.gov/geo/www/maps/puma5pct.htm)
– Mable/Geocorr Search Engine
(http://mcdc2.missouri.edu/websas/geocorr2k.html)
Small Area Geography
This map shows Detroit as PUMAs 3701-3708
PUMA Codes for Michigan
Ann Arbor
Detroit
Flint
Grand Rapids
Lansing
3200
3701-3708
2200
1300
1800
PUMA to Place
http://www.lib.umich.edu/govdocs/census2/pumapl00.txt
Place to PUMA
http://www.lib.umich.edu/govdocs/census2/plpuma00.txt
Codebook and PUMAS
The Explore Codebook shows PUMA5 as term for
5% PUMA boundaries
Small Area Geography and
Ranges
When creating data sets for PUMAS, be sure to
include the correct state as the universe (e.g.
state=26)
Small Area Geography and
Ranges
Puma5: 3701..3708 will list the data for each individual area
Small Area Geography and
Ranges
Search result for each individual PUMA
Small Area Geography for Ranges
To get the total for the area, list it in the universe as
puma5 >3700 & puma5 <3709 & state=26
Small Area Geography for Ranges
To get a listing of single years of age between 65 and 85,
list column as age: 65..85
Calculating Totals
• To calculate the most spoken languages by 65-85 year
olds as a group
• Click on Options/Total Options/Row
Complex Result
Spanish and Polish are two most popular
languages spoken by seniors 65-85 in Detroit
Access to PDQ
• Librarians may request free Ids, passwords, and
software from PDQ
• Send e-mail to [email protected]
– You are a librarian who talked to Grace York
– Requesting ID and password for using PDQ
Explore
– Want to download software for the PDQ
Toolbox, Expert Edition
http://www.pdq.com
Contacts for Research
Assistance
Initial Queries
Grace York, Documents Center, 203 Hatcher
[email protected] or 936-2378
JoAnn Dionne, Numeric and Spatial Data Services, 825
Hatcher, [email protected],
763-9408
Complex Data Sets
Lisa Neidert, Population Studies Center, 426 Thompson,
[email protected], 763-2163
PDQ Staff, 310 Depot Street, Suite C, Ann Arbor
48104, [email protected]