Public Use Microdata Samples Using PDQ Explore Software Grace York University of Michigan Library May 2004
Download ReportTranscript Public Use Microdata Samples Using PDQ Explore Software Grace York University of Michigan Library May 2004
Public Use Microdata Samples Using PDQ Explore Software Grace York University of Michigan Library May 2004 2000 Census Data Tabulations • Summary Files 1-4, Equal Employment Opportunity, School District Data, and Work Flow data are TABULATED data • American Factfinder EXTRACTS the tabulated data Public Use Microdata Samples • Copies of the original questionnaires with identifying information edited out • Create your own cross tabulations of census data Typical PUMS Questions • Single years of age by sex for teachers in Michigan (e.g. when will they retire?) • Race of those with Arab ancestry (no, they are not all white) • Demographic characteristics of immigrants from Senegal (age, sex, education, occupation, income, citizenship for a social survey) • Age, race and sex of automotive industry employees (campaign for organ donations) PUMS Software Programs • FTP data from Census Bureau (and manipulate with SAS or SPSS) http://www.census.gov/PressRelease/www/2003/PUMS5.html • Census Bureau CD-ROMS (Beyond 20/20 software) http://www.census.gov/mp/www/Tempcat/ PUMS.html • SDA Software for Michigan (UMich Only) http://nds.umdl.umich.edu/n/nds/ • PDQ Explore http://www.pdq.com PDQ Explore Software • Easy interface to – Public Use Microdata Samples, 1 and 5%, 1980-2000 – IPUMS, edited PUMS, 1850-1880, 19001920, 1940-1990 – Current Population Survey, 1991+ – Mortality Schedules • Permits users to tabulate their own variables Access to PDQ • Librarians may request free Ids, passwords, and software from PDQ • Send e-mail to [email protected] – You are a librarian who talked to Grace York – Requesting ID and password for using PDQ Explore – Want to download software for the PDQ Toolbox, Expert Edition http://www.pdq.com Software • Download the software per instructions to your hard drive • To begin searching, open the icon on your desktop Before Beginning … Choose File Two PUMS files – 1% and 5% sample • 1% has data for the nation, states, MSAs and super-Pumas (areas of 400,000) • 5% has data for the nation, states, MSAs and Pumas (areas of 100,000) Before Beginning… Define the data you want in terms of a spreadsheet. The longer part should be defined as rows rather than columns. I want single years of age by sex for all Vietnam-era veterans in the United States Universe = Vietnam-era veterans in the U.S. Column=sex (not very wide) Row=single years of age (could be long) Before Beginning… Consult Chapter 7 of the PUMS codebook if you want to check the possible variables and the appendices for place/language/ancestry and occupation codes http://www.census.gov/prod/cen2000/doc/pums.pdf Chapter 7 is also available on the University of Michigan web site at: http://www.lib.umich.edu/govdocs/census2/pums2000/pums7.pdf Before Beginning… Housing Record All geographic codes (state, MSA, PUMA) All housing records Some population records Population Record All population variables Ok to combine with geographic codes in housing Ask for help for other population/housing combinations at: [email protected] Before Beginning… Variable Codes for the Question in the Technical Documentation Data Dictionary AGE SEX VPS5 Single Years of Age Male or Female Veteran’s Period of Service 5: On active duty during the Vietnam Era (Aug. 1964 to Apr. 1975) http://www.lib.umich.edu/govdocs/census2/pums2000/pums7.pdf Logging On Enter the subscriber name and password that you were given by the PDQ staff Logging On Press OK to close the message of the day Defining Workspace • To conduct a new search, create a new workspace • Press Finish or return twice Defining Workspace Name your file on your hard drive and save. Defining Workspace At the next screen, use the top menu to choose Workspace; then Add a Data Set Defining Workspace Browse data sets; highlight ipums, pums, cps, or mortality file; Open Defining Variables • • Once you choose a data set, its codebook will open up Click on the plus button to get a list of variables, their alphabetic symbols, and any numeric values Defining Variables • Determine the alphanumeric variables you want (e.g. Vietnam-era veteran: yes is VPS5=1) • Use Top Menu to Choose Query/Setup New Expert Query (Access the codebook later through a tab on the desktop toolbar) Expert Query Form 1. 2. 3. Make sure you have the correct data set Determine if you want a tabulation (counts or numbers) Name your file Expert Query Form Enter the code for UNIVERSE (what you’re counting) in the Universe box (e.g. vps5=1 are Vietnam-era veterans for the entire U.S.) Expert Query Form • • • Enter the code for the variables in the ROW box (age = single years of age; age/5 would be five year age groups) Enter the code for the variables in the COLUMN box (e.g. sex) Press RESULTS to run the query Search Results Search results appear in spreadsheet format Saving Results • • • Click on File/Export Query Results You can save as CSV , tab delimited and several other formats. CSV (WYSIWIG) recommended for use with Excel Use SETUP button to return to query or icon at bottom to review the codebook Geographic Codes • Geographic codes are found in the Housing documentation • Limit files to Michigan with the code state=26 • Click on Query/New Expert Query to continue Narrowing the Universe Narrow the universe by using & newcode (e.g. vps5=1 & state=26) Logical Operators in PDQ http://www.lib.umich.edu/govdocs/census2/pdqop.pdf & is one of numerous operators used in PDQ Operator X:a..b unary + unary * / % + < > <= >= = or == != or <> & or && ^ | or || Name range plus minus multiply divide modulo add subtract less than greater than less than or equal greater than or equal equal not equal and exclusive or or Example/Comment age:15..44 sex=+1 (never needed) income4<=-1000 73*income1/100 rhhinc/persons subsample%10 income1+income2 rhhinc-rearning age<65 age>64 age<=65 age>=65 age=23 income!=0 race=2 & looking=1 bit-wise--use with caution age<18 | age>=65 Altering the Spreadsheet Tabulations Once you have a spreadsheet, click on Options to create totals or percentages for tables or columns Adding More Parameters Expand the table detail by repeating the row and column data for another parameter (e.g. race) as shown in Dimension 3 Altering Spreadsheet Appearance • • The default shows separate tables for each of the values in the third dimension (e.g. separate spreadsheets for white and black) Change Axis3 tab to FOREACH everything on same spreadsheet Calculating Means or Averages • • Calculate averages by changing the query type to summary statistics (e.g. mean or average) at the top Fill in the new Describe Expression box at the bottom with a variable code (e.g. age, income) Complex Table Mean income of white male Vietnam-era veterans in Michigan by age, whether or not they have earnings You can respecify only veterans with earnings Altering Mean Income Add & incws > 0 to universe to count only Vietnam-era veterans who are earning more than $0 Complex Table Mean income is higher when data limited to wage-earning veterans Small Area Geography • Data from the PUMS 5% file is available for states, metropolitan areas, and Public Use Microdata Areas (PUMAS) of 100,000 • You can identify a PUMA or group of PUMAs using – Maps in American Factfinder (http://factfinder.census.gov/) – PDF maps on the Census Bureau web site (http://www.census.gov/geo/www/maps/puma5pct.htm) – Mable/Geocorr Search Engine (http://mcdc2.missouri.edu/websas/geocorr2k.html) Small Area Geography This map shows Detroit as PUMAs 3701-3708 PUMA Codes for Michigan Ann Arbor Detroit Flint Grand Rapids Lansing 3200 3701-3708 2200 1300 1800 PUMA to Place http://www.lib.umich.edu/govdocs/census2/pumapl00.txt Place to PUMA http://www.lib.umich.edu/govdocs/census2/plpuma00.txt Codebook and PUMAS The Explore Codebook shows PUMA5 as term for 5% PUMA boundaries Small Area Geography and Ranges When creating data sets for PUMAS, be sure to include the correct state as the universe (e.g. state=26) Small Area Geography and Ranges Puma5: 3701..3708 will list the data for each individual area Small Area Geography and Ranges Search result for each individual PUMA Small Area Geography for Ranges To get the total for the area, list it in the universe as puma5 >3700 & puma5 <3709 & state=26 Small Area Geography for Ranges To get a listing of single years of age between 65 and 85, list column as age: 65..85 Calculating Totals • To calculate the most spoken languages by 65-85 year olds as a group • Click on Options/Total Options/Row Complex Result Spanish and Polish are two most popular languages spoken by seniors 65-85 in Detroit Access to PDQ • Librarians may request free Ids, passwords, and software from PDQ • Send e-mail to [email protected] – You are a librarian who talked to Grace York – Requesting ID and password for using PDQ Explore – Want to download software for the PDQ Toolbox, Expert Edition http://www.pdq.com Contacts for Research Assistance Initial Queries Grace York, Documents Center, 203 Hatcher [email protected] or 936-2378 JoAnn Dionne, Numeric and Spatial Data Services, 825 Hatcher, [email protected], 763-9408 Complex Data Sets Lisa Neidert, Population Studies Center, 426 Thompson, [email protected], 763-2163 PDQ Staff, 310 Depot Street, Suite C, Ann Arbor 48104, [email protected]