No Slide Title

Download Report

Transcript No Slide Title

Lab 1 Background on the IPUMS and SPSS

IPUMS

The Integrated Public Use Microdata Series database

www.ipums.org

Lab 1: Introduction to the datasets

What is the IPUMS?

Who uses IPUMS?

What research is IPUMS best for?

Other IPUMS-like datasets

Getting and using the data

Census Samples Included in the IPUMS

Sample Description 1850 PUMS -- Free population 1860 PUMS -- Free and Slave population 1870 PUMS -- General sample 1880 PUMS -- General sample 1900 PUMS -- General sample 1910 PUMS -- General sample 1920 PUMS -- General sample 1940 PUMS -- General sample 1950 PUMS -- General sample 1960 PUMS -- General sample 1970 PUMS -- Form 1 State sample 1970 PUMS -- Form 2 State sample 1970 PUMS -- Form 1 Metro sample 1970 PUMS -- Form 2 Metro sample 1970 PUMS -- Form 1 Neighborhood 1970 PUMS -- Form 2 Neighborhood 1980 PUMS -- 5% State sample 1980 PUMS -- 1% Metro sample 1980 PUMS -- 1% Urban/rural sample 1990 PUMS -- 5% State sample 1990 PUMS -- 1% Metro sample 1990 PUMS -- 1% Unweighted state 2000 C2SS -- 0.13% Sample 2000 PUMS -- 1% Sample Year Released 1994* 2002* 2002 1994 2002* 1989* 1998 1984 1984 1971* 1972 1972 1972 1972 1972 1972 1983 1983 1983 1992 1992 1995 2003 2003 Sample Density 1 in 100 1 in 100 1 in 100 1 in 100 1 in 200 1 in 250 1 in 100 1 in 100 1 in 100 1 in 100 1 in 100 1 in 100 1 in 100 1 in 100 1 in 100 1 in 100 1 in 20 1 in 100 1 in 100 1 in 20 1 in 100 1 in 100 1 in 750 1 in 100 Number of Records (thousands) Household 37 Person 198 66 80 107 208 113 257 391 354 428 503 870 480 1037 1351 461 579 744 744 744 744 744 744 4711 942 942 5528 1106 1106 158 1237 1922 1780 2030 2030 2030 2030 2030 2030 11337 2267 2267 12500 2500 2500 372 2819 Number of Variables 92 94 94 123 94 125 122 174 170 141 206 210 203 207 260 264 276 276 266 252 252 252 258 256 File Size 79 Mb 141 Mb 170 Mb 204 Mb 361 Mb 198 Mb 433 Mb 584 Mb 798 Mb 790 Mb 929 Mb 929 Mb 929 Mb 929 Mb 1016 Mb 1016 Mb 5376 Mb 1075 Mb 1075 Mb 6039 Mb 1208 Mb 1208 Mb 207 Mb 1675 Mb * Preliminary sample available; larger sample currently being constructed.

TOTAL 27,372 Mb

WHAT ARE MICRODATA?

Individual-level data • every record represents a separate person • all of their individual characteristics are recorded • users must manipulate the data themselves Different from aggregate/summary/tabular data • a disability table from www.factfinder.census.gov

• an occupation table from a published census volume from the library

1930 Census Population Schedule, made public April 2002

Raw Census Microdata from IPUMS

H9101000000030982025200090000001324101001000071000000008800000000 P9101000000030102520252120000000002109730111020010103212001182000 P9101000000030202520252120000000001109730111020020103622001181080 P9101000000030302520252120201010100009000199996030101122006990000 P9101000000030402520252120201010100009000199996030100912006990000 P9101000000030502520252120201010100009000199996030100712006990000 P9101000000030602520252120201010100009000199996030100612006990000 P9101000000030702520252120201010100009000199996030100422006990000 P9101000000030802520252120201010100009000199996030100322006990000 P9101000000030902520252120201010100009000199996030100222006990000 H9101000000040360025200030000001324101001000071000000008800000000 P9101000000040102520252120000000002103110101010010103011001021000 P9101000000040202520252120000000001103110101010020102121001021020 P9101000000040302520252120201010100003000199990030100111006990000 H9101000000050338025200030000001324101001000071000000008800000000 P9101000000050102520251200000000021031001070700101045120010520000 P9101000000050202520252120000000001103100107070020102522001051020 P9101000000050302520252120201010100003000199990030100722006990000 H9101000000060416025200040000001324101001000071000000008800000000 P9101000000060102520252120000000002104200119150010104912001192000 P9101000000060202520252120000000001104200119150020104922001192040 P9101000000060302520252120201010100004000199991030101922006990000 P9101000000060402520252120201010100004000199991030101522006990000

Relationship Age Sex Race Birthplace Mother’s birthplace H910000240000000088001001000220100 P910000020101032120010010010011504 P910000010201036220010010010011999 P910201000301011220060010010011999 P910201000301009120060010010011999 P910201000301007120060010010011999 P910201000301006120060010010011999 P910201000301004220060010010011999 P910201000301003220060010010011999 P910201000301002220060010010011999 H910000240000000088001001000110100 P910000020101030110010290510511310 P910000010201021210010290290171999 P910201000301001110060010290291999 H910000240000000088001001000220100 P910000020101045120010010010011100 P910000010201025220010010010011820 P910201000301007220060010010011999 H910000240000000088001001000220100 P910000020101049120010010010011100 P910000010201049220010010010011820 P910201000301019220060010010011820 P910201000301015220060010010012820 Occupation

IPUMS Data Structure

Household record (shaded) followed by a person record for each member of the household For each type of record, specific columns correspond to different variables

The Advantages of Microdata

Combination of all of a person’s characteristics

Characteristics of everyone with whom a person lived

Freedom to make any table you need

Freedom to make models to look at multivariate relationships

INTEGRATION

What the IPUMS actually does to the original census samples

IPUMS Translation Table for RACE ## 1850 P 18 18 1860 P 17 18 1870 P 17 18 1880 P 10 10 1900 P 12 12 1910 P 20 21 1920 P 18 19 1940 P 16 16 1950 P 22 22 1960 P 7 7 1970 P 7 7 1980 P 12 13 1990 P 12 14 ## # . . .

Column location in original samples Original codes for “Black” IPUMS assigned codes IPUMS 1880 1900 1910 1940 1950 1960 1970 1980 1990 White 1 00 0 0 0 1 1 0 0 01 001 Spanish write-in 1 10 12 * Black/Negro 2 00 1 1 1 2 2 1 1 02 002 Mulatto 2 10 2 2 American Indian 3 00 3 2 3 3 3 2 2 03 Alaskan Athabaskan 3 01 301 Apache 3 02 302 Blackfoot 3 03 303 Cherokee 3 04 304 Cheyenne 3 05 305 Chickasaw 3 06 306 Chippewa 3 07 307 Choctaw 3 08 308 Comanche 3 09 309 Creek 3 10 310 Aleut 3 30 * 005 Eskimo 3 40 * 004 Chinese 4 00 4 4 5 5 5 4 4 05 006 Taiwanese 4 10 007 Japanese 5 00 3 4 4 4 3 3 04 009

IPUMS Translation Table for RELATIONSHIP ## 1880 P 21 23 1900 P 09 11 1910 P 14 16 1920 P 11 14 1940 P 11 14 1950 P 16 20 1960 P 01 02 1970 P 01 02 1980 P 02 04 1990 P 09 10 ## IPUMS 1880 1900 1910 1940 1950 1960 1970 1980 1990 # HEAD & RELATIVES: Head/Householder 01 01 100 100 100 019901999 0- 0- 000 00 01 01 00 00 Spouse 02 01 120 120 120 029902999 1- 1- 010 01 Husband, not Head 02 01 140 140 2nd/3rd Wife (PG) 02 02 121 129 Child 03 01 130 130 130 039903999 2- 2- 020 02 Incl Adopted, Step 03 01 20 20 (1970 error) 03 01 22 (1970 error) 03 01 26 Adopted Child 03 02 132 132 132 Stepchild 03 03 131 131 131 049904999 03 Adopted, ns 03 04 280 Child-in-law 04 01 133 133 133 059905999 30 30 051 * Step Child-in-law 04 02 134 134 Parent 05 01 210 210 210 079907999 32 32 040 05 Stepparent 05 02 211 211 211 Parent-in-Law 06 01 213 213 213 089908999 33 33 053 * Stepparent-in-law 06 02 214 214 Sibling 07 01 220 220 220 109910999 34 34 030 04 Step/Half/Adopted 07 02 221 221 221 07 02 222 07 02 223 Sibling-in-Law 08 01 223 223 224 119911999 35 35 054 * Step/Half Sib-in-law 08 02 225 08 02 222 226 Grandchild 09 01 270 270 270 069906999 31 31 052 06 Adopted Grandchild 09 02 272 272 272 Step Grandchild 09 03 271 271 271

IPUMS Documentation: Farm Status Variable FARM — H 78 Farm status

Availability:

1850 1860 1870 1880 1900 1910 1920 1940 1950 1960 1970 1980 1990 X X X X X X X X X X X X X

Universe: All households and group quarters.

Description/Comparability: FARM identifies farm households. Only units sampled as (non-vacant) households are eligible to be coded as farms (see GQ, p. 1.12.1). Census methods for defining and identifying farms have changed several times. A year-by-year discussion follows:  For 1850-1880, the IPUMS constructs FARM from  occupational data. Any household containing a person with the occupation “farmer” is coded as a farm.

For 1900, the census counted a household as a farm if a member of the household operated a farm. It is not possible to tell whether or not the household actually lived on or owned the farm they operated in 1900 (or 1850-1880).

    For 1910 and 1920, enumerators identified farms using the following criteria: any household located on either a tract of 3 or more acres used for any agricultural operations, regardless of the amount of labor or produce involved, or households on a tract of fewer than 3 acres which either yielded $250+ in produce sales in the previous year or employed at least one full-time farmer or agricultural laborer.

For 1940 and 1950, enumerators simply asked the respondent whether or not the house in which they lived was located on a farm.

For 1960 and 1970, a farm was either 1) a household on 10+ acres that yielded $50+ in produce, or 2) a household on fewer than 10 acres that yielded $250+ in produce. For 1970, vacant units and dwellings in city lots could not be farms.

For 1980 and 1990, a farm was any household on 1+ acres that yielded $1000+ in produce. Tenant families that paid cash rent were considered farm households if the parcel of land they farmed (their “yard”) met these criteria. Those that paid no cash rent were enumerated in the same way as owner-occupied farms. For both years, vacant units and those on urban lots could not be farms. 1980 also excluded households on suburban lots, and 1990 excluded multiple-unit dwellings.

Flags: QFARM, QACREPRO (1970), QFARMPRO (1970-1990) Codes and Frequencies:

Non-farm Farm Code 1850 1 19420 2 17674 1860 1870 1880 5992 11960 5398 66601 6668 40475 1900 19825 7458 1910 1920 1940 1950 1960 1970 1980 1990 64458 95491 318264 389486 536861 719372 923614 1088581 24356 33761 72770 71644 42351 25057 18600 17002

Additional ways in which IPUMS improves the original samples

    

Additional documentation, including all enumeration forms and instructions Consistent occupation/industry classifications Consistent metropolitan classifications Constructed family variables Locator variables for spouse and parents

Lab 1: Introduction to the datasets

What is the IPUMS?

Who uses IPUMS?

What research is IPUMS best for?

Other IPUMS-like datasets

Getting and using the data

Quantity of IPUMS Data Downloaded

Number of IPUMS-USA extract requests, by month, 2001-2003

2000 1750 1500 1250 1000 750 500 250 0 Sep 2001 Oct 2001 Nov 2001 Dec 2001 Jan 2002 Feb 2002 Mar 2002 Apr 2002 May 2002 Jun 2002 Jul 2002 Aug 2002 Sep 2002 Oct 2002 Nov 2002 Dec 2002 Jan 2003 Feb 2003 Mar 2003 Apr 2003 May 2003

Month

Who uses the data?

Profile of IPUMS users

Approximately 9,000 registered users

About 90% are affiliated with universities

Among those: 40% are economists 25% are sociologists

Most other academics are from the social sciences

Other main users include journalists and policy-makers

How do people get IPUMS data

85% make “extracts” using online interface Choose the variables you want We provide customized data and command files 15% download complete datasets 1850-1970 datasets less than 1GB each 1980-2000 datasets about 5GB each We provide raw data and command files ?? Go to data redistributors Querylogic (www.querylogic.com) PDQ (www.pdq.com)

Lab 1: Introduction to the datasets

What is the IPUMS?

Who uses IPUMS?

What research is IPUMS best for?

Other IPUMS-like datasets

Getting and using the data

4 Key Strengths of the Census Microdata Samples Large

Have more cases than any comparable datasets Enable study of relatively small populations

National in scope

Results aren’t subject to local peculiarities Moreover, they provide context for local studies

Long-term

Provide historical depth

Microdata

Can make your own tabulations Apply multivariate techniques

Limitations of the Census Microdata Samples

Geographic detail Confidentiality restrictions 1940-2000 1-in-100 samples (1-in-20 for 1970-2000) Too small to answer some questions Decennial Any historical analysis must use 10-year gaps Cross-sectional data Not longitudinal Need knowledge of a statistical package

What type of question is IPUMS best suited for?

Studies that do not need to identify geographic areas of less than 100,000 after 1940 (e.g., cannot identify Clemson, SC. Can identify a group of several counties of which Clemson is a part).

• •

Subjects that are likely to deal with at least 10,000 people, preferably more. 10,000 individuals will generate about 100 cases in IPUMS. Anything less than this is probably too small a sample for useful analysis.

Any analysis of census-related question that is not answered via the published census volumes or summary files.

An example: Southern migrants in the North 1870-1970

Published census volumes can tell you --How many southern-born persons of each race lived in each state in 1900, 1920, 1930, and 1960 --occupations of all African-Americans in the North But you’re also interested in --The jobs held by actual migrants --How their jobs compared to those who stayed home --How their jobs compared to northern-born blacks --How their settlement changed from 1870 onward

An example: Why this analysis works

The numbers are very large --over 500,000 southerners are in the North in every decade from every decade from 1870 on I don’t need to know particular towns --state of residence is available in every census --a sub-state designation known as State Economic Area (SEA) is even available for every census Data not available anywhere else --and so it is worth the trouble

An example: What you can’t do with the IPUMS

How did the southerners do in Pittsburgh?

--IPUMS has data on 90 employed southern black men in Pittsburgh in 1970, fewer in previous years.

Were the migrants segregated in the north?

--you don’t know their street, tract, or ward --all you know is their city, and only that if it was a pretty big one (>100K for 1940-50 and 1980-90; >250K for 1960-70; >100K in 2000).

Did migrants’ jobs improve over time?

--The census samples are cross-sectional databases, not longitudinal ones

Lab 1: Introduction to the datasets

What is the IPUMS?

Who uses IPUMS?

What research is IPUMS best for?

Other IPUMS-like datasets

Getting and using the data

Ongoing data projects at the MPC

New high-density Public Use files 1880: 100% data for selected variables 20% sample for minorities (all variables) 10% sample for entire population (all variables) 1900: 10% sample 1930: 5% sample 1960: 5% sample

Ongoing data projects at the MPC

New high-density Public Use files: number of person records in each file 20,000,000 18,000,000 16,000,000 14,000,000 12,000,000 10,000,000 8,000,000 6,000,000 4,000,000 2,000,000 0 Samples planned and in progress Existing samples 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 Census year

Ongoing data projects at the MPC

New harmonized intercensal series American Community Survey Available from 2001-2002 on main IPUMS site 2003 data will be available in the Fall of 2004 March Current Population Survey Spans from 1962-2003 Available at http://beta.ipums.org/cps Includes special questions on labor markets

Ongoing data projects at the MPC

IPUMS International Currently contains 22 samples from 6 countries About 80 variables currently available IPUMS Latin America 15 country project Got underway this year IPUMS Europe 18 country project Got underway this year

Lab 1: Introduction to the datasets

What is the IPUMS?

Who uses IPUMS?

What research is IPUMS best for?

Other IPUMS-like datasets

Getting and using the data