IPUMS Work Process - University of Minnesota

Download Report

Transcript IPUMS Work Process - University of Minnesota

Integrated Public Use Microdata Series
IPUMS
www.ipums.org
Matt Sobek
Minnesota Population Center
[email protected]
IPUMS Overview
1. What is the IPUMS
2. Harmonization
3. Additional Data Enhancements
4. Access
5. Strengths and Limitations
6. Research examples
Brief History
IPUMS-USA 1991 -- Steve Ruggles
All existing samples of US census
Data extraction system 1998
IPUMS-International 2001
2004 IPUMS-Latin America
2005 IPUMS-Europe
2005 NSF Expansion
World’s largest collection census data
30 samples per year for the next 3 years
Bob McCaa
Datasets in IPUMS
Belarus
Brazil
Cambodia
Chile
China
Colombia
Costa Rica
Ecuador
France
Greece
Kenya
Mexico
Philippines
Romania
South Africa
Spain
Uganda
United States
Venezuela
Vietnam
1999
2000
1998
2002
1982
1993
2000
2001
1990
2001
1999
2000
2000
2002
2001
2001
2002
2000
1990
1999
1991
1980
1970
1960
1992
1982
1970
1960
1985
1984
1990
1982
1991
1989
1990
1995
1992
1996
1991
1991
1990
1981
1989
1973
1973
1982
1975
1981
1964
1960
1974
1968
1971
1970
1990
1960
1962
1962
1981
1980
1970
1970
1960
IPUMS Census Sample Holdings and Release Dates
June 2008 to June 2009
June 2007
December 2007
1970
Argentina
1971
Austria
2001 Armenia
1983
Guinea
1971
Nicaragua
1980
Argentina
1981
Austria
1976 Bolivia
1996
Guinea
1973
Pakistan
1991
Argentina
1991
Austria
1992 Bolivia
1961
Honduras
1981
Pakistan
2001
Argentina
2001
Austria
2001 Bolivia
1974
Honduras
1998
Pakistan
1970
Hungary
1971
Canada
2005 Colombia
1988
Honduras
1962
Paraguay
1980
Hungary
1981
Canada
1991 Czech Republic
1971
Indonesia
1972
Paraguay
1990
Hungary
1991
Canada
2001 Czech Republic
1976
Indonesia
1982
Paraguay
2001
Hungary
1970
Malaysia
1960 Dominican Rep
1980
Indonesia
1992
Paraguay
1972
Israel
1980
Malaysia
1970 Dominican Rep
1990
Indonesia
2002
Paraguay
1983
Israel
1991
Malaysia
1981 Dominican Rep
1995
Indonesia
1993
Peru
1995
Israel
2000
Malaysia
1986 Egypt
1997
Iraq
1970
Puerto Rico
1997
Palestine
1960
Panama
1996 Egypt
1961
Israel
1980
Puerto Rico
1981
Portugal
1970
Panama
1992 El Salvador
1991
Italy
1990
Puerto Rico
1991
Portugal
1980
Panama
1966 Fiji
1993
Madagascar
2000
Puerto Rico
2001
Portugal
1990
Panama
1986 Fiji
1987
Malawi
1983
Sudan
1991
Rwanda
2000
Panama
1996 Fiji
1998
Malawi
1993
Sudan
2001
Rwanda
2005
United States
1999 France
1987
Mali
1995
Turkmenistan
2001
Venezuela
1964 Guatemala
1998
Mali
1991
United Kingdom
1973 Guatemala
2002
Mongolia
1963
Uruguay
1981 Guatemala
1960
Netherlands
1975
Uruguay
1994 Guatemala
1970
Netherlands
1985
Uruguay
2002 Guatemala
2001
Netherlands
1996
Uruguay
IPUMS Global Coverage
Dark green = disseminating
Medium green = data held by IPUMS
Light green = negotiating
Yellow = not negotiating
Selected Variable Availability -- PERSON
Variable Name
Relationship to hh head
Age
Sex
Marital status
Children ever born
Children surviving
Date of last birth
Country of birth
Nativity
Religion
School attendance
Education attainment
Years of schooling
Literacy
Employment status
Class of worker
Occupation
Industry
Income
Migration, previous country
Migration, internal
Year of migration
Disability
BR
CL
CN
X
X
X
X
X
X
X
X
X
X
X
CO CR
EC
FR
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
x
x
X
x
X
x
x
.
.
X
x
X
X
X
KE MX
ZA
UG
US
VE
VN
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
x
.
X
X
X
X
x
X
X
x
x
.
X
.
X
X
.
x
X
.
.
.
.
X
.
.
X
.
.
X
.
X
X
x
.
X
X
X
X
X
X
.
.
X
X
X
X
X
X
X
X
X
X
.
X
.
.
.
.
.
.
X
X
X
.
.
x
X
X
.
x
x
x
x
X
x
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
.
X
X
X
.
X
X
X
X
x
X
X
X
X
X
X
X
X
.
x
X
.
X
.
X
X
X
X
X
X
X
X
X
X
x
X
X
X
X
X
X
X
.
X
x
X
X
x
X
X
X
X
X
.
X
X
X
x
x
X
X
x
X
X
X
X
.
X
X
X
X
X
X
x
X
.
X
X
x
X
X
X
X
.
.
x
X
.
.
.
X
X
.
X
X
.
X
x
.
X
X
x
.
X
X
x
X
x
.
.
X
X
.
X
X
x
X
X
X
X
X
X
X
X
X
X
.
X
X
x
.
.
x
X
X
.
x
.
x
x
.
.
x
.
x
.
.
X
X
x
X
.
X = available in all samples for that country
x = available in only some samples for that country
. = not available for that country
BR=Brazil; CL=Chile; CN=China; CO=Colombia; CR=Costa Rica; EC=Ecuador; FR=France; KE=Kenya
MX=Mexico; ZA=South Africa; UG = Uganda; US=United States; VE=Venezuela; VN=Vietnam
Selected Variable Availability -- HOUSEHOLD
Variable Name
Region
State/province
District/county/municip
Metropolitan area
Urban-rural status
Electricity
Water
Sewage
Toilet
Home ownership
BR
CL
CN
CO CR
EC
FR
X
X
X
X
X
X
X
.
X
X
KE MX
ZA
UG
US
VE
VN
.
.
X
X
X
.
.
X
.
X
X
X
X
.
X
X
X
X
X
X
X
X
X
X
X
.
X
X
X
X
X
X
X
.
.
X
.
.
.
.
.
.
.
X
.
.
X
X
.
X
X
x
X
X
X
X
X
x
x
X
X
X
.
x
X
X
x
X
x
X
X
.
X
X
X
X
.
x
X
X
.
X
X
X
X
X
X
X
X
X
.
x
X
x
x
X
X
x
.
X
X
X
.
X
.
x
X
X
.
X
X
X
.
x
X
X
X
X
.
x
X
X
x
X
X
X
X
X
X
x
X = available in all samples for that country
x = available in only some samples for that country
. = not available for that country
BR=Brazil; CL=Chile; CN=China; CO=Colombia; CR=Costa Rica; EC=Ecuador; FR=France; KE=Kenya
MX=Mexico; ZA=South Africa; UG = Uganda; US=United States; VE=Venezuela; VN=Vietnam
What Are Microdata?
Individual-level data
every record represents a separate person
all of their individual characteristics are recorded
users must manipulate the data themselves
•
•
•
Different from aggregate/summary/tabular data
•
•
a count of persons by municipality
an employment status table by sex from a
published census volume
Kenya 1999 Census Questionnaire
Raw Census Microdata from IPUMS
H9101000000030982025200090000001324101001000071000000008800000000
P9101000000030102520252120000000002109730111020010103212001182000
P9101000000030202520252120000000001109730111020020103622001181080
P9101000000030302520252120201010100009000199996030101122006990000
P9101000000030402520252120201010100009000199996030100912006990000
P9101000000030502520252120201010100009000199996030100712006990000
P9101000000030602520252120201010100009000199996030100612006990000
P9101000000030702520252120201010100009000199996030100422006990000
P9101000000030802520252120201010100009000199996030100322006990000
P9101000000030902520252120201010100009000199996030100222006990000
H9101000000040360025200030000001324101001000071000000008800000000
P9101000000040102520252120000000002103110101010010103011001021000
P9101000000040202520252120000000001103110101010020102121001021020
P9101000000040302520252120201010100003000199990030100111006990000
H9101000000050338025200030000001324101001000071000000008800000000
P9101000000050102520251200000000021031001070700101045120010520000
P9101000000050202520252120000000001103100107070020102522001051020
P9101000000050302520252120201010100003000199990030100722006990000
H9101000000060416025200040000001324101001000071000000008800000000
P9101000000060102520252120000000002104200119150010104912001192000
P9101000000060202520252120000000001104200119150020104922001192040
P9101000000060302520252120201010100004000199991030101922006990000
P9101000000060402520252120201010100004000199991030101522006990000
Age
Sex
Relationship
Race
Birthplace
Mother’s birthplace
Occupation
H910000240000000088001001000220100
P910000020101032120010010010011504
P910000010201036220010010010011999
P910201000301011220060010010011999
P910201000301009120060010010011999
P910201000301007120060010010011999
P910201000301006120060010010011999
P910201000301004220060010010011999
P910201000301003220060010010011999
P910201000301002220060010010011999
H910000240000000088001001000110100
P910000020101030110010290510511310
P910000010201021210010290290171999
P910201000301001110060010290291999
H910000240000000088001001000220100
P910000020101045120010010010011100
P910000010201025220010010010011820
P910201000301007220060010010011999
H910000240000000088001001000220100
P910000020101049120010010010011100
P910000010201049220010010010011820
P910201000301019220060010010011820
P910201000301015220060010010012820
IPUMS Data
Structure
Household record
(shaded) followed
by a person record
for each member
of the household
For each type of
record, columns
correspond to
specific variables
The Advantages of Microdata
 Combination of all of a person’s characteristics
 Characteristics of everyone with whom a person lived
 Freedom to make any table you need
 Freedom to make models examining multivariate
relationships
 Basically, you are only limited by the questions asked
in the particular census
IPUMS Overview
1. What is the IPUMS
2. Harmonization
3. Additional Data Enhancements
4. Access
5. Strengths and Limitations
6. Research examples
Translation Table – Marital Status
(IPUMS-International)
MARST
Marital Status
China
1982
code
label
CN82A403
100
SINGLE/NEVER MARRIED
200
MARRIED/IN UNION
210
Married (not specified)
Colombia
1973
CO73A411
Kenya
1989
Mexico
1970
KN89A413
MX70A402
US90A425
1=never married 4=single
1=single
9=single
6=never married
2=married
3=monogamous
2=married
1=married
211
Civil
3=only civil
212
Religious
4=only religious
213
Civil and religious
2=civil and religious
214
Polygamous
220
300
310
3=polygamous
Consensual union
1=free union
SEPARATED/DIVORCED
Legally separated
322
De facto separated
5=free union
3=sep. or divorced
Separated
321
U.S.A.
1990
6=separated
8=separated
3=separated
5=divorced
7=divorced
4=divorced
5=widowed
330
Divorced
4=divorced
400
WIDOWED
3=widowed
5=widowed
4=widowed
6=widowed
999
UNKNOWN/MISSING
0=missing
6=unknown
B=blank
1=unknown
Translation Table – Marital Status
General Codes
MARST
Marital Status
gen
code
label
CN82A403
1
100
SINGLE/NEVER MARRIED
2
200
MARRIED/IN UNION
210
KN89A413
MX70A402
US90A425
1=never married 4=single
1=single
9=single
6=never married
2=married
3=monogamous
2=married
1=married
211
Civil
3=only civil
212
Religious
4=only religious
213
Civil and religious
2=civil and religious
214
Polygamous
220
3
Married (not specified)
CO73A411
300
310
3=polygamous
Consensual union
1=free union
SEPARATED/DIVORCED
3=sep. or divorced
Separated
321
Legally separated
322
De facto separated
5=free union
6=separated
8=separated
3=separated
5=divorced
7=divorced
4=divorced
5=widowed
330
Divorced
4=divorced
4
400
WIDOWED
3=widowed
5=widowed
4=widowed
6=widowed
9
999
UNKNOWN/MISSING
0=missing
6=unknown
B=blank
1=unknown
Variable Description: Literacy
(International)
IPUMS Overview
1. What is the IPUMS
2. Harmonization
3. Additional Data Enhancements
4. Access
5. Strengths and Limitations
6. Research examples
IPUMS “Pointer” Variables
(Simple household)
Pernum
Relate
Age
Sex
Marst
Chborn
Spouse’s
Location
1
head
46
male
married
n/a
2
2
spouse
44
female
married
3
1
3
aunt
77
female
widow
7
0
4
child
15
female
single
0
0
5
child
13
female
single
n/a
0
6
child
11
male
single
n/a
0
Pernum
Relate
Age
Sex
Marst
Chborn
Mother’s
Location
Father’s
Location
1
head
46
male
married
n/a
0
0
2
spouse
44
female
married
3
0
0
3
aunt
77
female
widow
7
0
0
4
child
15
female
single
0
2
1
5
child
13
female
single
n/a
2
1
6
child
11
male
single
n/a
2
1
IPUMS “Pointer” Variables
(Complex household)
Pernum
Relationship
Age
Sex
Marst
Chborn
Spouse’s
Location
Mother’s
Location
Father’s
Location
1
head
53
female
separated
6
0
0
0
2
child
28
male
single
n/a
0
1
0
3
child
22
male
single
n/a
0
1
0
4
child
21
male
single
n/a
0
1
0
5
child
25
female
married
2
6
1
0
6
child-in-law
28
male
married
n/a
5
0
0
7
grandchild
3
male
single
n/a
0
5
6
8
grandchild
1
male
single
n/a
0
5
6
9
non-relative
32
female
separated
2
0
0
0
10
non-relative
10
male
single
n/a
0
9
0
11
non-relative
5
female
single
n/a
0
9
0
IPUMS Overview
1. What is the IPUMS
2. Harmonization
3. Additional Data Enhancements
4. Access
5. Strengths and Limitations
6. Dissemination
IPUMS Access
•
Restricted access
•
Scholarly and educational purposes
•
Conditions of use: key is not to redistribute
•
Serious vetting
IPUMS Overview
1. What is the IPUMS
2. Harmonization
3. Additional Data Enhancements
4. Access
5. Strengths and Limitations
6. Research examples
4 Key Strengths of the
Census Microdata Samples
• Large
More cases than any comparable datasets
Enable study of relatively small populations
• National in scope
Results not subject to local peculiarities
Provide context for local studies
• Temporal depth
Provide historical perspective
• Microdata
Can make your own tabulations
Apply multivariate techniques
Limitations of the Microdata Samples
Confidentiality
• Samples
Too small to answer some questions
• Geography
20,000 population or larger
• Sensitive variables, swapping, etc
Other Issues and Limitations
• Not annual
Any historical analysis will have gaps
• Cross-sectional data
Not longitudinal
• Very large extracts
• Need knowledge of a statistical package
• User burden
Information overload; culturally specific knowledge
IPUMS Overview
1. What is the IPUMS
2. Harmonization
3. Additional Data Enhancements
4. Users and Access
5. Strengths and Limitations
6. Research examples
IPUMS-International Research Topics
• Child labor outside the household in Mexico and Colombia
• Effect of NAFTA on educational attainment and school
enrollment by region within Mexico
• Concentration of mortality within families in Kenya
• Life course patterns of co-residence among Mexicans in
Mexico, Mexicans in the U.S., and Mexican Americans
• Brain drain from developing countries
• How language diversity is affected by migration and economic
factors
Married Female Labor Force Participation in Latin America
(age 18 to 65)
50
45
40
Brazil
Percent in Labor Force
35
30
Colombia
25
Venezuela
20
15
Chile
10
Mexico
Costa Rica
Ecuador
5
0
1960
1965
1970
1975
1980
1985
1990
1995
2000
2005
Married Female Labor Force Participation:
Latin America and U.S. (age 18 to 65)
70
60
Percent in Labor Force
50
40
United
States
30
20
Latin
America
10
0
1920
1930
1940
1950
1960
1970
1980
1990
2000
2010
Married Female Labor Force Participation:
Latin America and U.S. (age 18 to 65)
70
United
States
60
Percent in Labor Force
50
Brazil
40
Compare Latin
America to U.S.
40 years ago
Colombia
30
Venezuela
20
Ecuador
Chile
Costa Rica
10
0
1920
Mexico
1930
1940
1950
1960
1970
1980
1990
2000
2010
Married Female Labor Force Participation:
Mexican-born Women, 1970-2000
70
60
Mexican-born Women
in United States
Percent in Labor Force
50
40
30
Women in
Mexico
20
10
0
1970
1975
1980
1985
1990
1995
2000
Males
1963
1973
1984
2000
Females
Persons age 16 to 65.
United
United
United
United
United
1962
1968
1975
1982
1990
States 1960
States 1970
States 1980
States 1990
States 2000
France
France
France
France
France
South Africa 1996
South Africa 2001
Kenya 1989
Kenya 1999
Vietnam 1989
Vietnam 1999
China 1982
Venezuela 1971
Venezuela 1981
Venezuela 1990
Mexico 1970
Mexico 1990
Mexico 2000
Ecuador 1962
Ecuador 1974
Ecuador 1982
Ecuador 1990
Ecuador 2001
Rica
Rica
Rica
Rica
1964
1973
1985
1993
Colombia
Colombia
Colombia
Colombia
Costa
Costa
Costa
Costa
1960
1970
1982
1992
2002
Chile
Chile
Chile
Chile
Chile
Brazil 1960
Brazil 1970
Brazil 1980
Brazil 1991
Brazil 2000
Percent of Working-Age Population
Working-Age Population in the Labor Force, by Sex
100
90
80
70
60
50
40
30
20
10
0
Persons with Completed Secondary Education:
National Populations Versus Migrants to the United States
100
90
80
70
Percent
60
50
40
30
20
10
0
Brazil
Chile
Costa Rica
Ecuador
In home country, ca. 2000
Mexico
Vietnam
Migrants to U.S. 1995-2000
Kenya
South Africa
Population Residing with an Elderly Person
30
20
15
10
5
Brazil
Colombia
Mexico
Kenya
Elderly persons (age 65+)
S Africa
China
Vietnam
France
Non-elderly residing with an elderly person
2000
1990
1980
1970
1960
1990
1982
1975
1968
1962
1999
1989
1982
2001
1996
1999
1989
2000
1990
1970
1993
1985
1973
2000
1991
1980
1970
0
1960
Percent of total population
25
United States
End
[email protected]