Census in Global Perspective

Download Report

Transcript Census in Global Perspective

The census in global perspective and the
coming census microdata revolution
***
Robert McCaa & Steven Ruggles
Minnesota Population Center
http://www.ipums.org
IPUMS International
funded by
National Science Foundation
Nordic Demography Symposium, Tjøme 2001
1
Subtext:
Why should Nordic countries participate in
a project to preserve the world’s census
microdata and help make them usable?
Longest historical series of census microdata in the
world
Cross-national research on a global scale requires
representation of all cultural regions
Intriguing demographic, historical laboratory
Large pool of scientific talent with global concerns
Persisting cultural, scientific ties with Minnesota
(would, for example, U. of Texas be as interested?)
Nordic Demography Symposium, Tjøme 2001
2
Globalization of the census
& the coming census microdata revolution
» 1. Introduction: census & census microdata
» 2. The population census goes global
»
»
»
»
coverage, periodicity, and content
3. Liberating census microdata: preservation,
anonymization, integration, & dissemination
4. Statistical confidentiality and census samples:
a 36 year-long perfect record
5. International norms of statistical confidentiality
6. Harmonizing and disseminating scientifically
anonymized census samples: IPUMSi
Nordic Demography Symposium, Tjøme 2001
3
1. Introduction
The census: what is it?
Census microdata: what are they?
How can they be made usable?
Why should we care?
Nordic Demography Symposium, Tjøme 2001
4
16th c. “census” of Mexico (Nahuatl, 1530s).
“Here is the home of one...”
(from Museum of Antropology, Mexico City)
original ms.
transcribed
translated
no.
263
264
265
266
267
268
269
270
271
name
Cuilol
Xilotl
Matapach
Ilhuicacihuatl
Xilotl
Matlalihuitl
Magdalena Ollacatl
Necahual
Coatl
householder
household
sex
baptized relationship to head age
mari
male
not
head
x
marr
female not
spouse
x
wife
male
not
child
7 years old
x
female not
child
born last year x
female x
mother
x
wido
male
not
uncle
x
marr
female not
spouse of uncle
x
marr
female not
sister-in-law of uncle
x
wido
male
not
child of sister-in-law of uncle
15 years old x
Nordic
Demography
Symposium,
2001 he delivers one quarter-len
5
Here
is his
tribute.
EveryTjøme
80 days
Here there are eight [sic, nine] included in one house.
digitized
16th c. “census” of Mexico (Nahuatl, 1530s).
“Here is the home of one...”
(from Museum of Antropology, Mexico City)
original ms.
transcribed
translated
no.
263
264
265
266
267
268
269
270
271
name
Cuilol
Xilotl
Matapach
Ilhuicacihuatl
Xilotl
Matlalihuitl
Magdalena Ollacatl
Necahual
Coatl
householder
household
sex
baptized relationship to head age
mari
male
not
head
x
marr
female not
spouse
x
wife
male
not
child
7 years old
x
female not
child
born last year x
female x
mother
x
wido
male
not
uncle
x
marr
female not
spouse of uncle
x
marr
female not
sister-in-law of uncle
x
wido
male
not
child of sister-in-law of uncle
15 years old x
Nordic
Demography
Symposium,
2001 he delivers one quarter-len
6
Here
is his
tribute.
EveryTjøme
80 days
Here there are eight [sic, nine] included in one house.
digitized
When is a census, a census? Goyer (1986):
1. National legal authority
5. Individual enumeration
2. Defined enumeration area 6. Periodic enumeration
3. Complete coverage
7. Publication of results
4. Simultaneous enumeration 8. Dissemination of results
An Aztec extended family
5 conjugal units, 4 generations,
3 married brothers
Simply
an
old widow
1530
Married
Male
Married
Head of
house
Married
female
Married
Male
(1 yr. Ago)
Married
female
Married
Male
(1 yr. Ago)
Married
female
Married
female
Married
Male
Married
female
Female,
20, not yet
married
Male 10
years old,
not
married
Nordic Demography Symposium, Tjøme 2001
7
450 years later: An example of a
patrilateral household from rural Morelos
5 conjugal unions, 3 generations
Married
head, 50
Son, 15
1990
Married,
48
daughter
10
Son, 22
free
union
Daughtr
5
Free
union, 21
Son, 2
Daughtr,
22
Daughtr,
months
old
Free
Unión
Union,
libre,
25
25
años
Daughtr,
14, free
union
Free
union, 29
Free
union, 19
Free
union, 16
Daughtr,
2
Nordic Demography Symposium, Tjøme 2001
8
(not kin)
Examples to percentages:
Have there been changes in 4 1/2 centures?
1540: Predomina la familia extensa
Familias nahuas de pueblos rurales de Morelos
13% jefe
Head
13% conyugue
spouse
24% hijos
child
kin
50% familiar
non-kin
1% sinparen
20% jefe
Head
16% conyugue
spouse
child
54% hijos
kin
6% familiar
non-kin
4% sinparen
Familias campestres del estado de Morelos
Nordic Demography Symposium,
Tjøme 2001 nuclear
1990: Predomina
la familia
9
Census microdata
of the late 20th century:
What are they?
Who bears preservation responsibility?
Who will make them usable?
12100102600700720000011210000104
Census microdata:
22200202600700720000011210000104
32300100600700720000012123000000
Censuses are costly
42300200400700000000000000000000
Public goods should be democratized
52300200200700000000000000000000
Where microdata are available, they are used
62300200000700000000000000000000
Nordic Demography Symposium, Tjøme 2001
10
Globalization of the census
& the coming census microdata revolution
» 1. Introduction: census & census microdata
» 2. The population census goes global
»
»
»
»
coverage, periodicity, and content
3. Liberating census microdata: preservation,
anonymization, integration, & dissemination
4. Statistical confidentiality and census samples:
a 36 year-long perfect record
5. International norms of statistical confidentiality
6. Harmonizing and disseminating scientifically
anonymized census samples: the case of IPUMSi
Nordic Demography Symposium, Tjøme 2001
11
2. The population census goes global.
Coverage becomes universal
(thanks to A.N. Kiær, Statistics Norway,
who promoted globalization of census at beginning of 20th c.)
Content becomes uniform
Decennial censuses become the norm
Nordic Demography Symposium, Tjøme 2001
12
Population censuses became universal
in the 20th century.
Will census microdata ... in the 21st?
Table 1. Population censuses became universal in the last half of the 20 th century;
will census microdata become universal in the first half of the 21 st?
Census Round
No. of Countries % of World's Population No. of Countries Offering Census
(centered on 0) Conducting a Census
Enumerated
Microdata Samples to Researchers
1950s
86
79
1
1960s
117
91
27
1970s
124
71
44
1980s
135
94
54
1990s
134
94
54
2000s
146
97
?
153 countries with 1 million + pop. in 2000
2000 round figures are provisional
Nordic Demography Symposium, Tjøme 2001
13
Content ... increasingly uniform, principal
source on population information.
social variables:
Table 2. Topics in Censuses of 23 selected Asian Countries, 1950-1980
Census Topic
Countries Enumerated
Social
Sex
Age
Marital Status
Family relationship
Language
Citizenship
Ethnicity/race
Religion
1950
1960
1970
1980
16
21
20
19
16
16
15
12
8
10
9
10
21
21
19
18
9
12
11
15
20
20
20
20
9
14
10
14
19
19
18
17
8
12
9
10
Nordic Demography Symposium, Tjøme 2001
14
Content ... increasingly uniform
education and migration variables:
Table 2. Topics in Censuses of 23 selected Asian Countries, 1950-1980
Census Topic
Countries
Enumerated
Education
Social
Literacy
Sex of schooling
Years
Age
School
attendance
Marital Status
Educational
qualifications
Family
relationship
Migration
Language
Birthplace
Citizenship
Residence
Ethnicity/race
Duration
of residence
Religion
Prior
residence
Urban-rural
1950
1960
1970
1980
16
13
16
14
16
8
15
2
12
8
10
10
11
9
5
10
2
11
21
18
21
21
9
19
5
18
9
15
12
14
11
7
15
5
18
20
19
20
20
13
20
9
20
9
14
14
16
10
15
14
15
18
19
14
19
19
14
18
7
17
8
10
12
15
9
10
10
11
17
Nordic Demography Symposium, Tjøme 2001
15
Content ... increasingly uniform
demographic and economic variables:
Table 2. Topics in Censuses of 23 selected Asian Countries, 1950-1980
Census Topic
Countries
Enumerated
Demographic
Socialeverborn
Children
Sex
Children
living
AgeEconomic
Marital Status
Activity
status
Occupation
Family relationship
Language
Industry
Citizenship status
Employment
Ethnicity/race
Housing
Religion
Income
1950
1960
1970
1980
16
6
16
2
16
15
14
14
12
8
12
10
11
9
6
10
3
21
12
21
7
21
19
17
20
18
9
19
12
19
11
14
15
3
20
17
20
12
20
20
18
19
20
9
19
14
19
10
15
14
4
19
15
19
13
19
18
18
17
8
17
12
14
9
15
10
2
Nordic Demography Symposium, Tjøme 2001
16
Decennial censuses are the rule (1945-2004).
of 153 countries with 1 million + pop
totaling 6 billion people in 2000:
» At least one census per decade:
66 countries
50% of world’s population
» Missed a single decennial enumeration:
43 countries
38% of world’s population
» Missed 2 or 3 enumerations:
32 countries
10% pop.
» Fewer than 3 enumerations:
12 countries
2% of pop.
Nordic Demography Symposium, Tjøme 2001
17
On a millennial scale,
censuses and census microdata survive
for only a short, but significant period
Nordic Demography Symposium, Tjøme 2001
18
Globalization of the census
& the coming census microdata revolution
» 1. Introduction: census & census microdata
» 2. The population census goes global
»
»
»
»
coverage, periodicity, and content
3. Liberating census microdata: preservation,
anonymization, integration, & dissemination
4. Statistical confidentiality and census samples:
a 36 year-long perfect record
5. International norms of statistical confidentiality
6. Harmonizing and disseminating scientifically
anonymized census samples: the case of IPUMSi
Nordic Demography Symposium, Tjøme 2001
19
…official statistics that meet the test of
practical utility are to be compiled and
made available on an impartial basis by
official statistical agencies to honor
citizens’ entitlement to public information.
-- UN Statistical Commission, 1994
Nordic Demography Symposium, Tjøme 2001
20
IPUMSi helps five ways:
» 1. Inventory the world’s census microdata
» 2. Preserve endangered microdata and documentation
***
» 3. Anonymize census microdata to preserve statistical
confidentiality, using highest standards (Stat. Nether.)
» 4. Integrate datasets of selected countries using UN,
Eurostat and other standards
» 5. Disseminate database free with complete copies to
all partners
Integrated Public Use Microdata Series International
Nordic Demography Symposium, Tjøme 2001
21
» Microdata...for any population or
I
N
I
V
P E »
U N
T
M
O
Si R
I
E
S »
administrative division:
Nation, province, district, city, ethnic
group, etc.
Example: Latin America,
- 20 countries
- 67 censuses inventoried
- 1% - 100% sample densities
- 100,000 to 150 million cases
19th century:
2 censuses
1960s:
14
1970s: 17
1980s:
16
1990s: 17
Found: complete census data for
Colombia
1973 and 16 other countries 22
Nordic Demography Symposium, Tjøme 2001
I
P
U
M
Si
P
R
E
S
E
R
V
E
S
UN Demographic Center for Latin America
(CELADE, Santiago, Chile)
~3000 microdata tapes to be preserved
and metadata (documentation)
Nordic Demography Symposium, Tjøme 2001
23
Preserve against accident, deterioration and
technological obsolescence
» Microdata:
- transfer to stable media
- use standard data storage protocols
- entrust copies with at least two depositories
» Metadata: collect, catalogue, and reproduce
- Enumeration forms (preserve all versions used)
- Enumerator and data processing instructions
- Codebooks (photocopies and scanned images)
- Technical studies, evaluations, reports
UN Stat. Div.: entire archive deposited, to be scanned
Nordic Demography Symposium, Tjøme 2001
24
Globalization of the census
& the coming census microdata revolution
» 1. Introduction: census & census microdata
» 2. The population census goes global
»
»
»
»
coverage, periodicity, and content
3. Liberating census microdata: preservation,
anonymization, integration, & dissemination
4. Statistical confidentiality and census samples:
a 36 year-long perfect record
5. International norms of statistical confidentiality
6. Harmonizing and disseminating scientifically
anonymized census samples: the case of IPUMSi
Nordic Demography Symposium, Tjøme 2001
25
How anonymized census samples
became a standard statistical product:
» US Census Bureau:
- 1960 census 0.1% “public use microdata series”
- 1970 census: six 1% samples harmonized with 1960
- 1984: 1940, 1950 1% samples
- 1980, 1990 samples varying densities, contents
» CELADE: Latin America
- 1960s: 16 countries, densities 1-5%
- 1970s: 19 countries, 1-10%
Nordic Demography Symposium, Tjøme 2001
26
How anonymized census samples
became a standard statistical product:
» Canada:
- 1971, 1976, 1981, 1986, 1991, 1996: varying designs,
densities
- 1996: Data Liberation Initiative led to an explosion in
of usage in research and teaching
» UK:
- 1991: 2% individuals, 0.5% households
hundreds of publications, thousands of users
- 2001: double the densities because confidentiality
assessments were too conservative.
Nordic Demography Symposium, Tjøme 2001
27
Risk assessment
of statistical confidentiality:
» Take into account error, coding variability and changing
of personal characteristics in time
» Dale and Elliott, JRSS-A (forthcoming):
“For a user of an outside database, attempting this sort
of match with no opportunity for verification would
prove fruitless. In the first place, the small degree of
expected overlap would be a considerable deterrent to an
intruder. However, if a match between the two files was
attempted the large number of apparent matches would
be highly confusing as an intruder would have no way of
checking correct identification.”
Nordic Demography Symposium, Tjøme 2001
28
Statistical confidentiality in the USA:
a brief history
» Before 1954:
- 1850: “exclusively for the use of the government, and
not to be used...to the gratification of curiosity...”
- 1920s: deny access to data on individuals
- 1942: refused to supply War Dept. w/ addresses of
Japanese-Americans
» after 1954:
- census microdata do not reveal identities of
individuals
- basic geographical identifiers, low sample densities,
masking, swapping, top-coding, re-coding
Demography Symposium, Tjøme 2001
29
» In practice, not aNordic
single
breach or allegation of a breach!
Heightened concerns about confidentiality
in USA
» Assault on privacy by businesses
» Distrust of “government”
» Never a question of use of census microdata. Yet must
avoid any possible perception of mis-use to retain
confidence and cooperation of citizens.
» Pro-active strategy:
- Publicize confidentiality safe-guards
- Offer a variety of microdata products: higher risks,
higher security
- Data enclaves: expensive, low usage, exceedingly
detailed microdata
Nordic Demography Symposium, Tjøme 2001
30
Globalization of the census
& the coming census microdata revolution
» 1. Introduction: census & census microdata
» 2. The population census goes global
»
»
»
»
coverage, periodicity, and content
3. Liberating census microdata: preservation,
anonymization, integration, & dissemination
4. Statistical confidentiality and census samples:
a 36 year-long perfect record
5. International norms of statistical confidentiality
6. Harmonizing and disseminating scientifically
anonymized census samples: the case of IPUMSi
Nordic Demography Symposium, Tjøme 2001
31
‘statistical confidentiality’ shall mean the
protection of data related to single
statistical units which are obtained
directly for statistical purposes or
indirectly from administrative or other
sources against any breach of the right
to confidentiality. It implies the
prevention of non-statistical utilization
of the data obtained and unlawful
disclosure.
--COUNCIL REGULATION (EC)
No 322/97 of 17 February 1997
Nordic Demography Symposium, Tjøme 2001
32
Statistical confidentiality standards in
Eurostat Countries
(* = in IPUMSi consortium)
» Norway: Statistics Norway is prohibited to publish or
disclose data from which information about individual
persons or firms can be derived. Researchers may be
given access to such information under strict rules and
conditions. Guidelines provided by the Norwegian Data
Inspectorate form the framework for internal
management of data security.
» Other countries with strict provisions: *Austria, Canada,
Denmark, Finland, *France, Germany, Ireland,
Netherlands, Sweden
Nordic Demography Symposium, Tjøme 2001
33
Anonymized census microdata sample
availability for European countries
(* = in IPUMSi consortium, * = negotiating)
» 15 countries available via PAU, 1990 round
(3 in IPUMSi), :
» Belgium, Czech Republic, Estonia,
Finland, *Hungary, *Italy, Latvia,
Lithuania, Norway, Poland, *Spain,
Sweden, Switzerland, Turkey, *UK
» 11 countries not available via PAU (2 in IPUMSi):
» *Austria, Croatia, Denmark, *France,
Germany, Iceland, Ireland, Netherlands,
Portugal, Slovak Republic, Slovenia
Nordic Demography Symposium, Tjøme 2001
34
EUROSTAT statistical anonymity standards
(Thorogood, 1999)
--all accepted by IPUMSi
»
»
»
»
»
»
1. small sample size
2. limited geographical detail
3. top and bottom coding of unique categories
4. signed non-disclosure agreement
5. prohibit redistribution of datasets to third parties
6. prohibit attempts to identify individuals or the making
any claim to that effect
» 7. require users to provide copies of publications
Nordic Demography Symposium, Tjøme 2001
35
EUROSTAT statistical anonymity standards
(Thorogood, 1999)
--all accepted by IPUMSi and more
»
»
»
»
»
8. Age (constructed, where necessary)
9. Never identify date of birth
10. Never identify place of birth
11. Migration: timing and place not identified in detail
12. Place of residence identified by major civil division
(pop>60k, 120k, 250k, 1 million--national rule)
» 13. Sensitivity analysis of variables by national experts
» 14. Confidentiality assessment by national experts
Nordic Demography Symposium, Tjøme 2001
36
International Monetary Fund’s
General Data Dissemination System
52 countries with uniform standards
» All embrace strict standards of statistical confidentiality
» Prohibit disclosure of information which may identify
individuals or entities
» 37 countries distribute anonymized census microdata
samples
Nordic Demography Symposium, Tjøme 2001
37
Globalization of the census
& the coming census microdata revolution
» 1. Introduction: census & census microdata
» 2. The population census goes global
»
»
»
»
coverage, periodicity, and content
3. Liberating census microdata: preservation,
anonymization, integration, & dissemination
4. Statistical confidentiality and census samples:
a 36 year-long perfect record
5. International norms of statistical confidentiality
6. Harmonizing and disseminating scientifically
anonymized census samples: the case of IPUMSi
Nordic Demography Symposium, Tjøme 2001
38
I
P
U
M
Si
Making the data usable
... and used.
IPUMSi,1999-2004
~20 countries 1850-2000
Nordic Demography Symposium, Tjøme 2001
39
National experts in each country
are contracted to:
I
P
U
M
Si
P
A
Y
S
Assemble microdata and documentation
Develop samples to minimize
confidentiality risks and maximize
robustness
Design national integration plan
census-by-census
concept-by-concept
code-by-code
Write integrated documentation
Nordic Demography Symposium, Tjøme 2001
40
I
N
I T
P E
Census
Standard:UN/Eurostat
U Gdocumentation
compiled
Principles & Recs...
M R for Colombian
microdata
Si A
T
Photos from Colombia integration project,
February-March, 2000:
E
4 experts from DANE (census office)
S
+7 academics (3 universities)
Nordic Demography Symposium, Tjøme 2001
41
IPUMSi integration principles
» 1. Respect absolute anonymity
» 2. Preserve all original data, except adjustments to
insure privacy (top codes blurrings, masking, reordering, etc.)
» 3. Harmonize codes for countries
occupation: ISCO, HISCO (detailed, general)
education: ISCED
“
“
family: IPUMS, etc.
“
“
» 4. Enhance with constructed variables
Nordic Demography Symposium, Tjøme 2001
42
10 projects started
I
I N
P T
E
U G
M R
Si A
T
E
S
USA
1850-1880, 1900-2000
France
1962, 1968, 1975, 1982, 1990
Norway
1801, 1865, 1875, 1900
negotiating:
1960, 1970, 1980, 1990, 2001
Canada
1871, 1881, 1901;
negotiating:
1961-2001;
United Kingdom (1851, 1881), 1991;
negotiating:
1961, 1971, 1981, 2001
Argentina 1869, 1895
Colombia 1964, 1973,1985, 1993, 2003
Vietnam
1989, 1999
Hungary
1970, 1980, 1990, 2000
Nordic Demography Symposium, Tjøme 2001
43
5 projects planned
I
I N
P T
E
U G
M R
Si A
T
E
S
Mexico
1960, 1970, 1980, 1990, 2000
Spain 1981, 1991, 2001
Brazil1960, 1970, 1980, 1991, 2001
China1982, 1990, 2000
Kenya
1989, 1999
3 negotiations underway
Ghana
Italy
Austria
1984, 2000
1981, 1991, 2001
1971, 1981, 1991, 2001
Nordic Demography Symposium, Tjøme 2001
44
7 future possibilities
I
P
U
M
Si
?
?
Country Census microdata
a.
1860, 1870, 1880, 1950, 1960, 1970,
1980, 1990, 2000
b.
1961, 1971, 1981, 1991, 2001
c.
1961, 1971, 1976, 1981, 1986,
1991, 1996
d.
1960, 1965, 1970, 1975, 1980,
1985, 1990, 1995
e.
1960, 1966, 1970, 1975, 1980,
1985, 1990, 1995
f.
1971, 1981, 1991, 2001
g.
1970, 1980, 1990, 2000
Nordic Demography and
Symposium,
45
....Tjøme 2001 ???
A
I N
P O
N
U Y
M M
Si I
Z
E
S
Using the highest standards
currently available:
technical (Statistics Netherlands)
administrative (license agreement)
Imagine a new statistical product:
a scientifically anonymized census
microdata sample made up of unidentifiable
individuals...
Nordic Demography Symposium, Tjøme 2001
46
IPUMSi preserves statistical confidentiality
(in addition to NSO safe-guards):
» 1. Construct small samples
» 2. Suppress geographical detail (minor civil divisions
»
»
»
»
and others with less than 100,000 population), date of
birth, 3-4 digit occupational codes, etc.
3. Blur codes for sensitive variables where identity
might be compromised (income)
4. Top-code income, education, etc.
5. Swap a small fraction of records
6. Assess confidentiality risks for unique records for
all defined geographical areas (“ARGUS”, Statistics
Netherlands) Nordic Demography Symposium, Tjøme 2001
47
Repositories of anonymized census
microdata samples for scientific research
» ICPSR, University of Michigan
» ACAP, University of Pennsylvania
» CELADE, Centro Latino Americano de
»
»
»
»
Demografía, Santiago Chile.
ECE/PAU, Population Affairs Unit, Geneva
Switzerland.
EWC, East-West Center, U. of Hawaii.
IPUMSi, University of Minnesota.
Will others (a Nordic institution?) join the effort?
Nordic Demography Symposium, Tjøme 2001
48
D
I
I S
P S
E
U M
M I
Si N
A
T
E
S
International web-based access
system
End-User license agreement
protects privacy and confidentiality
assures proper use
User selects
countries,
cases,
variables, and
samples--makes cross-national research
possible
Open architecture software and mirror sites
available to all partners
Nordic Demography Symposium, Tjøme 2001
49
Why should Nordic countries
participate now?
Legal and scientific foundations in place:
EUROSTAT, France, Austria, UK, etc.
Project has been underway 18 months of 5 year
project; if resources are required, budget planning
must begin soon.
Historical census microdata projects are well
advanced: 1801, 1865 (100% club), 1875, 1900.
Time to turn to contemporary census microdata
Nordic Demography Symposium, Tjøme 2001
50
additional information at:
http://www.ipums.org
******
Thank you
Nordic Demography Symposium, Tjøme 2001
51
Work plan, part II:
make census microdata usable
» 3. Integrate: March 2000- National partners:
»
-integrate phase I countries using UN/Eurostat
Principles & Recommendations
-help to design prototype
Analyze all concepts, variables and codes of census
schedules for 30 target countries
-help to implement for phase I and II countries
4. Disseminate: -October 2004
- Design international data access engine
- Implement with phase I and II countries
Nordic Demography Symposium, Tjøme 2001
52