www.lisdatacenter.org Joint World Bank-LIS Workshop on database creation and survey harmonization Thursday, June 6, 2013

Download Report

Transcript www.lisdatacenter.org Joint World Bank-LIS Workshop on database creation and survey harmonization Thursday, June 6, 2013

www.lisdatacenter.org
Joint World Bank-LIS Workshop
on database creation and survey harmonization
Thursday, June 6, 2013
LIS: an overview
LIS: Cross-National Data Center
• parent organization
• located in Luxembourg
• independent, chartered non-profit organization
• cross-national, participatory governance
• acquires, harmonizes, and disseminates data for research
• venue for research, conferences, and user training
• staff: approximately 10 persons
LIS Center @ CUNY
• satellite office
• located at the Graduate Center of the City University of New York
• administrative, managerial, development support to parent office
• venue for research, teaching, and graduate student supervision
• staff: approximately 10 persons (mostly part-time PhD students)
History
•
LIS was founded in 1983 by two US academics (Tim Smeeding and Lee
Rainwater) and a team of multi-disciplinary researchers in Europe. It
began as a “study”, which later grew and was institutionalized as “LIS”.
•
For nearly 20 years, LIS was part of a local research institute, CEPS
(Centre d'Etudes de Populations, de Pauvreté et de Politiques SocioEconomiques). In 2002, LIS became an independent non-profit institution.
•
LIS is supported by the Luxembourg government, by the national science
foundations and other funders in many of the participating countries, and
by several supranational organizations
•
We are building a growing partnership with the new University of
Luxembourg.
Our mission
To enable, facilitate, promote, and conduct cross-national
comparative research on socio-economic outcomes and on
the institutional factors that shape those outcomes.
What we do
Step 1. We identify appropriate datasets.
Data must be neutral, reliable, and high-quality.
Step 2. We negotiate with each data provider.
Step 3. We collect, harmonize and document the data.
LIS’ data experts harmonize the data into a common,
cross-national template, and create comprehensive
documentation.
Teresa will discuss
Step 4. We double-check the harmonized data.
Step 5. We make the data available to researchers via remote execution,
and other user-friendly pathways.
Thierry will discuss
LIS and LWS Databases
Luxembourg Income Study Database (LIS)
•
•
•
•
•
•
First and largest available database of harmonized income data, available at the
household and person levels
In existence since 1983
Data mostly start in 1980, some go back to the 1960s (recollected every 3-5 years)
45 countries
205 datasets
Used to study: poverty; income inequality; labor market outcomes; policy effects
Luxembourg Wealth Study Database (LWS)
•
•
•
•
•
•
First available database of harmonized wealth data, available at the household level
In existence since 2007
Data going back to 1994
12 countries
20 datasets (planned expansion underway)
Used to study: household assets, debt, and expenditures; wealth portfolios; policy
effects
Pathways to the data
Remote-execution system
(“LISSY”)
This is the primary means of access; it uses
a software system that was designed
specifically for LIS.
Researchers write programs (in SPSS, SAS,
or Stata) and send them to the LIS server;
results are returned to the researcher, with
an average processing time of under two
minutes.
Two other pathways
to the LIS data
Web-based tabulator (“the WebTab”)
LIS Key Figures (no registration needed)
Current coverage:
62% of world population
84% of world GDP
Current axis of growth: middle-income countries
(now 17 out of 47 countries)
Australia
Denmark
India
Paraguay *
Spain
Austria
Dominican
Republic *
Ireland
Poland
Sweden
Belgium
Egypt *
Israel
Peru
Switzerland
Brazil
Estonia
Italy
Romania
Taiwan
Canada
Finland
Japan
Russia
United Kingdom
Chile *
France
Luxembourg
Serbia *
United States
China
Germany
Mexico
Slovak Republic
Uruguay
Colombia
Greece
Netherlands
Slovenia
Cyprus
Guatemala
Norway
South Africa
Czech Republic
Hungary
Panama *
South Korea
Our leadership
Janet Gornick
Director of LIS | Director of LIS Center (CUNY)
Professor of Political Science and Sociology
Graduate Center, City University of New York.
Markus Jäntti
Research Director of LIS
Professor of Economics, Stockholm University
Tony Atkinson
President of LIS Board
Economist at Nuffield College, Oxford University
Serge Allegrezza
President of LIS Local Advisory Board
Director of Luxembourg National Statistical Office
We are governed by an elected Executive Committee and an international Board,
comprising representatives from our funders and data providers.
LIS’ partners
Our partners include data providers, data users, and funders, in more
than 40 countries …
and in major supranational organizations, including:
Financial contributors:
The World Bank (WB)
The Organization for Economic Cooperation and Development (OECD)
The International Monetary Fund (IMF)
The United Nations Development Program (UNDP)
Dataset exchange; joint research projects; joint fundraising:
The European Central Bank (ECB)
The United Nations Children’s Fund (UNICEF)
EUROMOD
Harvard Population Center
Users, products, services
Thousands of data users - and growing
• remote execution enables use around the world
• free access for students in all countries
• free access for data providers and their staffs
Pedagogical activities
• annual training workshops in Luxembourg
• local workshops
• self-teaching lessons online
Research activities and support
• visiting scholar program
• working paper series (600+)
• research conferences
• edited books (new one coming in July!)
Research
using the LIS and LWS data:
some highlights
LIS provides evidence for
comparative research
on socio-economic outcomes
• assessing income inequality
• measuring poverty
• comparing employment outcomes
• analyzing assets and debt
• researching policy impacts
Assessing Income Inequality
Inequality Across Households
Income inequality in the US is the highest among
25 high-income countries included in the LIS Database.
0.40
0.35
Inequality Indicator: Gini Index
0.30
0.25
0.20
0.15
0.10
0.05
0.00
Source: Luxembourg Income Study Key Figures (publicly available online – www.lisdatacenter.org).
Measuring Poverty - I
Household Poverty Rates
The poverty rate in the US is the highest among
25 high-income countries included in the LIS Database.
Poverty Rate
(50% of median disposable household income)
18
16
14
12
10
8
6
4
2
0
Source: Luxembourg Income Study Key Figures (publicly available online – www.lisdatacenter.org).
Measuring Poverty - II
“Real Income Levels” of Children
US children: the rich are richer, and the poor are poorer.
United States
Norway
100
Switzerland
Switzerland
92
Canada
157
87
146
Sweden
137
137
France
77
Denmark
Finland
76
Finland
131
Belgium
71
France
126
United Kingdom
71
Canada
126
Norway
70
Belgium
126
Australia
69
Netherlands
Germany
68
Germany
Denmark
20
40
60
100
United Kingdom
54
0
103
United States
61
Sweden
114
Australia
63
Netherlands
120
80
100
As Percent of High US Child Income
120
89
0
50
100
150
As Percent of Low US Child Income
200
Source: Timothy Smeeding and Lee Rainwater. 2002. Comparing Living Standards Across Nations: Real Incomes at the Top, the Bottom and the Middle, LIS Working Paper 266.
Comparing Employment Outcomes
Earnings Equality between Women and Men
Earnings equality between working men and women ranks 18th
among 25 high-income countries in the LIS Database.
1.0
Ratio of Women’s Earnings to Men’s Earnings
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Source: Luxembourg Income Study Key Figures (publicly available online – www.lisdatacenter.org).
Analyzing Assets and Debt
Older Women’s Income and Asset Poverty
In the US, 27% of older women are both income poor and asset poor
– a higher share than among older women in several other countries.
100%
90%
31
80%
41
43
36
Neither Income
nor Asset Poor
8
Income Poor,
NOT Asset Poor
43
50
70%
16%
Income
Poor
60%
50%
39%
Income
Poor
4
12
12
18%
Income
Poor
5
13
19%
Income
Poor
15
40%
30%
45%
Asset
Poor
20%
10%
20%
Income
Poor
18
10
Income Poor
AND Asset Poor
5
64%
Asset
Poor
27
26%
Income
Poor
4
55%
Asset
Poor
52
42
52%
Asset
Poor
39%
Asset
Poor
37
34
Italy
Sweden
56%
Asset
Poor
38
Asset Poor,
NOT Income
Poor
18
0%
United States
Finland
Germany
United Kingdom
Source: Gornick, Janet C., et al. 2009. “The Income and Wealth Packages of Older Women in Cross-National Perspective.” Journal of Gerontology: Social Sciences 64B(3): 402-414.
Researching Policy Impacts
Income Inequality and Redistribution
The US government does less than other rich countries to
reduce income inequality.
Reduction in
Gini Index
through taxes
and transfers
Gini Indices:
income before taxes and transfers (upper bars) and after taxes and transfers (lower bars)
United States
23%
Israel
33%
United Kingdom
33%
Australia
34%
Canada
28%
30
9%
30
Taiwan
48
37
52
35
51
34
48
32
Poland
41%
Switzerland
22%
28
Romania
27%
28
Germany
43%
28
Czech Rep.
41%
Sweden
45%
25
Norway
39%
25
Netherlands
36%
25
Finland
36%
25
Denmark
47%
42
33
50
29
36
38
48
44
26
23
Gini index of market income
46
41
39
38
42
Gini index of disposable income
Source: Andrea Brandolini et al, 2007, Inequality in Western Democracies: Cross-Country Differences and Time Changes, LIS Working Paper 458.
Linking LIS Data with Other Data
Income Inequality and Earnings Mobility
Countries with higher levels of income inequality have lower levels of
intergenerational economic mobility.
Income
inequality
(from LIS)
Source: OECD 2008. Growing Unequal: Income Distribution and Poverty in OECD Countries. Paris: OECD.
Harmonisation
Data harmonisation at LIS: an overview
Harmonisation
Data harmonisation at LIS: an overview
The origins of
the LIS data
Harmonisation
Data harmonisation at LIS: an overview
The origins of
the LIS data
Harmonisation
The
harmonisation
process
Data harmonisation at LIS: an overview
The origins of
the LIS data
Harmonisation
The
harmonisation
process
The final output:
LIS data
Harmonisation process in 5 steps
:

Data acquisition
Get the original data and documentation

Opening of the original data
Understand the original data and concepts

Data harmonisation
- Conceptual: map original variables into LIS variables
- Technical: create uniform file structure and variables

Checking of the LIS data
Check final LIS files for consistency

Creation of LIS metadata
Create harmonised user documentation of the LIS files
The challenges of harmonisation
Make comparable original data that are:

from various countries
 different institutional / societal setups

over time
 changes in institutions and original surveys

household / individual level data
 confidentiality issues

from various existing datasets
 output (or ex-post) harmonisation
The challenges of ex-post harmonisation

Different types/purposes of original collection instrument



The concepts used in the original data collection are
different




Different definitions (employment definition)
Different universes and reference periods
Country-specific classifications (education, occupation, industry,
social security benefits)
The level of detail of information collected differs



Survey versus administrative data (coverage and contents)
Cross-sections versus panels (sample selection)
Labor market (e.g., LFS type of survey)
Incomes /wealth (detailed breakdown vs. overall questions)
Different statistical techniques



Different sampling procedures (e.g., oversampling of the rich)
Weighting procedures (self-weighted, sampling weights, etc.)
Treatment of missing values, imputation methods
The challenges of harmonising income data

Income sources included in total household disposable income
(irregular payments, non-cash incomes, imputed rents, non-taxable
incomes, “informal” incomes )

Current versus annual

Net versus gross (or in between...)

Top- and bottom-coding

Level of detail (e.g., total pensions) and different aggregation (e.g.
pensions by type of system versus by function)

Classification of incomes:


Public versus private
Social insurance versus universal versus social assistance
systems
The challenges of harmonising data from
middle income countries

Urban versus rural (sample composition, population coverage)

Household membership and treatment of incomes (live-in domestic servants,
family members temporarily absent)

Complex households (multigenerational households, definition of head, polygamy)

Employment definition and labour market characteristics (informal employment,
child labour, multiple jobs, status in employment)

Education (attended versus completed, highest level versus highest qualification)

Enlargement of income concept to in-kind incomes (consumption from own
production, in-kind individual public goods, subsidies)

Classification of income:



Employer-provided pensions and benefits (labour income, social security)
Social insurance versus assistance versus universal benefits)
Treatment of taxes
LIS golden rules for harmonisation

Set clear definitions for LIS variables



Complement ease of use with flexibility of use



Maximise comparability by setting clear definitions for each
variable (and trying to stick to them as much as possible)
Document very well any deviation from the general definition
Enhance user-friendliness by providing fully standardised
variables (standard variables, recodes, dummies, aggregate
variables)
Allow users the flexibility to create other concepts by leaving a
large amount of detailed information
Adapt the LIS template to the changing environment (over
time and space)


The 2011 template
Backwards rerun
Overall guiding principle: COMPARABILITY
Remote Execution System
Primary Pathway
Output
Programming
Any advanced
statistics
LISSY System
Cross-national
descriptive tables
Web Tabulator
Ready-made
indicators
Key Figures
Publicly available
Accessibility
Researchers only
Registration required
The LISSY system
Remote Execution System (Version 8)
• Fully automated, running 24 hours/day and 7 days/week
• Researchers analyse microdata at their own place of work
• Statistical programs (e.g., Stata, R) automatically processed. Outcomes
automatically sent back
Restricted to social science research purposes only
• Micro-databases cannot be downloaded and no direct access to the
data is permitted
• Users must register with LIS. LIS grants access to databases for a
limited time period (1 year) renewable annually
Over 4,500 users from 55 countries ever registered
In 2012, 1015 applications (new and renewed)
Security and confidentiality
Working with LISSY
• Write, submit and view requests
• Track status of job requests
• Access and manage history of all jobs you ever submitted
Data providers’ legal constraints
Researchers’ needs
Technical implementation
55,000 jobs per year to monitor
• Security settings defined for an automatic scan each incoming request
• Suspicious jobs are sent to a review queue for a manual review
• All incoming jobs and outputs stored allowing to trace back researchers’
job history
Ancillary support services
Extensive documentation is available on LIS website
• Detailed information on original surveys, LIS variables’ content and
availability, etc… allowing users to understand the context in which LIS
outcomes should be analysed
• Information on how to access to and work with micro-data:
– Data accreditation (access, confidentiality rules…)
– Data access system (how-to and FAQ sections)
– Learning materials (self-teaching packages …)
Support
• Support facilities as a mean to improve researchers’ ability to work with
LISSY and to reduce risks of breaching confidentiality rules
• User support (500 emails per year) and training sessions through
workshops
Challenges still to face
• Challenges to face include revising the LIS databases’ documentation
system by supplying a new metadata system that will allow LIS users
to create tailored documentation extracts fitted to their individual needs
• The key objective to work on: constantly adjusting the microdata
access services to fulfill researchers’ needs while maintaining the same
level of security and communication
Ideas for afternoon discussion
Possible collaborative activities:
• Exchange of information and expertise
regarding dataset selection/acquisition;
harmonisation; micro-simulation/imputation;
design and construction of metadata (etc.)
• Joint data harmonisation opportunities?
• Joint research opportunities?
• Joint fundraising opportunities?
• Any other possibilities that arise!
Thank You
Janet Gornick, Teresa Munzi, Thierry Kruten
www.lisdatacenter.org