Wie Eltern ihr Erwerbsleben arrangieren und das Wohlbefinden von

Download Report

Transcript Wie Eltern ihr Erwerbsleben arrangieren und das Wohlbefinden von

Working with EU-SILC using the hierarchical data
structure, matching & aggregating data
Practical computing session I – Part 2
Heike Wirth
GESIS – Leibniz Institut für Sozialwissenschaften
DwB-Training Cource on EU-SILC , February 13-15, 2013
Romanian Social Data Archive at the Departement of Sociology
University of Bucharest, Romania
Introduction
• EU-SILC data has a hierarchical structure
•
•
•
2
more than one level of analysis is possible
household & individual levels are represented by separate files
data are stored in multiple data files
Example of household level data
Example 1: Household
record Year of Country HH-ID Dwelling type
#
survey
1
2
…
1500
1501
…
HB010
2010
2010
HB020
AT
AT
2010
2010
…
RO
RO
…
Total
Ability to make ends
disposable
meet
HHLD
Income
HB030 HH010
HY020
HS120
1
apartment or flat in 15,271 with great difficulty
2
detached house
30,081
fairly easily
1
2
…
detached house
detached house
…
2,243
2,409
…
1 observation = 1 Household
Please note: HHLD-ID does not differentiate between countries
To be on the safe side use HHLD-ID with country & year of survey
3
fairly easily
with difficulty
…
….
…
…
Example of individual level data
Example 2: Individual data
record
#
Year of Country HH-ID Personsurvey
ID
Marital status
1
2
3
4
PB010
2010
2010
2010
2010
PB020
AT
AT
AT
AT
PX030
1
1
1
1
PB030
11
12
13
14
PB190
married
married
never married
never married
30001
30002
30003
…
2010
2010
2010
…
RO
RO
RO
…
1
1
1
11
12
13
…
married
married
never married
…
1 observation = 1 Person
Person-ID sequential within household
4
Gross
Highest ISCED Level
monthly
attained
earnings
PY0200G
PE040
3500
(upper) secondary
1400
lower secondary
1450
(upper) secondary
2307
lower secondary
1500
750
250
…
(upper) secondary
lower secondary
(upper) secondary
…
Working with this kind of data, requires
•
Decision on the appropriate unit of analysis for your research
question, e.g.
•
research interest in households or persons?


•
5
% of households /persons/men/women/children who live in poverty?
% of households with only 1 person or % of persons who live alone?
Knowledge of procedures for manipulating the data
Types of Matching
• One-to-one matching
•
•
Household Register to Household Data;
Personal Register to Personal Data
• One-to-many matching
•
Household variables to Individual data
• Many-to-one matching (‘aggregation’)
•
6
e.g. adding information from the individual data to the
household data
EU-SILC – Types of matching
HouseholdRegister File (D)
n:1
1:n
1:1
1:1
HouseholdData File (H)
7
PersonalRegister File (R)
n:1
1:n
PersonalData File (P)
Linking EU-SILC files (cross-sectional)
• Key variables provide links between the related records
between household files
• between individual files
• between household and individual files
•
• Key variables (depending on the files) are
household id (DB030; HB030; RX030; PX030)
• personal id (RB030; PB030)
•
• to be on the safe side: Use key variables always with
‘year of survey’ (DB010; HB010; RB010; PB010) &
• ‘country’ (DB020; HB020; RB020; PB020)
•
8
Example 1: one-to-one
• Attach household register information (D-File) to household
data file (H-File)
• e.g. ‘Degree of urbanisation’ (DB100) is only included in the
household register, it might be of use having this information
in the household data, too.
9
One-to-One Match, e.g. household information
Household Register ( separate file)
DB010
DB020
DB030
2010
AT
2
2010
AT
12
2010
AT
13
2010
AT
19
2010
AT
26
2010
AT
59
DB075
3
2
3
2
3
4
(…)
HS090
HS120
(…)
(…)
(…)
(…)
(…)
(…)
DB100
intermediate area
thinly populated area
thinly populated area
thinly populated area
thinly populated area
densely populated area
Household Data (separate file)
HB010
10
HB020
HB030
2010
2010
AT
AT
2
12
2010
2010
2010
2010
AT
AT
AT
AT
13
19
26
59
(…)
HX060
no - cannot afford with great difficulty
yes
with difficulty
(…)
(…)
One person household
Other hhlds without dep. children
no - other reason
fairly easily
yes
fairly easily
yes
easily
yes
with some difficulty
(…)
(…)
One person household
Other hhlds without dep. children
(…)
(…)
Other hhlds without dep. children
One person household
Result: Combined Household File
Household Data (combined file)
HB010
2010
2010
2010
2010
2010
2010
11
HB020
AT
AT
AT
AT
AT
AT
HB030
HS090
HS120
2
no - cannot
afford
with great
difficulty
12
yes
13
no - other
reason
19
26
59
yes
with
difficulty
fairly easily
fairly easily
yes
easily
yes
with some
difficulty
(…)
HX060
DB100
(…)
One person
household
intermediate area
(…)
(…)
Other households
without dependent thinly populated
children
area
One person
household
thinly populated
area
(…)
Other households
without dependent thinly populated
children
area
(…)
Other households
without dependent thinly populated
children
area
(…)
One person
household
densely populated
area
Example 2: one-to-many
• Attach household register information (D-File) to personal
data file (P-File)
• Attach ‘Degree of urbanisation’ (again) to the personal data
file
12
Attaching household data to personal data (1:n)
Household Register ( separate file)
DB010
DB020
DB030
2010
AT
2
2010
AT
12
2010
AT
DB075
3
2
(…) DB100
(…) intermediate area
(…) thinly populated area
3
(…) thinly populated area
26
Personal Data (combined)
PB010 PB020
2010
AT
2010
AT
2010
AT
2010
AT
2010
AT
(…)
13
PX30 PB030
2
201
12
1201
12
1202
12
1203
12
1204
PH010
fair
fair
fair
good
fair
PH020
PH030
PX020 DB100
yes
yes, limited
71 intermediate area
no
no, not limited
32 thinly populated area
yes
yes, limited
31 thinly populated area
no
no, not limited
30 thinly populated area
no
no, not limited
26 thinly populated area
Example 3: many-to-one
• e.g. number of persons in a households who are
•
•
•
unemployed,
full-time employed
self-employed?
• such information is not included in the data
=> own computation
14
Matching: many-to-one (summarizing information)
Personal Data
Summarized variables
PB010 PB020 PX30 PB030
2010
AT
2
201
2010
AT
12
1201
2010
AT
12
1202
2010
AT
12
1203
2010
AT
12
1204
(…)
PL031
Unemployed (5)
Empl. full time (1)
Emp. full time (1)
Emp. part time (2)
Self-employed (3)
# unempl
# employed
full time
# self
employed
1
0
0
0
0
0
2
2
2
2
0
1
1
1
1
Household Data( combined file)
15
HB010
2010
2010
HB020
AT
AT
HB030
2
12
# unempl
1
0
2010
AT
26
..
# employed # self employed
0
0
2
1
…
Hands on – matching 1:1
•
•
•
•
•
Attach ‘Degree of Urbanisation’ (DB100) to household data file (H-File)
Open the EU-SILC training dataset – D-File *.
Check the variables you are interested in .
Sort your data according to key variables used für linkage *.
Names of key variables in files to be matched must identical
=> Create new key variables (ID010, ID020, ID_HH) in such a way that
DB010 = ID010
DB020 = ID020
DB030 = ID_HH
• Create a new file with only the key variables & the variable(s) you are
interested in
• name the new file DB100.sav
16
SPSS–Matching: one-to-one
• **** Before you start ************.
* specify the path where the EU-SILC training dataset is stored.
FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'.
* specify the path where you want to save your data.
FILE HANDLE mydata_path /NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'.
open the EU-SILC training dataset – D-File *.
GET FILE='data_path/udb_c10d_silc_course.sav'.
* check the variables you are interested in .
cross DB020 by DB100.
17
SPSS–Matching: one-to-one
* open the EU-SILC training dataset – D-File *.
GET FILE='data_path/udb_c10d_silc_course.sav'.
* check the variables you are interested in .
cross DB020 by DB100.
* Step 1- Sort your data according to key variables used für linkage *.
sort cases by DB010 DB020 DB030.
* Step 2 - Names of key variables in files to be matched must identical *.
rename variables (DB010 DB020 DB030 = ID010 ID020 ID_HH).
* create a new file with the key variables & the variable(s) you are interested in *.
save outfile = 'mydata_path/DB100.sav'
/keep ID010 ID020 ID_HH DB100.
18
SPSS–Matching: one-to-one
GET FILE='data_path/udb_c10H_silc_course.sav'.
sort cases HB010 HB020 HB030.
* Key – Variables *.
* either rename (like before) or better generate a new variable *
STRING ID020 (A2).
compute ID010 = HB010.
compute ID020 = HB020.
compute ID_HH = HB030.
MATCH FILES FILE= *
/file ='mydata_path/DB100.sav'
/BY ID010 ID020 ID_HH.
execute.
* check whether it worked.
cross HB020 by DB100.
19
SPSS–Matching: One-to-many Match (1:n)
Example 2: Combing household and personal data
E.g. ‘Degree of Urbanisation’ (DB100) to personal data.
GET FILE='data_path/udb_c10p_silc_course.sav'.
* Sort key variables used für linkage *.
sort cases by PB010 PB020 PX030.
* PB020 = string variable - create a new string variable ID020 /or use the rename command *
STRING ID020 (A2).
compute ID010 = PB010.
compute ID020 = PB020.
compute ID_HH = PX030.
20
SPSS–Matching: One-to-many Match (1:n)
MATCH FILES FILE= *
/table = 'mydata_path/DB100.sav'
/BY ID010 ID020 ID_HH.
execute.
* Check whether it worked *.
cross pb020 by db100.
save outfile = 'mydata_path/personal_data.sav'.
21
Matching: many-to-one (n : 1)
• Create new summary variables for personal data (P-File)
number of persons living in the same household
• number of unemployed persons living in a household
• number of full-time employed persons living in a household
• number of part-time employed persons living in a household
• number of self-employed persons living in a household
• sum of ‘pensions from individual private plans (PY080G)
•
22
•
•
•
•
•
•
•
*********************************************************.
* many-to-one (n:1)
* Personal Data
* example 1
* number of persons living in the same household
* number of unemployed persons living in a household
*********************************************************.
•
•
* specify the path where the EU-SILC training dataset is stored.
FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'.
•
•
* specify the path where you want to save your data.
FILE HANDLE mydata_path / NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'.
•
•
* open the EU-SILC training dataset.
GET FILE='data_path/udb_c10p_silc_course.sav'.
23