Transcript Here

Training Session Part 2
Alexander Mack
GESIS – German Microdata Lab
Exercise 1
• For this exercise we will be using the
personal register (udb_c10r_silc_course)
• The personal register is the only dataset
which contains information on persons under
16 years of age
• We will be generating household level
indicators on the basis of person level data
via aggregation
Exercise 1
Generate a frequency table comparing the
number of persons under 18 in the
Household (HH) between countries.
What has to be taken into account?
Exercise 1
Solution:
• Step 1: Generate a Dummy which identifies
persons under 18
• Step 2: Count the number of Persons under
18 in households
• But how can we display HH level variables in
a person level dataset?
Exercise 1
• A possible solution is to generate a HH
counter
• Then display frequencies for only the first
person in the HH
Exercise 1
Generate a categorical variable “Age of the
youngest child in HH” with the following
values:
1 “no children under 18 in HH”
2 “under 3”
3 “between 3 and 5”
4 “between 6 and 17”
Exercise 1
Solution:
• Step 1: Identify the youngest person in the
HH for HHs with children
• Step 2: Recode the persons age in a HH
level variable
• Step 3: Build a categorical variable with
information from Step 2 and “Number of
children under 18”
Exercise 1
Identify all households in which at least one child
lives together with a single parent. Note that to do
so you must examine each child in a household
separately.
Generate a dummy variable which identifies
whether an individual is a single parent and a
dummy which identifies households with at least
one single parent. Compare the prevalence of
single mothers and fathers throughout Europe.
Exercise 1
• Step 1: Generate a rank variable for children in HH
• Step2: Identify parents by matching their IDs with each
kids Father/Mother ID and generate a dummy for
moms and dads
• Step 3: Generate a dummy identifying whether a
person is single (via partner/spouse ID)
• Step 4: Combine single and mom/dad dummies to
identify single parents
• Step 5: Generate a HH variable which identifies
whether a single parent lives in the HH
Exercise 1
Step 1
HH ID
Person ID
1
11
1
12
1
13
1
14
2
21
2
22
2
23
2
24
Partner ID Mother ID Father ID Childrank
Step 2
Step 3
momchild
momchild
dadchild1
dadchild2
1
2
Sex
Age
1
31
12
.
.
.
12
.
12
2
29
11
.
.
.
12
.
2
5
.
12
.
1
12
1
2
.
12
11
2
1
41
22
.
.
2
43
21
.
2
17
.
1
0
.
Step 4
Step 5
singlepare singlepare
nt
ntHH
mom
dad
single
11
0
1
0
0
0
12
11
1
0
0
0
0
.
12
11
0
0
1
0
0
12
.
12
11
0
0
1
0
0
.
21
22
23
.
0
1
0
0
1
.
.
21
22
23
.
1
0
0
0
1
21
22
.
21
22
23
.
1
0
1
1
1
23
.
1
21
22
23
.
0
0
0
0
1
Exercise 1 – Bonus
For children under 3 and aged 3 to 5 examine
their use of childcare facilities.
Generate a composite indicator which
measures how many hours a week a child
uses any form of institutionalized child care or
preschool (RL010, RL030, RL040, RL050).
Compare how much time on average children
in different countries spend in childcare for
the two mentioned age groups.
Exercise 1 - Bonus
Step 1: Generate an additive indicator of all
childcare items in question (In order to sum
them up missings must be recoded to 0)
Step 2: Generate dummies identifying kids
in the respective age groups
Exercise 2
• Load the personal data file
(udb_c10p_silc_course)
• Generate a 3 category education variable
on the basis of ISCED (PE040) which
combines categories 0, 1 and 2 to low;
categories 3 and 4 to medium and
category 5 to high.
Exercise 2
• Generate a variable defining a respondents
working status with the following values:
1 “Economically inactive”
2 “Working 30 hours or less”
3 “Working more than 30 hours”
• Compare this variable for the different educational
categories and men and women across countries.
• Generate Harmonized IDs
• Save your Personal data file.
Exercise 2 – Bonus
• Using the retrospective information available in the
EU-SILC cross sectional dataset examine
transitions into unemployment.
• Generate a variable which shows you whether an
individual has experienced a transition into
unemployment in the last year (use PLE211A-L).
• Examine the prevalence of transitions into
unemployment in Europe and across educational
groups.
Exercise 3
• Merge your personal data and personal
register files using the unique person ID.
• Note that by doing so persons under the
age of 16 will be excluded.
• Save your person level data file.
Exercise 3
• Run a multinomial logit model for working age (1865) women not in education with working status as
the dependent variable and the following
independent variables:
• age of the youngest child (categorical)
• single mothers (dummy)
• age
• education (categorical)
• Country dummies (use RB020_num)
Exercise 4
• Merge your person level and household level
datasets via the household identifier (1 to many
merge).
• Examine individuals’ characteristics on the
households’ ability to make ends meet (HS120).
Compare the ability to make ends meet between
the educational groups generated above. Are
educational differences in making ends meet
consistent across countries?
• Save the merged dataset
Exercise 4 - Bonus
• Generate a dummy variable from HS120 and
run a logistic regression at the level of
households.
• Examine the effect of the number persons
working at least 30 hours a week in the
household, the highest education obtained in
the household and the number of children.
Control for country level variation via fixed
effects.
Exploring the data
Calculate the equivalized HH income
according to the old OECD scale. Where a
weight of 1 is assigned to the first adult, a
weight of .7 is assigned to additional
adults and a weight of .5 is assigned to
each child (age<14) in the HH.
Exploring the Data
Run the regression from 2 c) but additionally
consider the childcare use of the youngest
child in the HH. Think about which datasets
you will need to draw the information from
and how to merge it to your existing person
level data file.