Transcript Here
Training Session Part 2 Alexander Mack GESIS – German Microdata Lab Exercise 1 • For this exercise we will be using the personal register (udb_c10r_silc_course) • The personal register is the only dataset which contains information on persons under 16 years of age • We will be generating household level indicators on the basis of person level data via aggregation Exercise 1 Generate a frequency table comparing the number of persons under 18 in the Household (HH) between countries. What has to be taken into account? Exercise 1 Solution: • Step 1: Generate a Dummy which identifies persons under 18 • Step 2: Count the number of Persons under 18 in households • But how can we display HH level variables in a person level dataset? Exercise 1 • A possible solution is to generate a HH counter • Then display frequencies for only the first person in the HH Exercise 1 Generate a categorical variable “Age of the youngest child in HH” with the following values: 1 “no children under 18 in HH” 2 “under 3” 3 “between 3 and 5” 4 “between 6 and 17” Exercise 1 Solution: • Step 1: Identify the youngest person in the HH for HHs with children • Step 2: Recode the persons age in a HH level variable • Step 3: Build a categorical variable with information from Step 2 and “Number of children under 18” Exercise 1 Identify all households in which at least one child lives together with a single parent. Note that to do so you must examine each child in a household separately. Generate a dummy variable which identifies whether an individual is a single parent and a dummy which identifies households with at least one single parent. Compare the prevalence of single mothers and fathers throughout Europe. Exercise 1 • Step 1: Generate a rank variable for children in HH • Step2: Identify parents by matching their IDs with each kids Father/Mother ID and generate a dummy for moms and dads • Step 3: Generate a dummy identifying whether a person is single (via partner/spouse ID) • Step 4: Combine single and mom/dad dummies to identify single parents • Step 5: Generate a HH variable which identifies whether a single parent lives in the HH Exercise 1 Step 1 HH ID Person ID 1 11 1 12 1 13 1 14 2 21 2 22 2 23 2 24 Partner ID Mother ID Father ID Childrank Step 2 Step 3 momchild momchild dadchild1 dadchild2 1 2 Sex Age 1 31 12 . . . 12 . 12 2 29 11 . . . 12 . 2 5 . 12 . 1 12 1 2 . 12 11 2 1 41 22 . . 2 43 21 . 2 17 . 1 0 . Step 4 Step 5 singlepare singlepare nt ntHH mom dad single 11 0 1 0 0 0 12 11 1 0 0 0 0 . 12 11 0 0 1 0 0 12 . 12 11 0 0 1 0 0 . 21 22 23 . 0 1 0 0 1 . . 21 22 23 . 1 0 0 0 1 21 22 . 21 22 23 . 1 0 1 1 1 23 . 1 21 22 23 . 0 0 0 0 1 Exercise 1 – Bonus For children under 3 and aged 3 to 5 examine their use of childcare facilities. Generate a composite indicator which measures how many hours a week a child uses any form of institutionalized child care or preschool (RL010, RL030, RL040, RL050). Compare how much time on average children in different countries spend in childcare for the two mentioned age groups. Exercise 1 - Bonus Step 1: Generate an additive indicator of all childcare items in question (In order to sum them up missings must be recoded to 0) Step 2: Generate dummies identifying kids in the respective age groups Exercise 2 • Load the personal data file (udb_c10p_silc_course) • Generate a 3 category education variable on the basis of ISCED (PE040) which combines categories 0, 1 and 2 to low; categories 3 and 4 to medium and category 5 to high. Exercise 2 • Generate a variable defining a respondents working status with the following values: 1 “Economically inactive” 2 “Working 30 hours or less” 3 “Working more than 30 hours” • Compare this variable for the different educational categories and men and women across countries. • Generate Harmonized IDs • Save your Personal data file. Exercise 2 – Bonus • Using the retrospective information available in the EU-SILC cross sectional dataset examine transitions into unemployment. • Generate a variable which shows you whether an individual has experienced a transition into unemployment in the last year (use PLE211A-L). • Examine the prevalence of transitions into unemployment in Europe and across educational groups. Exercise 3 • Merge your personal data and personal register files using the unique person ID. • Note that by doing so persons under the age of 16 will be excluded. • Save your person level data file. Exercise 3 • Run a multinomial logit model for working age (1865) women not in education with working status as the dependent variable and the following independent variables: • age of the youngest child (categorical) • single mothers (dummy) • age • education (categorical) • Country dummies (use RB020_num) Exercise 4 • Merge your person level and household level datasets via the household identifier (1 to many merge). • Examine individuals’ characteristics on the households’ ability to make ends meet (HS120). Compare the ability to make ends meet between the educational groups generated above. Are educational differences in making ends meet consistent across countries? • Save the merged dataset Exercise 4 - Bonus • Generate a dummy variable from HS120 and run a logistic regression at the level of households. • Examine the effect of the number persons working at least 30 hours a week in the household, the highest education obtained in the household and the number of children. Control for country level variation via fixed effects. Exploring the data Calculate the equivalized HH income according to the old OECD scale. Where a weight of 1 is assigned to the first adult, a weight of .7 is assigned to additional adults and a weight of .5 is assigned to each child (age<14) in the HH. Exploring the Data Run the regression from 2 c) but additionally consider the childcare use of the youngest child in the HH. Think about which datasets you will need to draw the information from and how to merge it to your existing person level data file.