SESADP Structure of Earnings Survey – Administrative Data

Download Report

Transcript SESADP Structure of Earnings Survey – Administrative Data

National Employment Survey Unit
Methodology Division, CSO
Project Team:
Kevin McCormack, Dr. Mary Smyth, Sinead Phelan, Ann O’Dwyer
Overview
Structure of Earnings Survey
 EU Regulation – 4 years
→ met by National Employment Survey (NES)
Microdata:60,000 employees- Annual & Hourly earnings; Hours worked:
Age
Gender
Education
Occupation
NACE
Full/part-time
Nationality
Length of Service
 EU Annual Earnings, GPG
 National Earnings Statistics
 RMFs
NES Publication
Example of Tables
Mean hourly earnings in October 2007 by educational
attainment, full/parttime status and sex
Level of
Educational
Attainment
Male
Female
Total
Full-time
Part-time
Full-time
Part-time
Full-time
Part-time
€
€
€
€
€
€
Primary or Lower
Secondary
17.62
13.15
14.78
13.06
16.88
13.08
Higher Secondary
18.68
12.44
16.36
14.56
17.78
14.15
Post Leaving Cert
20.00
13.31
15.91
15.11
18.89
14.80
Third level nondegree
23.06
14.75
19.37
17.20
21.02
16.90
Degree or higher
31.44
20.06
27.20
23.07
29.18
22.47
Total
21.69
14.11
20.42
15.69
21.17
15.40
SES - ADP
 Structure of Earnings Survey - Administrative Data
Project
 Project Goal:
 2011 & 2012 Annual Earnings Data required - EU & Nationally
 Administrative Data
 Response Burden, Cost Effective, Quality, Representative
 NES Annual Publication
 Roll-out Infrastructure:
2013
 SES 2014
5 Modules
 1) Research & Identify Potential Sources – ADS
 2) Linking Data Sources
 3) Modelling non-available characteristics
 4) Construction of the SESADS
 5) Publish Results
(M1) Research & Identify ADS
 7 Administrative Data Sources
 2 External



Revenue P35L
Dept. Social Protection
5 CSO



Census
EHECS
SILC
• CBR
• QNHS
Fig. 1: SESADS primary data sources
DSP
P35L
CBR
QNHS
SESADS
EHECS
SILC
COP
(M2) Linking Data Sources
An analysis was undertaken of the data fields contained
within the SESADS sources.
 Unique Identifiers:
 Per_IdNo. (PPS No. anonymised) - employees
 Ent_nbr (unique Enterprise Number ) - employers
Most suitable unique identifiers (UI) to link:



CSO’s data sources,
DSP and
Revenue Commissioners P35L data files
Fig.2: Construction of the SESADS
SESADS
COP/QNHS/SILC
Per_IdNo & ICA
DSP
Per_IdNo.,
Demographics
CBR/EHECS
Per_IdNo.
Ent_nbr, Enterprise
location, Size, and
NACE
P35L
Per_IdNo.
Ent_nbr, Gross
annnual earnings,
Weeks worked
Occupation,
NACE, Demographics,
Education,
Earnings
Identity Correlation Approach (1)
Census
 No Unique Identifier
 Linking social data sources (Census) is a greater challenge for the CSO.


No Unique Identifiers (UIs), such as a PPS No.
UIs were developed by following an identity correlation approach
(ICA),
e.g. combining date of birth, Gender , County live and NACE.
 E.g. 29101990|F|CORK|85|

This identity correlation approach enabled the social data sources to be linked
• SESADS
 Currently contains 1 million of the approx. 1.3 million F/P time
employees in the State
 Quality checked 800,000 records,
 Representative of the NACE sectors,
Identity Correlation Approach (2)
 Annual Births YoB = 63,000
 DoB 63,000 / 365 days
= 173
 Gender ÷ 2
=
 NACE ÷ 14
=
6 (17)
 County ÷ 26 (3)
=
1 (5)
86
E.g. 29101990|F|85|CORK|
On completion of Module 2 - SESADS will contain all employees in
the State, Gross Annual/Weekly Earnings classified by:
 Variables
 Sources
 NACE,
 Gender,
 Enterprise
 Size group,
 Public/Private sector,
 Weeks worked,
------------------------------------------




Occupation,
Area of residence,
Education,
Age,
Nationality.




P35L
CBR
EHECS
DSP
------------------- COP
 QNHS
 SILC
Module 3: Modelling of non-available
characteristics
 Employee characteristics to be modelled are:
 (1) Hours Worked
 (2) Annual bonuses
 (3) BIK (benefit in kind)
 (4) full/part-time employment status for employees.
 A multiple imputation methodology will be employed to carry out this stage of
the Project.
 EHECS,QNHS and SILC data sources will be leveraged to provide the base
information.
 Once this model is completed, the SESADS will fulfill both the Eurostat annual
and 4- yearly Eurostat SES earnings requirements.
Module 4: Construction of the SESADS
 The SESADS will be constructed in the CSO’s
Administrative Data Centre (ADC)
 Structures (known as layers) consistent with those as
outlined in the ESSnet on microdata linking and data
warehousing in statistical production.
 SES – EU microdata format
Module 5: Publication of Results
 The first set of SES statistics for 2011 and 2012 (gender pay
gap and average earning) were submitted to Eurostat in
November 2013.
 Finalised datasets with more detail will be available mid-
2015
 NES Publication
Timetable
 SESADP – signed off 2015


SES 2011 & 2012 Data
NES Publication
 Roll out Project infrastructure


for 2013 & 2014 data
Assess by end 2015
 SES 2014

Microdata- submitted to Eurostat mid-2016
ends-
Cost Benefit Analysis
Business Survey






V’s
15 persons
-Cost € 1.5 million
-T+ 18 months
-Quality Data Edits -Sample (70K)
-Burden (10K Ents, 70K ees) --
SESADP
1 FTE (3.5)
€ 0.1 m (€0.2m)
T+ 10 months
Revenue data
1 million
None
Thanks to:
 CSO Divisions: Cork
Dublin
STS – cross division support
EHECS
ADC
CENSUS
Earnings Analysis
CBR
QNHS
SILC
IT
Etc.