Transcript methodology

Netherlands Employment Report Documentation
Automatic Data Processing, LLC (ADP)
Table of Contents
A.1
Introduction ............................................................................................................................... 3
A.2
Data description and analysis ................................................................................................... 3
A.3
VAR model ............................................................................................................................... 7
A.4
Results ..................................................................................................................................... 8
1
Introduction
This document describes the modelling approach used to generate monthly forecasts of the Netherlands’ nonfarm employment based on ADP aggregate payroll data. While the official non-farm payroll employment data are
published by Statistics Netherlands (CBS) and Eurostat on a quarterly basis, the ADP forecasts for the
Netherlands will be generated and reported on a monthly basis and prior to the release of the official monthly
unemployment rate for the country. The ADP data contain the 9 main industry groups reported by CBS each
quarter. The methodology is consistent with Eurostat’s methodology and the consistency is verified on a regular
basis, typically once per year, by the missions sent to the country by Eurostat. However, we report employment
only for those industries for which we can get the best statistical model and closest fit to the CBS data.
The data used for this exercise is panel data containing an average of 13,000 firms per month for which the
number of active employees on a certain day of the month is reported. The sample covers about 7% of all jobs in
the Netherlands. The data represents an unbalanced panel for the period from April 2008 to May 2016. The panel
is unbalanced because firms enter and exit the sample due to changes in operating conditions or business
relationship status with ADP. A large part of the employment forecasts generation process is devoted to data
processing and cleaning. This includes removing duplicate observations and outliers, and applying interpolation
where possible to fill in one-month or two-month gaps in data or to compensate for changes in monthly sample
sizes.
After cleaning data on the individual firm level, matched pairs are created. A matched pair is identified if the
employment record for a given firm appears in two adjacent months. In the next stage, the number of active
employees is summed across matched pairs for each month and for each industry to generate time series of
monthly employment levels for each industry.
We use the vector autoregressive (VAR) model for estimation and forecasting purposes. ADP industry series are
used as endogenous variables, while official sectoral employment data and selected leading indicators are used
as exogenous drivers. As a number of industries in the ADP and CBS samples contain distinct trends that may
move in different directions, we control for these differences by removing deterministic trends and modelling the
data as percentage deviations from the trends. Detrending the data may be justified by a number of
considerations. First, we are interested in short-term fluctuations rather than long-term trends. Second, ADP
sample changes are likely affected by factors specific to ADP, as we do not know the reasons why a particular
firm exited the sample. Thus, official data sector-specific trends that use lower-frequency census data is a more
reliable measure of long-term movements, and we utilize that information to improve the model’s performance.
A.2
Data description and analysis
In this section, the ADP data are analyzed and compared with the CBS data. The data processing and cleaning
procedures are then described.
ADP data consist of employment data in monthly frequency for 13,000 firms of different sizes and in different
industries over the period from April 2008 to March 2016. The sample covers about 7% of jobs in the Netherlands.
The codes and names of the industries selected for the model are presented in Table 1 below. The breakdown
includes all sectors of the economy excluding agriculture.
Since we use a VAR model to forecast changes in employment, using a smaller number of groups helps to avoid
over parametrization of the model and to preserve degrees of freedom.
For each firm, the number of active employees is reported as of a specific day of the month which is defined as
the date of the most recent payroll process. Although data starting in 2007 are also available, we opted to restrict
our sample to begin in April 2008 due to the large number of duplicates and volatile aggregate.
3
Table 2.1: Industry Groups
Code
BDE
Description
C
F
Mining and quarrying; electricity and
gas; water supply and waste
management
Manufacturing
Construction
GI
Trade, transport, hotels, catering
J
Information and communication
K
Financial institutions
LN
Business services
OQ
Government and care
RU
Culture, recreation, other services
The ADP firm-level data record the number of jobs at each firm and are consistent with payroll jobs data reported
in the labor accounts of the national accounts system. Given that the Netherlands has the largest portion of parttime workers in the European Union, a distinction between employment and jobs may be important, and we
emphasize the use of the payroll employment, i.e. what employers report. We use the terms employment and
jobs interchangeably throughout the text, but the implied concept is close to the number of jobs. We also exclude
self-employed individuals from the official numbers when comparing them to the ADP counterparts, since the
latter only contains payroll jobs.
In the first stage of data cleaning, we remove exact duplicates. The second stage of data cleaning focuses on
removing outliers to smooth out monthly percentage changes. In the third stage, interpolation is applied at the
individual firm level if necessary. In the final stage we remove the outliers at the aggregate industry level in certain
months. After completing the data processing steps described above, we sum the number of active employees
across matched pairs for each industry and month, arriving at monthly time series which can then be used for
estimation. Most industries display strong seasonality, and we adjust each series using the X-13ARIMA-SEATS
seasonal adjustment procedure available from the U.S. Census Bureau.
The CBS also reports quarterly levels of employment, quarter-to-quarter absolute changes and quarter-to-quarter
percentage changes for the industries listed in Table 1. Quarterly employment levels correspond to the number of
persons who are employed on the last day of the quarter. The numbers are based on national accounts
methodology and have to be in line with other indicators such as compensation or income. The CBS uses the
classification by main activity or Standard Industrial Classification (SIC). Businesses in a sector of industry or
branch may also be engaged in other activities called subsidiary activities. The first estimate is released 45 days
after the quarter ends, while the second estimate is published 90 days after the end of the quarter. With the
release of the second estimate of the fourth quarter, data for the previous three quarters of the year are revised.
The charts below plot the quarter-to-quarter percent changes for ADP and CBS employment data for selected
industries. Although there are clear signs of short-term co-movement, the long-term trends are somewhat shifted
as indicated by the mismatch in values and scale of the left-hand-side and right-hand-side axis.
4
ADP C (LHS)
CBS C (RHS)
4
0,4
0,3
0,2
0,1
0
-0,1
-0,2
-0,3
-0,4
-0,5
-0,6
3
2
1
0
-1
-2
-3
nov-15
jul-15
mrt-15
nov-14
jul-14
mrt-14
nov-13
jul-13
mrt-13
nov-12
jul-12
mrt-12
nov-11
jul-11
mrt-11
nov-10
jul-10
mrt-10
-4
Figure 2.1: ADP and CBS employment - manufacturing (% change, q/q)
ADP GI (LHS)
CBS GI (RHS)
nov-15
-0,6
jul-15
-5
mrt-15
-0,4
nov-14
-4
jul-14
-0,2
mrt-14
-3
nov-13
0
jul-13
-2
mrt-13
0,2
nov-12
-1
jul-12
0,4
mrt-12
0
nov-11
0,6
jul-11
1
mrt-11
0,8
nov-10
2
jul-10
1
mrt-10
3
Figure 2.2: ADP and CBS employment - trade, hotels and catering (% change, q/q)
5
ADP LN (LHS)
CBS LN (RHS)
4
3
2,5
2
1,5
1
0,5
0
-0,5
-1
-1,5
-2
3
2
1
0
-1
-2
-3
-4
nov-15
jul-15
mrt-15
nov-14
jul-14
mrt-14
nov-13
jul-13
mrt-13
nov-12
jul-12
mrt-12
nov-11
jul-11
mrt-11
nov-10
jul-10
mrt-10
-5
Figure 2.3: ADP and CBS employment – business services (% change, q/q)
ADP K (LHS)
CBS K (RHS)
30
0,2
25
0
20
-0,2
15
-0,4
10
-0,6
5
-0,8
0
-1
-5
nov-15
jul-15
mrt-15
nov-14
jul-14
mrt-14
nov-13
jul-13
mrt-13
nov-12
jul-12
mrt-12
nov-11
jul-11
mrt-11
-1,4
nov-10
-15
jul-10
-1,2
mrt-10
-10
Figure 2.4: ADP and CBS employment – financial services (% change, q/q)
6
A.3 VAR model
We employ a monthly vector autoregressive model (VAR) to fit and forecast computed employment series for
each industry. VAR models are widely used for analyzing multivariate time series. Typical VAR models treat all
variables as endogenous following Sim’s (1980) critique on the ad hoc exogeneity assumption of macroeconomic
models. VAR models can incorporate restrictions, including the exogeneity of some of the variables. They can
also be amended to include deterministic trends and exogenous variables. VAR models are typically used to
summarize the dynamic properties and generate forecasts for economic and financial time series. They can also
be used for structural inference and policy analysis. The basic VAR model of order p is defined by the following
equation:
y = A y
+ … + A y
+ u ,
where is a (K x 1) vector of endogenous variables, A are (K x K) coefficient matrices, and u is a K dimensional
white noise process with E u = 0and E u u = ∑ . Typically the VAR(p) model defines a stationary process
with time invariant mean, variance and covariance structure. Besides the lags of endogenous variables, the VAR
model can also include exogenous regressors on the right-hand-side, and in this case VAR is represented by the
following equation
y = A y
+ … + A y
+ B X
+ … + B X
+u ,
where X represents the vector of exogenous variables and trends.
The coefficients of the VAR(p) process can be estimated by applying OLS to each of the equations. The OLS
estimator is identical to the GLS estimator if there are no restrictions in the parameters. For a normally distributed
Gaussian process y ~N 0, ∑ u , the OLS estimator is also identical to the maximum likelihood (ML) estimator.
The usual statistical inference procedures can be applied if the process is stable. If there are integrated variables
so that y ~I 1 , then the process may be cointegrated and a vector error correction model may be more
appropriate. The OLS / ML method can still be applied for estimating the model parameters, but the usual tstatistics and F-statistics can generate misleading conclusions when used for hypothesis testing.
We use extrapolated (quadratic spline) quarterly official employment for each industry, and business confidence
indices for manufacturing, services, construction and trade published on a monthly basis by OECD as exogenous
variables. We also use the output gap to control for the business cycles and industrial production gap. We impose
restrictions to include only relevant variables in each equation. For example, construction business confidence is
only included in the construction and real estate sectors.
Once the fitted values and forecasts are obtained from the model, we use official sector-specific trends to
benchmark the levels of ADP sector variables. We simply apply the estimated gaps for ADP variables to the
trends in the official variables for each sector. The magnitude of the gaps is further adjusted to match the volatility
of the changes in the official and ADP series.
7
A.4 Results
In the charts below we plot fitted values of quarter-to-quarter changes in ADP employment for selected industries
together with the changes in CBS employment.
ADP
CBS
80,00
60,00
40,00
20,00
0,00
-20,00
-40,00
-60,00
nov-15
jun-15
jan-15
aug-14
mrt-14
okt-13
mei-13
dec-12
jul-12
feb-12
sep-11
apr-11
nov-10
jun-10
jan-10
aug-09
mrt-09
-80,00
Figure 3.1: ADP and CBS employment – total non-farm (ths.)
ADP C
CBS C
Figure 3.2: ADP and CBS employment – manufacturing (ths.)
8
nov-15
jun-15
jan-15
aug-14
mrt-14
okt-13
mei-13
dec-12
jul-12
feb-12
sep-11
apr-11
nov-10
jun-10
jan-10
aug-09
mrt-09
6,00
4,00
2,00
0,00
-2,00
-4,00
-6,00
-8,00
-10,00
-12,00
-14,00
ADP GI
CBS GI
nov-15
jun-15
jan-15
aug-14
mrt-14
okt-13
mei-13
dec-12
jul-12
feb-12
sep-11
apr-11
nov-10
jun-10
jan-10
aug-09
mrt-09
25,00
20,00
15,00
10,00
5,00
0,00
-5,00
-10,00
-15,00
-20,00
-25,00
Figure 3.3: ADP and CBS employment – trade, hotels and catering (ths.)
ADP LN
CBS LN
80,00
60,00
40,00
20,00
0,00
-20,00
-40,00
-60,00
-80,00
Figure 3.1: ADP and CBS employment – business services (ths.)
9
nov-15
jun-15
jan-15
aug-14
mrt-14
okt-13
mei-13
dec-12
jul-12
feb-12
sep-11
apr-11
nov-10
jun-10
jan-10
aug-09
mrt-09
-100,00
ADP K
CBS K
8,00
6,00
4,00
2,00
0,00
-2,00
-4,00
-6,00
Figure 3.4: ADP and CBS employment – financial services (ths.)
10
nov-15
jun-15
jan-15
aug-14
mrt-14
okt-13
mei-13
dec-12
jul-12
feb-12
sep-11
apr-11
nov-10
jun-10
jan-10
aug-09
mrt-09
-8,00