Transcript methodology
Netherlands Employment Report Documentation Automatic Data Processing, LLC (ADP) Table of Contents A.1 Introduction ............................................................................................................................... 3 A.2 Data description and analysis ................................................................................................... 3 A.3 VAR model ............................................................................................................................... 7 A.4 Results ..................................................................................................................................... 8 1 Introduction This document describes the modelling approach used to generate monthly forecasts of the Netherlands’ nonfarm employment based on ADP aggregate payroll data. While the official non-farm payroll employment data are published by Statistics Netherlands (CBS) and Eurostat on a quarterly basis, the ADP forecasts for the Netherlands will be generated and reported on a monthly basis and prior to the release of the official monthly unemployment rate for the country. The ADP data contain the 9 main industry groups reported by CBS each quarter. The methodology is consistent with Eurostat’s methodology and the consistency is verified on a regular basis, typically once per year, by the missions sent to the country by Eurostat. However, we report employment only for those industries for which we can get the best statistical model and closest fit to the CBS data. The data used for this exercise is panel data containing an average of 13,000 firms per month for which the number of active employees on a certain day of the month is reported. The sample covers about 7% of all jobs in the Netherlands. The data represents an unbalanced panel for the period from April 2008 to May 2016. The panel is unbalanced because firms enter and exit the sample due to changes in operating conditions or business relationship status with ADP. A large part of the employment forecasts generation process is devoted to data processing and cleaning. This includes removing duplicate observations and outliers, and applying interpolation where possible to fill in one-month or two-month gaps in data or to compensate for changes in monthly sample sizes. After cleaning data on the individual firm level, matched pairs are created. A matched pair is identified if the employment record for a given firm appears in two adjacent months. In the next stage, the number of active employees is summed across matched pairs for each month and for each industry to generate time series of monthly employment levels for each industry. We use the vector autoregressive (VAR) model for estimation and forecasting purposes. ADP industry series are used as endogenous variables, while official sectoral employment data and selected leading indicators are used as exogenous drivers. As a number of industries in the ADP and CBS samples contain distinct trends that may move in different directions, we control for these differences by removing deterministic trends and modelling the data as percentage deviations from the trends. Detrending the data may be justified by a number of considerations. First, we are interested in short-term fluctuations rather than long-term trends. Second, ADP sample changes are likely affected by factors specific to ADP, as we do not know the reasons why a particular firm exited the sample. Thus, official data sector-specific trends that use lower-frequency census data is a more reliable measure of long-term movements, and we utilize that information to improve the model’s performance. A.2 Data description and analysis In this section, the ADP data are analyzed and compared with the CBS data. The data processing and cleaning procedures are then described. ADP data consist of employment data in monthly frequency for 13,000 firms of different sizes and in different industries over the period from April 2008 to March 2016. The sample covers about 7% of jobs in the Netherlands. The codes and names of the industries selected for the model are presented in Table 1 below. The breakdown includes all sectors of the economy excluding agriculture. Since we use a VAR model to forecast changes in employment, using a smaller number of groups helps to avoid over parametrization of the model and to preserve degrees of freedom. For each firm, the number of active employees is reported as of a specific day of the month which is defined as the date of the most recent payroll process. Although data starting in 2007 are also available, we opted to restrict our sample to begin in April 2008 due to the large number of duplicates and volatile aggregate. 3 Table 2.1: Industry Groups Code BDE Description C F Mining and quarrying; electricity and gas; water supply and waste management Manufacturing Construction GI Trade, transport, hotels, catering J Information and communication K Financial institutions LN Business services OQ Government and care RU Culture, recreation, other services The ADP firm-level data record the number of jobs at each firm and are consistent with payroll jobs data reported in the labor accounts of the national accounts system. Given that the Netherlands has the largest portion of parttime workers in the European Union, a distinction between employment and jobs may be important, and we emphasize the use of the payroll employment, i.e. what employers report. We use the terms employment and jobs interchangeably throughout the text, but the implied concept is close to the number of jobs. We also exclude self-employed individuals from the official numbers when comparing them to the ADP counterparts, since the latter only contains payroll jobs. In the first stage of data cleaning, we remove exact duplicates. The second stage of data cleaning focuses on removing outliers to smooth out monthly percentage changes. In the third stage, interpolation is applied at the individual firm level if necessary. In the final stage we remove the outliers at the aggregate industry level in certain months. After completing the data processing steps described above, we sum the number of active employees across matched pairs for each industry and month, arriving at monthly time series which can then be used for estimation. Most industries display strong seasonality, and we adjust each series using the X-13ARIMA-SEATS seasonal adjustment procedure available from the U.S. Census Bureau. The CBS also reports quarterly levels of employment, quarter-to-quarter absolute changes and quarter-to-quarter percentage changes for the industries listed in Table 1. Quarterly employment levels correspond to the number of persons who are employed on the last day of the quarter. The numbers are based on national accounts methodology and have to be in line with other indicators such as compensation or income. The CBS uses the classification by main activity or Standard Industrial Classification (SIC). Businesses in a sector of industry or branch may also be engaged in other activities called subsidiary activities. The first estimate is released 45 days after the quarter ends, while the second estimate is published 90 days after the end of the quarter. With the release of the second estimate of the fourth quarter, data for the previous three quarters of the year are revised. The charts below plot the quarter-to-quarter percent changes for ADP and CBS employment data for selected industries. Although there are clear signs of short-term co-movement, the long-term trends are somewhat shifted as indicated by the mismatch in values and scale of the left-hand-side and right-hand-side axis. 4 ADP C (LHS) CBS C (RHS) 4 0,4 0,3 0,2 0,1 0 -0,1 -0,2 -0,3 -0,4 -0,5 -0,6 3 2 1 0 -1 -2 -3 nov-15 jul-15 mrt-15 nov-14 jul-14 mrt-14 nov-13 jul-13 mrt-13 nov-12 jul-12 mrt-12 nov-11 jul-11 mrt-11 nov-10 jul-10 mrt-10 -4 Figure 2.1: ADP and CBS employment - manufacturing (% change, q/q) ADP GI (LHS) CBS GI (RHS) nov-15 -0,6 jul-15 -5 mrt-15 -0,4 nov-14 -4 jul-14 -0,2 mrt-14 -3 nov-13 0 jul-13 -2 mrt-13 0,2 nov-12 -1 jul-12 0,4 mrt-12 0 nov-11 0,6 jul-11 1 mrt-11 0,8 nov-10 2 jul-10 1 mrt-10 3 Figure 2.2: ADP and CBS employment - trade, hotels and catering (% change, q/q) 5 ADP LN (LHS) CBS LN (RHS) 4 3 2,5 2 1,5 1 0,5 0 -0,5 -1 -1,5 -2 3 2 1 0 -1 -2 -3 -4 nov-15 jul-15 mrt-15 nov-14 jul-14 mrt-14 nov-13 jul-13 mrt-13 nov-12 jul-12 mrt-12 nov-11 jul-11 mrt-11 nov-10 jul-10 mrt-10 -5 Figure 2.3: ADP and CBS employment – business services (% change, q/q) ADP K (LHS) CBS K (RHS) 30 0,2 25 0 20 -0,2 15 -0,4 10 -0,6 5 -0,8 0 -1 -5 nov-15 jul-15 mrt-15 nov-14 jul-14 mrt-14 nov-13 jul-13 mrt-13 nov-12 jul-12 mrt-12 nov-11 jul-11 mrt-11 -1,4 nov-10 -15 jul-10 -1,2 mrt-10 -10 Figure 2.4: ADP and CBS employment – financial services (% change, q/q) 6 A.3 VAR model We employ a monthly vector autoregressive model (VAR) to fit and forecast computed employment series for each industry. VAR models are widely used for analyzing multivariate time series. Typical VAR models treat all variables as endogenous following Sim’s (1980) critique on the ad hoc exogeneity assumption of macroeconomic models. VAR models can incorporate restrictions, including the exogeneity of some of the variables. They can also be amended to include deterministic trends and exogenous variables. VAR models are typically used to summarize the dynamic properties and generate forecasts for economic and financial time series. They can also be used for structural inference and policy analysis. The basic VAR model of order p is defined by the following equation: y = A y + … + A y + u , where is a (K x 1) vector of endogenous variables, A are (K x K) coefficient matrices, and u is a K dimensional white noise process with E u = 0and E u u = ∑ . Typically the VAR(p) model defines a stationary process with time invariant mean, variance and covariance structure. Besides the lags of endogenous variables, the VAR model can also include exogenous regressors on the right-hand-side, and in this case VAR is represented by the following equation y = A y + … + A y + B X + … + B X +u , where X represents the vector of exogenous variables and trends. The coefficients of the VAR(p) process can be estimated by applying OLS to each of the equations. The OLS estimator is identical to the GLS estimator if there are no restrictions in the parameters. For a normally distributed Gaussian process y ~N 0, ∑ u , the OLS estimator is also identical to the maximum likelihood (ML) estimator. The usual statistical inference procedures can be applied if the process is stable. If there are integrated variables so that y ~I 1 , then the process may be cointegrated and a vector error correction model may be more appropriate. The OLS / ML method can still be applied for estimating the model parameters, but the usual tstatistics and F-statistics can generate misleading conclusions when used for hypothesis testing. We use extrapolated (quadratic spline) quarterly official employment for each industry, and business confidence indices for manufacturing, services, construction and trade published on a monthly basis by OECD as exogenous variables. We also use the output gap to control for the business cycles and industrial production gap. We impose restrictions to include only relevant variables in each equation. For example, construction business confidence is only included in the construction and real estate sectors. Once the fitted values and forecasts are obtained from the model, we use official sector-specific trends to benchmark the levels of ADP sector variables. We simply apply the estimated gaps for ADP variables to the trends in the official variables for each sector. The magnitude of the gaps is further adjusted to match the volatility of the changes in the official and ADP series. 7 A.4 Results In the charts below we plot fitted values of quarter-to-quarter changes in ADP employment for selected industries together with the changes in CBS employment. ADP CBS 80,00 60,00 40,00 20,00 0,00 -20,00 -40,00 -60,00 nov-15 jun-15 jan-15 aug-14 mrt-14 okt-13 mei-13 dec-12 jul-12 feb-12 sep-11 apr-11 nov-10 jun-10 jan-10 aug-09 mrt-09 -80,00 Figure 3.1: ADP and CBS employment – total non-farm (ths.) ADP C CBS C Figure 3.2: ADP and CBS employment – manufacturing (ths.) 8 nov-15 jun-15 jan-15 aug-14 mrt-14 okt-13 mei-13 dec-12 jul-12 feb-12 sep-11 apr-11 nov-10 jun-10 jan-10 aug-09 mrt-09 6,00 4,00 2,00 0,00 -2,00 -4,00 -6,00 -8,00 -10,00 -12,00 -14,00 ADP GI CBS GI nov-15 jun-15 jan-15 aug-14 mrt-14 okt-13 mei-13 dec-12 jul-12 feb-12 sep-11 apr-11 nov-10 jun-10 jan-10 aug-09 mrt-09 25,00 20,00 15,00 10,00 5,00 0,00 -5,00 -10,00 -15,00 -20,00 -25,00 Figure 3.3: ADP and CBS employment – trade, hotels and catering (ths.) ADP LN CBS LN 80,00 60,00 40,00 20,00 0,00 -20,00 -40,00 -60,00 -80,00 Figure 3.1: ADP and CBS employment – business services (ths.) 9 nov-15 jun-15 jan-15 aug-14 mrt-14 okt-13 mei-13 dec-12 jul-12 feb-12 sep-11 apr-11 nov-10 jun-10 jan-10 aug-09 mrt-09 -100,00 ADP K CBS K 8,00 6,00 4,00 2,00 0,00 -2,00 -4,00 -6,00 Figure 3.4: ADP and CBS employment – financial services (ths.) 10 nov-15 jun-15 jan-15 aug-14 mrt-14 okt-13 mei-13 dec-12 jul-12 feb-12 sep-11 apr-11 nov-10 jun-10 jan-10 aug-09 mrt-09 -8,00