Presentation - 15th TRB National Transportation Planning

Download Report

Transcript Presentation - 15th TRB National Transportation Planning

Who’s Employed?

An in Depth Comparison of Employment Data Sources

Gregory Giaimo, PE Samuel Granato, PE Andrew Hurst The Ohio Department of Transportation Division of Planning The 14 th Presented at Transportation Planning Applications Conference May 6, 2013

Overview

Motivation

Macro View-QCEW vs. BEA Control Totals for Data Expansion

Micro View-QCEW vs. Purchased Data for Possible Replacement

Motivation

For Travel Modeling Want Employment Data With:

• Accuracy (correct employment/employers) • Completeness (all employment/employers) • Spatial Precision (geocodable address of individual employers at actual • • • place of business activity) Temporal Consistency (no defunct businesses, contain new businesses extant on the supposed date of the dataset) Categorization (correct NAICS or similar) Disaggregate (individual employer records allows data checking, finer TAZ disaggregation and future travel demand models (particularly freight) will include disaggregate attraction end modeling including business synthesizers similar to current household synthesizers) •

There Area a Number of Potential Employment Data Sources

Motivation

QCEW (Quarterly Census of Employment and Wages)

• Regulatory dataset for Federal unemployment insurance • Pros: cheap, regulatory basis implies it is complete and temporally consistent for covered sectors • Cons: confidentiality restrictions, uncovered sectors for those exempt from Federal unemployment insurance laws (sole proprietors, small farms, railroads, military, small non-profits, student workers, elected officials etc.), sub-county location must be geocoded by user from mailing addresses (regulations only require correct county and ability to mail a bill), single site reporting for multi site businesses, government particularly poor •

BEA (Bureau of Economic Analysis)

• Dataset maintained by Federal Government for Macro-Economic Analysis • Pros: based on QCEW but enhanced with other administrative sources such as income tax data to provide complete and temporally consistent data • Cons: Only aggregate county level data available

Motivation

LEHD (Longitudinal Employer-Household Dynamics)

• Census Bureau product based on QCEW and linked with ACS data • Pros: Same pros as other QCEW based sources, no confidentiality restrictions or costs, in addition dataset provides linkages between employee residences and employer locations • Cons: Same pros as other QCEW based sources, plus no employer records only aggregate employment, Census Bureau masking, a PUMS-like product for employment would alleviate some of this constraint •

Private Sources (InfoGroup’s InfoUSA/ReferenceUSA, Dun & Bradstreet’s Global Commercial Database etc.)

• Several firms assemble employment data, primarily for resale for business marketing purposes, they use phone directories and other publicly available • • sources and then enhance and verify it with their staff Pros: Good spatial precision, few of the multi-site problems in QCEW, reasonably complete Cons: Cost, lack of regulatory basis means incompleteness is ill-defined, temporal consistency is poor because primary purpose of dataset makes it more likely that defunct businesses are retained

Motivation

Since 2000 ODOT has utilized QCEW as its primary source of employment data, confidentiality requirements mean model employment data can’t be given out freely creating some logistical issues with the models and consultant contracts, also the latest confidentiality agreement includes stricter personal liability making some hesitant to sign • Ohio library system has a license for Infogroups’s ReferenceUSA, allowing state agencies to query 50 records at a time, based on this data, ODOT also received a small area sample of their InfoUSA database for this study • ODOT Economic Development and Planning Offices also recently purchased two separate version of the Dun and Bradstreet database for their own purposes (largely due to QCEW confidentiality limits) • Taken with the public availability of LEHD and BEA data this provided an opportunity and need for ODOT to compare and contrast data sources

Macro-View

• • • Macro-View will focus on QCEW vs. BEA Expand QCEW to BEA to account for: 1. Ungeocoded QCEW (records do travel modelers no good if not located) 2. Uncovered employment sectors 3. Sole proprietors (most important) 4. Difference between 1 st Qtr. QCEW and annual average BEA 7000000 6000000 Important to expand by county and industry as will be shown 5000000 4000000 3000000 2000000 1000000 0 Total Employment QCEW Geocoded QCEW Total BEA Wage BEA Total Employees Percent 4765940 74% 4909538 5199216 6451236 76% 81% 100%

Ohio Employment Sources

BEA Proprietors Extra BEA Wage Ungeocoded Geocoded Employees

Industry Level QCEW vs. BEA

QCEW Employers MINNING UTILITIES WHOLESALE 15815 RETAIL INFORMATION 3730 FINANCE/INS 16390 MGMT SERVICES1531 EDUCATION ARTS/REC UNCLASSIFIED Total 709 894 35467 6419 3739 PUBLIC ADMIN 6850 547 260139 47 83 86 2235 524 7228 1080 763 913 1292 696 4983 215 2470 324 858 300 529 1390 1153 309 27478

QCEW vs. BEA

Employees 96% 90% 91% 91% 97% 69% 97% 91% 80% 93% 93% 83% 88% 85% 95% 97% 93% 98% 94% 86% 64% 90% 11770 9885 29659 150915 608488 193657 536292 183774 86949 203054 55617 227422 106652 248063 456385 805857 56763 413534 146197 234043 964 4765940 128 462 1946 6822 2580 21674 4922 3288 5673 6198 1679 16112 1344 17312 5389 14069 2282 3468 3370 24569 311 143598 99% 96% 94% 96% 100% 90% 99% 98% 94% 97% 97% 93% 99% 93% 99% 98% 96% 99% 98% 90% 76% 97% BEA 91078 27895 20765 County Allocated %Allocated%QCEWofBEA 84038 19410 17853 92% 70% 86% 13% 37% 152% 296852 648564 236906 671615 215452 93023 331883 234520 367974 113014 387132 147691 830432 119530 443910 338268 834732 0 6451236 291608 647290 226113 671615 196664 92724 331377 233849 355874 110997 383296 137663 778222 119412 443303 337561 834732 0 6451236 98% 100% 95% 100% 91% 100% 100% 100% 97% 98% 99% 93% 94% 100% 100% 100% 100% 0 100% 53% 94% 91% 81% 87% 100% 63% 24% 66% 96% 69% 313% 99% 49% 94% 44% 31% 76%

• There are significant differences so it’s worth delving a bit deeper

QCEW vs. BEA

QCEW Geocoding

• • • 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Mostly automated but manual passes on large employers (hence while only 90% of employers geocoded, 97% of employment) Geocoding not even across industry categories or counties ODOT spent a lot of time fixing multi-site employers, especially school districts which now appear in Ohio’s official file

QCEW Geocoding Percentages

Employers Employees

• • •

BEA Characteristics

While BEA industry and county marginal totals add up, the joint distribution values do not due to limitations in the sources BEA uses to fill in QCEW gaps Hence if you are expanding to industry/county totals you need to use an Iterative Proportional Fitting routine (i.e. Fratar) to account for the unallocated employment (not all industries/counties equal in this regard) BEA data has different (and much higher) sole proprietor rate for farm than other types

BEA Percent Allocated to Counties

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

BEA Proprietor Rates

Farm 83% Private Government 21% 0%

Comparing QCEW/BEA

350% 300% 250% 200% 150% 100% 50% 0% • • BEA adds many commission only employees in NAICS 50 categories, particularly real estate so you should expect high expansion factors here ODOT uses Q1 QCEW so we get high expansion factors in seasonal industries (construction and arts/recreation)

Percent Total QCEW to Total BEA

• Note similarity to previous map

• •

Comparing QCEW/BEA

Tiny representation of agriculture in QCEW renders direct expansion sub-optimal ODOT allocates the BEA farm proprietors based on agricultural acreage instead

Agricultural Employment From ES202 vs Distributed Proportionally to Ag. Acreage

800 700 600 500 400 300 200 100 0 1 46 91 136 181 226 271 316 361 406 451 496 541 586 631 676 721 766 811 856 901 946 991 1036 es202 farm

Comparing QCEW/BEA

While of minor importance, we decided to allocate some of the missing transportation employment to rail terminals prior to expansion

Macro-View Wrap Up

• As mentioned previous, ODOT evaluated other sources beyond QCEW • At a macro level, there are significant differences • These are more difficult to understand at this level, so ODOT conducted some micro analysis at several locations

• This presentation will focus on one location for clarity • A relatively recent and growing commercial/ industrial area in the western suburbs of Columbus • Contains diverse mix of employment types

Micro-View

• However, due to small study area, results shown here should not be generalized, consider them as illustrative only

Micro-View

• The same area looks a bit different depending on the source • RefUSA data only obtained for a subarea • D&B data only obtained for 4+ employee employers

Comparison Methodology

• Obtained data for (mostly) the same area • Compared the employment records by address since no other common unique identifier • Combined this with detailed local knowledge and aerial imagery (study areas were selected based on analyst knowledge) • Necessary to determine when duplicate addresses are valid (office parks, suite’s, corporate vs. franchise and subsidiaries often have employee’s at same address) or when multiple occupants from different year’s are in data • Theoretical maximum employment for an address taken as the maximum valid employment from any of the sources (this is not necessarily the true value since that source may have over-stated the number) • LEHD not included in most comparison’s since it is aggregate data

Comparison Methodology

Purchased data sources contain many duplicate businesses which need removed prior to comparison • More problematic for smaller employers

• •

Comparisons

After removal of duplicates, REFUSA and QCEW performed similarly for large employers, REFUSA had better coverage of small employers (includes some sole proprietors and commission employee’s not in QCEW) D&B didn’t perform as well in this study area Harris one of the two versions of the D&B data purchased by ODOT, only had 20+ employee employers

• •

Combining Datasets

Employers included in purchased data and QCEW were nearly statistically independent Given the 75% and 92% employer coverage in QCEW and Reference USA, one would expect 98% coverage by combining the sources (analyst could not identify any missing employers which implies 100% was obtained but there is certainly some margin of error)

Number of Employers (4+ employees) by Source

140 60 40 20 0 120 100 80 D RD R QRD QD QR Q QCEW QCEW/REFUSA QCEW/D&B REFUSA

Number of Employers if Only Use These Sourceas

D&B

Categorization

• Categorization by industry was similar (89% same for same employers)

Future Direction

Given these results and the desire to produce model datasets not subject to confidentiality constraints ODOT will purchase employment data and develop a process to:

1.

2.

3.

4.

5.

Geocode Remove duplicates Cross match with previous year’s data Cross match with QCEW Develop an employment estimate for employer’s identified by QCEW rather than using value directly