Improving the Quality of Tax Statistics: Recent Innovations in Editing and Imputation Techniques at the Statistics of Income Division of the U.S.
Download
Report
Transcript Improving the Quality of Tax Statistics: Recent Innovations in Editing and Imputation Techniques at the Statistics of Income Division of the U.S.
Improving the Quality of Tax Statistics:
Recent Innovations in Editing and
Imputation Techniques at the Statistics of
Income Division of the U.S. Internal
Revenue Service
Scott Hollenbeck – [email protected]
Barry Johnson – [email protected]
Melissa Ludlum – [email protected]
Today’s Presentation
Overview of Statistics of Income (SOI)
Dealing with Missing Data
Recent Innovations
Future Plans
What Does SOI Do?
Primary source of U.S. tax data
Data from 110 tax returns and information documents
Test and correct data collected during administrative
processing (IRS Masterfile)
Collect extensive additional data from forms, schedules
and attachments
Most projects collect data from samples
Products
Micro data files for U.S. Treasury Department & Congress
Public-use files
Tables and analysis (www.irs.gov/taxstats)
SOI Data Collection Systems
Maintains computer network separate from
main IRS processing
Data collection takes place in IRS
Submissions Processing Centers
Graphical User Interface (GUI) systems
based in ORACLE
Data tested for internal consistency
Post-edit processing overseen by
headquarters’ staff
Three Major SOI Programs
Individual Income Tax
Filed by individuals and married couples to report most
forms of personal income
133 million returns filed in 2006
Corporation Income Tax
Filed by incorporated businesses to report income from
parent corporation and subsidiaries
2.5 million returns filed in 2006
Tax-exempt Organizations
Annual information returns report assets, income,
expenses
833,000 returns filed in 2006
Missing Data – Unit Nonresponse
Causes
Extensions/late-filed returns
Tax evasion
Strategies
Update values from prior year using survey
responses
Utilize records for recent prior years filed
during the selection period
Missing Data – Item Nonresponse
Causes
Taxpayer neglects to provide attachments
Paper return is being used by another IRS
function
Strategies
Use IRS Masterfile data for key values
Impute values based on existing data and
information provided on prior and/or subsequent
return
Surveys and direct contact with preparers
What’s New?
Digital images of tax returns
Electronic filing
Automated error correction/imputation routines
Digital Return Images
In 1998 SOI began scanning operations
Images stored in Tagged Image File Format (TIFF)
In 2006, imaged more than 71.5 million pages
from 30 different tax and information returns
Many users:
SOI headquarters staff
SOI edit operations
IRS Functions
General Public (tax-exempt organizations only)
Split-Screen Edit Systems
Combines scanned image and GUI edit
system on a single 24 inch wide-aspect
monitor
Image displayed using Adobe Acrobat or
specially adapted ORACLE programs
Image and edit systems are synchronized
Online access to instructions, dictionaries,
other tools
Split-Screen Edit Systems
Positive feedback from editors
Slight overall improvement in productivity and
quality
Images available to geographically disbursed
work force
Reduced storage of paper documents
Reduced impact on other IRS functions
Electronic Filing of Tax Returns
2004 Modernized electronic filing (MeF) began
Uses Extensible Markup Language (XML) to
capture:
Numeric and character strings supplied by
taxpayer
Information tags
2005 mandatory e-file for large business and
tax-exempt organizations
20.5% SOI sample of corporate income taxes
13.5% SOI sample of tax-exempt organizations
SOI Use of MeF Data
In 2006, SOI developed programs to render
digital images from XML data
Edit returns using split-screen applications
In 2007, will populate ORACLE data tables
directly with XML data
Editors will validate data, supply codes and
allocate certain data items
Electronic Filing of Tax Returns
Individual income tax returns
1986 – E-file through paid preparers
1992 – E-file from home computers allowed
1994 – 98% of all filers eligible to e-file
2006 – 73 million returns, or 54%, e-filed
Data stored in Tax Return Database (TRDB)
ASCII data, not tagged XML
2010 – Scheduled for conversion to MeF
SOI Individual Income Tax Program
Sample of returns processed differently
depending on certain criteria
Edited returns
“Missing returns”
Forced closed returns
Individual Processing Programs
Online editing system – editors transcribe,
code and review any potential data
discrepancies
Post Edit Reconciliation Process (PERP) –
automated computer program which validates
and adjusts data
Edited Returns
Edited returns are processed through the online
editing system by an editor, then reviewed
using the PERP program
Prior to Tax Year 2004, all sampled returns
which were not “missing” were manually edited
Currently only paper returns and electronically
filed returns with specific characteristics are
edited through online system
“Missing Returns”
Each year, approximately 250 paper returns
selected for the sample are not located
Limited IRS Masterfile data available
PERP program used to impute missing
details of forms and schedules
Forced Closed Returns
Automated processing of certain E-filed
returns in the SOI sample
Bypass the online editing system and
processed through the PERP program
Returns with possible discrepancies are
reviewed by National Office analyst
Returns that pass all tests are considered
“forced closed” and added to final data file
Results from Forced Closing Returns
Tax Year 2004 – First year using automated
closing of selected electronically filed returns
Total sample size – 200,295 returns
Electronically filed – 64,670 returns
“Forced Closed” – 18,193 returns
Editing hours saved – 1,400 hours
Results from Forced Closing Returns
Tax Year 2005 – Second year of program,
expanded criteria for returns eligible to be
“forced closed”
Total sample size – 292,837 returns
Electronically filed – 114,897 returns
“Forced Closed” – 47,753 returns
Editing hours saved – 4,100 hours
The Future - Data
More returns and information documents will
be filed electronically
Optical Character Recognition or Intelligent
Character Recognition will be used to capture
data from paper-filed returns
Data will be available in real time
Enable larger sample sizes and increased
use of population files
The Future – Field Operations
Increased resources dedicated to resolving
data inconsistencies as opposed to data
transcription
Paperless environment – use of electronic
data or digital images created from paper
returns
Increased use of prior year data to identify
and correct data anomalies
The Future - Products
Improvements in technology and increased
use of electronic filing will allow SOI to
produce more data, more quickly and more
efficiently
Increased sample sizes will allow small area
estimates
Population files will allow for creation of ad
hoc panels, linkage of data items across tax
form types and research on infrequent data
items