ACS Data Issues

Download Report

Transcript ACS Data Issues

Working with the data
http://www.edthefed.com/presentations/ACS_Data_Issues.ppt
Where to begin?
Have you come across
any ACS data issues in
your work?
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Sample Error (90% Confidence)
Collapsing
Period Estimates
Reliability
Dollar Values
Trend Analysis
Weighing Change
Light Rail
Reweighting
CTPP Issues
Block Group data
You must do Statistical Significance Tests
Sampling Error
To avoid false
statements like
Commutes increase for all modes
“Based upon data from the 2000 Census (CTPP) and
the 2005-2007 ACS, the total number workers who live
in Flagstaff increased along with the number who took
transit to work. During the same time, the number of
people who worked at home increased along with
those who drove alone and carpooled.” The World Gazette
How do you do a Significance Test?
It is simpler than it looks and there are a lot guides
1. Get the Margin of Error (MOE) from ACS
2. Calculate the Standard Error (SE)
[SE = MOE / 1.645]
3. Solve for Z where A and B are the two
estimates
AB
Z
(SE(A))2  (SE(B)) 2
4. If Z < -1.645 or Z > 1.645
Difference is Significant at 90% confidence
Some things to keep in mind
Obtaining Standard Errors is the Key
Formulas vary depending comparisons
• Sum or Difference of Estimates
• Proportions and Percents
• Means and Other Ratios
Working with 2000 data will be a little
more involved
There are resources to help
The ACS compass handbooks
A Compass for
Understanding And Using
ACS Data
l Set of user-specific
handbooks
l Train-the trainer
materials
l E-learning ACS Tutorial
l Annotated Presentations
http://www.census.gov/acs/www/guidance_for_data_users/compass_products/
NY State Data Center Calculator
http://sdcclearinghouse.wordpress.com/2009/03/03/spreadshee
t-to-calculate-acs-margins-of-error-and-statisticalsignificance-for-sums-proportions-and-ratios/
But what if I am using 2000 non-ACS Data?
You will need to Estimate the MOE and know the
Survey Design Factor
The CUTR Guide has you covered
There’s a Report
http://www.nctr.usf.edu/pdf/77802.pdf
and a Spreadsheet
Calculator
http://www.nctr.usf.edu/spreadsheet/77802.xls
http://www.nctr.usf.edu/abstracts/abs77802.htm
Transportation resources
http://onlinepubs.trb.org/on
linepubs/nchrp/nchrp_rpt_
588.pdf
Understanding the MOE
Part 1, Profile 1 (Resident data)
Using the MOE
We know the number of workers
has changed, but what is the
range of that change?
A. 5,744?
B. 5,072 to 6,416?
C. 3,888 to 7,600?
Another Flagstaff point
Part 1, Profile 1 (Resident data)
Part 2, Profile 1 (Workplace data)
Between the
reference period
what has the
number of people
who took transit to
work in Flagstaff
done?
A. Gone Up?
B. Gone Down?
C. No significant
Change
Which Table would
you use and why?
Two types of Collapsing
Collapsed table
C08301. MEANS OF TRANSPORTATION TO WORK Universe: WORKERS 16 YEARS AND OVER
Data Set: 2007-2009 American Community Survey 3-Year Estimates
Full table not
available
Sometimes
neither tables
exist
And MOEs are
greater than
estimate
Population = 26,566
“B” and “C” Tables
B08006 C08006 Means of Transportation
“B” and “C” Tables
Full and collapsed table
What do you
notice about
the Table?
Some things to be aware of
What year is the
data?
Period Estimate
Reliability/Currency
What data is more reliable?
Which is more current?
Dollar Values and Income tables
ACS asks-- What was your
income during the last 12
months?
Single Year Estimates
12 different periods
Each adjusted to single
period (Jan to Dec)
Multiyear Estimates
Each year adjusted to
current year
About Trend Analysis
Trend analysis
(overlapping syndrome)
If you are doing trend analysis with multi-year
estimates you can not compare successive period
estimates due to the overlapping middle years.
Also, you can not compare a 3-year estimate with a
5-year estimate
Change in Weighting
In 2009
changed to
using subcounty totals
as opposed
to just
county totals
Change in Weighting
Detroit Example
“Detroit is the poster child for odd looking data”
Change in Weighting (Analysis)
In 2009
changed to
using subcounty totals as
opposed to just
county totals
Light Rail Conundrum
Impact of New “Light
Rail” systems might
not be showing up
Source: 2000 CTPP and 2007ACS3, CTPP Data Profile 1
One more thing on Pop Estimates
The older estimates get revised every year but the ACS
does not get reweighted
Maricopa County Population Estimates
Now let’s focus on the CTPP data
But First a word on Disclosure - 3 year tables
DRB Said… “Too many variables” crossed with
Means of Transportation (Mode)
…makes for micro data record
…and with a micro data record you
could identify an individual
The Battle Ensued
We Said…
No, You can’t identify an individual
-- Hired a statistical consultant < 0.01%
-- Had a hearing with DRB Bosses
-- Made every argument possible
Census Said…
Tough Luck
--Compress your Modes and
improve your chances of passing
our rules
-- Chop your cross tabs to 5
variables
What we ended up with – for 3 year Tables
Five (5) Variables crossed with
Means of Transportation to work (MOT)
…and
A boat load of collapsing of the Modes
…and
Disclosure Rules
Rule 7 was the killer
7. For Worker Flows
Must have 3 unweighted records for each
O-D pair
Does not apply to Total Workers or
Workers by Mode to Work (all 18 modes)
(means of transportation)
For the 5-year CTPP
So What Did We Do?
NCHRP Web Report 180 ($550K)
Producing Transportation Data Products from
the ACS that Comply With Disclosure Rules
5-year CTPP will have two types of tables
Tables that passed Census Rules
Tables with Perturbation done to them
Privacy Protection
http://onlinepubs.trb.org/onlinepubs/nchrp/nchrp_w180.pdf
Table Summary using 5-year Table list
TAZ/BG Tract TAD Place County PUMA State
Part 1
Regular
Perturbed
111
77
Part 2
Regular
Perturbed
50
65
Part 3
Regular
Perturbed
2
38
Tables Using Perturbed Data Set
Means of transportation Aggregate Vehicles Used
Aggregate Travel Time Mean HH Income
Aggregate HH Income
Aggregate Carpools
Almost all Part 3 Tables
Still left with some Disclosure Rules
For All tables Regular (A) + Perturbed (B)
1. All Tables Rounded
0 = 0, 1-7 =4, 8 or > = nearest multiple of 5
2. Any number that ends in 5 or 0 stays as is
3. Aggregate dollar values rounded to nearest 100
4. Aggregate minutes to work and aggregate
vehicles use standard rounding
5. Totals Rounded independently of cells
6. Medians or quintiles not subject to rounding
7. Percentages and rates calculated after rounding
8. Medians and aggregates must be based on 3 or
more values
Still left with some Disclosure Rules
For Regular (A) Tables Only
1. Cell Suppression: For Tables 101106 (unweighted
sample count of the population), 101107 (percent of
population in sample), 110101(total housing units
sampled), and 110103 (percent of housing units
sampled), there must be 0 or at least 3 or more
occupied housing units in sample to show the table
2. Table Suppression: Aggregates and Means must
have at least 3 unweighted cases to be shown. The
policy of the ACS program is that if any one cell in a
table is suppressed, the whole table is suppressed
Some early issues with the 5-year ACS?
Standard Data Products
Some Very Large MOEs
Block Group data only in
download area
Reliability of tract estimates is
much lower than the 2000 LF
NO Workplace Tables!
The Census Bureau says: BG data should ONLY be used to build up
larger geographic areas because the Margins of Error (MOEs) are
too large otherwise
(JSM Conference August 2010)
Ken Hodges, Nielsen (claritas)
ACS 5-Year Data: A First Look at the First Release (4.5 MB, ppt)
http://www.copafs.org/UserFiles/file/HodgesMarch2011.pptx
Let’s talk about Block Group Data for a moment
Source: Tract Data-Missouri State Data Center, Block Group Data-AFF
AFF all 21 Modes, MSDC all 21 but also collapsed with Total
Commuters Added
MSDC put a value to MOES.
First: Let’s consider MOEs
What do you notice?
Don’t forget if this was CTPP data it would be Rounded too
Now lets fill in the table
CB does not give you Total Commuters but you
like that. Can we talk about that for a moment?
Now lets fill in the table
How would we get
Total Commuters and
more importantly the
MOEs?
For the Estimate totals, just add the relevant
estimates. But for MOEs you have some decisions
to make
Now lets fill in the table
Two different MOE approaches
available
488
1. Calculate the 90% margin of
error of the sum of more than
two estimates
2. Calculate the 90% margin of
error of the sum or difference
between two estimated values
(What two values would you use?)
1. Gives you an MOE of either 245 when including the
MOE for ‘Other Means’ or 214 without it
2 Gives you an MOE 0f 209
What data should I use?
Travel Times
for the 6counties in
NE Illinois
1. To compare with 1970, ‘80, ‘90 and 2000 Travel Times?
2. To compare with my town of 52K people?
3. To validate my 2008 vintage travel demand model?
Learn how to do the Coefficient of Variation Test
The Upside - Data Evolution
Once you know all the
data issues it is
possible to use it
intelligently
It’s ignorance that kills
you
Slides available at: edthefed.com
http://www.edthefed.com/presentations/ACS_Data_Issues.ppt