How to Lie with Statistics Chad Orzel Physics and Astronomy 10/5/04 What’s This All About? Statistics are commonly used to deceive Technically true, but deceptive Preys.

Download Report

Transcript How to Lie with Statistics Chad Orzel Physics and Astronomy 10/5/04 What’s This All About? Statistics are commonly used to deceive Technically true, but deceptive Preys.

How to Lie with Statistics
Chad Orzel
Physics and Astronomy
10/5/04
What’s This All About?
Statistics are commonly used to deceive
Technically true, but deceptive
Preys on fear of numbers
“Math is hard!” --Barbie
False impression of accuracy
“Figures never lie, but liars figure.”
Need to know how to lie with statistics,
to keep from being lied to with statistics.
“There are three kinds of lies:
Lies, Damned Lies, and Statistics.”
--attributed to Benjamin Disraeli
Ways to Lie to Voters
0) Fabrication
Just make things up…
Can be very effective:
Lyndon Johnson:
“Make the son of a bitch deny it.”
Swift Boat Veterans for “Truth”
Not what we’re talking about today
Talking about ways to say things that are true, but misleading…
Example:
A typical person in this class:
1) Is Male
2) Plans to Vote for Kerry
3) Has two siblings
4) Is 26 years old
5) Made $18,000 last year
All true statements, based on survey results!
Ways to Lie to Voters
1) Omission  Leave Things Out
Previous slide: What does “typical” mean?
Specify what kind of average you’re using:
Mean: Add ‘em up, divide by total number
Median: value in middle (half higher, half lower)
Not the same
Mean and Median
Physics Data
10
“Normal Distribution”
“Bell Curve”
# Measurements
Nearly identical for random variables
Very different for skewed data:
8
Mean: 190.1
Median: 190
6
4
2
0
Mean affected by extreme values
186
188
190
Height
Diverse populations
Median less sensitive to extremes
 Usually better for economic data
192
194
Example 1: Siblings
Sibling Distribution
Most people have 0,1,2
Few people with huge families
Limited range
 Can’t have < 0 siblings
50
# Respondants
 Pull mean up
Median
60
40
Mean
30
20
10
0
0
2
4
6
Number of Siblings
8
10
Example 2: Age
Age Distribution
Diverse Population Problem
35
Median
(Much) older faculty
Nobody at mean age
Very bad description
30
# of Respondants
Students, mostly 19-22
25
20
15
10
Mean
5
0
20
30
40
Age
50
60
Example 3: Income
Sort of silly, really…
“The average family will
save $2,000 under my tax
plan…”
What kind of average?
Remember:
The mean includes Bill Gates…
100
Number of Respondants
Usually where this lie comes up:
Income Distribution
Median
80
60
40
Mean
20
0
0
20
40
60
Income ($1,000's)
80
100
Campaign Examples
Bush Tax Cut
Kerry’s $9,000
“We're told that jobs that
pay $9,000 less than the
jobs that have been lost is
the best that we can do.”
“111 million taxpayers will
save, on average, $1,586 off
their taxes.”
Facts:
1) 25% receive NO cut
(drops mean to $1,217)
2) Median cut: $470
Fact:
Based on comparison of broad categories
Lost: Manufacturing jobs
Gained: “Service” jobs
Half of all taxpayers get $470 or less
(http://www.factcheck.org/article.aspx?docID=145)
 Includes burger flippers
(http://www.factcheck.org/article.aspx?docID=228)
Ways to Lie to Voters
1) Omission (Continued)
The Fifth Dentist Problem
“Four out of five dentists surveyed…”
How many dentists total?
5 total: not a good sample
Leave out the sample size, and you can
prove just about anything…
“Four out of five cards
drawn from this deck were black!”
Campaign Example
“And that's what people are seeing now is happening in Afghanistan. Ten million citizens
have registered to vote. It's a phenomenal statistic. That if given a chance to be free they
will show up at the polls. Forty-one percent of those 10 million are women.”
--G.W. Bush, 1st Presidential Debate
• Ratio of men registered to women registered: 58.6 to 41.4 percent
• Estimated eligible voting population in Afghanistan: 9.8 million
• Registered voters in Afghanistan, as of August 21: 10.3 million
• Reported number of registration cards a single Afghan has been able to obtain: from 2 to 40
• Percent of the estimated eligible male population that is now registered to vote: 120 percent
• Number of provinces that are over-registered: 13 (out of 30)
• Number of provinces which registered voters exceed the population by 40% or more: 4
(http://www.tcf.org/afghanistanwatch/main.htm#voterregistrationfraud)
Ways to Lie to Voters
2) Exaggeration  Make Something of Nothing
Fear of big numbers:
“My opponent wants to spend $2 million on
[something]…”
Sounds bad…
$2 million = 1/1,000,000th of the budget
= chump change
Need to put big numbers in context
Example: Guys Rule!
Gender Distribution
More Survey Data…
56
Nothing false in graph
Creates false impression
54
% of Respondants
Scale axes to blow up
small differences
52
50
48
46
44
Male
Female
Example: Guys Rule!
Gender Distribution
100
Honest presentation:
Bars same width, color
Slightly more male students
Not that big a difference
% of Respondants
Full scale shown
75
50
25
0
Male
Female
Example
(http://www.pollkatz.homestead.com/)
Campaign Example
“According to the first post-debate poll, from Newsweek, John Kerry leads President Bush
by a margin of 49% to 46%. Put Nader in the mix and Kerry's margin drops from 3 to 2.”
--Josh Marshall, Talking Points Memo (weblog)
“In the first national telephone poll using a fresh sample, NEWSWEEK found the race now
statistically tied among all registered voters, 47 percent of whom say they would vote for
Kerry and 45 percent for George W. Bush in a three-way race.” --MSNBC
(1,013 voters surveyed, Margin of Error +/- 4%)
What does margin of error really mean?
(http://www.washingtonmonthly.com/archives/individual/2004_08/004536.php)
Other Ways to Lie
3) Misdirection True, but Irrelevant
Quote impressive statistics about side issues
Creates false impression of real support
4) False Correlation  Post Hoc Fallacy
Homicide rates peak in summer
Ice cream sales peak in summer
Therefore, ice cream leads to murder?
Correlation is not Causation
What to Do?
Questions to ask about any statistic:
1) Who created it?
Do they have an agenda?
2) Why was it created?
Research or politics?
3) How was it created?
Methodology
What to Do? (continued)
Questions to ask about any statistic:
4) What’s missing?
Is there hidden context?
5) Is it relevant?
Avoid misdirection
6) Does it make sense?
If it sounds ridiculous, it probably is…