FOOLING BY STATISTICS - Middle East Technical University

Download Report

Transcript FOOLING BY STATISTICS - Middle East Technical University

FOOLING BY STATISTICS
5 Ways to Avoid Being Fooled By Statistics
by Jiafeng Li on August 8, 2013 in Market Research
http://www.iacquire.com/blog/5-ways-to-avoid-being-fooled-by-statistics
and
http://www.webmechanix.com/data-misrepresentation-issues-marketing-agencies
If something were to happen to the validity of our
data, then the outcome of our decision-making
would be affected accordingly.
• There are so many ways statistics can be wrong since statistics come from
data. From data to statistics there are processes like:
 data collection
 data entry
 data analysis
 data reporting
 data visualization
• For different stages, there are chances of malpractice. For example, the
way of data collection may be biased; errors may occur during data entry;
the data analysis may be misrepresented and flawed; the results of data
analysis during data reporting may be misinterpreted; the data
visualization may be misleading.
How Data Misrepresentation Can Cost You
Thousands (Or More!)
Graphical Misrepresentation of Data
• One of the easiest ways to make sense of large data sets is with a visual aid.
These visual aids include things like graphs and charts. While helpful for
reporting, visual representations of data can be very misleading if used
improperly.
• Below an example is created using the number of “leads” generated over the
course of 10 weeks. NOTE: Assume week 8 is simply an anomaly. There were
no extra marketing efforts made, just one great, random, week.
Leads Generated
Week 1
20
Week 2
20
Week 3
30
Week 4
10
Week 5
10
Week 6
10
Week 7
10
Week 8
80
Week 9
10
Week 10
10
A screenshot from Fox News in 2009
What?! The statistics in the pie chart add up to 167%? Isn’t it supposed to
be 100%? If you see a chart like this, don’t make any guess, just discard it!
• Did you catch it? There is no labels on the x-axis. We have no clue
where it starts. But how scary it looks. It zooms up from some point in
the bottom to 9.9%. Oh my god! The prices are going up and times
are bad.
• This is a new trend. The presenter wants to show that the sales of
Brand X has doubled. The height of the second image is double that
of the first. So what’s wrong?
The flaw is, when we increase the height by two times, the width also
goes up two times. Even though the label says 40 million, the second
image is 4 times bigger than the first. Hence to the eye and the mind
the growth 'looks' much more than what it is.
Advertisements like to manipulate consumers’
minds with statistics
• Look at the advertisement by AT&T below. But, really, don’t believe it,
unless you are provided with a detailed report on it. You really don’t
know how and where and whom they collect the data from. These
factors can make very different results. So, statistics like these are also
not convincing.
• The last figure of 9.01% looks like a big jump from 8.06%. Inflation has
shot up! Wait, where does the vertical x-axis start? 7. Should it not
start at 0? This is what we were taught in school.
Here lies the trick. To make the jump significant, set the axis at 7. If
you actually keep the axis at zero, the jump will not look high and
hence will not be a 'saleable story' and will never make the front
page.
A statistic without a source is useless. If the source
is provided, always check the authority of the
source.
• Credible statistics look like these:
Sampling Bias
• If samples are not representative, the statistics will be biased. So it is always a
good practice to check the sample size. If the sample size is too small, the results
will be easily biased.
• During data collection, there are possibilities of sampling bias: unrepresentative
demographics, unrepresentative geographic locations, etc. With sampling bias,
the results of data would be of no value or very little value since they can be
quite different from what the actual world is like.
• The presidential election in 1936 between Roosevelt and Landon? The Literary
Digest Magazine, one of the most respected magazines at that time predicted
that Landon would win the election by a large margin while the real election
results turned out to be the opposite. The cause of this is sampling bias. The
Literary Digest Magazine polled over 10 million people and received 2.4 million
responses. Those who responded to the poll were mostly upper class people who
are more likely to vote for Republican candidate.
Statistics That Are Skewed Purposely
• Even with correct data results, statistics can be misinterpreted. In this
case, you will see wrong conclusions drawn from accurate data
analysis results. On the other hand, some statistics are skewed or
exaggerated visually to make them serve the author’s purposes. In
this part, we will address the issues raised from the stages of “data
reporting” and “data visualization.”
GAS PRICES
• Fox Chart Showed Gas Prices Were Consistently Rising. On February 20, Fox News
displayed a graphic that used three random data points: One was the national
average gas price from the day the graphic aired, the other two were chosen from
the previous week and the previous year. From Fox News' America's Newsroom:
• In Reality, Fox Cherry Picked Data To Hide Fact That Fluctuating Gas Prices Had
Fallen From High Points. An accurate representation of gas prices over the 12month period starting in February 2011 showed that gas prices in February 2012 -the highest point on Fox's graphic -- were actually down from their high in AprilMay of 2011. From AAA:
Misinterpretation and Logical Fallacies
• The conversation below is what I heard from a couple:
• Boyfriend: You’re cool when you’re drunk.
Girlfriend: So I am not cool when I am not drunk?!
Boyfriend: WTF??
• This is a typical logical fallacy: using a proposition against the original propositions while
the two propositions are not collectively exhaustive. Collectively exhaustive means one
of the two propositions must happen and there are no other possibilities of other events.
However, “cool when drunk” and “not cool when not drunk” are not collectively
exhaustive. “Cool when not drunk” can also be a possibility. So “girlfriend” just eliminates
the “cool when not drunk” proposition.
• When interpreting the data results, some people also made some logical fallacies like the
above example. When interpreting 37% of New York City citizens have gone to Central
Park once, a conclusion like “this indicates 63% of NYC citizens have never been to
Central Park” is incorrect. 0 and 1 are not collectively exhaustive. There are possibilities
of having been to Central Park for 2 times, 3 times, etc. So, 63% not only includes those
who have never been to Central Park, but those who have been there multiple times.
Whenever you see some interpretation like this, be mindful of the logical fallacies
problem.
A video of Oxford mathematician Peter Donnelly. reveals
the common mistakes humans make in interpreting
statistics -- and the devastating impact these errors can
have on the outcome of criminal trials.
Let’s watch the video:
https://www.youtube.com/watch?v=kLmzxmRcUTo