What on earth does it mean?

Download Report

Transcript What on earth does it mean?

What on earth is a p value, a Process
sigma, Cronbach’s alpha, the BlackScholes formula, a Priority in AHP, or
the Sunday Times score for
Portsmouth University? On the
interpretability of measurements
based on mathematical models.
Michael Wood
June 2011
http://userweb.port.ac.uk/~woodm/presentations.htm
Management makes use of many measurements
based on mathematical models, but these are
often difficult to interpret sensibly. This talk will
look at some examples of such measurements,
and the consequences of the problems of their
interpretation – including the employment of
unnecessary academics to teach what should be
obvious, and supporting the bad decisions which
led to the recent financial crash. I will then
discuss how these, and other, measurements
could be redesigned to make them more useful
and user-friendly.
I’ll look at four examples:
1. Six sigma and the process sigma
measurement
2. Null hypothesis significance tests and p
values
3. University league tables
4. Risk measurements and the normal
(Gaussian) distribution
Four examples … with some
imaginary dialogues between
the expert and a naive user ...
Process sigma – the measurement
linked to the Six Sigma philosophy
• The process sigma for this process is 4.833
• What on earth does this mean?
• It means there are 430 dpmo (defects per million
opportunities). Use this Sigma calculator
• So why not just say 430 dpmo? Keep it simple!
• But this would be dumbing down. Life is difficult and we
mustn’t join the modern trend of trying to make it easier.
• Why not? The complicated version adds nothing except
confusing the uninitiated. (Similar comments apply to
Cpk.)
• ... which must be a good thing!
p values
• We’ve done a survey and found that women are
more intelligent than men. p value is 0.004.
• What does the p value mean?
• It tells us how sure we can be about our results
taking sampling error into account.
• 0.0002 is very small. Not very impressive!
• It’s a bit difficult to explain p values to someone
like you, but smaller is better. Less than 5%
mean you can be fairly sure women are cleverer
than men, less than 1% is almost conclusive.
• Sounds like you’re trying to confuse me …
… p values
• I’m told that if the p value is 0.004 this means
that we can be 99.8% confident that women
really are more intelligent based on this data.
Isn’t that a better way to put it?
• No, that’s a common misunderstanding ... you
need to go on a course, although I’m not sure
you’ll take it in ...
• There are lots of common misunderstandings,
but I’m sure about the 99.8% confident ...
University League tables
• The Sunday Times score for Portsmouth
University is 599.
• What does that mean?
• Well … e.g. Southampton got 783 points so
Southampton is obviously a better place to study
• What are the points based on?
• Lots of things: e.g. Student satisfaction,
Research quality
• So do Southampton do better on these two? ...
... University League tables
• Actually Portsmouth do a little better on student
satisfaction (174 vs 169/250), but Southampton
do better on research quality (136 vs 112/200)
• But student satisfaction is more important to
students than research quality ...
• You’ve got to balance the two. The experts at
the Sunday Times have done this.
• But different people may want different things ...
Measurements of risk
• Muddled Michael has a habit of losing his car keys when he goes on
holiday. He reckons he has a 25% chance of losing his keys. He
decides to consult an expert on risk …
• Easy! If he takes 9 spare keys with him, then the probability of
losing all 10 keys is 0.2510 which is about one chance in a million …
which seems an acceptable risk.
• Michael puts all 10 keys on the same key ring (he doesn’t want to
confuse himself by putting them in different places) and goes on
holiday.
• The problem here is that the maths assumes that losing each key is
an independent event. In fact if he loses one key he will probably
lose the rest as well, so a more realistic estimate of losing all his
keys is 25%!
• There are similar assumptions underlying most risk calculations –
but if the calculations are more complicated it is easy not to notice.
Risk and the weather
• The probability of more than 1 mm of rain falling in
Southampton in one day is 31.5%
(Estimated from Met Office graph based on 1971-2000 data.)
• Then, theoretically, the probability of a week when it
rains every day is 0.3157 which suggests that this
happens about every 9 years.
– Two weeks with rain every day is a “once in 29000 years” event.
• Almost certainly happens more often – last time was 2030 November 2009, and the time before was 10-16 of
the same month
(Southampton Weather website)
• The theory is wrong because the assumptions are
wrong!
Risk and the normal distribution
• Very similar assumptions underlie the normal (Gaussian)
•
•
•
•
distribution. This assumes that the variable depends on a large
number of small independent factors. If not the predictions can be
misleading especially for rare events
Many finance measurements depend on the normal distribution and
similar assumptions – e.g. Black Scholes formula. OK in normal
times, but tends to seriously underestimate the probability of big
falls.
If the Dow Jones Industrial average moved in accordance with a
normal distribution, it would have moved by 4.5% or more on only
six days between 1996 and 2003 …. In reality … 366 times”
(Mandelbrot cited by Buckley, 2011, p. 140).
Black Monday (1987) was a 20 sd event, once in a million year
event, experienced several times by people much young than a
million years (Buckley, 2011, 141).
Measures “understood” but not assumptions … trust in a
misunderstood version …
What can go wrong?
1. Unnecessary time and effort expended
– E.g. 50% of time spent on stats courses
could be saved by redesigning concepts? Big
savings in time and effort possible!
2. Failure to understand
a) Complete
b) Subtleties
3. Misunderstanding
a) Of basic concept
b) Of assumptions leading to misleading uses
• P values
... for example ...
– Massive amount of wasted time and energy (think of
all those journal articles), general confusion,
misinterpretations like significant=important
• University league tables
– scores taken too seriously, specific requirements
ignored, creates uniformity because everyone thinks
the same; rational world would be more varied
• Risk
– ignoring unrealistic assumptions led to overconfidence in mathematical measures which helped
the financial crash ...
Principles for designing
measurements for understanding
• Remember most measurements determined by historical
•
•
•
•
accident – therefore can probably be improved for
current users and uses. Design not discovery.
Name should reflect meaning of result, not the method
used to get there
Make sure the direction is intuitive, use units and
percentages as appropriate
Must be an accurate description of meaning of
measurement in users’ language
Users must understand key assumptions (which are not
irrelevant technicalities). If possible users should follow
general idea of derivation.
Reasons for the persistence of
strange measurements
• Aim often ticking a box, not understanding
– Users don’t see problem
• Interests of experts and teachers
– Mystification is good for business! Some
measurements (e.g. process sigma) invented solely
for this purpose?
• The dumbing down myth
– Increased user-friendliness should lead to more, not
less, powerful use of measurements
– We need to dumb up so that even the dumb won’t do
dumb things
References
• Buckley, Adrian (2011). Financial Crisis: causes, context
and consequences. Harlow: Pearson Education.
• I Six Sigma (2011). Sigma calculator available at
•
•
http://www.isixsigma.com
Met Office graph
Southampton Weather website