The most dangerous equation - University of Pennsylvania

Download Report

Transcript The most dangerous equation - University of Pennsylvania

The most dangerous equation
Howard Wainer
National Board of Medical Examiners
The danger equation
P(Y) = P(Y|x=1) P(x=1) + P(Y|x=0) P(x=0)
Where
Y = The event of looking like an idiot,
x = 1 The event of knowing the equation in question
x = 0 The event of not knowing the equation.
This equation makes explicit the two aspects of what constitutes a
dangerous equation.
There are two obvious pieces:
• Some equations are most dangerous if
you know them (the first term on the
right side),
• Others are dangerous if you do not
(the second term).
I will not explore the dangers an equation might hold because
the secrets within its bounds open doors behind which lie
terrible peril.
Few would disagree that the obvious winner in this contest
would be Einstein’s iconic equation
E = MC2
(1)
for it provides a measure of the enormous energy hidden
within ordinary matter.
This is not the direction I wish to pursue.
Instead I am interested in equations that
unleash their danger, not when we know
about them, but rather when we do not;
Equations that allow us to understand things
clearly, but their absence leaves us
dangerously ignorant.
Probability of Looking Like an Idiot
Graphically, we can place all equations on a scale like
that depicted below:
1.0
0.8
0.6
0.4
0.2
0.0
Dangerousness
There are many plausible nominees for the
title of “Most Dangerous”
For example, economists might,
with ample evidence,
pick Kelley’s equation
ˆ  x  (1   )

See for example, Baumol et al (1989), Carnevale
(1999), King (1934), Mills (1924), Secrist (1933),
Sharpe (1985), and Williamson (1991) who
interpreted regression toward the mean as having
economic causes rather than merely reflecting the
uncertainty of prediction.
For more details of its misuse see Friedman (1992),
Hotelling (1933), Stigler (1997), and Wainer &
Brown (2007).
Others might suggest that even more have been led astray
through their ignorance of Bayes’ Theorem
P(X|Y) = P(Y|X) P(X)/P(Y)
(3)
See the Nobel Prize winning work of Kahnemann &
Tversky (e.g. Kahneman, 2002) who provide
empirical evidence of how human decision-making
can go wildly wrong because this equation is not
built into human intuition.
Others (Mosteller & Tukey, 1977) have suggested that
the linear model
Y = b1x1+ b2x2 + …+ bnxn
might be considered a prime candidate because:
(4)
1) It suggests that b1 tells about the relationship between y and
x1, which is not the case, yet many interpret it in that way.
2) It encourages fallacious causal interpretation (if I change x1
by 1 unit then y will change by b1 units).
3) It encourages fallacious interpretation even by those who
think they are being careful. ("I can't know the value of the
coefficient, but surely its sign tells whether increasing x1 will
increase or decrease y.")
4) It is badly non-robust, but rarely diagnosed appropriately, so
many models are misleading.
And
5) It assumes linearity at a high-dimensional level, which is hard to
check. But even at the bivariate and trivariate levels, often
the data are not linear, but not checked.
6) When applied to observational data (as it almost always is), it is
difficult to know whether an appropriate set of predictors
has been selected -- and if we have an inappropriate set, our
interpretations are questionable.
7) It collapses with little warning when computations are degenerate.
It is dangerous, ironically, because it can be the most useful model for
the widest variety of data when wielded with caution, wisdom, and
much interaction between the analyst and the computer program.
Although these are all worthy
competitors I believe that the
championship title must go to De
Moivre’s (1730) equation:
x   / n
(5)
I arrived at this conclusion for three reasons
related to:
(i)
(ii)
(iii)
The extreme length of time that ignorance of
it has caused confusion,
The wide breadth of areas of application that
have been misled, and
The seriousness of the consequences that such
ignorance has caused.
In the balance of this talk I will describe five
very different situations in which ignorance of
De Moivre’s equation has led to billions of
dollars of loss over centuries yielding untold
hardship; these are but a small sampling, there
are many more.
Example 1.
The Trial of the Pyx: six centuries of
misunderstanding
In 1150, a century after the Battle of Hastings, it was recognized that the king
could not just print money and assign it to have any value he chose. Instead
the coinage’s value must be intrinsic, based on the amount of precious
materials in its make-up. And so standards were set for the weight of gold
in coins – a guinea should weigh 128 grains (there are 360 grains in an
ounce).
It was recognized, even then, that coinage methods were too imprecise to
insist that all coins be exactly equal in weight, so instead the king and the
barons, who supplied the London Mint (an independent organization) with
gold, insisted that coins when tested in the aggregate (say 100 at a time)
conform to the regulated size plus or minus some allowance for variability
(1/400th of the weight) which for one guinea would be 0.32 grains and so,
for the aggregate, 32 grains). Obviously, they assumed that variability
increased proportionally to the number of coins and not to its square root.
This deeper understanding lay almost 600 years in the future with De Moivre’s
(1730) exploration of the binomial distribution (Stigler, 1999, 367-368).
The costs of making errors are of two types.
If the average of all the coins was too light the barons were being
cheated, for there would be extra gold left over after minting the
agreed number of coins.
This kind of error is easily detected and, if found, the director of the
Mint would suffer grievous punishment.
But if the variability were too great it would mean that there would be
an unacceptably large number of too heavy coins produced that
could be melted down and recast with the extra gold going into the
pockets of the minter.
By erroneously allowing too much variability it meant that the Mint
could stay within the bounds specified and still make extra money
by collecting the heavier-than-average coins and reprocessing
them.
The fact that this error was able to continue for almost 600 years
provides strong support for De Moivre’s equation to be considered
a strong candidate for the title of most dangerous equation.
Example 2: Country living: A Bane or a Blessing?
Figure 3.
Age adjusted kidney cancer rates for all US counties
in 1980-1989 shown as a function of the log of the
county population.
20
A
g
e
A
d
j
u
s
t
e
d
C
a
n
c
e
r
15
10
5
R
a
t
e
3.5
4.5
5.5
Log(population)
6.5
Example 3.
The small schools movement: Billions for increasing variance
In the late 1990s The Bill and Melinda Gates Foundation began supporting small
schools on a broad-ranging, intensive, national basis. By 2001, the
Foundation had given grants to education projects totaling approximately
$1.7 billion. They have since been joined in support for smaller schools by
the Annenberg Foundation, the Carnegie Corporation, the Center for
Collaborative Education, the Center for School Change, Harvard’s Change
Leadership Group, Open Society Institute, Pew Charitable Trusts, and the
U.S. Department of Education’s Smaller Learning Communities Program.
The availability of such large amounts of money to implement a smaller schools
policy yielded a concomitant increase in the pressure to do so, with
programs to splinter large schools into smaller ones being proposed and
implemented broadly (e.g. New York City, Los Angeles, Chicago and Seattle).
Obviously the hope was that smaller schools will yield
higher achievement, or
E(achievement|small) > E(achievement|big)
But what was the supporting evidence?
It was found that when one looks at high performing schools one is apt to see
an unrepresentatively large proportion of smaller schools.
Or, stated mathematically, that
P(small|high achievement)
is greater than anticipated by the number of small schools.
Unfortunately, the latter does not imply the former.
5th grade math scores by school size in Pennsylvania in 2005.
Small schools are overabundant at both extremes.
5
t
h
g
r
a
d
e
m
a
t
h
1500
1250
1000
s
c
o
r
e
750
300
600
Enrollment
900
1200
In these data from Pennsylvania we see that 12% of the highest
performing schools are from the 3% smallest
(a 400% over-presentation).
But alas there is a similar over-representation among the worst
performing schools.
The regression line is flat – in 5th grade at least, school size
has no effect.
In high school the story is different
There is still an overrepresentation of small
schools at both
extremes, but now the
regression line has a
strong positive slope.
Small schools perform
worse!
11th
Grade
Math
Scores
1600
1400
1200
1000
100
250
630
Enrollment
(spaced on a log scale)
1580
On October 26, 2005 , after expenditures of over $1.7 billion,
Lynn Thompson, in an article in The Seattle Times reported that:
“The Gates Foundation announced last week it is moving
away from its emphasis on converting large high schools
into smaller ones and instead giving grants to specially
selected school districts with a track record of academic
improvement and effective leadership.”
This point of view was amplified in a study by Schneider, Wysse
& Keesler, that was reported in an article by Debra Viadero in
Education Week (June 7, 2006) in which after a careful
analysis of matched students in schools of varying sizes Ms.
Schneider concluded,
“I’m afraid we have done a terrible
disservice to kids.”
Example 4. The Safest Cities to Drive in
In the June 18, 2006 issue of the New York Times (News of the
Week in Review, page 2) there was a short article that listed the
ten safest US cities and the ten most unsafe based on an
automobile insurance company statistic “average number of years
between accidents”.
The cities were drawn from the 200 largest cities in the US.
It should come as no surprise that a list of the ten safest cities or
the ten most dangerous cities have no overlap with the ten largest
cities.
Table 1 . Information on automobile accident rates in 20 cities
(data from the NY Times and www.allstate.com/media/newsheadlines)
City
State
Population
Rank
Population
Number
of Years
Between
Accidents
Ten safest
Sioux Falls
Fort Collins
Cedar Rapids
Huntsville
Chattanooga
Knoxville
Des Moines
Milwaukee
Colorado
Springs
Warren
South Dakota
Colorado
Iowa
Alabama
Tennessee
Tennessee
Iowa
Wisconsin
170
182
190
129
138
124
103
19
133,834
125,740
122,542
164,237
154,887
173,278
196,093
586,941
14.3
13.2
13.2
12.8
12.7
12.6
12.6
12.5
Colorado
Michigan
48
169
370,448
136,016
12.3
12.3
New Jersey
DC
New Jersey
Virginia
Virginia
California
New Jersey
New Jersey
California
Maryland
64
25
189
174
114
92
74
148
14
18
277,911
563,384
123,215
128,923
187,873
200,499
239,097
150,782
751,682
628,670
5.0
5.1
5.4
5.7
6.0
6.1
6.2
6.5
6.5
6.5
Ten Least
Safe
Newark
Washington
Elizabeth
Alexandria
Arlington
Glendale
Jersey City
Paterson
San Francisco
Baltimore
Ten biggest
New York
Los Angeles
Chicago
Houston
Philadelphia
Phoenix
San Diego
San Antonio
Dallas
Detroit
New York
California
Illinois
Texas
Pennsylvania
Arizona
California
Texas
Texas
Michigan
1
2
3
4
5
6
7
8
9
10
8,085,742
3,819,951
2,869,121
2,009,690
1,479,339
1,388,416
1,266,753
1,214,725
1,208,318
911,402
Example 5. Sex differences
“It does appear that on many, many, different human
attributes – height, weight, propensity for criminality,
overall IQ, mathematical ability, scientific ability – there
is relatively clear evidence that whatever the difference
in means – which can be debated – there is a difference
in standard deviation/variability of a male and female
population. And it is true with respect to attributes that
are and are not plausibly, culturally determined.”
Lawrence Summers (2005)
Table 2. Summary of some outcomes, by sex from National Assessment of
Educational Progress
8th Grade NAEP National Results
Subject
Math
Year
1990
1992
1996
2000
2003
2005
Mean Scale Scores
Male
Female
263
262
268
269
271
269
274
272
278
277
280
278
Standard Deviations
Male
Female
37
35
37
36
38
37
39
37
37
35
37
35
Science
1996
2000
2005
150
153
150
148
146
147
36
37
36
33
35
34
1.09
1.06
1.06
Reading
1992
1994
1998
2002
2003
2005
254
252
256
260
258
257
267
267
270
269
269
267
36
37
36
34
36
35
35
35
33
33
34
34
1.03
1.06
1.09
1.03
1.06
1.03
Geography
1994
2001
262
264
258
260
35
34
34
32
1.03
1.06
US History
1994
2001
259
264
259
261
33
33
31
31
1.06
1.06
Source: http://nces.ed.gov/nationsreportcard/nde/
Male/Female
Ratio
1.06
1.03
1.03
1.05
1.06
1.06
Note that the ratio of standard deviations
was 1.10 in Project Talent
(a 1960 study of 73,000 15 year olds).
The reduction in the difference in variability over
the intervening 40 years may be progress or may
just reflect a difference in the character of the
tests.
In discussing Lawrence Summers’ remarks Christiane
Nüsslein-Volhard, the 1995 Nobel Laureate in
Physiology/Medicine, said,
“He missed the point. In mathematics and science, there is no
difference in the intelligence of men and women. The
difference in genes between men and women is simply the Y
chromosome, which has nothing to do with intelligence.”
(Dreifus, July 4, 2006)
Is Professor Nüsslein-Volhard’s syllogism:
1. There are no differences in intelligence between men and women.
2. Men have a Y and women don't.
3. Therefore nothing about intelligence can be carried on Y.
Or is it:
1. There is nothing about intelligence carried on Y
(what evidence is there on this?) .
2. The only genetic difference between the sexes is men have a Y and
women do not,
3. Therefore there are no differences in intelligence between men and
women.
The first argument seems circular, so let us assume
that she meant the latter and I am just ignorant
of the evidence she refers implicitly to in
statement (1).
If so perhaps it is Professor Nüsslein-Volhard who
missed the point here.
The Y chromosome is not the only difference
between the sexes!
Summers’ point was that when we look at either extreme of an
ability distribution we will see more of the group that has
greater variation.
Any mental trait that is conveyed on the x-chromosome will
have larger variability among males than females, for
females have two x-chromosomes to only one for males.
Thus, from De Moivre’s equation, we would expect, ceteris paribus,
about 40% more variability among males than females.
The fact that we see less than ten percent greater variation in NAEP demands
the existence other modes of transmission.
Obviously there must be major causes of high-level
performance that are not carried on the xchromosome, and indeed some causes are not
genetic.
But it suggests that for some skills between 10% and
25% of the increased variability is likely to have
had its genesis on the x-chromosome.
This view gained further support in studies by Arthur Arnold and Eric
Vilain of UCLA that were reported by Nicholas Wade of the New
York Times on April 10, 2007.
He wrote,
“It so happens that an unusually large number of brainrelated genes are situated on the X chromosome. The
sudden emergence of the X and Y chromosomes in
brain function has caught the attention of evolutionary
biologists. Since men have only one X chromosome,
natural selection can speedily promote any
advantageous mutation that arises in one of the X’s
genes. So if those picky women should be looking for
smartness in prospective male partners, that might
explain why so many brain-related genes ended up on
the X.”
He goes on to conclude,
“Greater male variance means that although average IQ
is identical in men and women, there are fewer average
men and more at both extremes. Women’s care in
selecting mates, combined with the fast selection made
possible by men’s lack of backup copies of X-related
genes, may have driven the divergence between male
and female brains.”
Conclusions
Humans don’t fully comprehend the effect that variation, and
especially differential variation, has on what we observe.
Daniel Kahneman’s 2002 Nobel Prize was for his studies on
intuitive judgment; he showed that humans don’t intuitively
“know’ that smaller hospitals would have greater variability
in the proportion of male to female births.
But such inability is not limited to humans making
judgments in psychology experiments.
•
Routinely small hospitals are singled out for special
accolades because of their exemplary performance only to
slip toward average in subsequent years. Explanations
typically abound that discuss how their notoriety has
overloaded their capacity.
• Similarly small mutual funds are recommended, post hoc, by
Wall Street analysts only to have their subsequent
performance disappoint investors.
The list goes on and on, adding evidence
and support to my nomination of De
Moivre’s equation as the most
dangerous of them all.