#### Transcript Chapter 3 - Utah State University

```Chapter 3
Sequential Decisions
“Life must be understood backward,
but … it must be lived forward.”
- Soren Kierkegaard
Terminology
•
•
•
•
tree
terminal node (leaf)
backward graph (edges reversed)
decision graph – each arc represents a
choice. Must be acyclic
• Payoff can be at terminal node or along
edges
• Theorem: any subpath of an optimal path
is optimal
Games of Chicken
Potential
Entrant
• A monopolist faces a potential entrant
• Monopolist can accommodate or fight
• Potential entrant can enter or stay out
In
Out
Monopolist
Accommodate
Fight
50 , 50
-50 , -50
0 , 100
0 , 100
Equilibrium
Potential
Entrant
• Use best response method
to find equilibria
In
Out
Monopolist
Accommodate
Fight
50 , 50
-50 , -50
0 , 100
0 , 100
Importance of Order
• Two equilibria exist
• ( In, Accommodate )
• ( Out, Fight )
• Only one makes temporal sense
• Fight is a threat, but not credible – because once
I decide to enter, you lose if you fight!
• Not sequentially rational
• Simultaneous outcomes may not make
sense for sequential games.
Sequential Games
The Extensive Form
0 , 100
E
M
-50 , -50
50 , 50
Looking Forward…
• Entrant makes the first move:
• Must consider how monopolist will respond
• If enter:
M
• Monopolist accommodates
-50 , -50
50 , 50
… And Reasoning Back
• Now consider entrant’s move
0 , 100
E
M
acc
50 , 50
• Only ( In, Accommodate ) is sequentially
rational
Sequential Rationality
COMMANDMENT
Look forward and reason back.
Anticipate what your rivals will do
tomorrow
today
Solving Sequential Games
• Determine what that player will do
• Trim the tree
• Eliminate the dominated strategies
• This results in a simpler game
• Repeat the procedure – called roll back.
Example 3.9
R
2
2
A
3
4
3
E
L
C
B
D
1
4
4
2
T
F
1
G
K
3
2
1 2
2
M
N
P
Q
S
Pick nodes one up from leaves and select best choice, reduce graph.
Example 3.9
R
2
2
A
3
4
3
E
L
C
B
D
1
4
T
4
2
F
K
1
G
2
1
N
S
Pick nodes at last choice point select best (lowest) choice, reduce graph.
Cost is sum along edges of path.
Example 3.9
R
2
2
A
3
D
4
C
B
1
2
F
K
G
2
1
L
S
Repeat: Pick nodes at lasta choice point and select best (lowest) choice, reduce g
Cost is sum along edges of path.
Example 3.9 – backwards induction
also called
roll back
R
2
C
1
G
2
S
Repeat: Pick nodes at lasta choice point and select best (lowest) choice, reduce g
Cost is sum along edges of path.
Voting
• Majority rule results – no conclusion:
• B>G>R
G>R>B
R>B>G
• B beats G ; G beats R ; R beats B
• What if you want “R” to Win?
• B vs. G
(B wins) then winner vs. R

R
• Problem:
• Everyone knows you want “R”
B vs. G then winner vs. R? Good Luck!
• Better chance:
R vs. G, then winner versus B
Interesting how voting order makes a winner in a
no winner case!
Extensive Form
• B>G>R
G>R>B
R>B>G
B
B vs. R
R vs. B
R
R wins
B
B vs. G
G
Looking Forward
B
B vs. R
A majority prefers R to B
R
B
A majority prefers B to G
B vs. G
G
Trim The Tree
B vs. R
R vs. B
R
B
B vs. G
Rollback in Voting and
“Being Political”
• Not necessarily good to vote
•
•
•
•
Amendments to make bad bills worse
Crossing over in open primaries
“Centrist” voting in primaries
• STILL – Outcome predetermined
• AGENDA SETTING!
Predatory Pricing

An incumbent firm operates in three
markets, and faces entry in each
• Market 1 in year 1, Market 2 in year 2, etc.

Each time, I can slash prices, or
accommodate the new entry

What should I do the first year?
Predatory Pricing
E3
E2
E1
M
M
Predatory Pricing

The end of the tree: year 3
0 , 100 + previous
E3
M
-50 , -50 + previous
50 , 50 + previous

In year 3: ( In, Accommodate )
Predatory Pricing
• Since the Incumbent will not fight Entrant 3, he will not
fight Entrant 2
• Same for Entrant 1
• Only one “Rollback Equilibrium”
• All entrants play In
• Incumbent plays Accommodate
• Why do we see predatory pricing?
• predatory pricing : An anti-competitive measure
employed by a dominant company to protect market
share from new or existing competitors. Predatory
pricing involves temporarily pricing a product low
enough to end a competitive threat.
Sophie’s choice
• Sophie has \$100 and a long boring holiday
without exciting University lectures
• She can watch videos or play Nintendo
games
• Videos are \$4 each
• Nintendo games are \$5 each
• She has \$100
• What is Sophie’s choice?
Standard price taker budget set
Qvideos
First find Sophie’s choice set and
budget line.
25
Qgames
20
Convex, smooth preferences
U=120
Qvideos
Then show her preferences. Note
Sophie’s perspective both videos and
Nintendo games are ‘goods’
(desirables).
Utility function is U
U=140
Qgames
U=80
U=100
Put them together
U=120
Qvideos
25
First – note that as both videos and
games are goods and there is nothing
else for Sophie to spend her money
on, she will consume on her budget
line
solution
space
U=140
Qgames
20
U=80
U=100
Where on the budget line?
U=120
Qvideos
25
Start of with 25 videos and 0 games.
This is a bundle on her budget line.
But can she do better? Yes! If she
buys less videos and uses some
money to buy games, she moves to a
higher indifference curve, so she is
better off.
U=140
Qgames
20
U=80
U=100
Where on the budget line?
U=95
Qvideos
25
no videos? Can Sophie make a better
choice for herself? Yes! If she buys
fewer games and uses some of her
money to buy videos she moves to
higher indifference curves.
U=100
Qgames
20
U=85
U=90
So she prefers a mixture of videos and movies. But
what mix is best?
The best bundle for Sophie will be
where her indifference curve is just
tangent to her budget line. Here that
is where she has 10 videos and 12
movies
Qvideos
25
10
Qgames
12
20
Tangency condition
To see this, lets magnify her budget
line and indifference curves around
the tangency point
Qvideos
25
10
Qgames
12
20
Tangency condition
Here is the magnified
version. Notice that she
can move anywhere on
her budget line. But if
Sophie stops before she
reaches the tangency
bundle then she is not
maximising her utility
Tangency condition
Only when she reaches
her ‘tangency’ bundle is
she on her highest
indifference curve (U=95).
Tangency condition
Further she cannot do
better than this bundle.
For example, she cannot
reach the U=95.5
indifference curve. She
doesn’t have enough
money.
Summary so far
• So:
• Sophie will choose her optimal bundle where
her indifference curve is just tangent to her
budget line.
• This gets her on her highest possible
indifference curve given her budget.
• But why does this make economic sense?
Founders of Probability
Theory
Blaise Pascal
Pierre Fermat
(1623-1662, France)
(1601-1665, France)
They laid the foundations of the probability
theory in a correspondence on a dice game.
Prior, Joint and
Conditional Probabilities
P(A) = prior probability of A
P(B) = prior probability of B
P(A, B) = joint probability of A and B
P(A | B) = conditional (posterior) probability of
A given B
P(B | A) = conditional (posterior) probability of
B given A
Probability Rules
Product rule:
P(A, B) = P(A | B) P(B)
or equivalently
P(A, B) = P(B | A) P(A)
Sum rule:
P(A) = ΣB P(A, B) = ΣB P(A | B) P(B)
if A is conditionalized on B, then the total
probability of A is the sum of its joint
probabilities with all B
Statistical Independence
Two random variables A and B are independent iff:
 P(A, B) = P(A) P(B)
 P(A | B) = P(A)
 P(B | A) = P(B)
knowing the value of one
variable does not yield any
of the other
Statistical Dependence Bayes
Thomas Bayes
(1702-1761, England)
“Essay towards solving a problem in the doctrine of
chances” published in the Philosophical Transactions
of the Royal Society of London in 1764.
Bayes Theorem
P(A|B) = P(A  B) / P(B)
P(B|A) = P(A  B) / P(A)
=> P(A  B) = P(A|B) P(B) = P(B|A) P(A)
=> P(A|B) =
P(B|A) P(A)
P(B)
Bayes Theorem Causality
P(A|B) =
P(B|A) P(A)
P(B)
Diagnostic:
P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect)
Pattern Recognition:
P(Class|Feature) = P(Feature|Class) P(Class) / P(Feature)
Bayes Formula and
Classification
Conditional Likelihood
of the data
given the class
Prior
probability of the class
before seeing anything
p( X | C )  p( C )
p( C | X ) 
p( X )
Posterior
probability of the class
after seeing the data
Unconditional
probability of the data
Medical Example
• Probability you have a disease is .002
• If you have the disease, the probability
that the test is positive is .97
• If you don’t have the disease, the
probability that the test is positive is .04
• What is the probability of a positive test?
• p(+test) = .002*.97 + .998*.04
Medical example
p(+disease) = 0.002
p(+test | +disease) = 0.97
p(+test | -disease) = 0.04
p(+test) = p(+test | +disease) * p(+disease) + p(+test | -disease) * p(-disease)
= 0.97 * 0.002 + 0.04 * 0.998 = 0.00194 + 0.03992 = 0.04186
p(+disease | +test) = p(+test | +disease) * p(+disease) / p(+test)
= 0.97 * 0.002 / 0.04186 = 0.00194 / 0.04186 = 0.046
p(-disease | +test) = p(+test | -disease) * p(-disease) / p(+test)
= 0.04 * 0.998 / 0.04186 = 0.03992 / 0.04186 = 0.953
Bayesian Decision Theory cont.
• Fish Example:
• Each fish is in one of 2 states: sea bass or salmon
• Let w denote the set of possible outcomes
 w = w1 for sea bass
 w = w2 for salmon
Bayesian Decision Theory cont.
• The State of nature is unpredictable.
• w is a variable that must be described

probabilistically.
• If the catch produced as much salmon as sea
bass the next fish is equally likely to be sea
bass or salmon.
• a priori: before the event
• ex post: after the event
• Define
 P(w1 ) : a priori probability that the next fish is sea bass
 P(w2 ): a priori probability that the next fish is salmon.
Bayesian Decision Theory cont.
• If other types of fish are irrelevant:
P( w1 ) + P( w2 ) = 1.
• Prior probabilities reflect our prior knowledge
(e.g. time of year, fishing area, …)
• Simple decision Rule:
 Make a decision (about the next fish caught) without seeing
the fish.
 Decide w1 if P( w1 ) > P( w2 ); w2 otherwise.
 OK if deciding for one fish
 If several fish, all assigned to same class.
• If we knew something about the fish (like how
light it looked), could we make a better decision?
Bayesian Decision Theory cont.
•
In general, we will have some features
we can use to help us predict.
• Feature: lightness reading = x
Different fish yield different lightness readings (x is
a random variable)
Bayesian Decision Theory cont.
• Define
p(x|w1) = Class-Conditional Probability
Density
Probability density function for x given that the
state of nature is w1
The probability that you have light reading x
when fish is w1
• The difference between p(x|w1 ) and p(x|w2 )
describes the difference in lightness between sea
bass and salmon.
Bayesian Decision Theory cont.
Hypothetical class-conditional probability
Density functions are normalized (area under each curve is 1.0)
Bayesian Decision Theory cont.
• Suppose that we know
The prior probabilities P(w1 ) and P(w2 ),
The conditional densities p( x | w1 ) and p( x | w2 )
Measure lightness of a fish = x.
• What is the category of the fish given the lightness
p(w j | x)
Bayes' formula
P(wj | x) = P(x |wj ) P(wj ) / P(x),
where
2
P( x)   p( x | w j ) P(w j )
j 1
Likelihood  Prior
Posterior 
Evidence
• P(A|B) = P(A union B) / P(B)
Bayes' formula cont.
• p(x|wj ) is called the likelihood of wj with
respect to x.
(the wj category for which p(x|wj ) is large
is more "likely" to be the true category)
• p(wj) is the prior probability that wj is true
• p(x) is the evidence
how frequently we will measure a pattern with
feature value x.
Scale factor that guarantees that the posterior
probabilities sum to 1.
Bayes' formula cont.
Posterior probabilities for the particular priors P(w1)=2/3 and P(w2)=1/3.
At every x the posteriors sum to 1.
Error
If we decide w2  P(w1 | x)
P(error | x)  
If we decide w1  P(w2 | x)
For a given x, we can minimize the probability
of
error by deciding w1 if P(w1|x) > P(w2|x) and
w2 otherwise.
Bayes' Decision Rule
(Minimizes the probability of error)
w1 : if P(w1|x) > P(w2|x)
w2 : otherwise
or
w1 : if P ( x |w1) P(w1) > P(x|w2) P(w2)
w2 : otherwise
and P(Error|x) = min [P(w1|x) , P(w2|x)]
This means
P(Error|x) = min [P(w1|x) , P(w2|x)]
If the conditional probabilities are both ½,
our chance of error is ½
If the conditional probabiities differ [1/4,3/4]
our chance of error is only ¼.
We will pick the most likely case, meaning
the least likely case will all be diagnosed
incorrectly.
Why many spam filter vendors have
implemented Bayesian filtering
• Most spam filtering products currently on the market are
keyword/keyphrase based filters.
• These filters were fairly effective in stopping spam two
years ago, although they have always exhibited an
unacceptably high false-positive rate.
• However, spammers have been busy developing custom
software to generate their spam, which hides these
keywords and phrases in increasingly sophisticated
ways.
• To make matters worse, the spamming community
actually publishes these keywords on the Internet, so
that spammers can avoid their use. This has resulted in
keyword/keyphrase filters becoming virtually ineffective
in stopping spam.
• Confronted with the harsh reality that their products
entire infrastructure is built around an outdated,
ineffective paradigm, the keyword spam filter vendors
decided they would hook their wagon to a small portion
of the Bayesian theory. They theorized that by applying a
score to their existing keywords and then aggregating
that score based on hits for that keyword, their keyword
filters could prolong the life of their failing products.
• Advantages to their pseudo bayesian approach
• 1) Lower false positive rates than keyword filters alone (it
takes more keyword hits to classify as spam )
2) Slightly increased spam identification rate over
keywords alone.
Problems with their pseudo
bayesian approach
Problems with their pseudo bayesian approach
1) Significantly increased system resource usage (what used to take
one pass, now takes as many as 10-15 passes), to aggregate the
total point value necessary to identify a message as spam, or to
clear a message as ok.
2) Can't identify cloaked spam (which is generally the most vile
spam), such as "v*i(a)g-r-a" or bogus HTML tags, as well as more
sophisticated cloaking.
3) Still based on and dependent upon, having clearly visible and
obvious keyword/keyphrases.
4) No method of determining why a particular message was caught by
the filter - making it impossible to subsequently, intelligently tune the
filter for optimal spam recognition.
5) Blind "training" and retraining of the bayesian filter usually results in
unpredictable results and often negatively impacts the filter's ability
to correctly identify future spam.
Example
•
•
•
•
•
•
•
•
•
•
A certain disease is fatal 40% of the time
45% of those cured took radiation
20% of the people who did not survive took radiation.
Let A: cured
Want to find P(A|B)
P(A) = .60 P(Ac) = .4
P(B|A) =.45
P(B|Ac) = .2
P(A|B) = .45*.6/(.45*6 + .2*.4) = .7714
Example
• For a particular year, forty-five of seventy-four
• Roughly 15% of all students are athletes.
• Suppose the graduation rate for the university is
45%.
• At graduation, if you meet someone, what is the
probability he/she is an athlete?
P(A|G) = P(G|A)*P(A)/P(G)
= .61*.15/.45 = .20
Deductive Reasoning
Consider the propositions: A = (The Sprinklers are on)
B = (The Grass is wet)
Major premise: If A is TRUE, then B is TRUE
Minor premise:
A is TRUE
Conclusion:
Therefore, B is TRUE
Major premise: If A is TRUE, then B is TRUE
Minor premise:
B is FALSE
Conclusion:
Therefore, A is FALSE
Aristotle, ~ 350 BC
Deductive Reasoning - ii
Consider the propositions: A = (The Sprinklers are on)
B = (The Grass is wet)
Major premise: If A is TRUE, then B is TRUE
Minor premise:
A is FALSE
Conclusion:
Therefore, B is ?
Major premise: If A is TRUE, then B is TRUE
Minor premise:
B is TRUE
Conclusion:
Therefore, A is ?
Inductive Reasoning
Consider the propositions: A = (The Sprinklers are on)
B = (The Grass is wet)
Major premise: If A is TRUE, then B is TRUE
Minor premise:
B is TRUE
Conclusion:
Therefore, A is more plausible
Major premise: If A is TRUE, then B is TRUE
Minor premise:
A is FALSE
Conclusion:
Therefore, B is less plausible
Yes!
Bayes(1763), Laplace(1774), Boole(1854),
Jeffreys(1939), Cox(1946), Polya(1946),
Jaynes(1957)
In 1946, the physicist Richard Cox showed that inductive
reasoning follows rules that are isomorphic to those of
probability theory
Probability
A
AB
B
Conditional Probability P( A | B)  P( AB) / P( B)
P( B | A)  P( AB) / P( A)
A theorem
P( A  B)  P( A)  P( B)  P( AB)
Probability - ii
Product Rule
Sum Rule
Bayes’ Theorem
P( AB)  P( B | A) P( A)
 P( A | B) P( B)
P( A| B)  P( A | B)  1
P( B | A)  P( A | B) P( B) / P( A)
These rules together with Boolean algebra
are the foundation of Bayesian Probability Theory
Bayes’ Theorem
P(Ci D j | A)  P( A | Ci D j ) P(Ci D j ) / P( A)
if Ci D j are exhaustivepropositions, i.e.,  P(Ci D j | A)  1,
i, j
then wecan writeBayes'T heoremas
P( A | Ci D j ) P(Ci D j )
P(Ci D j | A) 
 P( A | Ci D j ) P(Ci D j )
i, j
We can sum over
propositions that are
of no interest
marginalization
P(Ci | A)   P(Ci D j | A)
j
Bayes’ Theorem: Example 1
• Signal/Background Discrimination
– S = Signal
– B = Background
P(Data | S ) P( S )
P( S | Data) 
P(Data | S ) P( S )  P(Data | B) P( B)
• The probability P(S|Data), of an event being a
signal given some event Data, can be
approximated in several ways, for example, with
a feed-forward neural network
Black and blue taxis
• Consider the witness problem in law courts. Witness reports are
notoriously unreliable, which does not stop people being locked
away on the basis of little more.
• Consider a commonly cited scenario.
• A town has two taxi companies, one runs blue taxi-cabs and the
other uses black taxi-cabs. It is known that Blue Company has 15
taxis and the Black Cab Company has 85 vehicles. Late one night,
there is a hit-and-run accident involving a taxi. It is assumed that all
100 taxis were on the streets at the time.
• A witness sees the accident and claims that a blue taxi was
involved. At the request of the defence, the witness undergoes a
vision test under conditions similar to those on the night in question.
Presented repeatedly with a blue taxi and a black taxi, in‘random’
order, the witness shows he can successfully identify the colour of
the taxi 4 times out of 5 (80% of the time). The rest or 1/5 of the
time, he misidentifies a blue taxi as black or a black taxi as blue.
• Bayesian probability theory asks the following question, “If the
witness reports seeing a blue taxi, how likely is it that he has the
colour correct?”
• As the witness is correct 80% of the time (that is, 4 times in 5), he is
also incorrect 1 time in 5, on average.
• For the 15 blue taxis, he would (correctly) identify 80% of them as
being blue, namely 12, and misidentify the other 3 blue taxis as
being black.
• For the 85 black taxis, he would also incorrectly identify 20% of
them as being blue, namely 17.
• Thus, in all, he would have misidentified the colour of 20 of the taxis.
Also, he would have called 29 of the taxis blue where there are only
15 blue taxis in the town!
• In the situation in question, the witness is telling us that the taxi was
blue.
•
•
•
•
•
•
In the situation in question, the witness is telling us that the taxi was blue.
But he would have identified 29 of the taxis as being blue. That is, he has
called 12 blue taxis ‘blue’, and 17 black taxis he has also called ‘blue’.
Therefore, in the test the witness has said that 29 taxis are blue and only
been correct 12 times!
Thus, the probability that the taxis the witness claimed to be blue actually
being blue, given the witness's identification ability, is 12/29, i.e. 0.41.
When the witness said the taxi was blue, he was incorrect therefore nearly 3
times out of every 5 times. The test showed the witness to be correct less
than half the time.
Bayesian probability takes account of the real distribution of taxis in the
town. It takes account, not just of the ability of a witness to identify blue taxis
correctly (80%), but also the witness’s ability to identify the colour of blue
taxis among all the taxis in town. In other words, Bayesian probability takes
account of the witness’s propensity to misidentify black taxis as well. In the
trade, these are called ‘false positives’.
• The ‘false negatives’ were the blue taxis that the witness
misidentified as black. Bayesian probability statistics
(BPS) becomes most important when attempting to
calculate comparatively small risks. BPS becomes
important in situations where distributions are not
random, as in this case where there were far more black
taxis than blue ones.
• Had the witness called the offending taxi as black, the
calculation would have been {the 68 taxis the witness
correctly named as black} over {the 71 taxis the witness
thought were black}. That is, 68/71 (the difference being
the 3 blue taxis the witness thought were black); or
nearly 96% of the time, when the witness thought the
taxi was black, it was indeed black.
• Unfortunately, most people untrained in the analysis of
probability tend to intuit, from the 80% accuracy of the
witness, that the witness can identify blue cars among
many others with an 80% rate of accuracy.
• I hope the example above will convince you that this is a
very unsafe belief.
• Thus, in a court trial, it is not the ability of the person to
identify a person among 8 (with a 1/8th, or 12.5%,
chance of guessing ‘right’ by luck!) in a pre-arranged line
up that matters, but their ability to recognise them in a
crowded street or a darkened alleyway in conditions of
stress.
Testing for rare conditions
• Testing for rare conditions
• Virtually every lab-conducted test involves sources of error. Test
samples can be contaminated, or one sample can be confused with
another. The report on a test you receive from your doctor just may
belong to someone else, or be sloppily performed. When the
supposed results are bad, such tests can produce fear. But let us
assume the laboratory has done its work well, and the medic is not
currently drunk and incapable.
• The problem of false positives is still a considerable difficulty.
Virtually every medical test designed to detect a disease or medical
condition has a built-in margin of error. The margin of error size
varies from one test procedure to another, but it is often in the range
of 1-5%, although sometimes it can be much greater than this. Error
here means that the test will sometimes indicate the presence of the
disease, even when there is no disease present.
• Suppose a lab is using a test for a rare condition, a test that has a
2% false-positive rate. This means that the test will indicate the
disease in 2% of people who do not have the condition.
• Among 1,000 tested for the disease and who do not have it; the test
will suggest that about 20 persons do have it. If, as we are
supposing, the disease is rare (say it occurs in 0.1% of the
population, 1 in 1000), it follows that the majority (here, 95%, 19 in
20) of the people whom the tests report to have the disease will be
misdiagnosed!
• Consider a concrete example . Suppose that a woman (let us
suppose her to be a white female, who has not recently had a blood
transfusion and who does not take drugs and doesn’t have sex with
intravenous drug users or bisexuals) goes to her doctor and
requests an HIV test. Given her demographic profile, her risk of
being HIV-positive is about 1 in 100,000. Even if the HIV test was so
good that it had a false-positive rate as low as 0.1% (and it is
nothing like that good), this means that approximately 100 women
among 100,000 similar women will test positive for HIV, even though
only one of them is actually infected with HIV.
• When considering both the traumatising effects of such
reports on people and the effects on future insurability,
employability and the like, it becomes clear that the
false-positive problem is much more than just an
interesting technical flaw.
• If your medic ever reports that you tested positive for
some rare disorder, you should be extremely skeptical.
There is a considerable likelihood the diagnosis itself is
mistaken. Knowing this, intelligent physicians are very
careful in their use of test results and in their subsequent
discussion with patients. But not all doctors have the
time or the ability to treat test results with the skepticism
that they often deserve.
• In general:
• The more rare a condition and the less precise the test
(or judgement), then the more likely (frequent) the error.
• Consider the HIV test above. Many such tests are wrong
5%, or more, of the time. Remember that the real risk for
our heterosexual white woman was around 1 in 100,000,
but the test would indicate positive for 5000 of every
100,000 tested! Thus, if applied to a low risk group like
white heterosexual females (who did not inject drugs,
and did not have sex with a member of a high-risk group
like bisexuals, or haemophiliacs, or drug injectors) then
the HIV test would be incorrect 4999 times out of 5000!
• In general, if the risk were even less and the test method
still had a 5% the error rate, the rate for false positives
would be even greater. The false positive rate would also
increase if the test accuracy were lower.
```