Bayesian Decision Theory and Multi-choice Decision

Download Report

Transcript Bayesian Decision Theory and Multi-choice Decision

Sequential Hypothesis Testing
under Stochastic Deadlines
Peter Frazier, Angela Yu
Princeton University
Sequential
Hypothesis
Testing
under
Stochastic
Deadlines
Peter Frazier & Angela Yu
Princeton University
Summary
•We consider the sequential hypothesis
testing problem and generalize the
sequential probability ratio test (SPRT)
to the case with stochastic deadlines.
•This causes reaction times for correct
responses to be faster than for errors,
as seen in behavioral studies.
• Both decreasing the deadline’s mean
and increasing its variance causes
more response urgency.
• Results extend to the general case
with convex continuation cost.
1. Sequential
Probability
Ratio Test
Sequential Hypothesis Testing
wait
wait
A
B
A
B
A
B
At each time, the subject decides whether to
act (A or B), or collect more information. This
requires balancing speed vs. accuracy.
• We observe a sequence of i.i.d. samples
x1,x2,... from some density.
• The underlying density is unknown, but is
known to equal either f0 or f1.
• We begin with a prior belief about whether
f0 or f1 is the true density, which we update
through time based on the samples.
• We want to maximize accuracy
Let  be the index of the true distribution.
Let p0 be the initial belief, P{=1}.
Let pt := P{=1 | x1,...,xt}.
Let c be a cost paid per-sample.
Let d be a cost paid to violate the deadline
(used later)
• Let  be time-index of the last sample
collected.
• Let  be the guessed hypothesis.
•
•
•
•
•
Posterior probabilities may be
calculated via Bayes Rule:
t
p =
t
t¡ 1
f 1 (x )p
f 1 ( x t ) pt ¡ 1 + f 0 ( x t ) pt ¡
Probability (pt)
Time (t)
1
.
Objective Function
The objective function is:
inf ¿;± Pf ± 6
= µg + c < ¿ >
Probability of
Error
Time Delay
Penalty
where we require that the decisions  and 
are “non-anticipative”, that is, whether  <= t is
entirely determined by the samples x1,...,xt,
and  is entirely determined by the samples
x1,...,x.
Optimal Policy (SPRT)
Probability (pt)
Wald & Wolfowitz
A
(1948) showed that the
optimal policy is to stop
as soon as p exits an
interval [A,B], and to
choose the hypothesis
that appears more likely B
at this time.
Time (t)

This policy is called the Sequential Probability
Ratio Test or SPRT.
2. Models for
Behavior
•A classic sequential hypothesis testing task is
detecting coherent motion in random dots.
•One hypothesis is that monkeys and people
behave optimally and according to the SPRT.
Broadly speaking, the model based on the
classic SPRT fits experimental behavior well.
Accuracy vs. Coherence
Reaction Time vs. Coherence
(Roitman & Shadlen, 2002)
There is one caveat, however…
SPRT fails to predict the difference in response time
distributions between correct and error responses.
•Correct responses are more rapid in experiments.
•SPRT predicts they should be identically distributed.
Accuracy
Mean RT
RT Distributions
(Data from Roitman & Shadlen, 2002; analysis from Ditterich, 2007)
3. Generalizing
to Stochastic
Deadlines
Monkeys occasionally abort trials without
responding, but it is always better to guess than
to abort under the assumed objective function.
(Data from Roitman & Shadlen, 2002)
(Analysis from Ditterich, 2006)
To explain the discrepancy, we hypothesize a
limit on the length of time that monkeys can
fixate the target.
Objective Function
Hypothesizing a decision deadline D leads to a
new objective function:
inf ¿;± Pf ± 6
= µ; ¿ < Dg + c < ¿ > + d Pf ¿ ¸ Dg
Error Penalty
Time
Penalty
Deadline
Penalty
We will assume that D has a non-decreasing failure
rate, i.e. P{D=t+1 | D>t} is non-decreasing in t.
This assumption is met by deterministic, normal,
gamma, and exponential deadlines, and others.
Optimal Policy
The resulting optimal policy is to stop as soon
as pt exits a region that narrows with time.
Probability (pt)
¿ = inff t ¸ 0 j pt 2= C t g
Generalized
SPRT
Classic SPRT
Time (t)
Deadline
Response Times
Frequency of
Occurrence
Under this policy, correct responses are
generally faster than error responses.
Correct Responses
Error Responses
Reaction Time
Influence of the Parameters
Deadline Uncertainty
Deadline Mean
Time Penalty
Deadline Penalty
Plots of the continuation region Ct (blue), and the probability of a
correct response P{=|=t} (red). D was gamma distributed, and
the default settings were c=.001, d=2, mean(D)=40, std(D)=1. In
each plot we varied one while keeping the others fixed.
Theorem: The continuation region at time t
for the optimal policy, Ct, is either empty or a
closed interval, and it shrinks with time
(Ct+1 µ Ct).
Proposition: If P{D<1} = 1 then there exists a
T < 1 such that CT = ;. That is, the optimal
reaction time is bounded above by T.
Proof Sketch
Define Q(t,pt) to be the conditional loss given pt of
continuing once from time t and then behaving optimally.
Lemma 1: The continuation cost of the optimal
policy, Q(t,p), is concave as a function of p.
Lemmas 2 and 3: Wasting a time period incurs an
opportunity cost in addition to its immediate cost c.
t¡ 1
Q(t ¡ 1; p
t
) · Q(t; p ) ¡ c.
Lemma 4: If we are certain which hypothesis is
correct (p=0 or p=1), then the optimal policy is
to stop as soon as possible. Its value is:
Q(t; 0) = Q(t; 1) = c(t + 1) + d Pf D = t + 1 j D > tg
Expected Loss
Proof Sketch
Q(t+1,p)-c
Q(t,p)
min(p,1-p)
0
p
Ct+1
Ct
1
References
1. Anderson, T W (1960). Ann. Math. Statist. 31: 165-97.
2. Bogacz, R et al. (2006). Pyschol. Rev. 113: 700-65.
3. Ditterich, J (2006). Neural Netw. 19(8):981-1012.
4. Luce, R D (1986). Response Times: Their Role in Inferring
Elementary Mental Org. Oxford Univ. Press.
5. Mozer et al (2004). Proc. Twenty Sixth Annual Conference of
the Cognitive Science Society. 981-86.
6. Poor, H V (1994). An Introduction to Signal Detection and
Estimation. Springer-Verlag.
7. Ratcliff, R & Rouder, J N (1998). Psychol. Sci. 9: 347-56.
8. Roitman J D, & Shadlen M N (2002). J. Neurosci. 22: 94759489.
9. Siegmund, D (1985). Sequential Analysis. Springer.
10.Wald, A & Wolfowitz, J (1948). Ann. Math. Statisti. 19:326-39.