Decision Rules

Download Report

Transcript Decision Rules

Decision Rules
Decision Theory
In the final part of the course we’ve been
studying decision theory, the science of how to
make rational decisions.
So far we’ve been concerned with decisions
under ignorance: when we don’t know with any
certainty what the probabilities of the states
are. We’ll continue with this today.
Problem Specification
Solving a decision problem begins with a
problem specification, breaking down the
problem into three components:
1. Acts: the various (relevant) actions you can
take in the situation.
2. States: the different ways that things might
turn out (coin lands heads, coin lands tails).
3. Outcomes: What results from the various
acts in the different states.
Homework 11
In HW 11, I asked you to
1. Describe a decision you yourself have had to
make recently.
2. Conduct a problem specification: analyze the
decision into acts, states, and outcomes.
3. Make a decision table.
1. Recent Decision
Recently, I had to decide whether I should go
with some friends of mine to Clockenflap, the
HK music festival.
2. Problem Specification
Acts: I could either buy tickets and go to
Clockenflap, or not buy tickets and stay home to
work.
States: the music might be good, and my friends
might be good to hang out with, but also the
music might be terrible, and my friends might be
annoying (dancing awkwardly, doing stupid
drunken things).
2. Problem Specification
Outcome 1: Enjoy music, enjoy socializing, -$300
Outcome 2: Enjoy music, don’t enjoy socializing,
-$300
Outcome 3: Don’t enjoy music, do enjoy
socializing, -$300
Outcome 4: Don’t enjoy either music or
socializing, -$300
Outcome 5: More time for work, save money
3. Decision Table
Good
music,
good
friends
Go to
Enjoy
Clockenflap music,
enjoy
friends,
lose $300
Stay home
Good
Bad music, Bad music,
music, bad good
bad friends
friends
friends
Enjoy
Enjoy
music, lose friends,
$300
lose $300
Lose $300
More time More time More time More time
and money and money and money and money
Utilities
Last time we learned that when people have
rational preferences regarding the outcomes in a
decision table, we can order indifference classes
of those outcomes. Then we can replace the
outcomes with numbers that represent them
(called utilities) so that our decision tables will
reflect not only the problem specification, but
also people’s preferences.
Preferences
Let’s suppose that I don’t care that much about
the music. If I have a fun time with my friends,
that’s all that’s really important to me.
Preferences
• I’m indifferent between: [Good music + good
friends] and [Bad music + good friends]
• I prefer [Bad music + good friends] to [work at
home]
• I prefer [work at home] to [Good music + bad
friends]
• I’m indifferent between [Good music + bad
friends] and [bad music + bad friends]
Indifference Classes
3. [Good music + good friends], [Bad music +
good friends]
2. [work at home]
1. [Good music + bad friends], [bad music + bad
friends]
Decision Table + Utilities
Good
music,
good
friends
3
Go to
Clockenflap
Stay home 2
Good
Bad music, Bad music,
music, bad good
bad friends
friends
friends
1
3
1
2
2
2
Maximin
We also learned about the maximin principle.
According to the maximin principle, the way to
solve a decision under ignorance problem, is to
find all the worst possible outcomes for each of
the acts, and then choose the act that has the
best worst outcome. The act that maximizes the
minimum value of the outcome.
Maximin Principle
Good
music,
good
friends
3
Go to
Clockenflap
Stay
2*
home**
Good
Bad music, Bad music,
music, bad good
bad friends
friends
friends
1*
3
1*
2*
2*
2*
Problems with Maximin
We criticized the maximin principle on the
following grounds: it’s too conservative.
Sometimes it’s worth risking a little more if the
rewards are very great.
Problems with Maximin
Suppose that I really value fun times with my
friends. If I had to put a price tag on it, I’d pay
$5000 out of my own pocket to take them out to
dinner and drinks and have a good time. I also
value extra time to work, but I value an extra
day of working at only about $100. Even when
my friends are annoying, I value their company
at, let’s say, $250.
Maximin Principle
Good
music,
good
friends
$4,700
Go to
Clockenflap
Stay
$100*
home**
Good
Bad music, Bad music,
music, bad good
bad friends
friends
friends
-$50*
$4,700
-$50*
$100*
$100*
$100*
Problems with Maximin
Here, the maximin principle suggests that
because the worst possible outcome of
attending Clockenflap is that I lose $50 (+$250
for how much I value my annoying friends -$300
for the price of a ticket), I should take the
conservative option and stay home. This is so,
even though the rewards of attending the
festival are possibly much, much higher than the
value of staying home: $100 << $4,700.
Missed Opportunities
The problems with the maximin regret principle
seem to stem from the fact that it focuses on
the avoiding the worst possible outcomes,
rather than avoiding the worst missed
opportunities. When I stay at home instead of
going to Clockenflap, I miss out on the
opportunity to have a great time with my friends
(worth $5000 to me, minus ticket prices).
Regret
We can measure the amount of a missed
opportunity in terms of regret. If I choose the
act “stay home” instead of the act “go to
Clockenflap” when the state is “good friends,” I
get a value worth $100 to me, but miss out on a
value worth $4,700 to me. The regret I feel is the
$4700 - $100 = $4600 of value on top of the
$100 I experienced that I missed out on when I
made that choice.
However, if I choose to stay at home when the
state is “bad friends,” my regret is $0: I couldn’t
have benefited at all by choosing a different act.
Let’s consider a new example.
Decision Table
A1
A2
A3
A4
S1
5
-1
-3
0
S2
-2
-1
-1
-4
S3
10
20
5
1
Regret Numbers
We can calculate a regret number R
corresponding to the utility number U in each
row with the following equation:
R = MAX – U
Where MAX is the maximum value in each
column (the best utility achievable in that state)
Regret Table
A1
A2
A3
A4
S1
5–5=0
5+1=6
5+3=8
5–0=5
S2
S3
-1 + 2 = 1 20 – 10 = 10
-1 + 1 = 0 20 – 20 = 0
-1 + 1 = 0 20 – 5 = 15
-1 + 4 = 3 20 – 1 = 19
Regret Table
A1
A2
A3
A4
S1
0
6
8
5
S2
1
0
0
3
S3
10
0
15
19
Minimax Regret Rule
The minimax regret rule tells us to minimize the
maximum amount of regret. That is, since we
have three states, each act will result in three
possible amounts of regrets. The highest of
these numbers is the maximum regret of the
act: it’s the maximum amount of “missed
opportunity” you could feel if you took that act,
and the state didn’t go your way. Minimax regret
says to take the act with the smallest (minimum)
maximum regret.
Minimax Regret Rule
A1
A2**
A3
A4
S1
0
6*
8
5
S2
1
0
0
3
S3
10*
0
15*
19*
Minimax Regret: Clockenflap
Go to
Clockenflap
Stay home
Good
music,
good
friends
$4,700
Good
Bad music, Bad music,
music, bad good
bad friends
friends
friends
-$50
$4,700
-$50
$100
$100
$100
$100
Regret Table
Go to
Clockenflap
Stay home
Good
music,
good
friends
$0
Good
Bad music, Bad music,
music, bad good
bad friends
friends
friends
$150
$0
$150
$4,600
$0
$4,600
$0
Minimax Regret Rule
Good
music,
good
friends
$0
Go to
Clockenflap
**
Stay home $4,600*
Good
Bad music, Bad music,
music, bad good
bad friends
friends
friends
$150*
$0
$150*
$0
$4,600*
$0
Clockenflap
So the minimax regret rule gets the intuitively
correct result: I should go to the music festival,
because I value good times with friends so
much, not going would be a waste of a
tremendous opportunity.
Problems with Minimax Regret Rule
There are still problems with the minimax regret
rule.
One of them is this: sometimes it doesn’t
capture what we naturally think of as “amount
of missed opportunity.
Example
A1
A2
S1
$0
S2
$99
$100 $0
S3
$99
S4
$99
S5
$99
S6
$99
S7
$99
S8
$99
S9
$99
$0
$0
$0
$0
$0
$0
$0
Incorrect Recommendation
Here the maximum regret for action A1 is is
$100 (if state S1 obtains) and the maximum
regret for A2 is $99 (if any of the states S2-S9
obtain). So the minimax regret rule tells us to
select action A2, which has the minimum
maximum regret. But most people faced with
this choice would pick A1.
Maximax Rule
Let’s try a different strategy. Our original
maximin rule was problematic because it was
too conservative and pessimistic. It only looked
at the worst case scenarios, and said you should
pick the action with the best worst case
scenario.
What about a maximax rule? This rule would
say: look at the best possible outcomes for each
of your actions, and choose the one that has the
best best possible outcome.
Unfortunately, this rule is not appealing to
anyone.
Maximax Rule
A1: Invest your life
savings in a promising,
but unproven start-up
company.**
S1: The start-up is
wildly successful.
S2: The start-up fails
when Google engineers
find a way to do
everything it does, but
better.
You make hundreds of
millions of dollars.*
You lose your life
savings.
A2: Play it safe, and
You pay for your
invest a conservative
retirement.*
stock portfolio with a
modest, but guaranteed
payout.
You pay for your
retirement.*
The Optimism-Pessimism Rule
The maximax rule is problematic, because it
suggest that you always should “risk it all” for
the chance of big payoffs. Most human beings
are conservative, and want a modest sure thing,
rather than an extravagant long-shot.
Maybe the best rule is one that balances both
optimism and pessimism.
Optimism vs. Pessimism
The idea is that we ask each person how much
they care about the best possible outcome vs.
the worst possible outcome.
Maybe 20% of your concern is directed at the
best possible outcome, and 80% at the worst. Or
maybe you’re split 50-50 and care about them
equally. Or maybe you’re a risk taker who cares
primarily about big payoffs.
Optimism Index
Let’s have a number O for your optimism index.
It’s just how much you care about the best
possible outcome.
We don’t need a special number for your
pessimism index, because it will clearly be (1 –
O). For example, if you care 20% about the best
outcome, you’ll care (1 – 20%) = 80% about the
worst outcome.
Optimism-Pessimism Numbers
Thus we can calculate a new number, the
optimism-pessimism number (OPN) for each of
act, which is the best outcome for that act
“weighted” by how much you care about the
best outcome, plus the worst outcome for that
act weighted by how much you care about it:
N= [O x MAX] + [(1 – O) x MIN]
Decision Table
A1
A2
S1
10
2
S2
4
6
S3
0
6
OPN for A1 (O = 50%)
If O = 50%, then the OPN for A1 is:
OPN for A1
= [50% x MAX] + [(1 – 50%) x MIN]
= [50% x MAX] + [50% x MIN]
= [50% x 10] + [50% x 0]
=5
OPN for A2 (O = 50%)
OPN for A2
= [50% x MAX] + [(1 – 50%) x MIN]
= [50% x MAX] + [50% x MIN]
= [50% x 6] + [50% x 2]
=4
Since 5 > 4, the optimism-pessimism rule
recommends A1 (when O = 50%).
Recommendation Depends on O
However, the recommendation changes if O is
set to 20%, instead:
OPN for A1 = 20% x 10 + 80% x 0 = 2
OPN for A2 = 20% x 6 + 80% x 2 = 2.8
So if you are more pessimistic, you should
choose action A2.
Problems with the O-P Rule
Is the optimism-pessimism rule the right one?
Two main issues with it are these: Decision
theory is supposed to be a guide to how to act
rationally. But “act this way if you’re pessimistic,
act this other way if you’re optimistic” isn’t
much of a guide– it doesn’t tell us what to do!
Next Time
Next time we’ll consider one more rule, look at a
philosophical problem, and overview briefly
decisions under risk.
Problems with the O-P Rule
Second, if the optimism-pessimism rule is
correct, people can make excuses for their bad
decisions by saying things like “this decision was
actually very good; I was super-optimistic when I
made it” or “this decision is good; I was very
pessimistic when I made it.” Whether a decision
is good or bad doesn’t seem to depend on the
optimism of the person making it.