Repeated Games - University of Minnesota

Download Report

Transcript Repeated Games - University of Minnesota

Repeated Games
APEC 8205: Applied Game Theory
Fall 2007
Objectives
• Understand the Class of Repeated Games
• Understand Conditions Under Which Non-Nash Play Can be Sustained as a
Subgame Perfect Nash Equilibrium when a Game is Repeated
– Multiple Nash Equilibria
– Infinite Repetition
Why study repeated games?
• Many interactions in life are repeated.
– Large retailers compete on a daily basis for customers.
– Dana and I compete on a daily basis to decide who will make dinner and who
will pickup around the house.
– Mason and Spencer compete on a daily basis to see who gets to watch TV and
who gets to play X-Box.
• What is of interest in these type of repeated interactions?
– Can players achieve better results than might occur in a single shot game?
– Can players use the history of play to their advantage?
Some Terminology
•
G: Stage game (usually thought of in normal form).
–
–
–
–
Players: i = 1,..,N
ai  Ai: Strategy space for player i.
a = (a1,…,aN)  A = i = 1NAi: Strategy profile for all players.
ui(a): Player i’s payoff for strategy profile a.
– u(a) = (u1(a),…, uN(a)): Vector of player payoffs for strategy profile a.
•
•
•
•
•
•
•
•
T: Number of times the stage game is repeated (could be infinite).
ait  Ai : Player i’s strategy choice at time t.
at = (a1t,…,aNt)  A = i = 1NAi: Strategy profile for all players at time t.
ht = (a1,…,at-1)  At = t’ = 1t-1A: History of play at time t.
sit(ht)  Ai: History dependent strategy.
st(ht) = (s1t(ht), …, sNt(ht))  A: History dependent strategy profile.
Ui(s1(h1),…, sT(hT)) = t=1Twitui(st(ht)): Player i’s payoff from the game.
U(s1(h1),…, sT(hT)) = (U1(s1(h1),…, sT(hT)),…,UN(s1(h1),…, sT(hT))): Payoffs for all
players.
Consider and Example
C
Player 1
2
D
3
Player 2
C
2
0
0
1
Suppose this Prisoner’s Dilemma game is played twice
and that wit =1 for i = 1,2 and t = 1,2.
D
3
1
Two Period Prisoner’s Dilemma Example
In Extensive Form
Player 1
C
D
Player 2
C
D
Player 1
C
Player 1
C
D
C
Player 2
Player 1
D
D
C
D
4
4
2
5
5
2
3
3
C
2
5
Player 1
C
Player 2
C
D
D
C
Player 2
Player 2
D
C
D
C
D
C
D
C
D
0
6
3
3
1
4
5
2
3
3
6
0
4
1
3
3
1
4
Player 1’s Payoff
Player 2’s Payoff
D
C
4
1
D
2
2
Two Period Prisoner’s Dilemma Example
After Solving Stage 2 Subgames
Player 1
C
D
Player 2
C
D
Player 1
C
Player 1
D
Player 1
D
Player 2
D
Player 1
D
Player 2
D
Player 2
Player 2
D
D
D
D
3
3
1
4
4
1
2
2
Player 1’s Payoff
Player 2’s Payoff
Two Period Prisoner’s Dilemma Example
After Solving Game As Whole
Player 1
D
Player 2
Therefore, the subgame perfect strategies are
(strategy choice in stage 1,
strategy choice in stage 2 given (D,D) in stage 1,
strategy choice in stage 2 given (D,C) in stage 1,
strategy choice in stage 2 given (C,D) in stage 1,
strategy choice in stage 2 given (C,C) in stage 1)
=
(D,D,D,D,D)
for both players.
Player 1’s Payoff
Player 2’s Payoff
D
Player 1
D
Player 2
D
2
2
So, what is the point?
• If the stage game of a finitely repeated game has a unique Nash
equilibrium, then there is a unique subgame perfect equilibrium where that
Nash equilibrium is played in every stage of the game!
• But what can happen if there is not a unique equilibrium?
• Or what if the stage game can be infinitely repeated?
What about multiple equilibria?
Player 2
L
4
U 4
Player 1
C
5
2
2
M 5
0
3
3
0
D 0
R
0
0
0
0
0
1
1
Consider this modified version of the Prisoner’s Dilemma
and assume T = 2 and wit = 1 for i = 1,2 and t = 1,2.
Starting with Period 2
• There are 9 possible histories for the 2nd period of this game:
– (U,L), (U,C), (U,R), (M,L), (M,C), (M,R), (D,L), (D,C), and (D,R).
• For any subgame starting from one of these histories, there are two
potential Nash equilibria: (M,C) or (D,R).
• Therefore, for an equilibrium strategy to be subgame perfect, it must have
(M,C) or (R,D) in response to the history (x, y) for x = U, M, D and y = L,
C, R in the first period.
Now Period 1
Consider the strategies
s12(h1) = M if h1 = (U,L) and D otherwise
&
s22(h1) = C if h1 = (U,L) and R otherwise.
With these strategies the players’ payoffs for the game starting in period 1 are:
Player 2
L
7
U 7
Player 1
C
6
3
3
M 6
1
4
4
1
D 1
R
1
1
1
1
1
2
2
which yields a subgame perfect equilibrium with cooperative,
Non-Nash stage game play in period 1!
What about infinite repetition?
• First, two definitions:
– Feasible Payoff: Any convex combination of the pure strategy profile payoffs.
• i is feasible if i = sA s ui(s) where s  0 for s  A and sA s = 1.
–
Average Payoff : (1 - ) t = 1  t - 1 ui(ht) where 1    0 is the discount factor.
• Theorem (Friedman 1971): Let G be a finite, static game of complete
information. Let (e1,…,eN) denote the payoffs from a Nash equilibrium of
G, and let (x1,…,xN) denote any other feasible payoffs from G. If xi > ei for
every i and if  is sufficiently close to one, then there exists a subgame
perfect Nash equilibrium of the infinitely repeated game G that achieves
(x1,…,xN) as the average payoff.
• Often referred to as the Folk Theorem, but there are now lots of different
versions of this Folk Theorem.
What does this result mean?
• In infinitely repeated games, we can get lots of subgame perfect equilibria.
• These equilibria can include actions in a stage game that are not Nash
equilibrium actions for that stage game.
• You can get cooperative behavior in a Prisoner’s Dilemma!
Lets see what I mean.
Consider the Prisoner’s Dilemma
C
Player 1
2
D
3
Player 2
C
2
0
0
1
Consider the strategy:
Play C in Period 1,
Play C in period t > 1 if at’ = (C, C) for all t’  t,
Otherwise play D.
Can we find a discount rate such that this strategy is subgame perfect
for this Prisoner’s Dilemma if it is repeated infinitely?
D
3
1
The answer to this question is yes!
• Suppose Player j is playing this type of strategy. At any point in time,
Player j has either chosen D in the past in response to i’s choice of D or he
has always chosen C because i has always chosen C. So, we must consider
whether the strategy above is a best response for player i under both of
these circumstances.
• If D has been chosen in the past, player j will always choose D in the
future. What is optimal for i now will be optimal for i in the future due to
infinite repetition.
– Let VC & VD be the current value of playing strategy C & D.
– If C is optimal, i’s payoff from here on out will be VC = 0 +  VC such that VC =
0.
– If D is optimal, i’s payoff from here on out will be VD = 1 + VD such that VD =
1/(1 -  ).
– VD > VC, so D is optimal.
• If D has not been chosen in the past, player j will choose C in the
immediate future and will continue to do so as long as i does. But if i
chooses D, j will follow suit from here on out. Again, what is optimal for i
now will be optimal for i in the future due to infinite repetition.
– If C is optimal, i’s payoff from here on out will be VC = 2 +  VC such that VC =
2/(1 -  ).
– If D is optimal, i’s payoff from here on out will be VD = 3 + /(1 - ).
– VC >/=/< VD when  >/=/< ½.
To summarize
• As long as  > ½, this strategy will constitute a subgame perfect Nash
strategy for the infinitely repeated Prisoner’s Dilemma.
• This type of strategy is often referred to as a trigger strategy.
– Bad behavior on the part of one player triggers bad behavior on the part of his
opponent from here on after.
Are there other trigger strategies that can work?
YES!
General Trigger Strategy
• Define
– i*: equilibrium payoffs (per stage)
– iD: defection payoff
– iP: punishment payoffs (Nash equilibrium payoff per stage)
• Assume iD > i* > iP
• Cheating doesn’t pay when:
 i*
 iP
D
 i 
1
1
 iD   i*
or 1    D
 i   iP
Are there other types of strategies that can work?
YES! LOTS MORE!
So what are we to make of all this?
• It does provide an explanation for cooperation in games where cooperation
seems unlikely.
• However, the explanation tells us that almost anything is possible.
– So, what type of behavior can we expect?
– The theory provides few answers.
• There has been a lot of research on repeated Prisoner Dilemma games to
understand the best way to play as well as how people actually play. Of
particular interest is Axelrod (1984). Axelrod had researches submit
various strategies and had computers play them to see which ones
performed the best. Tit-for-Tat strategies tended to perform the best the
best.
Application: Cournot Duopoly with Repeated Play
• Who are the players?
– Two Firms
• Who can do what when?
– Firms Choose Output Each Period (qit for i = 1,2) to Infinity & Beyond
• Who knows what when?
– Firms Know Output Choices for all Previous Periods
• How are players rewarded based on what they do?
    s a  qis  q sj qis

–
t
i
s t
Stage Game Output & Profit
• Cournot Nash Equilibrium
– Output
• q1C = q2C = qC = a/3
– Profit
• 1C = 2C = C = a2/9
• Collusive Monopoly Outcome
– Output
• q1M = q2M = qM = a/4
– Profit
• 1M = 2M = M = a2/8
Is it possible to sustain the collusive Monopoly outcome as a
subgame perfect Nash equilibrium with infinite repetition?
Consider the Strategy
• Period 1: qi1 = qM
• Period t > 1:
– qit = qM if qit’ = qjt’ = qM for t’ < t
– qit = qC otherwise
Lets check to see if this proposed strategy is a subgame
perfect Nash equilibrium.
• To accomplish this, we need to show that the strategy is a Nash equilibrium
in all possible subgames.
• Our task is simplified here by the fact that there are only two distinct types
of subgames:
– qit’ ≠ qM or qjt’ ≠ qM for some t’ < t
– qit’ = qjt’ = qM for all t’ < t
First consider qit’ ≠ qM or qjt’ ≠ qM for some t’ < t
• With this history, the proposed strategy says both players should choose qC.
• So, lets see what the optimal output in period t is for Firm i given Firm j will
always choose qC.
    s a  qis  q C qis

t
i
s t

 it  s t
s
C
s
V  t   a  qi  q qi   s a  qis  q C qis

s t
s 0







V  a  q  q q     s a  qis  q C qis  a  qit  q C qit  V
t
i
a  q
V
C

t
i
 q C qit
1   
t
i
s 0
a  q
max V 
qit

 q C qit
1   
t
i
a  2qit  q C
0
1   
3q C  2qit  q C
0
1   
qit  qC
Firm i’s optimal strategy is to choose the Cournot output
just like the proposed strategy says!
Now consider qit’ = qjt’ = qM for some t’ < t
• With this history, the proposed strategy says both players should choose qM.
• So, lets see what the optimal output in period t is for Firm i given Firm j will
always choose qM as long as Firm i chooses qM.
First, suppose that Firm i chooses qM in period t and
forever after.
    s a  q M  q M q M

t
i
s t
 it  s t
V  t    a  q M  q M q M

s t



   s 4q M  q M  q M q M
s 0

M2
2q
1 
Now, suppose Firm i choose something other than qM in
period t.
   a  q  q q     s a  qis  q C qis

t
i
t
t
i
M
t
i
s t

 it
t
M
t
V  t  a  qi  q qi    s a  qis  q C qis

s 0
Recall that we have already solved the optimization problem for
s
s
C
s



a

q

q
q

i
i

which implies qis = qC for all s > t and
s 0



2
qC
s
s
C
s
 a  qi  q qi 

1 
s 0


t
M
t
max
V

a

q

q
q
i
i 
t
qi
q
1 
a  2qit  q M  0
3q M
q 
2
s
i
such that
9 M2
 C2
q 
q
4
1 
9 M2
 16 M 2
 q 
q
4
1  9
V
C2
Finally, Firm i will prefer to choose the Monopoly
output forever after if
2q
9 M2
 16 M 2
 q 
q
1  4
1  9
M2
or

9
17
Therefore, if the discount rate is high enough, the proposed strategy will constitute
a subgame perfect Nash equilibrium in this infinitely repeated game!
Is this the only subgame perfect Nash equilibrium?
• Hardly!
• One criticism of trigger strategies like our proposed strategy is that they do
not permit cooperation to be reestablished.
• It is possible to find subgame perfect Nash equilibrium strategies that allow
cooperation to be reestablished:
– Period 1: qi1 = qM.
– Period t > 1:
• qit = qM if qjt – 1 = qM or qjt – 1 = x
• x otherwise
• Though defining and proving such strategies are subgame perfect can be an
arduous task!