Transcript Document

Fictitious Play
The Theory of Learning in
Games
D. Fudenberg and D. Levine
Speaker: Tzur Sayag
03/06/2003
Do you believe that PM Sharon is
serious about the peace process?





A voter has to decide if he should support PM
Sharon
Belief: Sharon will never evacuate settlements
•
Action: Vote against the new economics revolution.
May 24: Sharon announces “occupation is no-good”
Belief: Sharon will probably never evacuate settlements
•
Action: Vote against the new economics revolution
Jun 5: Sharon meets Abu-Mazen and declares support for a
Palestinian state.

Belief: Seems like Sharon might evacuate the settlements after all
•
Action: Vote for the new economics revolution.
Roadmap




Introduction to the common models of
learning in games
Cournot adjustment
Fictitious play and Nash equilibriums
•
•
•
Motivation
Definitions
Results
Generalizations of fictitious play – if we have
time
Notations
P1 gets a1 and p2 gets b1 if they play
Action1,Action1 respectively
Player 2
Action1 Action2
Action1 (a1,b1) (a2,b2)
Player 1
Action2 (a3,b3) (a4,b4)
Learning in Games - 1



Repeated games – same or related
fixed-player model
Teach the opponent to play a best
response to a particular action by
repeating it over and over.
Being Sophisticated – Example




D is dominant for Bob.
If Alice learns Bob only plays D, game
converges to <D,L>
Bob’s payoff for <D,L> is 2.
If Bob is patient, he can play U always and
just wait for a while
If Bob always plays U,
•
•


Alice who thought Bob’s gonna play D should shift
its play from L to R (since R was only good when
Bob actually played D)
So Bob plays constant U which leads Alice to play
constant R with payoff 2 > 1.
in this case Bob gets 3 which is better.
Bingo!
Alice
Bob

L
R
U
1,0
3,2
D
2,1
4,0
Being Sophisticated –
Abstracting

Most learning theory rely on models in
which the incentive is small to alter the
future play of the opponent.
• Locked in for 2 periods
• Large anonymous population

Embed a two player game by pairing
players randomly from a large
population.
Models of Embedding

Single-pair model
• random single pair, actions revealed to
everyone

Aggregate static model
• all players randomly matched, aggregates
outcomes revealed to everyone

Random-matching model
• all players randomly matched, each player
sees his game outcome only
Three common models of
Learning



Fictitious play
•
Players observe only their own matches and play a
best response to the frequencies.
Partial best-response
•
A fixed portion switches each period from its current
action to a BR to the aggregate stats from the
previous period.
Replicator Dynamics
•
The share of the population using each strategy grows
proportionally to that strategy’s current payoff.
Cournot Adjustment – a flavor of
analysis






Two firms 1 and 2.
iє[0,∞)
BR is unique
since u is concave
so the relevant
u’’ is
Strategy:
choose
a
quantity
s
positive, this means that u’ is a monotone increasing
function which means it has at most
one zero which
i, s-i>єS
Strategy
profile
is
<s
means, yes, you guessed it right U only has one
point and
thei, max
is therefore unique.
i(<s
-i>)
Utilityuextreme
for
i
is
u
s
can’t be fixed since it is STRICTLY concave by
Assume ui(<., s-iassumption
>) is strictly convex
BR(s-i) = argmax ( ui(<x, s-i>) )
xєS
Cournot Adjustment Model





time periods t = 0,1,2,…, discrete
State profile θ0 єS
in each period the player chooses a pure
strategy that is BR to the previous period
Formally i chooses st=BR(s-it-1)
Cournot Dynamics
Reaction Curve
BR1
For every θ2 the line states the
BR of player 1 against it. The
value for player 1 is the
“height” at point θ2
θt= (θt1 , θt2)
θ2
θt+1
θt2
Can you
convince
yourself this
point is a Nash?
BR2
θt1
New BR if 2 plays θt2
θ1
Cournot Dynamics







A movement between profiles such that
θt+1 = f(θt) , fi(θt) = BRi(θt-i)
A steady state is θs s.t. θs = f(θs)
Once θt= θs the system remains there
Claim (simple) θs is a NASH
Proof: by definition for every player θs=BRi(θ-i), so players
don’t want to move.
SO: EVERY STEADY STATE IS A NASH EQUILIBRIUM
Cournot Dynamics – oblivions to
linear transformation


Proposition 1.1: Suppose u’i(s)=a·ui(s) + vi(s-i) for all
players I, Then u’ and u are best-response equivalent
Proof:
•
•

vi(s-i) is dependent on the opponent’s play so it does not change
the “magnitude” order (“seder”) of my actions
Multiplying all payoffs by the same constant a has no effect on the
order
So, a transformation that leaves preferences, and
consequently best responses, will give rise to the same
dynamic learning process.
Cournot Dynamics and Zero
sum Games



Recall: payoffs in ZSG add to zero.
Proposition 1.2: every 2 x 2 game for which the best
response correspondences have a unique intersection that
lies in the interior of the strategy space is best-response
equivalent to a zero-sum game.
Proof: given G, a 2x2 game, with unique intersection,
•
•
w.l.o.g. assume
1) A is BR for player 1 against A
2) B is BR for player 2 against A
If A was also a BR for player 2 then <A,A> is a BR
correspondence at a pure profile which contradicts our
assumption.
Cournot Dynamics and Zero
Sum Games 2






Proof outline: Given G, the 2x2 game
with unique intersection, we build a zero
sum game that has the same Best
Responses. Observe the following zsg.
If a<1 then BR1(A)=A
since u1(A,A) = 1 but u1(A,B) is “only” a
If a<1 then BR2(A)=B
since u2(A,B) = 0 but u1(A,A) is “only” 0
Denote σi player i’s probability to play A
Claim 1: player 1 is indifferent between A
and B if, σ2 =a · σ2 + b · (1- σ2)
Claim 2: player 2 is indifferent between A
and B if, σ1 + a ·(1-σ1) = b · (1- σ1)
A
B
A
(1,-1)
(0,0)
B
(a,-a) (b,-b)
Proof of:
player 1 is indifferent between A and B if, σ2 =a · σ2 + b · (1- σ2)


Assume σ2 = a · σ2 + b · (1- σ2) (*)
(1) If player 1 plays A he (1) gets:
•
u1(A,?) = σ2 · (u1(A,A)) + (1 - σ2) · (u1(A,B))



σ2 · (1) + (1- σ2) · (0) = σ2
(by the game table)
(2) If player 1 plays B he gets:
•
u1(A,?) = σ2 · u1(B,A) + (1 - σ2) · u1(B,B)


(σ2 is the prob. 2
plays A)
σ2 · (a) + (1- σ2) · b
(by the game table)
So if (1) = (2) he does not care which to choose,
(1) = (2)

σ2 = σ2 · (a) + (1- σ2) · b as required.
Proof of claim 2 regarding 2’s indifference follows the same path.
Mental note:
σ = Pr[player i playing A]
Proof cont:
Building the ZSG Game
i




1 is indifferent between A and B if σ2 =a · σ2 + b · (1- σ2)
2 is indifferent between A and B if σ1 + a ·(1-σ1) = b · (1- σ1)
Fixing an intersection point σ1, σ2 We can solve for the
unknown payoffs a,b:
a= (σ2 – σ1) / (1+ σ2 · σ1)
Notice that (σ2 – σ1) < 1 (σi > 0 otherwise i never plays A…)
(σ2 – σ1) < 1 implies a<1 (since (1+ σ2 · σ1)>1) Q.E.D.
We already showed that when a< 1 it means that we get the
same best responses we had in the original game G:
A for player 1 against A, B for player 2 against A

To sum up:
it should have been obvious that (σ1, σ2 ) is a Nash, the
point was to find a 2x2 ZSG which has the same best
responses as the original game
Strategic-Form Games



Finite actions
One shot simultaneous-move games
{Players, strategy space, payoff
functions} is the strategic form of a game
Nash and Correlated Nash

A game can have several Nashs
<A,A>, <B,B>,<(1/2,1/2), (1/2,1/2)>
but the payoffs may be different.
<A,A> gets 2 for each
<(1/2,1/2), (1/2,1/2)> gets 1 for each.


Lets question the robustness of the mixed
strategy Nash point.
Intuitively, at the mixed, players are
indifferent (“in real life”) play A,B
whatever…so one may believe that the other
one plays A with slightly more probability. He
then wants to switch to pure A so the
robustness of Nash seems questionable..
A
B
A
(2,2) (0,0)
B
(0,0) (2,2)
1 1
1 3
1
1
1
u ( , ,  , )  u ((  1,0    0,1 ), ( 
2 2
4 4
2
2
4
1
1
3
u ((  1,0 ), (  1,0 ,  0,1 )) 
2
4
4
1
1
3
u ((  0,1 ), (  1,0 ,  0,1 )) 
2
4
4
1
1
1
3
u ((  1,0 ), (  1,0 ))  u ((  1,0 ), (  0,1 ))
2
4
2
4
1
1
1
3
u ((  0,1 ), (  1,0 ))u ((  0,1 ), (  0,1 ))
2
4
2
4
1 1
3
 u ( 1,0 ,  1,0 )  u ( 1,0 ,  0,1 ) 
2 4
8
1
3
u ( 0,1 ,  1,0 )  u ( 0,1 ,  0,1 ) 
8
8
1
3
1
3
2 6 8
2  0  0  2    1
8
8
8
8
8 8 8
Nash and Correlated Nash



A Nash is strict if for each player i, si is the
unique best response to s-i
Only pure strategies can be strict since if a
mixed is BR than so is every pure strategy in
the mixed strategy’s support otherwise there is
no point of including it.
Recall: Support for a mixed strategy are the
pure strategies that participate with positive
probability.
Some Questions in Theory of
Games



When and why should we expect play to
correspond to a Nash equilibrium
If there are several Nash equilibria, when one
should we expect to occur?
In the previous example, in the absence of
coordination, we are faced with the possibility
that player 1 expects NE1=<A,A> so he plays
A, the opponent might expect NE2=<B,B> and
he plays B, with the results of the nonequilibrium outcome profile <A,B>
The Idea of Learning based
explanation of equilibrium


Intuitively, the history of observations
can provide a way for the players to
coordinate their expectations on one of
the two pure-strategy equilibrium.
Typically, Learning models predict that
this coordination will eventually occur,
with the determination of which of the
two eq. arise left to initial conditions or to
random chance.
The Idea of Learning based
explanation of equilibrium


For the history to serve this coordination role,
the sequence of actions played must
eventually become constant or at least readily
predictable by the players, of course, there is
no presumption that this is always the case.
Perhaps, rather than going to a Nash, players
wander around the space aimlessly, or
perhaps play lies in some set of alternatives
larger than the set of Nashs?
The Idea of Learning based
explanation of equilibrium


For the simple coordination
game (symmetric <2,2>,
<0,0>) there is no reason to
think that any learning
process will prefer one Nash
over the other.
What if we alter it such that
there is a better Nash. Will
the players learn to play the
<A,A> Nash?
Altered
(2,2) (-a,0)
(0,-a) (1,1)
Correlated Nash (Aumann 74)



Suppose the players have access to
randomized devices that are privately viewed.
If a player chooses a strategy according to his
own randomized device, the result is a
probability distribution over strategy profiles,
denoted μєΔ(S).
Unlike a profile of mixed strategies which is by
definition uncorrelated, such a distribution may
be correlated.
Correlated Nash – Jordan’s
matching pennies








3 players.
Each chooses H or T
Payoffs are +1 or -1 only
1 wins if he matches 2
2 wins if he matches 3
3 wins by not matching 1
This game has a unique NE,
each play (1/2,1/2)
However:
It has many correlated NE.
Player 3 plays H
H
T
H
1 ,1,-1
-1,-1,-1
T
-1, 1, 1
1 ,-1, 1
Player 3 plays T
H
T
H
1 , 1, -1
-1,-1,-1
T
-1, 1, 1
1, -1, 1
Correlated Nash – Jordan’s
matching pennies




C-NE: unified distribution over these 6 profiles:
(H,H,H) (H,H,T) (H,T,T) (T,T,T) (T,T,H) (T,H,H)
Each player has 50% to play H.
No weight is placed on (H,T,H), so the play of
the players is not independent (it is correlated)
For Player 1: When he plays H he faces
1/3 chance each of his opponents play
(H,H), (H,T),(T,T). Since his goal is to
match 2, he wins 2/3 of the times by
playing H and only one third if he plays Y.
similarly if he plays T his opponents might
only play (T,T), (T,H), (H,H). Now tails win
2/3 of the times as against heads which
wins only 1/3 of the time. So he is evened.
He is at a Nash.
Player 3 plays H
H
T
H
1 ,1,-1
-1,-1,-1
T
-1, 1, 1
1 ,-1, 1
Player 3 plays T
H
T
H
1 , 1, -1
-1,-1,-1
T
-1, 1, 1
1, -1, 1
Why is Correlated Nash of
Significance?




Hint – Cycles create correlation between
profile strategies.
Informally a cycle is a finite sequence of
profiles of length k such that s0=sk.
Cournot play can exhibit cycles –
example follows.
So cycles => correlation => correlated Nash
Cournot Cycle - matching
pennies.





3 player (head, tail)
1 wants to match 2.
2 wants to match 3.
3 wants to un-match 1.
[Cournot:] means each
player assumes his
opponents play the
same as in their last
step
Current
Profile
(s1,s2,s3)
Remarks
H,H,H
P3: H->T (switch)
P2: H->H (stay)
P1:H->H (stay)
H,H,T
P2: H->T
H,T,T
P1:H->T
T,T,T
P3: T->H
T,T,H
P2: T->H
T,H,H
P1:T->H
H,H,H
P3: H->T
Roadmap

Introduction to the common models of
learning
Cournot adjustment
Fictitious play and Nash equilibriums

Generalizations of fictitious play


• Motivation
• Definitions
• Results
Fictitious play - Introduction




Motivation…
Repeated game, stationary
assumption….
Each player forms a belief of his
opponents “strategy” by looking at what
happened
Player plays Best Response according
to his/her belief
Two-Player Fictitious Play notations



S1 and S2 are finite actions spaces for players
one and two respectively.
•
•
S1 = {■,●,▲}
S2 = {♥,●,♦}
•
•
•
•
u1(■, ♥)=15
for mixed strategy we take
u1(<½,½>,<¼, ¾> )= u1(
u1(■,<¼, ¾> )=¼ u1(■, ♥) + ¾ u1(■, ♦)
u1, u2 – player payoff functions
Player is pi, opponent is p-i
i={1,2}
Two-Player Fictitious Play

Notion of belief
• A prediction of the opponent action
distribution the degree to which 1 believes 2
will play ●

Assume players choose their actions for
each period to maximize their expected
payoff, with respect to their belief for the
current period.
Two-Player Fictitious Play –
Forming Beliefs


Player i starts with a weight function K0i
K0i : S-i → +
• For example:
• K0i:{▲,■,●} → +
• K0i(■)=4

As the game is iteratively repeated K is
updated
Two-Player Fictitious Play –
Belief update


If some action say ■ was played (by the opponent!)
the last time, we add 1 to it’s count, generally:
Kt (s-i) =
1
if s-it-1 = s-i
0

otherwise
That’s a complicated way of saying that K(s) simply
counts the number of times the opponent played s.
Two-Player Fictitious Play – Using
frequencies to form beliefs
i



Kt (s )
Given K the frequency
i
i

(
s
)

t
i ~ i
vector,
Kt ( s )
~
Each player forms a
s S i
probability vector  over
his opponent’s actions
His belief can be said to be
that the
Reads: the belief player i

•

i
Pr[i plays ■] = Kt(■) / #steps
Simple normalization
holds at time t regarding
the probability of his
opponent to plays s-I in
time t
Two-Player FictitiousMyPlay
–
belief is that my
Using frequencies 2 opponent plays ♥ with



probability ½, ,● with prob
¼ and ¼ ♦, looking at my
payoff table, by playing ■ I
can max the utility
We now have a belief of how the
opponent plays.
A FP is any rule ρit which assigns a Best
Response action to the belief it
Example:
• ρ1(<½,¼,¼>) = ■ (extend naturally to mixed)
•
This implies that u1([■, <½,¼,¼>]) is “better” for
player 1 than any other action against <½,¼,¼>
Two-Player Fictitious Play –
remarks


Many BR are possible for a given belief set
An example of such rules ρ may be:
•
•
•
Always prefer pure action over mixed action
Pick the best response for which your action index is
least, (that’s the limit of my creativity)
(both of course must still be best responses)
Two-Player Fictitious Play –
Interpretation (page 31-32)






Bayesian inference
Player i believes opponents’ play corresponds to a sequence of i.i.d.
multinomial random variable with a fixed but unknown distribution.
Player i’s prior over that unknown distribution takes the form of a
Dirichlet distribution.
i’s prior and posterior belief corresponds to a distribution over the set
Δ(S-i) of probability distributions over S-i
The distribution over oppnent’s strategies I t is the induced marginal
distribution over pure strategies.
If beliefs over Δ(S-i) are denoted μi, then we have:
 ( s )    (s )  [d ]
i
t
i
i
 i
i
i
t
i
Two-Player Fictitious Play –
Interpretation

Denote the marginal empirical distribution
as:
k (s j )  k (s j )
dt j (s j ) 



t
0
t
The assessment  is not the same as d because
of the influence of i’s prior belief
This has the form of a “fictitious sample”
observed before the game started.
As observations are incorporated into , it will
converge to d (the empirical distribution)
Two-Player Fictitious Play –
Interpretation

Notes:
•
•
•
•
•
As long as the initial weights are positive it will stay positive
The belief reflects the conviction that the opponent strategy
is constant and unknown.
It may be wrong If the process cycles.
Any finite sequence of what looks like a cycle is actually
consistent with this assumption that the world is constant
and those observations are a fluke
If cycles persist, we might expect i to notice it but in any
case, his beliefs will not be falsified in the first few periods
as they did in the Cournot process.
Asymptotic Behavior – does play
converges


Sufficient conditions
Proposition 2.1
• (1) if s is a strict Nash and is s is played at
•
time t in the process of FP then s is played at
all subsequent dates
(2) any pure-strategy steady state of FP must
be a Nash
If ŝ a strict Nash and played at time
t ŝ is played at all subsequent dates

Proof:
•
•
•
Suppose it (players beliefs) are such that the actions
are strict Nash s’.
[believe me that:] When profile ŝ is played at time t,
each players belief at t+1 are a convex combination of
it and a mass point on ŝ-i: it+1 = (1-αt) it + αtδ(ŝ-i)
we get:
uti ( sˆ i ,  ti1 )  uti ( sˆi , [(1   t )   ti   t   ( sˆ i )]) 
uti ( sˆ i , (1   t )   ti )  uti ( sˆi ,  t   ( sˆ i )) 
(1   t )  uti ( sˆ i , ti )   t  uti ( sˆi ,  ( sˆ i ))
If ŝ a strict Nash and played at time
t ŝ is played at all subsequent dates

We want to show that this payoff is still better
than any other payoff involving it+1
•
•
•
i i
i i i
i i
i
ˆ
ˆ
ˆ
ˆ
u (s ,  t 1 )  (1 t )  ut (s , t )  t  ut (s ,  (s ))
i
t
Now ŝi was a strict BR for it
Should be obvious for the first term (by assumption
that it is strict BR for it).
for the second term, note that for the point mass it is
obvious that ŝi is better because it implies that the
profile < ŝi , ŝ-i > is a Nash which was our assumption
 101 1 (C ) 
K10 (C )mass
 1 K10on
(C )ŝ-i1and why is
So, what is a point

 
i
10  1
11of it11and of it ?
 t+1 a convex combination
1
•
•
•
•
•
•
•
1
K101 (C ) 10 1
 you that:
 
(1-α ) i + α δ(ŝ)
i
I need to show
=
t
t
t
11 10 t+111
1
For clarity, letsK say
t=10,
there
are 2 players,
(
C
)
10
1
 10
  
Lets say ŝ={( ½,
½
),
10 11( ¼11, ¾ )}
S-i=S2={C,D}10
and 1look at 1110+1(C)
  10 1 (C ) 
recall 110 (C)=K
11 10 (C) /1011 (ignore prior it matters not)
Suppose that at time 10 player 2 actually played C
(he played a mixed which is interpreted that ¼ of
the times he would play C..)
111 (C) = K10 (C) + 1 / 11
2nd part: Any pure-strategy steady
state of FP must be a Nash
•
•
•
•
•
•
A steady state is a strategy profile that is played in
every step after perhaps a finite time T.
Ideas?
If play remains at a pure-strategy profile then
eventually the assessments will become concentrated
at this profile.
If it was not a Nash for one of the players, him playing
what he played would not be a BR, this is a
contradiction to how FP works,
Since all players always play BR according to their
belief.
Food for thought: Why does it not work for mixedstrategy profile?
To Conclude this
• we wanted to show that “if s is a strict Nash
•
•
•
and is s is played at time t in the process of
FP then s is played at all subsequent dates”
We showed it by looking at what happens to
players belief and prove that the actions at
given the new belief are still strict BR.
This means the system is at a steady state.
We also showed that if it is a pure-strategy
steady state it is a Nash.
No Pure Nash => FP can’t
converge to a pure profile






Matching pennies
For example:
At time=3 player I believes
that II prefers Tails, so he
plays Tails to match
But II plays Heads so I adds
one to Heads
Now Heads>Tails and I
convinced himself II will play
Heads so he switches to H
The game cycles and never
converges to the Nash
profile.
H
T
H
1,-1
-1,1
T
-1,1
1,-1
T
Profile
I
II
1
Initial
(1.5,2)
(2,1.5)
2
(T,T)
(1.5,3)
(2,2.5)
3
(T,H)
(2.5,3)
(2,3.5)
4
(T,H)
(3.5,3)
(3,3.5)
5
(H,H)
(4.5,3)
(4,3.5)
No Pure Nash => FP can’t
converge to a pure profile



If the game did “converge” it would be in a
steady state that is pure and not a Nash (since
matching pennies has no pure Nash) but we
showed that any pure-steady state must be a
Nash.
Its ok then that the game does not converge.
Interestingly, the empirical distributions over
player i’s strategies are converging to ( ½ , ½ )
their product {( ½ , ½ ), ( ½ , ½ )} is a Nash.
Asymptotic Behavior


Proposition 2.2:
•
If the empirical d over each player’s choices
converges, the strategy profile corresponding to the
product of these distributions is a Nash
Proof:
•
•
intuitively, if the empirical does converge, then the
belief converges to the same thing, hence, if it was not
a Nash players would move from there.
Generally, for this it is enough that the beliefs are
“asymptotically empirical”, need not be FP
Asymptotic Behavior

More results (proof omitted)
• The empirical converges if:
• (1) generic payoff and 2x2 game
• (2) zero sum
• (3) solvable by iterated strict dominance

The empirical distribution however need
not converge! 2 examples.
Example – Shapley (1964)
L
M
R
T
(0,0)
(1,0)
(0,1)
M
(0,1)
(0,0)
(1,0)
D
(1,0)
(0,1)
(0,0)
<T,M>
<D,M>
<T,M>
<T,M>
<D,L>
<M,L>
<T,R>
<M,R>
Nash is at ( 1/3, 1/3,1/3) for both.
if initial weights lead to <T,M> we cycle.
Diagonals are never played. (<T,L>, <D,M>,<T,M>
the number of consecutive periods each profile is played increases sufficiently
fast so the empirical distributions never converges.
Example – due to Jordan






Increase to (1/2,1/2) the diagonal.
We get “Even-Nyar Umisprayim” – a zero sum game with
the same Nash
Here, since the empirical do converge, there are still
cycles, but the rate of repeated profiles within a sequence
grows slowly.
Note that this does not conflict with our previous
statements, namely
Here’s a zsg who’s “empirical” converge but it’s not
steady (cycles)
Had the empirical converged to a pure-strategy Nash the
fact that it cycled would have been a conflict to <LINK>
Payoffs in Fictitious Play


The question we deal with here is if FP
“learns” the distributions then it should,
asymptotically yield the same utility that
would be achieved when the frequency
distribution is known in advance.
Here we will suppose more than 2
players, their assessment track the joint
distribution of opponent strategies.
Payoffs Fictitious Play Notations




Empirical joint distribution Dt i
ˆ i  max u i ( i , D i )
U
t
t
t
Best payoff against the empirical

1 t i i i
i
Time avg of realized payoffs U t   ut ( s , s )
t  1
Definition:
i
• Fictitious play is ε-consistent along a history if
there is T such that for any T:
i
ˆ
U    Ut
i
t
Payoffs Fictitious Play Notations




Empirical joint distribution Dt i
ˆ i  max u i ( i , D i )
U
t
t
t
Best payoff against the empirical

1 t i i i
i
Time avg of realized payoffs U t   ut ( s , s )
t  1
Definition:
i
• Fictitious play is ε-consistent along a history if
there is T such that for any T:
i
ˆ
U    Ut
i
t
Consistency


We don’t look at how good the player
does globally but how good he does with
comparison to his expectations which
are built upon his beliefs.
If the FP is not consistent it would be
much less interesting model, it would be
as if someone simply plays a different
game. So, we want consistency.
Consistency

If A game is consistent it can be useful, if
after some period of time, a player sees
that his expectations are not fulfilled (by
comparing the expected payoff with the
actual payoff), he can deduce that
something is wrong in his model of the
world.
Consistency – main result


Relate player frequency of switching
strategies of a player to the consistency
of his play.
We define the frequency of switches ηit
to be the fraction of periods   t for
i
i
which s  s 1
Fictitious Play and BR dynamics

Definition
• Fictitious play exhibits infrequent switches
along a history if for every ε > 0 there is a T
and for any t  T, ηit  ε for all i.

Proposition (Fudenberg and Levine 94)
• If FP exhibits infrequent switches along a
history, then it is ε-consistent along that
history for every ε > 0
infrequent switches → ε-consistency

Intuition:
•
•
Once prior looses it’s influence, at each date player i
plays BR to the empirical (observations) through date
t-1.
On the other hand, if i is not doing on average as well
as best response to the empirical, there must be a
nonmalleable fraction of dates t for which i is not
playing BR, but in this case, player i must switch in
date t+1. conversely, infrequent switches imply that
most of the time, i’s date-t action is a best response to
the empirical distribution at the end of date t.
Proof - notations
k = length of initial history (kept as prior belief)
i

 t = initial belief
i
i
ˆ


t

= best response strategy of i to t
 player i expected date-t payoff is

Uti*  max i (ui ( i ,  ti ))
Proof - summary



We showed that if there are not many
switches along the players history, his
play is consistent.
It means that the actual payoffs do not
flounder below the expectation!
Again, one can use this to see if
something is wrong with his model of the
env.
Roadmap

Crash introduction to the common
models of learning
Cournot adjustment
Fictitious play and Nash equilibriums

Generalizations of fictitious play


• Motivation
• Definitions
• Results
Generalizations of Fictitious
Play


Mainly generalize the update rule for the
belief
Example: exponential weighting
t

1


 ti ( s j )  
 1 

 t 1 t
    I (s  s j )
  1
As long as beliefs are asymptotically
empirical what we showed for FP holds.
Summary




Dynamics of Games and a flavor of analysis
Although different from Nash analysis, Nash is
still an important point, we showed that it is still
a point FP can converge to
FP is a consistent play
Have a nice summer vacation & thanx for
listening.