avivzoharjan09.ppt

Download Report

Transcript avivzoharjan09.ppt

Learning Equilibria in
Repeated Congestion Games
Moshe Tennenholtz, Aviv Zohar
Motivation
 The Nash equilibrium is an important concept
 Exists for many reasonable games.
 Provides a good recommendation for a group of players.
 It assumes that all players are fully aware of the game.
 Given each other’s strategies, they can compute a best
response.
Motivation
 What about games with missing information?
 We show the existence of an equilibrium even if
players learn the game while playing it, for a broad
family of games – repeated symmetric congestion
games.
 Our equilibrium will be in pure strategies (rare even for
the regular Nash eq.)
 It will be very efficient – the total cost of all players will
be minimal.
 The repeated game must be long enough.
Congestion Games
A congestion game [Rosenthal] is defined by:
 A set of players N={1…n}
 A set of resources R
 A set of bundles each player i can pick
 A cost function for each resource r, that depends only
on the number of players that have picked this
resource
 This cost is applied to all players that use this resource.
 The strategy of a player is to pick a bundle.
 The cost to a player: the sum of all costs for his bundle.
Example of a Congestion Game
 Number of players: n.
 Resources: the edges in a graph.
 Allowed bundles: All simple paths connecting S to T
 Costs per edge:
0/3/2
0/1/2
0/0/1
2/2/2
S
T
0/4/4
2/2/0
Congestion Games
 Theorem: Every congestion game is a potential game
and thus has a pure Nash equilibrium.
 I.e., there exists a pure strategy profile (each player
deterministically picks a bundle) such that no player
wishes to change his bundle given the choices of the
others.
 Def: A symmetric congestion game is a game where
all players can pick the same set of bundles.
Resource Selection Games
 Dfn: A resource selection game is a symmetric
congestion game where all possible bundles are
of size 1.
 Example: identical processes running on machines.
 Every process chooses the machine to run on
 Running time depends on the number of processes on
that machine.
Our Setting
 Assume that players repeatedly play some congestion
game.
 The number of rounds T is finite.
 The cost of every resource is unknown initially.
 At every round, everyone picks a bundle and observes:
 Their own payment for each resource,
 The actions of all others.
 The total payment is the average over the rounds of the
game.
 Is there an equilibrium of some sort?
 How efficient can we be?
Previous (slightly related) work
 Lots of work in learning that tries to converge to a
Nash equilibrium of the game while learning it.
 The problem: The converging strategies themselves are
not in equilibrium
 Work on equilibria in fully known repeated games.
 Folk theorems guarantee a wide variety of equilibria
 However, it is less realistic to expect players to fully
know the game.
The Learning Equilibrium
 Due to Brafman and Tennenholtz
 A form of ex-post equilibrium
 There is an unknown state of the world S (that affects
the payments in the game)
 Def: Given an unknown repeated game, a strategy
profile for the players is a learning equilibrium if no
player wishes to change strategy, even if it knows the
game.
Previous (more related) work:
 Repeated 2 player games have a (mixed) learning
equilibrium (if you can see the payment of the other
player) [Brafman, Tennenholtz]
 All repeated symmetric 2 player games have a (mixed)
learning equilibrium
 All repeated monotonic resource selection games have
a (mixed) learning equilibrium [Ashlagi, Monderer,
Tennenholtz]
 Our result: A pure equilibrium in all repeated
symmetric congestion games (no limit on number of
players).
Communication between agents
 For the remainder of the talk, I will assume agents can
communicate (Cheap talk) to coordinate through
some channel.
 This assumption is not a must. Agents can
communicate through the repeated game via the
actions they take.
 Such signaling
 makes the proofs a bit more complex
 but has little effect on the game (provided that the game
is long enough, and we communicate only a finite
amount of data)
What can we hope to achieve?
 The cooperative solution is the best we can hope for.
 Denote its total cost by OPT.
 The cooperative solution:
 Players play all combinations of bundles, and learn the
cost of each resource for any load.
 Then compute the optimal joint action
 Play the joint action while taking turns playing the
different roles in it.
 Each player gets OPT/n cost if the game is long
enough.
Our Main Result – Exact Statement
 For any symmetric congestion game G with n players,
and for every ε>0
there exists a number (of rounds) T such that
the repeated game that has T rounds in which we play
G in every round, has an ε-equilibrium in which each
player suffers a cost of at most (OPT/n)+ε.
 An equivalent statement could be made about infinite
games with discounted payments. (Discuss)
Proof
 The equilibrium strategy will consist of 3 behaviors.
Cooperative learning:
1.

Players play all combinations of bundles to learn the
game. If some player deviates, start punishing,
otherwise start playing optimaly.
Playing optimaly
2.

Players play optimally taking turns in different roles.
If someone deviates, start punishing.
Punishment of a deviator
3.

This is the tricky part.
How to Punish
 Let us start by assuming that the deviation occurs after
the game has been learned.
 Assume w.l.o.g. that player n has deviated.
 A punishment strategy in this case:
 All n-1 honest players compute a Nash equilibrium for
the congestion game G, while ignoring the n’th player.
 I.e., they consider the game as having only n-1 players.
 Note that the equilibrium always exists.
Effective Punishment
 Lemma: if all other players play a Nash eq. for n-1
players, the deviating player has a cost no lower than
any other player.
 Proof: Assume that some honest player i gets less than
the deviator. Fix all other players.
Bundle played
by player i.
Bundle played by
the deviator.
 Total cost of player i was greater.
 The difference must come from resources that are not
shared.
 Due to symmetry, player I could have picked the
deviator’s bundle, and would gain by it (in the game of
n-1 players in which player n does not exist)
 This contradicts the fact that player i is playing a Nash
eq. for n-1 players.
Bundle played
by player i.
Bundle played by
the deviator.
Punishing when information is
missing
 Our proof so far relied on full knowledge of the game.
 If the game is unknown, we cannot compute the Nash
eq. of n-1 players.
 In case of missing knowledge, we will optimistically
under-evaluate the costs.
 Now, players will use these under-evaluated costs
when they try to punish a deviator.
 They will compute the Nash equilibrium of the
congestion game with n-1 players, that has resource
costs defined by
And they will repeatedly play this equilibrium.
 Now, again, assume that during some round the
deviator pays less than some player i.
 The difference must come from resources they did not
have in common.
 But then, why didn’t player i switch to the bundle the
deviator has? There can be only one reason:
He under-evaluated his own bundle.
 Therefore, he must have observed something new.
Bundle played
by player i.
Bundle played by
the deviator.
The modified Lemma
At every round of punishment, at least one of two things
must happen:
1. One of the players learns a previously unknown value
2. or, the deviator has a higher cost than any other
player.
Once a player learns a new value, he will broadcast it to
the other honest players.
This way, they have common knowledge of the values
found and can continue to compute the Nash eq.
strategy.
Proving the Theorem
 If no one deviates, players spend a finite amount of
time learning the game, and then play optimally.
 If the game is long enough, they will gain a payment of
OPT/n+ε/2.
 If one player deviates, he can only do better than the
other players for a finite number of rounds.
For the rest of the game he gets
So if the game is long enough, his gains in the finite
number of rounds are dwarfed by this high cost. He
gains at best some small ε.
Imperfect monitoring
 What can players observe during the game is critical.
 The theorem also holds for weaker levels of
monitoring.
 E.g., Let us now assume that players see the actions of
other players only where they select the same
resources that they have.
 Can we still detect deviations, punish and coordinate?
 One of the main problems is communication. Players
can still signal, but no longer broadcast to all others at
the same time (unless they are on the same resource).
Imperfect Monitoring
 Assume some honest player observes some other




player deviating from the proposed strategy.
It has to call this into the attention of the other players.
He does so by deviating himself, and notifying some of
the others.
They in turn deviate and notify others, etc.
After every player has seen some other player deviate,
we have to find out who to punish, and how.
Blaming others.
 Each player will signal which
other player he has seen
deviating, and when this
deviation occurred.
T+2
T+1
T+1
 Everyone suspects the player who
has been reported as the deviator
in the earliest round.
T
Actual deviator
Blaming others.
 But the deviator may also lie.
T+2
 To throw off the blame, he can
try to say that he saw someone
else deviate in an earlier round.
 So the players must suspect:
T+1
T-1
T+1
T
 The earliest reported deviator
 The player that reported him
Actual deviator
How to Punish
 So how can we punish in this case?
 Note that the identity of the deviator is important.
 All other players need to compute a Nash eq. for n-1
players, and play it.
 Each bundle in the Nash eq. has to be picked by one of
the players.
 Solution: tell both suspect players to pick the same
bundle (that is part of the Nash eq.). At least one of
them is honest, and will play that strategy.
 The other player must have a high cost.
Non-Detailed monitoring
 Another way to restrict the level of monitoring, is to
allow players to see only their total cost, without
details regarding each resource.
 If all players see enough combinations of profiles, they
will be able to deduce all the needed information
about the cost of resources.
 The problem: There is no way to under-evaluate the
costs of resources in a way that can be used to punish
the deviator.
Example
 Let us look at a game with 2 players that has 3 bundles
A,B,C.
 Assume that player 1 has observed
the costs in the table below
Player 1’s action
Player 2’s action
cost
A
C
C(A)=1
B
A
C(B)=1
C
B
C(C)=1
 The scenario is completely symmeric
Player 1’s action
Player 2’s action
cost
A
C
C(A)=1
B
A
C(B)=1
C
B
C(C)=1
 A possible assignment of costs:
All 3 symmetric assignments are
also possible
 So in fact, any resource can be valued at cost 0 (when
one player visits it)
 There is in fact no sure way to punish the deviating
player with a constant pure strategy.
 If player 1 picks bundle α,
the deviator can pick γ and gain
without revealing new information
 A pure strategy that does punish (or learn):
Select α,β,γ in sequence
 Conjecture: There is a pure strategy learning
equilibrium even in the non-detailed monitoring case.
 Theorem: there is a mixed strategy equilibrium in the
case of non detailed monitoring.
 The equilibrium strategy punishes by playing the Nash
equilibrium of the known part of the game, and with
some small probability does a random exploratory
action.
Asymetric Congestion Games
 Theorem: There exists an asymmetric congestion game
with no learning equilibrium (not even mixed).
 In fact, it is even an asymmetric resource selection
game.
0.5 or 1000
Bundles allowed
for player 1
0/1
0.5 or 1000
Bundles allowed
for player 2
 If some player has a cost of 1000 on his private
resource, his best option is to select the shared
resource all the time.
 If the other player has 0.5 on his resource, his best
choice is to play that resource all the time.
 If both players have 0.5, at least one of them pays more
than 0.25
 He can pretend to have 1000 on his resource, and play
the shared resource, to get a cost of 0.
0.5 or 1000
0/1
Bundles allowed for player 1
0.5 or 1000
Bundles allowed for player 2
 This is a very interesting game.
 It is quite unclear how to play it rationally.
0.5 or 1000
0/1
Bundles allowed for player 1
0.5 or 1000
Bundles allowed for player 2
Strong Equilibria in repeated
congestion games
 There exists a (symmetric) resource selection game
that has no strong equilibrium.
 Assumption: the deviators can correlate their action.
 Observe the following game with 3 players:
1/2/2
1/2/2
1/2/2
1/2/2
 The total cost to all players is at least 5 in any profile.
 Any pair of players have a cost of at least 3.
 In any strategy profile there exist 2 players that each
have a cost of 1.5 or more, and at least one pays strictly
more.
 These 2 players can deviate, play on different
resources, and get a payment of 1.5 each.