Lecture II: complex Networks

Download Report

Transcript Lecture II: complex Networks

Lecture V: Game Theory
Zhixin Liu
Complex Systems Research Center,
Academy of Mathematics and Systems Sciences, CAS
In the last two lectures, we talked about
Multi-Agent Systems
Analysis
 Intervention

In this lecture, we will talk about

Game theory
complex interactions between
people
Start With A Game
Rock-paper-scissor
B
rock
A
paper scissor
rock
0,0
-1,1
paper
1,-1
0,0
scissor -1,1
1,-1
1,-1
-1,1
0,0
Other games: poker, go, chess, bridge, basketball, football,…
From Games To Game Theory

Some hints from the games





Games are everywhere.




Rules
Results (payoff)
Strategies
Interactions between strategies and payoff
Economic systems: oligarchy monopoly, market, trade …
Political systems: voting, presidential election, international relations …
Military systems: war, negotiation,…
Game theory
the study of the strategic interactions among rational agents.
Not to beat the
 Rationality
other players
implies that each player tries to maximize his/her payoff
History of Game Theory





1928, John von Neumann proved the minimax
theorem
1944, John von Neumann & Oskar Morgenstern,
《Theory of Games and Economic Behaviors》
1950s, John Nash, Nash Equilibrium
1970s, John Maynard Smith, Evolutionarily stable
strategy
Eight game theorists have won Nobel prizes in
economics
Elements of A Game



Player:
Who is interacting? N={1,2,…,n}
Actions/ Moves: What the players can do?
Action set : Ai  ai1, ai 2 ,, aili 
Payoff: What the players can get from the game
ui : in1 Ai  R
Strategy


Strategy: complete plan of actions
Pure strategy is a special
Mixed strategy: probability distribution
over the
kind of mixed strategies
pure strategies
li




Si  si si  ( si1 , si 2 , , sili ), sij  0,  sij  1


j 1



Payoff:
u ( s1 , s2 )  i  j s1i s2 j u (a1i ,a 2 j ),   1,2.
An Example: Rock-paper-scissor
Players: A and B
 Actions/ Moves:
{rock, scissor, paper}

Payoff:
u1(rock,scissor)=1
u2(scissor, paper)=-1
 Mixed strategies
B

A
rock
paper scissor
rock
0,0
-1,1 1,-1
paper
1,-1
0,0
scissor -1,1
1,-1
s1=(1/3,1/3,1/3)
s2=(0,1/2,1/2)
u1(s1, s2) = 1/3(0·0+1/2·(-1)+1/2·1)+
1/3(0·1+1/2·0+1/2·(-1))+1/3(0·(-1)+1/2·1+1/2·0)
=0
-1,1
0,0
Classifications of Games





Cooperative and non-cooperative games
Cooperative game: players are able to form binding commitments.
Non cooperative games: the players make decisions independently
Zero sum and non-zero sum games
Zero sum game: the total payoff to all players is zero. E.g., poker, go,…
Non-zero sum game: e.g., prisoner’s dilemma
Finite game and infinite game
Finite game: the players and the actions are finite.
Simultaneous and sequential (dynamic) games
Simultaneous game: players move simultaneously, or if they do not move
simultaneously, the later players are unaware of the earlier players' actions
Sequential game: later players have some knowledge about earlier actions.
Every player know the
Perfect information and imperfect information
games and payoffs of
strategies
Perfect information game: all players know the moves
previously
made
the other
players but
not
by all other players. E.g., chess, go,…
necessarily the actions.
Perfect information ≠ Complete information
We will first focus on games:
Simultaneous
Complete information
Non cooperative
Finite
What is the solution of the game?
Assumption

Assume that each player




knows the structure of the game
attempts to maximize his payoff
attempt to predict the moves of his
opponents.
knows that this is the common knowledge
between the players
Dominated Strategy
A strategy is dominated if, regardless of what any
other players do, the strategy earns a player a
smaller payoff than some other strategies.
S-i : the strategy set formed by all other players
except player i
Strategy s' of the player i is called a strictly
dominated strategy if there exists a strategy s*,
such that
ui (s* , si )  ui (s' , si ),  si  Si
Elimination of Dominated Strategies
Example:
L
M
R
L
R
U
4,3
5,1 6,2
U
4,3 6,2
M
2,1
8,4 3,6
M
2,1 3,6
D
3,0
9,6 2,8
D
3,0 2,8
L
U
4,3 6,2
(U,L) is the solution of the game.
A dominant strategy may not exist!
L
R
U
4,3
Definition of Nash Equilibrium

Nash Equilibrium (NE): A solution concept of
a game






(N, S, u) : a game
Si: strategy set for player i
: set of strategy profiles
: payoff function
s-i: strategy profile of all players except player i
A strategy profile s* is called a Nash equilibrium if
ui (si* , s*i )  ui ( i , s*i ), i
where σi is any pure strategy of the player i.
Remarks on Nash Equilibrium

A set of strategies, one for each player, such that
each player’s strategy is a best response to others’
strategies
Best Response:
The strategy that maximizes the payoff given
others’ strategies.
 No player can do better by unilaterally changing his
or her strategy
 A dominant strategy is a NE

Example



Players: Smith and Louis
Actions: { Advertise , Do Not Advertise }
Payoffs: Companies’ Profits

Each firm earns $50 million from its customers
Advertising costs a firm $20 million
Advertising captures $30 million from competitor

How to represent this game?


Strategic Interactions
Smith
Louis
No Ad
Ad
No Ad
(50,50)
(20,60)
Ad
(60,20)
(30,30)
Best Responses

Best response for Louis:






If Smith advertises: advertise
If Smith does not advertise: advertise
The best response for Smith is the same.
(Ad, Ad) is a dominant strategy!
(Ad, Ad) is a NE!
This is another Prisoners’ Dilemma!
Smith
No Ad
Ad
No Ad
(50,50)
(20,60)
Ad
(60,20)
(30,30)
Louis
Nash Equilibrium


NE may be a pair of mixed strategies.
Example:
B
head
Tail
head
(1,-1)
(-1,1)
Tail
(-1,1)
(1,-1)
A
Matching Pennies
(1/2,1/2) is the Nash Equilibrium.
Existence of NE

Theorem (J. Nash, 1950s)
For a finite game, there exists at least one
Nash Equilibrium (Pure strategy, or mixed
strategy).
Nash Equilibrium

NE may not be a good solution of the game, it is
different from the optimal solution.
e.g.,
Smith
No Ad
Ad
No Ad
(50,50)
(20,60)
Ad
(60,20)
(30,30)
Louis
Nash Equilibrium

A game may have more than one NE.
e.g., The Battle of Sex
NE: (opera, opera), (football, football),
((2/3,1/3),(1/3, 2/3))
Husband
opera
football
opera
(2,1)
(0,0)
football
(0,0)
(1,2)
Wife
Nash Equilibrium

Zero sum games (two-person):
Saddle point is a solution
u ( s1* , s2* )  max min u ( s1 , s2 )  min maxu ( s1 , s2 )
s1S1 s2 S 2
s1*  arg max min u ( s1 , s2 )
s1S1 s2 S 2
s2*  arg min maxu ( s1 , s2 )
s2 S 2 s1S1
s2 S 2 s1S1
Nash Equilibrium

Many varieties of NE: Refined NE, Bayesian
NE, Sub-game Perfect NE, Perfect Bayesian
NE …

Finding NEs is very difficult.

NE can only tell us if the game reach such a
state, then no player has incentive to change
their strategies unilaterally. But NE can not
tell us how to reach such a state.
Iterated Prisoner’s Dilemma
Cooperation

Groups of organisms:



Mutual cooperation is of benefit to all agents
Lack of cooperation is harmful to them
Another types of cooperation:



Cooperating agents do well
Any one will do better if failing cooperate
Prisoner’s Dilemma is an elegant embodiment
Prisoner’s Dilemma

The story of prisoner’s dilemma
Player: two prisoners
Action: {Cooperation, Defecti}
Payoff matrix
Prisoner B
C
C
D
(3,3)
(0,5)
(5,0)
(1,1)
Prisoner A
D
Prisoner’s Dilemma



No matter what the other does, the best
choice is “D”.
(D,D) is a Nash Equilibrium.
But, if both choose “D”, both will do worse
than if both select “C”
Prisoner B
C
C
D
(3,3)
(0,5)
(5,0)
(1,1)
Prisoner A
D
Iterated Prisoner’s Dilemma

The individuals:




Meet many times
Can recognize a previous interactant
Remember the prior outcome
Strategy: specify the probability of
cooperation and defect based on the history


P(C)=f1(History)
P(D)=f2(History)
Strategies

Tit For Tat – cooperating on the first time, then repeat opponent's last choice.
Player A C D D C C C C C D D D D C…
Player B D D C C C C C D D D D C…
Strategies









Tit For Tat - cooperating on the first time, then repeat opponent's last choice.
Tit For Tat and Random - Repeat opponent's last choice skewed by random
setting.*
Tit For Two Tats and Random - Like Tit For Tat except that opponent must
make the same choice twice in a row before it is reciprocated. Choice is skewed
by random setting.*
Tit For Two Tats - Like Tit For Tat except that opponent must make the same
choice twice in row before it is reciprocated.
Naive Prober (Tit For Tat with Random Defection) - Repeat opponent's last
choice (ie Tit For Tat), but sometimes probe by defecting in lieu of cooperating.*
Remorseful Prober (Tit For Tat with Random Defection) - Repeat opponent's
last choice (ie Tit For Tat), but sometimes probe by defecting in lieu of
cooperating. If the opponent defects in response to probing, show remorse by
cooperating once.*
Naive Peace Maker (Tit For Tat with Random Co-operation) - Repeat
opponent's last choice (ie Tit For Tat), but sometimes make peace by cooperating in lieu of defecting.*
True Peace Maker (hybrid of Tit For Tat and Tit For Two Tats with Random
Cooperation) - Cooperate unless opponent defects twice in a row, then defect
once, but sometimes make peace by cooperating in lieu of defecting.*
Random - always set at 50% probability.
Strategies











Always Defect
Always Cooperate
Grudger (Co-operate, but only be a sucker once) - Cooperate until the
opponent defects. Then always defect unforgivingly.
Pavlov (repeat last choice if good outcome) - If 5 or 3 points scored in the last
round then repeat last choice.
Pavlov / Random (repeat last choice if good outcome and Random) - If 5 or 3
points scored in the last round then repeat last choice, but sometimes make
random choices.*
Adaptive - Starts with c,c,c,c,c,c,d,d,d,d,d and then takes choices which have
given the best average score re-calculated after every move.
Gradual - Cooperates until the opponent defects, in such case defects the total
number of times the opponent has defected during the game. Followed up by
two co-operations.
Suspicious Tit For Tat - As for Tit For Tat except begins by defecting.
Soft Grudger - Cooperates until the opponent defects, in such case opponent is
punished with d,d,d,d,c,c.
Customised strategy 1 - default setting is T=1, P=1, R=1, S=0, B=1, always
co-operate unless sucker (ie 0 points scored).
Customised strategy 2 - default setting is T=1, P=1, R=0, S=0, B=0, always
play alternating defect/cooperate.
Iterated Prisoner’s Dilemma


The same players repeat the prisoner’s dilemma many times.
After ten rounds




The best income is 50.
A real case is to get 30 for each player.
An extreme case is that each player selects “defection”, each player
can get 10.
The most possible case is that each player will play with a mixing
strategy of “defect” and “cooperate” .
Prisoner A
C
D
C
(3,3)
(0,5)
D
(5,0)
(1,1)
Prisoner B
Iterated Prisoner’s Dilemma

Which strategy can thrive/what is the good
strategy?
Robert Axelrod, 1980s

A computer round-robin tournament

AXELROD R. 1987. The evolution of strategies in the iterated Prisoners' Dilemma.
In L. Davis, editor, Genetic Algorithms and Simulated Annealing. Morgan Kaufmann, Los Altos, CA.
The first round

Strategies: 14 entries+ random strategy
Including Markov process + Bayesian inference



Each pair will meet each other, totally there
are 15*15 runs, each pair will play the game
200 times
Payoff: ∑S’ U(S,S’)/15
Tit For Tat wins (cooperation based on
reciprocity)
The first round
Naive Prober - Repeat opponent's last

Characters
choice but sometimes probe by defecting
ofin“good”
strategies
lieu of cooperating
Goodness: never defect first
TFT vs. Naive prober
Forgiveness: may revenge, but the memory is short.
TFT vs. Grudger
Grudger - Cooperate until the
opponent defects. Then always
defect unforgivingly
Winning Vs. High Scores




This is not a zero sum game, there is a
banker.
TFT never wins one game. The best result for
it is to get the same result as its opponent.
“Winning the game” is a kind of jealousness,
it does not work well
It is possible to arise “cooperation” in a
“selfish” group.
The second round



Strategies: 62 entries+ random strategy

“goodness” strategies

“wiliness: strategies
Tit For Tat wins again
“Win” or “lost” depends on the circumstance.
Characters of “good” strategies

Goodness: never defect first



First round: the first eight strategies with “goodness”
Second round: there are fourteen strategies with
“goodness” in the first fifteen strategies
Forgiveness: may revenge, but the memory is
short.



“Grudger” is not s strategy with “forgiveness”
“goodness” and “forgiveness” is a kind of collective
behavior.
For a single agent, defect is the best strategy.
Evolution of the Strategies

Evolve “good” strategies by genetic algorithm (GA)
What is a “good” strategy?



TFT is a good strategy?
Tit For Two Tats may be the best strategy in
the first round, but it is not a good strategy in
the second round.
“Good” strategy depends on the environment.
Tit
For Two Tats - Like Tit For Tat
except that opponent must make
the same choice twice in row
before
it
is
reciprocated.
Evolutionarily stable strategy
Evolutionarily stable strategy (ESS)


Introduced by John Maynard Smith and George R.
Price in 1973
ESS means evolutionarily stable strategy, that is “a
strategy such that, if all member of the population
adopt it, then no mutant strategy could invade the
population under the influence of natural selection.”
John Maynard Smith, “Evolution and the Theory of Games”

ESS is robust for evolution, it can not be invaded by
mutation.
Definition of ESS

A strategy x is an ESS if for all y, y  x, such
that
x U  ((1   ) x  y)  y U  ((1   ) x  y)
holds for small positiveε.
(1) u ( x, x)  u ( x, y), y
(2) u ( x, y)  u ( y, y), if u ( x, x)  u ( x, y), y  x
ESS

ESS is defined in a population with a large number
of individuals.

The individuals can not control the strategy, and
may not be aware the game they played

ESS is the result of natural selection

Like NE, ESS can only tell us it is robust to the
evolution, but it can not tell us how the population
reach such a state.
ESS in IPD





Tit For Tat can not be invaded by the wiliness
strategies, such as always defect.
TFT can be invaded by “goodness” strategies, such
as “always cooperate”, “Tit For Two Tats” and
“Suspicious Tit For Tat ”
Tit For Tat is not a strict ESS.
“Always Cooperate” can be invaded by “Always
Defect”.
“Always Defect ” is an ESS.
references

Drew Fudenberg, Jean Tirole, Game Theory, The
MIT Press, 1991.

AXELROD R. 1987. The evolution of strategies in
the iterated Prisoners' Dilemma. In L. Davis, editor,
Genetic Algorithms and Simulated Annealing.
Morgan Kaufmann, Los Altos, CA.

Richard Dawkins, The Selfish Gene, Oxford
University Press.
Concluding Remarks
 Tip
Of Game theory
Basic Concepts
 Nash Equilibrium
 Iterated Prisoner’s Dilemma
 Evolutionarily Stable Strategy

Concluding Remarks

Many interesting topics deserve to be
studied and further investigated:






Cooperative games
Incomplete information games
Dynamic games
Combinatorial games
Learning in games
….
Thank you!