Non-Cooperative Game Theory

Download Report

Transcript Non-Cooperative Game Theory

Non-Cooperative Game Theory
• To define a game, you need to know three
things:
– The set of players
– The strategy sets of the players (i.e., the actions
they can take)
– The payoff functions
Game A
Strategy sets for
each player
Players
Column Player 
Red
Black
Red
2,2
5,0
Black
0,5
3,3
Row Player 
Payoff to Row
Payoff to Column
Payoffs
for each
player, for
each
possible
outcome
Game B
Column Player 
Red
Black
Red
4,4
0,1
Black
1,0
1,1
Row Player 
• What happened in these two games?
– What were the strategies?
– What were the outcomes?
• Why did we get these outcomes?
• Should we have expected these outcomes?
• In other words -- How do we solve these
games?
Solving Games
• We are looking for the equilibrium.
• What is equilibrium?
– Equilibrium is a strategy combination
where no one player has an incentive to
change her strategy given the strategies of
the other players.
• Huh?
Game A
Column Player 
Red
Black
Red
2,2
5,0
Black
0,5
3,3
Row Player 
Nash Equilibrium (NE)
• Formally, a set of strategies forms a
NE if, for every player i,
i(si, s-i)  i(si*, s-i).
• Note that the equilibrium is defined
in terms of strategies, not payoffs.
• Why is this a solution? Because it’s
a rest point - no incentive for one
player to change unilaterally.
How Do We Find NE?
• Elimination of Dominated Strategies.
• A player has a dominated strategy if there is
one action/strategy which always provides a
lower payoff than another strategy, no
matter what other players do.
• If you cross off all dominated strategies,
sometimes you are left with only NE.
Game A
Column Player 
Red
Black
Red
2,2
5,0
Black
0,5
3,3
Row Player 
Repeated elimination can find the NE
Left
Center
Right
Top
3,2
5,4
4,3
Middle
1,6
4,2
2,5
Bottom
1,3
6,3
5,4
• Elimination of dominated strategies only
works if the strategies are strictly dominated
– Always worse, not just equal to or worse
Left
Center
Right
Top
3,2
5,4
4,3
Middle
3,6
4,2
2,5
Bottom
1,3
6,3
5,4
Sometimes there aren’t dominated strategies
so you have to check for NE cell by cell
“Battle of the
Sexes”
Scream
The Beach
Scream
2,1
0,0
The
Beach
0,0
1,2
Sometimes there aren’t any NE
“Evens and Odds”
1 finger
2 fingers
1 finger
1,-1
-1,1
2 fingers
-1,1
1,-1
We can use the “Normal” or matrix form if:
– There are only 2 (sometimes 3) players
– There are a finite number of strategies
– Actions approximately simultaneous
If actions are sequential, must use another
form, the “Extensive” form:
– Still only really feasible for 2 or 3 players,
although can accommodate “chance”
– Still must have finite number of strategies
Extensive Form Games
• Use a game “tree” to depict the order in which
players make decisions and the choices that they
have at each decision point.
• Decision points are called “nodes”.
• Players’ strategies or choices branch off from
each decision node.
• At the end of each branch on the game tree are
the payoffs the players would receive if that
branch were the path followed.
US vs. Saudi Arabia Oil “Game”
US
Quota
Nothing
Tariff
R
N
R
90,80 100,60 75,50
R
N
100,60
N
40,80 50,100
Saudi
Arabia
Solving Extensive Form Games
• Nash Equilibrium has the same meaning in
extensive form games as in normal form games.
• There is also another solution concept in
extensive form games, the Subgame Perfect
Equilibrium (SPE) strategy which has some
advantages over Nash Equilibrium.
US vs. Saudi Arabia Oil “Game”
US
Quota
Nothing
Tariff
R
N
R
90,80 100,60 75,50
R
N
100,60
N
Saudi
Arabia
40,80 50,100
Subgame = part of larger game that can stand alone as a game itself.
Sub-Game Perfect Equilibrium
• A subgame can be defined for any node other
than a terminal (payoff) node, and includes all
of the subsequent “branches” of the tree that
emanate from that node.
• For a strategy to be a Subgame Perfect
Equilibrium (SPE) strategy, it can only contain
actions that are optimal for their respective
subgames.
US
Quota
Nothing
Tariff
R
N
R
90,80 100,60 75,50
R
N
N
100,60 40,80 50,100
Saudi
Arabia
To find all of the Subgame Perfect Equilibria:
• For each subgame, determine the optimal strategy.
• Find the optimal strategy for the “pruned” tree.
US
Quota
Nothing
Tariff
R
N
R
90,80 100,60 75,50
R
N
100,60
N
40,80 50,100
Saudi
Arabia
Compare Subgame Perfect Equilibria (SPE) to NE:
NE can include incredible threats, along as unilateral
changes are not optimal.
Example:Quota; R if Quota or Tariff, N if Nothing
US
Quota
Nothing
Tariff
R
N
R
90,80 100,60 75,50
R
N
100,60
N
40,80 50,100
Saudi
Arabia
Another Example
Entrant
Stay Out
Enter
Low P
Low P
High P
Incumbent
High P
2,2
-1,0
0,5
0,0
•Find optimal strategy for each subgame (prune the tree).
•Find Entrant’s optimal action.
Repeated Games
• In repeated games, strategies are much richer.
• In a one-shot Prisoner’s Dilemma game, players can either
cooperate or defect.
• In a repeated game, players choose whether to cooperate or
defect each period.
• Players can have strategies that are contingent on the other
player's actions.
– Cooperate if the other player cooperated last period.
– Defect if the other player has ever defected.
• Note: In repeated games, must discount future payoffs.
– (1/(1+r))t = t is the discount factor for period t.
Solving Repeated Games
• If the game has a finite horizon (that is, it ends after a
specified number of rounds), you use backwards induction.
– Start by finding the optimal strategy in the last period
– Move to the next to the last period, and find the optimal
strategy, recognizing the effects on the final round.
• If the game has an infinite horizon, you can't use
backwards induction because there is no last period.
• To solve infinite horizon games, you check different
strategies to see if they meet the requirements of
equilibrium.
– For each player, changing strategies unilaterally will
not make the player better off.
Solving Repeated Prisoner’s Dilemma Games
• For finite horizon repeated PD, use backwards induction.
– In the last period, always optimal to defect
– If your action in the next-to-the-last period does not
affect the optimal strategy in the last period, you do
better by defecting in the next to the last period
– And so on….
• For finite horizon repeated PD, collusion is never optimal.
Solving Repeated Prisoner’s Dilemma Games
• For infinite horizon repeated PD, consider different
strategies.
• “Grim Trigger” strategy:
– Cooperate as long as other player cooperates, but once
he defects, defect forever.
– His defection “triggers” the punishment.
– “Grim” because punishment lasts forever.
• To check if there is a symmetric equilibrium with trigger
strategies:
– Make sure that cooperating is better than defecting if
other player has cooperated.
– Make sure that “punishment” is a credible threat, that
you will actually go through with it.
Prisoner’s Dilemma
Column Player 
Defect
Cooperate
Defect
2,2
5,0
Cooperate
0,5
3,3
Row Player 
When Are Trigger Strategies are NE?
• Assume other player also using a trigger strategy.
• If neither has defected, both cooperate this period.
• If you follow the trigger strategy, i.e. cooperate, you get C
this period (the payoff from cooperation) and you get C
each period in the future.
– An infinite stream of payments of C can be written as
1/(1-)* C.
• If you defect, you get D this period (the increased payoff
from unilateral defection) but in all future periods you get
P (the punishment payoff level)
– Total earnings thus are D + /(1-)* P.
• Thus following the strategy is optimal if:
1/(1-)* C > D + /(1-)* P.
When Are Trigger Strategies are NE, con’t?
• The condition 1/(1-)* C > D + /(1-)* P can be
rewritten as:
 > (D - C ) / (D - P)
• So the discount factor, , must be sufficiently large for
collusion to be sustainable.
• How do we interpret this?
– A high discount factor means that payoffs in the future
are relatively important.
– You are willing to forsake immediate, but transitory
gains from defection for higher payoffs in the future.
When Are Trigger Strategies are NE, con’t?
• Is punishment a credible threat?
• Once again, assume other player also using a trigger
strategy.
• If either has defected, both will punish this period.
• If you follow the trigger strategy, i.e. punish, you get P
this period and you get P each period in the future.
• If you don’t punish, you will get a lower payoff, since
defecting is a best response to other players playing
defecting.
• Therefore the punishment is a credible threat.