Transcript slides2

Extensive-Form Game
Abstraction with Bounds
Outline
•Motivation
•Abstractions of extensive-form games
•Theoretical guarantees on abstraction quality
•Computing abstractions
• Hardness
• Algorithms
•Experiments
Real game
Abstracted game
(automated) Abstraction
Equilibrium-finding
algorithm
ε-Nash equilibrium
Map to real game
Nash equilibrium
Why is game abstraction difficult?
•Abstraction pathologies
• Sometimes, refining an abstraction can be worse
• Every equilibrium can be worse
•Tuomas presented numerous papers on game abstraction without solution quality bounds
•Lossless abstractions are too large
•Particularly difficult to analyze in extensive-form games
• Information sets cut across subtrees
• A player might be best responding to nodes in different part of game tree
Extensive-form games - definition
•Game tree.
1
3
•Branches denote actions.
•Information sets.
•Payoff at leaves.
•Perfect recall.
• Players remember past actions.
2
3
C
P1
P1
l
0
•Imperfect recall.
• Players might forget past actions.
e
8
r
L
P2
P2
R
1.5
c
d
c
d
P1
P1
-3
P1
f e
0
0
f
e
f
-6
0
-6
Counterfactual value
•Defined for each information set 𝐼.
1
3
•Expected value from information set.
• Assumes that player plays to reach 𝐼.
• Rescales by probability of reaching 𝐼.
P1
P1
l
•Example:
• Uniform distribution everywhere.
• Bottom information set.
0
1
• Reach probability: 6.
• Conditional node distribution:
1
1
1
1
1 1
,
2 2
• 𝑉 𝐼 = 2 ∗ 2 ∗ 8 − 2 ∗ 2 ∗ 6 = 0.5.
2
3
C
.
e
8
r
L
P2
P2
c
d
c
P1
P1
-3
f e
0
0
f
-6
R
1.5
d
6
Regret
•Defined for each action 𝑎.
1
3
•Change in expected value when taking 𝑎.
•
Holds everything else constant.
•
•
Uniform distribution everywhere.
Bottom information set.
•
𝑉 𝐼 = 2 ∗ 2 ∗ 8 − 2 ∗ 2 ∗ 6 = 0.5.
•
•
P1
P1
l
•Example:
1
1
1
0
1
To get 𝑟(𝑒) set 𝜎1′ 𝑒 = 1.
𝑟 𝑒 = 𝑉 𝑒 𝐼 − 𝑉 𝐼 = 3.5.
2
3
C
e
8
r
L
P2
P2
c
d
c
P1
P1
-3
f e
0
0
f
-6
R
1.5
d
6
Abstraction
•Goal: reduce number of decision variables,
maintain low regret
1
3
•Method: merge information sets/remove actions
•Constraints:
• Must define bijection between nodes
• Nodes in bijection must have same ancestors
sequences over other players
• Must have same descendant sequences over all
players
P1
P1
r
L
P2
P2
l
0
c
•Might create imperfect recall
• Worse bounds
2
3
C
e
P1
8
f e
0
0
1.5
c
d
P1
R
d
P1
P1
f
e
-6
9
f e
0
0
f
-7
Abstraction payoff error
•Quantify error in abstraction.
1
3
•Measure similarity of abstraction and full game.
•Based on bijection.
P1
•Maximum difference between leaf nodes:
• First mapping: 1
• Second mapping: 2
P1
r
L
P2
P2
l
0
c
•Formally:
• Leaf node: 𝜖 𝑅 𝑠 = |𝑢 𝑠 − 𝑢(𝑠 ′ )|
• Player node: 𝜖 𝑅 𝑠 = max 𝜖 𝑅 (𝑐)
𝑐𝑝
𝑐 𝜖 𝑅 (𝑐)
e
P1
8
f e
0
0
R
1.5
c
d
P1
𝑐
• Nature node: 𝜖 𝑅 𝑠 =
2
3
C
d
P1
P1
f
e
-6
9
f e
0
0
f
-8
Abstraction chance node error
•Quantify error in abstraction.
P2
•Measure similarity of abstraction and full game.
•Based on bijection.
P1
•Maximum difference between leaf nodes:
• First mapping: 1
• Second mapping: 2
P1
r
l
0
C
2
3
1
3
•Formally:
L
R
C
1.5
1
3
2
3
• Player node: 𝜖 0 𝑠 = max 𝜖 0 (𝑐)
𝑐
• Nature node:
𝜖0
𝑠 =
𝑐 |𝑝
P1
P1
𝑐 − 𝑝 (𝑐′)|
e
8
f e
0
0
P1
P1
f
e
-6
9
f e
0
0
f
-8
Abstraction chance distribution error
•Quantify error in abstraction.
P2
•Measure similarity of abstraction and full game.
•Based on bijection.
•Maximum difference between leaf nodes:
• First mapping: 1
• Second mapping: 2
P1
P1
r
l
0
C
2
3
1
3
•Formally:
• Infoset node: 𝜖 0 𝑠 = |𝑝 𝑠 𝐼 − 𝑝(𝑠 ′ |𝐼 ′ )
• Infoset: 𝜖 0 𝐼 = 𝑠∈𝐼 𝜖 0 (𝑠)
e
8
f e
0
0
R
C
1.5
1
3
2
3
P1
P1
L
P1
P1
f
e
-6
9
f e
0
0
f
-8
Bounds on abstraction quality
Given:
◦ Original perfect-recall game
◦ Abstraction that satisfies our constraints
◦ Abstraction strategy with bounded regret on each action
We get:
◦ An 𝜖-Nash equilibrium in full game
◦ Perfect recall abstraction error for player 𝑖:
◦ 2𝜖𝑖𝑅 +
0
𝑗∈ℋ0 𝜖𝑗 𝑊
+
0
𝑗∈ℋ𝑖 2𝜖𝑗 𝑊
+ 𝑟𝑖
◦ Imperfect-recall abstraction error:
◦ Same as for perfect recall, but ℋ𝑖 times
◦ Linearly worse in game depth
Complexity and structure
Complexity:
◦ NP-hard to minimize our bound (both for perfect and imperfect recall)
◦ Determining whether two trees are topologically mappable is graph isomorphism complete
Decomposition:
◦ Level-by-level is, in general, impossible
◦ There might be no legal abstractions identifiable through only single-level abstraction
Algorithms
Single-level abstraction:
◦
◦
◦
◦
◦
Assumes set of legal-to-merge information sets
Equivalent to clustering with weird objective function
Forms a metric space
Immediately yields 2-approximation algorithm for chance-free abstraction
Chance-only abstractions gives new objective function not considered in clustering literature
◦ Weighted sum over elements, with each taking the maximum intra-cluster distance
Integer programming for whole tree:
◦ Variables represent merging nodes and/or information sets
◦ #variables quadratic in tree size
Perfect recall IP experiments
•
5 cards
•
•
•
•
2 kings
2 jacks
1 queen
Limit hold’em
•
•
•
•
•
2 players
1 private card dealt to each
1 public card dealt
Betting after cards are dealt in each round
2 raises per round
Signal tree
•
Tree representing nature actions that are independent of player actions
•
Actions available to players must be independent of these
•
Abstraction of signal tree leads to valid abstraction of full game tree
Experiments that minimize tree size
Experiments that minimize bound
Imperfect-recall single-level experiments
Game:
◦
◦
◦
◦
Die-roll poker.
Poker-like game that uses dice.
Correlated die rolls (e.g. P1 rolls a 3, then P2 is more likely to roll a number close to 3).
Game order:
◦ Each player rolls a private 4-sided die.
◦ Betting happens.
◦ Each player rolls a second private 4-sided die.
◦ Another round of betting.
◦ Model games where players get individual noisy and/or shared imperfect signals.
Experimental setup
Abstraction:
◦ Compute bound-minimizing abstraction of the second round of die rolls.
◦ Relies on integer-programming formulation.
◦ Apply counterfactual regret minimization (CFR) algorithm.
◦ Gives solution with bounded regret on each action.
◦ Compute actual regret in full game.
◦ Compare to bound from our theoretical result.
Imperfect-recall experiments
Comparison to prior results
Lanctot, Gibson, Burch, Zinkevich, and Bowling. ICML12
◦
◦
◦
◦
◦
◦
Bounds also for imperfect-recall abstractions
Only for CFR algorithm
Allow only utility error
Utility error exponentially worse (𝑂(𝑏ℎ ) vs. 𝑂(ℎ))
Do not take chance weights into account
Very nice experiments for utility-error only case
Kroer and Sandholm. EC14
◦ Bounds only for perfect-recall abstractions
◦ Do not have linear dependence on height
Imperfect-recall work builds on both papers
◦ The model of abstraction is an extension of ICML12 paper
◦ Analysis uses techniques from EC14 paper
◦ Our experiments are for the utility+chance outcome error case