Transcript slides2
Extensive-Form Game Abstraction with Bounds Outline •Motivation •Abstractions of extensive-form games •Theoretical guarantees on abstraction quality •Computing abstractions • Hardness • Algorithms •Experiments Real game Abstracted game (automated) Abstraction Equilibrium-finding algorithm ε-Nash equilibrium Map to real game Nash equilibrium Why is game abstraction difficult? •Abstraction pathologies • Sometimes, refining an abstraction can be worse • Every equilibrium can be worse •Tuomas presented numerous papers on game abstraction without solution quality bounds •Lossless abstractions are too large •Particularly difficult to analyze in extensive-form games • Information sets cut across subtrees • A player might be best responding to nodes in different part of game tree Extensive-form games - definition •Game tree. 1 3 •Branches denote actions. •Information sets. •Payoff at leaves. •Perfect recall. • Players remember past actions. 2 3 C P1 P1 l 0 •Imperfect recall. • Players might forget past actions. e 8 r L P2 P2 R 1.5 c d c d P1 P1 -3 P1 f e 0 0 f e f -6 0 -6 Counterfactual value •Defined for each information set 𝐼. 1 3 •Expected value from information set. • Assumes that player plays to reach 𝐼. • Rescales by probability of reaching 𝐼. P1 P1 l •Example: • Uniform distribution everywhere. • Bottom information set. 0 1 • Reach probability: 6. • Conditional node distribution: 1 1 1 1 1 1 , 2 2 • 𝑉 𝐼 = 2 ∗ 2 ∗ 8 − 2 ∗ 2 ∗ 6 = 0.5. 2 3 C . e 8 r L P2 P2 c d c P1 P1 -3 f e 0 0 f -6 R 1.5 d 6 Regret •Defined for each action 𝑎. 1 3 •Change in expected value when taking 𝑎. • Holds everything else constant. • • Uniform distribution everywhere. Bottom information set. • 𝑉 𝐼 = 2 ∗ 2 ∗ 8 − 2 ∗ 2 ∗ 6 = 0.5. • • P1 P1 l •Example: 1 1 1 0 1 To get 𝑟(𝑒) set 𝜎1′ 𝑒 = 1. 𝑟 𝑒 = 𝑉 𝑒 𝐼 − 𝑉 𝐼 = 3.5. 2 3 C e 8 r L P2 P2 c d c P1 P1 -3 f e 0 0 f -6 R 1.5 d 6 Abstraction •Goal: reduce number of decision variables, maintain low regret 1 3 •Method: merge information sets/remove actions •Constraints: • Must define bijection between nodes • Nodes in bijection must have same ancestors sequences over other players • Must have same descendant sequences over all players P1 P1 r L P2 P2 l 0 c •Might create imperfect recall • Worse bounds 2 3 C e P1 8 f e 0 0 1.5 c d P1 R d P1 P1 f e -6 9 f e 0 0 f -7 Abstraction payoff error •Quantify error in abstraction. 1 3 •Measure similarity of abstraction and full game. •Based on bijection. P1 •Maximum difference between leaf nodes: • First mapping: 1 • Second mapping: 2 P1 r L P2 P2 l 0 c •Formally: • Leaf node: 𝜖 𝑅 𝑠 = |𝑢 𝑠 − 𝑢(𝑠 ′ )| • Player node: 𝜖 𝑅 𝑠 = max 𝜖 𝑅 (𝑐) 𝑐𝑝 𝑐 𝜖 𝑅 (𝑐) e P1 8 f e 0 0 R 1.5 c d P1 𝑐 • Nature node: 𝜖 𝑅 𝑠 = 2 3 C d P1 P1 f e -6 9 f e 0 0 f -8 Abstraction chance node error •Quantify error in abstraction. P2 •Measure similarity of abstraction and full game. •Based on bijection. P1 •Maximum difference between leaf nodes: • First mapping: 1 • Second mapping: 2 P1 r l 0 C 2 3 1 3 •Formally: L R C 1.5 1 3 2 3 • Player node: 𝜖 0 𝑠 = max 𝜖 0 (𝑐) 𝑐 • Nature node: 𝜖0 𝑠 = 𝑐 |𝑝 P1 P1 𝑐 − 𝑝 (𝑐′)| e 8 f e 0 0 P1 P1 f e -6 9 f e 0 0 f -8 Abstraction chance distribution error •Quantify error in abstraction. P2 •Measure similarity of abstraction and full game. •Based on bijection. •Maximum difference between leaf nodes: • First mapping: 1 • Second mapping: 2 P1 P1 r l 0 C 2 3 1 3 •Formally: • Infoset node: 𝜖 0 𝑠 = |𝑝 𝑠 𝐼 − 𝑝(𝑠 ′ |𝐼 ′ ) • Infoset: 𝜖 0 𝐼 = 𝑠∈𝐼 𝜖 0 (𝑠) e 8 f e 0 0 R C 1.5 1 3 2 3 P1 P1 L P1 P1 f e -6 9 f e 0 0 f -8 Bounds on abstraction quality Given: ◦ Original perfect-recall game ◦ Abstraction that satisfies our constraints ◦ Abstraction strategy with bounded regret on each action We get: ◦ An 𝜖-Nash equilibrium in full game ◦ Perfect recall abstraction error for player 𝑖: ◦ 2𝜖𝑖𝑅 + 0 𝑗∈ℋ0 𝜖𝑗 𝑊 + 0 𝑗∈ℋ𝑖 2𝜖𝑗 𝑊 + 𝑟𝑖 ◦ Imperfect-recall abstraction error: ◦ Same as for perfect recall, but ℋ𝑖 times ◦ Linearly worse in game depth Complexity and structure Complexity: ◦ NP-hard to minimize our bound (both for perfect and imperfect recall) ◦ Determining whether two trees are topologically mappable is graph isomorphism complete Decomposition: ◦ Level-by-level is, in general, impossible ◦ There might be no legal abstractions identifiable through only single-level abstraction Algorithms Single-level abstraction: ◦ ◦ ◦ ◦ ◦ Assumes set of legal-to-merge information sets Equivalent to clustering with weird objective function Forms a metric space Immediately yields 2-approximation algorithm for chance-free abstraction Chance-only abstractions gives new objective function not considered in clustering literature ◦ Weighted sum over elements, with each taking the maximum intra-cluster distance Integer programming for whole tree: ◦ Variables represent merging nodes and/or information sets ◦ #variables quadratic in tree size Perfect recall IP experiments • 5 cards • • • • 2 kings 2 jacks 1 queen Limit hold’em • • • • • 2 players 1 private card dealt to each 1 public card dealt Betting after cards are dealt in each round 2 raises per round Signal tree • Tree representing nature actions that are independent of player actions • Actions available to players must be independent of these • Abstraction of signal tree leads to valid abstraction of full game tree Experiments that minimize tree size Experiments that minimize bound Imperfect-recall single-level experiments Game: ◦ ◦ ◦ ◦ Die-roll poker. Poker-like game that uses dice. Correlated die rolls (e.g. P1 rolls a 3, then P2 is more likely to roll a number close to 3). Game order: ◦ Each player rolls a private 4-sided die. ◦ Betting happens. ◦ Each player rolls a second private 4-sided die. ◦ Another round of betting. ◦ Model games where players get individual noisy and/or shared imperfect signals. Experimental setup Abstraction: ◦ Compute bound-minimizing abstraction of the second round of die rolls. ◦ Relies on integer-programming formulation. ◦ Apply counterfactual regret minimization (CFR) algorithm. ◦ Gives solution with bounded regret on each action. ◦ Compute actual regret in full game. ◦ Compare to bound from our theoretical result. Imperfect-recall experiments Comparison to prior results Lanctot, Gibson, Burch, Zinkevich, and Bowling. ICML12 ◦ ◦ ◦ ◦ ◦ ◦ Bounds also for imperfect-recall abstractions Only for CFR algorithm Allow only utility error Utility error exponentially worse (𝑂(𝑏ℎ ) vs. 𝑂(ℎ)) Do not take chance weights into account Very nice experiments for utility-error only case Kroer and Sandholm. EC14 ◦ Bounds only for perfect-recall abstractions ◦ Do not have linear dependence on height Imperfect-recall work builds on both papers ◦ The model of abstraction is an extension of ICML12 paper ◦ Analysis uses techniques from EC14 paper ◦ Our experiments are for the utility+chance outcome error case