Transcript slides1
Convex Optimization for Sequential Game Solving Overview •Sequence-form transformation •Bilinear saddle-point problems •EGT/Mirror prox •Smoothing techniques for sequential games •Sampling techniques •Some experimental results 1 3 Extensive-form games •Information sets •Payoff at leaves 2 3 P1 •Game tree •Branches denote actions C P1 l 0 •Nash equilibrium: r L R P2 P2 1.5 c d c d P1 P1 -3 6 • min max 𝑓(𝑠1 , 𝑠2 ) 𝑠1 𝑠2 e f e f 6 0 0 -6 1 3 Behavioral strategies •Specify a distribution over actions at each information set •Strategy: C 2 3 P1 P1 l r L R P2 P2 1.5 pl + pr = 1 pL + pR = 1 pe + p f = 1 0 pa ³ 0 •Utility for a player: • Probability-weighted sum over leaf nodes • Not linear, even when other player is held fixed • Example: c d c d P1 P1 -3 6 e f e f 6 0 0 -6 1 3 Sequence form •Information-set probabilities sum to probability of players’ last sequence. 2 3 P1 •Technique for obtaining a linear formulation. •Exploits perfect recall property. C P1 l 0 r L R P2 P2 1.5 c d c d P1 P1 -3 6 e f e f 6 0 0 -6 1 3 Sequence form C 2 3 P1 •P1 constraints: • 𝑥𝑙 + 𝑥𝑟 = 1 • 𝑥𝐿 + 𝑥𝑅 = 1 • 𝑥𝑒 + 𝑥𝑓 = 𝑥𝑟 P1 l 0 •P2 constraints: • 𝑦𝑐 + 𝑦𝑑 = 1 r L R P2 P2 1.5 c d c d P1 P1 -3 6 •Utility: • Weighted sum over leaves. • Weights: 1 𝑦𝑥 3 𝑐 𝑒 ∗6 • Recovering behavioral strategy: • 𝑝 𝑒 = e f e f 6 0 0 -6 𝑥𝑒 𝑥𝑟 Sequence form •Sequence-form strategy spaces: • 𝑋 for P1. • 𝑌 for P2. • Specified by linear equalities, so both are convex polytopes. •Utility matrix 𝐴: • Encodes utility function for P2. • Non-zero entries correspond to sequences preceding leaf nodes. •Optimization problem: • min max 𝑥 𝑇 𝐴𝑦 𝑥∈𝑋 𝑦∈𝑌 1 3 Utility matrix 𝐴 𝒚∅ 𝒚𝒄 𝒚𝒅 𝒙∅ 0 0 0 𝒙𝒍 0 0 0 𝒙𝒓 0 0 0 𝒙𝑳 0 −3 6 𝒙𝑹 1.5 0 0 𝒙𝒆 0 6 0 𝒙𝒇 0 0 −6 min max 𝑥 𝑇 𝐴𝑦 𝑥∈𝑋 𝑦∈𝑌 C 2 3 P1 P1 l 0 r L R P2 P2 1.5 c d c d P1 P1 -3 6 e f e f 6 0 0 -6 Equilibrium-finding algorithms •Our objective: • min max 𝑥 𝑇 𝐴𝑦 𝑥∈𝑋 𝑦∈𝑌 •Solvable by linear programming • Best method when possible • LP often won’t fit in memory •Iterative 𝜖-Nash equilibrium methods are preferred in practice • Regret-based methods such as counter-factual regret minimization (CFR) • First-order methods (FOMs), we will focus on these Bilinear saddle-point formulation Bilinear saddle-point formulation Sequence-form problem: ◦ min max 𝑥 𝑇 𝐴𝑦 𝑥∈𝑋 𝑦∈𝑌 From the perspective of a single player: ◦ min 𝑓 𝑥 , 𝑓 𝑥 = max 𝑥 𝑇 𝐴𝑦 𝑥 𝑦 ◦ max 𝜙 𝑦 , 𝜙 𝑦 = min 𝑥 𝑇 𝐴𝑦 𝑦 𝑥 𝑓 ⋅ is convex, 𝜙(⋅) is concave ◦ Could apply e.g. subgradient descent ◦ But can we do better? Smoothed function approximation Nonsmooth functions: ◦ 𝑓 𝑥 = max 𝑥 𝑇 𝐴𝑦 𝑦 ◦ 𝜙 𝑦 = min 𝑥 𝑇 𝐴𝑦 𝑥 Add smooth term: ◦ 𝑓 𝑥 = max 𝑥 𝑇 𝐴𝑦 − 𝜇𝑌 𝑑𝑌 (𝑦) 𝑦 ◦ 𝜙 𝑦 = min 𝑥 𝑇 𝐴𝑦 + 𝜇𝑋 𝑑𝑋 (𝑥) 𝑥 ◦ 𝑑𝑌 , 𝑑𝑋 are differentiable and strongly convex on 𝑌, 𝑋, called distance-generating functions (DGF) ◦ 𝜇𝑌 , 𝜇𝑋 are smoothing parameters Conditions on DGFs Differentiable on 𝑋 Strongly convex modulus 1: 1 2 ◦ 𝑑 𝛼𝑥 + 1 − 𝛼 𝑥 ′ ≤ 𝛼𝑑 𝑥 + 1 − 𝛼 𝑑 𝑥 ′ − 𝛼 1 − 𝛼 𝑥 − 𝑥 ′ Equivalently for differentiable functions: 1 ◦ 𝑑 𝑥′ ≥ 𝑑 𝑥 + 𝛻𝑑 𝑥 , 𝑥 ′ − 𝑥 + 2 𝑥 ′ − 𝑥 2 , ∀𝑥, 𝑥 ′ Equivalently for twice-differentiable functions: ◦ ℎ𝑇 𝛻 2 𝑑 𝑥 ℎ ≥ ‖ℎ‖2 , ∀𝑥 ∈ 𝑋, ℎ ∈ ℝ𝑛 ∈𝑋 2 , ∀𝑥, 𝑥 ′ ∈𝑋 Example smoothing function •Example: 𝑋 is the n-dimensional simplex • Δ𝑛 = 𝑥: |𝑥|1 = 1, 𝑥 ≥ 0 . •Negative entropy function is a DGF for this • 𝑑𝑋 𝑥 = • ℎ𝑖2 𝑖𝑥 𝑖 = 𝑖 𝑥𝑖 𝑖 𝑥𝑖 log 𝑥𝑖 . ℎ𝑖2 𝑖𝑥 𝑖 ≥ •Diameter Ω = ln 𝑛. •Strong convexity 𝜎 = 1. |ℎ𝑖 | 𝑖 𝑥 𝑖 2 𝑥𝑖 = |ℎ|12 . Effect of smoothing −2 1 𝑓 𝑥 = 𝑚𝑎𝑥𝑦 𝑦 ⋅ −0.1 ⋅ 𝑥 − 𝑦 ⋅ 0.4 2 −0.9 −2 1 𝑓 𝑥 = 𝑚𝑎𝑥𝑦 𝑦 ⋅ −0.1 ⋅ 𝑥 + 𝑦 ⋅ 0.4 − 𝑤 𝑦 2 −0.9 Algorithms Excessive gap technique (EGT) and Mirror prox: ◦ Gradient descent-like algorithms on smoothed 𝑓, 𝜙 functions ◦ Assume access to smoothing functions 𝑑𝑋 , 𝑑𝑌 ◦ EGT Rate: 4‖𝐴‖ 𝑇 𝐷𝑋 𝐷𝑌 𝜎𝑋 𝜎𝑌 ◦ ‖𝐴‖ is a condition on the game ◦ 𝐷𝑋 is a measure of size of 𝑋: 𝐷𝑋 = max 𝑑𝑋 𝑥 − 𝑑𝑋 (𝑥 ′ ) 𝑥,𝑥′ ◦ 𝜎𝑋 is the strong convexity parameter of 𝑑𝑋 ◦ Mirror prox similar ◦ Linear in 𝑇, CFR was 𝑇 Smoothing for sequential games Strategy space polytopes 𝑋, 𝑌 are more complex than simplex. Tree of simplexes, scaled by parent variable. Distance-generating function for treeplexes Use entropy function on each individual simplex For each simplex 𝑗, dilate it by the parent sequence: ◦ 𝑑𝑗 𝑥 = 𝑥𝑝𝑗 𝑥𝑖 𝑖𝑥 𝑝𝑗 log 𝑥𝑖 ,𝑥 𝑥𝑝𝑗 ∈ Δ𝑗 Take weighted sum over simplexes to get d.g.f. for treeplex 𝑄: ◦ 𝑑 𝑥 = 𝑗 𝛽𝑗 𝑥𝑗 (𝑥 𝑗) , 𝑥 ∈ 𝑋, 𝑥 𝑗 ∈ Δ𝑗 Weights 𝛽𝑗 = 𝑀2𝑑𝑗 𝑀𝑗 , ◦ 𝑑𝑗 is depth under simplex 𝑗 ◦ 𝑀𝑗 is maximum value of 𝑙1 norm over subtree at simplex 𝑗 The dilated entropy function with these weights is strongly convex over any treeplex 𝑋 Sampling EGT and Mirror prox require gradient estimator ◦ 𝛻𝑓𝜇𝑋 𝑥 = 𝐴𝑦 ◦ 𝛻𝜙𝜇𝑌 𝑦 = 𝐴𝑇 𝑥 𝑇 ◦ Requires traversing entire game tree Instead, we can sample at a subset of (or possibly all) information sets to get estimate 𝜉 of 𝐴𝑦 ◦ Apply stochastic mirror prox algorithm ◦ Optimally trades off variance-based 1 𝑇 1 runtime and normal runtime ◦ Requires 𝐸 𝜉 = 𝐴𝑦 ◦ Obtained by sampling according to current 𝑦 𝑇 1 3 Sampled gradient 𝜉 1 4 •P2 strategy: 𝑦𝑐 = , 𝑦𝑑 = 3 4 •Sample e.g. nature and opponent •Example for C going right, and P2 choosing 𝑑: 𝝃∅ 0 𝝃𝒍 0 𝝃𝒓 0 𝝃𝑳 6 𝝃𝑹 1.5 𝝃𝒆 0 𝝃𝒇 0 C 2 3 P1 P1 l 0 r L R P2 P2 1.5 c d c d P1 P1 -3 6 e f e f 6 0 0 -6 Experiments - game Leduc hold’em. ◦ ◦ ◦ ◦ ◦ ◦ Simplified limit Texas hold’em game. Deck has k unique cards, with two copies of each card. Each player is dealt a single private card. One community card is dealt. A betting round occurs before and after the community card is dealt. Each betting round allows up to three raises. Experiments