slides1

Transcript slides1

Convex Optimization for
Sequential Game Solving
Overview
•Sequence-form transformation
•Bilinear saddle-point problems
•EGT/Mirror prox
•Smoothing techniques for sequential games
•Sampling techniques
•Some experimental results
1
3
Extensive-form games
•Information sets
•Payoff at leaves
2
3
P1
•Game tree
•Branches denote actions
C
P1
l
0
•Nash equilibrium:
r
L
R
P2
P2
1.5
c
d
c
d
P1
P1
-3
6
• min max 𝑓(𝑠1 , 𝑠2 )
𝑠1
𝑠2
e
f
e
f
6
0
0
-6
1
3
Behavioral strategies
•Specify a distribution over actions at each
information set
•Strategy:
C
2
3
P1
P1
l
r
L
R
P2
P2
1.5
pl + pr = 1
pL + pR = 1
pe + p f = 1
0
pa ³ 0
•Utility for a player:
• Probability-weighted sum over leaf nodes
• Not linear, even when other player is held fixed
• Example:
c
d
c
d
P1
P1
-3
6
e
f
e
f
6
0
0
-6
1
3
Sequence form
•Information-set probabilities sum to probability of
players’ last sequence.
2
3
P1
•Technique for obtaining a linear formulation.
•Exploits perfect recall property.
C
P1
l
0
r
L
R
P2
P2
1.5
c
d
c
d
P1
P1
-3
6
e
f
e
f
6
0
0
-6
1
3
Sequence form
C
2
3
P1
•P1 constraints:
• 𝑥𝑙 + 𝑥𝑟 = 1
• 𝑥𝐿 + 𝑥𝑅 = 1
• 𝑥𝑒 + 𝑥𝑓 = 𝑥𝑟
P1
l
0
•P2 constraints:
• 𝑦𝑐 + 𝑦𝑑 = 1
r
L
R
P2
P2
1.5
c
d
c
d
P1
P1
-3
6
•Utility:
• Weighted sum over leaves.
• Weights:
1
𝑦𝑥
3 𝑐 𝑒
∗6
• Recovering behavioral strategy:
• 𝑝 𝑒 =
e
f
e
f
6
0
0
-6
𝑥𝑒
𝑥𝑟
Sequence form
•Sequence-form strategy spaces:
• 𝑋 for P1.
• 𝑌 for P2.
• Specified by linear equalities, so both are convex polytopes.
•Utility matrix 𝐴:
• Encodes utility function for P2.
• Non-zero entries correspond to sequences preceding leaf nodes.
•Optimization problem:
• min max 𝑥 𝑇 𝐴𝑦
𝑥∈𝑋 𝑦∈𝑌
1
3
Utility matrix 𝐴
𝒚∅
𝒚𝒄
𝒚𝒅
𝒙∅
0
0
0
𝒙𝒍
0
0
0
𝒙𝒓
0
0
0
𝒙𝑳
0
−3
6
𝒙𝑹
1.5
0
0
𝒙𝒆
0
6
0
𝒙𝒇
0
0
−6
min max 𝑥 𝑇 𝐴𝑦
𝑥∈𝑋 𝑦∈𝑌
C
2
3
P1
P1
l
0
r
L
R
P2
P2
1.5
c
d
c
d
P1
P1
-3
6
e
f
e
f
6
0
0
-6
Equilibrium-finding algorithms
•Our objective:
• min max 𝑥 𝑇 𝐴𝑦
𝑥∈𝑋 𝑦∈𝑌
•Solvable by linear programming
• Best method when possible
• LP often won’t fit in memory
•Iterative 𝜖-Nash equilibrium methods are preferred in practice
• Regret-based methods such as counter-factual regret minimization (CFR)
• First-order methods (FOMs), we will focus on these
Bilinear saddle-point
formulation
Bilinear saddle-point formulation
Sequence-form problem:
◦ min max 𝑥 𝑇 𝐴𝑦
𝑥∈𝑋 𝑦∈𝑌
From the perspective of a single player:
◦ min 𝑓 𝑥 , 𝑓 𝑥 = max 𝑥 𝑇 𝐴𝑦
𝑥
𝑦
◦ max 𝜙 𝑦 , 𝜙 𝑦 = min 𝑥 𝑇 𝐴𝑦
𝑦
𝑥
𝑓 ⋅ is convex, 𝜙(⋅) is concave
◦ Could apply e.g. subgradient descent
◦ But can we do better?
Smoothed function approximation
Nonsmooth functions:
◦ 𝑓 𝑥 = max 𝑥 𝑇 𝐴𝑦
𝑦
◦ 𝜙 𝑦 = min 𝑥 𝑇 𝐴𝑦
𝑥
Add smooth term:
◦ 𝑓 𝑥 = max 𝑥 𝑇 𝐴𝑦 − 𝜇𝑌 𝑑𝑌 (𝑦)
𝑦
◦ 𝜙 𝑦 = min 𝑥 𝑇 𝐴𝑦 + 𝜇𝑋 𝑑𝑋 (𝑥)
𝑥
◦ 𝑑𝑌 , 𝑑𝑋 are differentiable and strongly convex on 𝑌, 𝑋, called distance-generating functions (DGF)
◦ 𝜇𝑌 , 𝜇𝑋 are smoothing parameters
Conditions on DGFs
Differentiable on 𝑋
Strongly convex modulus 1:
1
2
◦ 𝑑 𝛼𝑥 + 1 − 𝛼 𝑥 ′ ≤ 𝛼𝑑 𝑥 + 1 − 𝛼 𝑑 𝑥 ′ − 𝛼 1 − 𝛼 𝑥 − 𝑥 ′
Equivalently for differentiable functions:
1
◦ 𝑑 𝑥′ ≥ 𝑑 𝑥 + 𝛻𝑑 𝑥 , 𝑥 ′ − 𝑥 + 2 𝑥 ′ − 𝑥
2 , ∀𝑥, 𝑥 ′
Equivalently for twice-differentiable functions:
◦ ℎ𝑇 𝛻 2 𝑑 𝑥 ℎ ≥ ‖ℎ‖2 , ∀𝑥 ∈ 𝑋, ℎ ∈ ℝ𝑛
∈𝑋
2 , ∀𝑥, 𝑥 ′
∈𝑋
Example smoothing function
•Example: 𝑋 is the n-dimensional simplex
• Δ𝑛 = 𝑥: |𝑥|1 = 1, 𝑥 ≥ 0 .
•Negative entropy function is a DGF for this
• 𝑑𝑋 𝑥 =
•
ℎ𝑖2
𝑖𝑥
𝑖
=
𝑖 𝑥𝑖
𝑖 𝑥𝑖
log 𝑥𝑖 .
ℎ𝑖2
𝑖𝑥
𝑖
≥
•Diameter Ω = ln 𝑛.
•Strong convexity 𝜎 = 1.
|ℎ𝑖 |
𝑖 𝑥
𝑖
2
𝑥𝑖
= |ℎ|12 .
Effect of smoothing
−2
1
𝑓 𝑥 = 𝑚𝑎𝑥𝑦 𝑦 ⋅ −0.1 ⋅ 𝑥 − 𝑦 ⋅ 0.4
2
−0.9
−2
1
𝑓 𝑥 = 𝑚𝑎𝑥𝑦 𝑦 ⋅ −0.1 ⋅ 𝑥 + 𝑦 ⋅ 0.4 − 𝑤 𝑦
2
−0.9
Algorithms
Excessive gap technique (EGT) and Mirror prox:
◦ Gradient descent-like algorithms on smoothed 𝑓, 𝜙 functions
◦ Assume access to smoothing functions 𝑑𝑋 , 𝑑𝑌
◦ EGT Rate:
4‖𝐴‖
𝑇
𝐷𝑋 𝐷𝑌
𝜎𝑋 𝜎𝑌
◦ ‖𝐴‖ is a condition on the game
◦ 𝐷𝑋 is a measure of size of 𝑋: 𝐷𝑋 = max 𝑑𝑋 𝑥 − 𝑑𝑋 (𝑥 ′ )
𝑥,𝑥′
◦ 𝜎𝑋 is the strong convexity parameter of 𝑑𝑋
◦ Mirror prox similar
◦ Linear in 𝑇, CFR was 𝑇
Smoothing for
sequential games
Strategy space polytopes
𝑋, 𝑌 are more complex than simplex.
Tree of simplexes, scaled by parent variable.
Distance-generating function for
treeplexes
Use entropy function on each individual simplex
For each simplex 𝑗, dilate it by the parent sequence:
◦ 𝑑𝑗 𝑥 = 𝑥𝑝𝑗
𝑥𝑖
𝑖𝑥
𝑝𝑗
log
𝑥𝑖
,𝑥
𝑥𝑝𝑗
∈ Δ𝑗
Take weighted sum over simplexes to get d.g.f. for treeplex 𝑄:
◦ 𝑑 𝑥 =
𝑗 𝛽𝑗 𝑥𝑗 (𝑥
𝑗) , 𝑥
∈ 𝑋, 𝑥 𝑗 ∈ Δ𝑗
Weights 𝛽𝑗 = 𝑀2𝑑𝑗 𝑀𝑗 ,
◦ 𝑑𝑗 is depth under simplex 𝑗
◦ 𝑀𝑗 is maximum value of 𝑙1 norm over subtree at simplex 𝑗
The dilated entropy function with these weights is strongly convex over any treeplex 𝑋
Sampling
EGT and Mirror prox require gradient estimator
◦ 𝛻𝑓𝜇𝑋 𝑥 = 𝐴𝑦
◦ 𝛻𝜙𝜇𝑌 𝑦 = 𝐴𝑇 𝑥 𝑇
◦ Requires traversing entire game tree
Instead, we can sample at a subset of (or possibly all) information sets to get estimate 𝜉 of 𝐴𝑦
◦ Apply stochastic mirror prox algorithm
◦ Optimally trades off variance-based
1
𝑇
1
runtime and normal runtime
◦ Requires 𝐸 𝜉 = 𝐴𝑦
◦ Obtained by sampling according to current 𝑦
𝑇
1
3
Sampled gradient 𝜉
1
4
•P2 strategy: 𝑦𝑐 = , 𝑦𝑑 =
3
4
•Sample e.g. nature and opponent
•Example for C going right, and P2 choosing 𝑑:
𝝃∅
0
𝝃𝒍
0
𝝃𝒓
0
𝝃𝑳
6
𝝃𝑹
1.5
𝝃𝒆
0
𝝃𝒇
0
C
2
3
P1
P1
l
0
r
L
R
P2
P2
1.5
c
d
c
d
P1
P1
-3
6
e
f
e
f
6
0
0
-6
Experiments - game
Leduc hold’em.
◦
◦
◦
◦
◦
◦
Simplified limit Texas hold’em game.
Deck has k unique cards, with two copies of each card.
Each player is dealt a single private card.
One community card is dealt.
A betting round occurs before and after the community card is dealt.
Each betting round allows up to three raises.
Experiments

slides1

Transcript slides1

Directory