Learning Cooperative Games Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick (to appear in IJCAI 2015)

Transcript Learning Cooperative Games Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick (to appear in IJCAI 2015)

Learning Cooperative
Games
Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick
(to appear in IJCAI 2015)
Cooperative Games
Players divide into
coalitions to perform tasks
Coalition members can
freely divide profits.
How should profits be divided?
Cooperative Games
A set of players - 𝑁 = 1, … , 𝑛
Characteristic function - 𝑣: 2𝑁 → ℝ+
• 𝑣(𝑆) – value of a coalition 𝑆.
Imputation: a vector 𝐱 ∈ ℝ𝑛 satisfying
efficiency: 𝑖∈𝑁 𝑥𝑖 = 𝑣 𝑁
And individual rationality: 𝑥𝑖 ≥ 𝑣 𝑖
Cooperative Games
A game 𝒢 = 〈𝑁, 𝑣〉 is called simple if
𝑣 𝑆 ∈ {0,1}
𝒢 is monotone if for any 𝑆 ⊆ 𝑇 ⊆ 𝑁:
𝑣 𝑆 ≤𝑣 𝑇
The Core
An imputation 𝐱 is in the core if
𝑥𝑖 = 𝑥(𝑆) ≥ 𝑣 𝑆 , ∀𝑆 ⊆ 𝑁
𝑖∈𝑆
• Each subset of players is getting at least
what it can make on its own.
• A notion of stability; no one can deviate.
Learning Coalitional Values
I want the
forest cleared
of threats!
6
Learning Coalitional Values
I’ll pay my men
fairly to do it.
7
Learning Coalitional Values
But, what can
they do?
8
Learning Coalitional Values
I know
nothing!
9
Learning Coalitional Values
Let me observe
what the scouting
missions do
0
100
50
150
10
Learning Cooperative
Games
We want to find a stable outcome, but the
valuation function is unknown.
Can we, using a small number of samples,
find a payoff division that is
likely to be stable?
11
PAC Learning
We are given 𝑚 samples from an (unknown)
function 𝑣: 2𝑁 → ℝ
𝑆1 , 𝑣 𝑆1 , … , 𝑆𝑚 , 𝑣 𝑆𝑚
Given these samples, find a function
𝑣 ∗ : 2𝑁 → ℝ that approximates 𝑣.
Need to make some structural assumptions
on 𝑣 (e.g. 𝑣 is a linear classifier)
12
PAC Learning
Probably approximately correct: observing 𝑚 i.i.d
samples from a distribution 𝒟,
with probability 1 − 𝛿 (probably), I am going to
output a function that is wrong on at most a measure
of 𝜀 of sets sampled from 𝒟 (approximately correct).
14
PAC Stability
Probably approximately stable: observing 𝑚 i.i.d
samples from a distribution 𝒟,
with probability 1 − 𝛿 (probably), output a payoff
vector that is unstable against at most a measure of 𝜀
of sets sampled from 𝒟 (approximately stable),
… or output that the core is empty.
15
Stability via Learnability
Theorem: let 𝑣 ∗ be an 𝜀, 𝛿 PAC approximation of 𝑣;
if 𝐱 ∗ ∈ 𝑐𝑜𝑟𝑒 𝑣 ∗ then w.p. ≥ 1 − 𝛿,
Pr 𝑥 ∗ 𝑆 < 𝑣(𝑆) < 𝜀
𝑆∼𝒟
Some caveats:
1. Need to still guarantee that 𝑥 ∗ 𝑁 ≤ 𝑣(𝑁) (we
often can)
2. Need to handle cases where 𝑐𝑜𝑟𝑒 𝑣 ∗ = ∅ but
𝑐𝑜𝑟𝑒 𝑣 ≠ ∅.
16
Stability via Learnability
So, if we can PAC learn 𝒞, we can PAC stabilize 𝒞.
Is there another way of achieving PAC stability?
For some classes of games, the core has a simple
structure.
17
Simple Games
18
PAC Stability in Simple
Games
Simple games are generally hard to learn [Procaccia &
Rosenschein 2006].
But, their core has a very simple structure
Fact: the core of a simple game 𝒢 = 〈𝑁, 𝑣〉 is not empty if
and only if 𝒢 has veto players,
in which case any division of payoffs among the veto
players is in the core.
No need to learn the structure of the game, just identify
the veto players!
19
Simple Games
20
Simple Games
21
Simple Games
22
Simple Games
23
PAC Stability in Simple
Games
Only Sam appeared in all observed winning coalitions:
he is likely to be a veto player; pay him everything.
24
PAC Stability in Simple
Games
Theorem: simple games are PAC stabilizable (though
they are not generally PAC learnable).
What about other classes of games?
We investigate both PAC learnability and PAC stability
of some common classes of cooperative games.
25
Network Flow Games
• We are given a weighted, directed graph
s
3
10
7
5
1
3
6
1
7
3
t
5
1
2
4
• Players are edges; value of a coalition is the
value of the max. flow it can pass from s to t.
Network Flow Games
Theorem: network flow games are not
efficiently PAC learnable unless RP = NP.
Proof idea: we show that a similar class of
games (min-sum games) is not efficiently
learnable (the reduction from them to network
flows is easy).
Network Flow Games
Min-sum games: the class of 𝑘-min-sum games
is the class of games defined by 𝑘 vectors
𝐰1 , … , 𝐰𝑘 ∈ ℝ𝑛
𝑓 𝑆 = min
ℓ=1…𝑘
𝑤ℓ𝑖
𝑖∈𝑆
1-min-sum games: linear functions.
Network Flow Games
Proof Idea:
It is known that 𝑘-clause-CNF formulas (CNF
formulas with 𝑘 clauses) are hard to learn if 𝑘 > 1.
We reduce hardness for 𝑘-CNF formulas to
hardness for (𝑘 + 1)-min-sum.
𝐱1 , 𝜙 𝐱1 , … , (𝐱 𝑚 , 𝜙 𝐱𝑚 )
𝑁→𝑁
𝜙 → 𝑓𝜙
Construct 𝑘-clause
CNF 𝜙 ∗ from 𝑓 ∗
Learn 𝑓 ∗ that PAC
approximates 𝑓𝜙
Argue that 𝜙 ∗ PAC
approximates 𝜙
Network Flow Games
Network flow games are generally hard to learn.
But, if we limit ourselves to path queries, they
are easy to learn!
Theorem: the class of network flow games is
PAC learnable (and PAC stabilizable) when we
are limited to path queries.
Network Flow Games
s
3
10
7
5
1
3
6
1
7
3
t
5
1
2
4
Proof idea:
Suppose we are given the input
𝑝1 , 𝑓𝑙𝑜𝑤 𝑝1 , … , 𝑝𝑚 , 𝑓𝑙𝑜𝑤 𝑝𝑚
Define for every 𝑒 ∈ 𝐸
𝑤𝑒∗ = max 𝑓𝑙𝑜𝑤 𝑝𝑗
𝑗:𝑒∈𝑝𝑗
Network Flow Games
s
2
2
2
2
t
2
2
Proof idea:
Suppose we are given the input
𝑝1 , 𝑓𝑙𝑜𝑤 𝑝1 , … , 𝑝𝑚 , 𝑓𝑙𝑜𝑤 𝑝𝑚
Define for every 𝑒 ∈ 𝐸
𝑤𝑒∗ = max 𝑓𝑙𝑜𝑤 𝑝𝑗
𝑗:𝑒∈𝑝𝑗
Network Flow Games
s
25
5
2
5
2
5
5
2
Proof idea:
Suppose we are given the input
𝑝1 , 𝑓𝑙𝑜𝑤 𝑝1 , … , 𝑝𝑚 , 𝑓𝑙𝑜𝑤 𝑝𝑚
Define for every 𝑒 ∈ 𝐸
𝑤𝑒∗ = max 𝑓𝑙𝑜𝑤 𝑝𝑗
𝑗:𝑒∈𝑝𝑗
t
2
Network Flow Games
5
5
s
1
1
1
1
5
1
5
1
2
Proof idea:
Suppose we are given the input
𝑝1 , 𝑓𝑙𝑜𝑤 𝑝1 , … , 𝑝𝑚 , 𝑓𝑙𝑜𝑤 𝑝𝑚
Define for every 𝑒 ∈ 𝐸
𝑤𝑒∗ = max 𝑓𝑙𝑜𝑤 𝑝𝑗
𝑗:𝑒∈𝑝𝑗
t
2
Threshold Task Games
[Chalkiadakis et al., 2011]
Each agent has a weight 𝑤𝑖
A finite set of tasks 𝒯; each with a value V 𝑡 and a
threshold 𝑞 𝑡 .
A set 𝑆 ⊆ 𝑁 can complete a task 𝑡 if 𝑤 𝑆 ≥ 𝑞(𝑡).
Value of a set: most valuable task that it can
complete.
Weighted voting games: single task of value 1.
Threshold Task Games
Theorem: let 𝑘-TTG be the class of TTGs with 𝑘
tasks; then 𝑘-TTG is PAC learnable.
Proof Idea:
1. 𝑇𝑇𝐺𝑘 (𝑄): class of TTGs
with 𝑘 tasks whose values are
known (𝑄 = 𝑉1 , … , 𝑉𝑘 ).
First show that 𝑇𝑇𝐺𝑘 𝑄 is
PAC learnable
2. If after 𝑚 samples from
TTG 𝑣 we saw the value set
𝑄; then w.p. ≥ 1 − 𝛿,
Pr 𝑣 𝑆 ∉ 𝑄 < 𝜀
3. Combining these
observations, we know
that after enough samples
we are likely to know the
values of 𝑄, we can then
pretend that our input is
from 𝑇𝑇𝐺𝑘 𝑄 , and learn a
game for it. That game PAC
approximates 𝑣.
Additional Results
Induced Subgraph Games [Deng &
Papadimitriou, 1994]: PAC learnable, PAC
stabilizable if edge weights are non-negative.
1
2
5
3
3
2
4
1
1
5
3
4
7
3
6
1
7
9
5
6
4
8
2
Additional Results
Coalitional Skill Games [Bachrach et al., 2008]:
generally hard to learn (but possible under some
structural assumptions).
𝒮 – a set of skills
𝑆𝑖 ⊆ 𝒮 : the skills of agent 𝑖 ∈ 𝑁
𝐾𝑡 ⊆ 𝒮: the skills required by task 𝑡
𝑇 𝑆 = 𝑡: 𝐾𝑡 ⊆ 𝑖∈𝑆 𝑆𝑖 : the set of tasks that 𝑆
can complete.
𝑣 𝑆 is a function of 𝑇(𝑆) (we look at several
variants).
Additional Results
MC-nets [Ieong & Shoham, 2005]: learning MCnets is hard (disjoint DNF problem).
A list of 𝑘 rules of the form
𝑥𝑖 ∧ 𝑥𝑗 ∧ ¬𝑥𝑘 → 𝑣
“if 𝑆 contains 𝑖 and 𝑗, but does not contain 𝑘,
award it a value of 𝑣”
Value of 𝑆: sum of its evaluations on rules.
Conclusions
Handling uncertainty in cooperative games is
important!
- Gateway to their applicability.
- Can we circumvent hardness of PAC learning and
directly obtain PAC stable outcomes (like we did in
simple games)?
- What about distributional assumptions?
Thank you!
Questions?
Additional Slides
Shattering Dimension and Learning
Given a class of functions 𝒞 that take values in 0,1 ,
and a set 𝒮 = 𝑆1 , … , 𝑆𝑚 of 𝑚 sets, we say that 𝒞
shatters 𝒮 if for every vector 𝐛 ∈ 0,1 𝑚 , there is
some function 𝑓𝐛 ∈ 𝒞 such that
∀𝑗 = 1, … , 𝑚: 𝑓𝐛 𝑆𝑗 = 𝑏𝑗
Intuitively: 𝒞 is complex enough in order to label the
sets in 𝒮 in any way possible.
𝑉𝐶𝑑𝑖𝑚 𝒞 = max 𝑚 ∣ 𝒞 can shatter a set of size 𝑚
Shattering Dimension and Learning
Claim: we only need a number of samples polynomial
1
1
in ,
and 𝑉𝐶𝑑𝑖𝑚(𝒞) to 𝜀, 𝛿 -learn a class of
𝜀 log 𝛿
boolean functions 𝒞.
Shattering Dimension and Learning
If 𝒞 takes real values, we cannot use VC dimension.
Given a set of sets 𝒮 = 𝑆1 , … , 𝑆𝑚 of size 𝑚, and a list
of real values 𝐫 = 𝑟1 , … , 𝑟𝑚 , we say that 𝒞 shatters
𝒮, 𝐫 if for every 𝐛 ∈ 0,1 𝑚 there exists some
function 𝑓𝐛 ∈ 𝒞 such that
∀𝑗 such that 𝑏𝑗 = 0: 𝑓𝐛 𝑆𝑗 < 𝑟𝑗
∀𝑗 such that 𝑏𝑗 = 1: 𝑓𝐛 𝑆𝑗 ≥ 𝑟𝑗
The pseudo-dimension of 𝒞
𝑃𝑑𝑖𝑚 𝒞
= max 𝑚 𝒞 can shatter a tuple 𝒮, 𝐫 of size 𝑚
Shattering Dimension and Learning
Claim: we only need a number of samples polynomial
1
1
in ,
and 𝑃𝑑𝑖𝑚(𝒞) to 𝜀, 𝛿 -learn a class of real
𝜀 log 𝛿
functions 𝒞.
Reverse Engineering a Game
I have a (known) game 𝑣: 2𝑁 → ℝ
I tell you that it belongs to some class 𝒞:
- it’s a 𝑘-vector WVG
- It’s a network flow game
- It’s a succinct MC net
But I’m not telling you what are the parameters!
Can you recover them? Using active/passive learning

Learning Cooperative Games Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick (to appear in IJCAI 2015)

Transcript Learning Cooperative Games Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick (to appear in IJCAI 2015)

Directory