Learning Cooperative Games Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick (to appear in IJCAI 2015)
Download
Report
Transcript Learning Cooperative Games Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick (to appear in IJCAI 2015)
Learning Cooperative
Games
Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick
(to appear in IJCAI 2015)
Cooperative Games
Players divide into
coalitions to perform tasks
Coalition members can
freely divide profits.
How should profits be divided?
Cooperative Games
A set of players - ๐ = 1, โฆ , ๐
Characteristic function - ๐ฃ: 2๐ โ โ+
โข ๐ฃ(๐) โ value of a coalition ๐.
Imputation: a vector ๐ฑ โ โ๐ satisfying
efficiency: ๐โ๐ ๐ฅ๐ = ๐ฃ ๐
And individual rationality: ๐ฅ๐ โฅ ๐ฃ ๐
Cooperative Games
A game ๐ข = โฉ๐, ๐ฃโช is called simple if
๐ฃ ๐ โ {0,1}
๐ข is monotone if for any ๐ โ ๐ โ ๐:
๐ฃ ๐ โค๐ฃ ๐
The Core
An imputation ๐ฑ is in the core if
๐ฅ๐ = ๐ฅ(๐) โฅ ๐ฃ ๐ , โ๐ โ ๐
๐โ๐
โข Each subset of players is getting at least
what it can make on its own.
โข A notion of stability; no one can deviate.
Learning Coalitional Values
I want the
forest cleared
of threats!
6
Learning Coalitional Values
Iโll pay my men
fairly to do it.
7
Learning Coalitional Values
But, what can
they do?
8
Learning Coalitional Values
I know
nothing!
9
Learning Coalitional Values
Let me observe
what the scouting
missions do
0
100
50
150
10
Learning Cooperative
Games
We want to find a stable outcome, but the
valuation function is unknown.
Can we, using a small number of samples,
find a payoff division that is
likely to be stable?
11
PAC Learning
We are given ๐ samples from an (unknown)
function ๐ฃ: 2๐ โ โ
๐1 , ๐ฃ ๐1 , โฆ , ๐๐ , ๐ฃ ๐๐
Given these samples, find a function
๐ฃ โ : 2๐ โ โ that approximates ๐ฃ.
Need to make some structural assumptions
on ๐ฃ (e.g. ๐ฃ is a linear classifier)
12
PAC Learning
Probably approximately correct: observing ๐ i.i.d
samples from a distribution ๐,
with probability 1 โ ๐ฟ (probably), I am going to
output a function that is wrong on at most a measure
of ๐ of sets sampled from ๐ (approximately correct).
14
PAC Stability
Probably approximately stable: observing ๐ i.i.d
samples from a distribution ๐,
with probability 1 โ ๐ฟ (probably), output a payoff
vector that is unstable against at most a measure of ๐
of sets sampled from ๐ (approximately stable),
โฆ or output that the core is empty.
15
Stability via Learnability
Theorem: let ๐ฃ โ be an ๐, ๐ฟ PAC approximation of ๐ฃ;
if ๐ฑ โ โ ๐๐๐๐ ๐ฃ โ then w.p. โฅ 1 โ ๐ฟ,
Pr ๐ฅ โ ๐ < ๐ฃ(๐) < ๐
๐โผ๐
Some caveats:
1. Need to still guarantee that ๐ฅ โ ๐ โค ๐ฃ(๐) (we
often can)
2. Need to handle cases where ๐๐๐๐ ๐ฃ โ = โ
but
๐๐๐๐ ๐ฃ โ โ
.
16
Stability via Learnability
So, if we can PAC learn ๐, we can PAC stabilize ๐.
Is there another way of achieving PAC stability?
For some classes of games, the core has a simple
structure.
17
Simple Games
18
PAC Stability in Simple
Games
Simple games are generally hard to learn [Procaccia &
Rosenschein 2006].
But, their core has a very simple structure
Fact: the core of a simple game ๐ข = โฉ๐, ๐ฃโช is not empty if
and only if ๐ข has veto players,
in which case any division of payoffs among the veto
players is in the core.
No need to learn the structure of the game, just identify
the veto players!
19
Simple Games
20
Simple Games
21
Simple Games
22
Simple Games
23
PAC Stability in Simple
Games
Only Sam appeared in all observed winning coalitions:
he is likely to be a veto player; pay him everything.
24
PAC Stability in Simple
Games
Theorem: simple games are PAC stabilizable (though
they are not generally PAC learnable).
What about other classes of games?
We investigate both PAC learnability and PAC stability
of some common classes of cooperative games.
25
Network Flow Games
โข We are given a weighted, directed graph
s
3
10
7
5
1
3
6
1
7
3
t
5
1
2
4
โข Players are edges; value of a coalition is the
value of the max. flow it can pass from s to t.
Network Flow Games
Theorem: network flow games are not
efficiently PAC learnable unless RP = NP.
Proof idea: we show that a similar class of
games (min-sum games) is not efficiently
learnable (the reduction from them to network
flows is easy).
Network Flow Games
Min-sum games: the class of ๐-min-sum games
is the class of games defined by ๐ vectors
๐ฐ1 , โฆ , ๐ฐ๐ โ โ๐
๐ ๐ = min
โ=1โฆ๐
๐คโ๐
๐โ๐
1-min-sum games: linear functions.
Network Flow Games
Proof Idea:
It is known that ๐-clause-CNF formulas (CNF
formulas with ๐ clauses) are hard to learn if ๐ > 1.
We reduce hardness for ๐-CNF formulas to
hardness for (๐ + 1)-min-sum.
๐ฑ1 , ๐ ๐ฑ1 , โฆ , (๐ฑ ๐ , ๐ ๐ฑ๐ )
๐โ๐
๐ โ ๐๐
Construct ๐-clause
CNF ๐ โ from ๐ โ
Learn ๐ โ that PAC
approximates ๐๐
Argue that ๐ โ PAC
approximates ๐
Network Flow Games
Network flow games are generally hard to learn.
But, if we limit ourselves to path queries, they
are easy to learn!
Theorem: the class of network flow games is
PAC learnable (and PAC stabilizable) when we
are limited to path queries.
Network Flow Games
s
3
10
7
5
1
3
6
1
7
3
t
5
1
2
4
Proof idea:
Suppose we are given the input
๐1 , ๐๐๐๐ค ๐1 , โฆ , ๐๐ , ๐๐๐๐ค ๐๐
Define for every ๐ โ ๐ธ
๐ค๐โ = max ๐๐๐๐ค ๐๐
๐:๐โ๐๐
Network Flow Games
s
2
2
2
2
t
2
2
Proof idea:
Suppose we are given the input
๐1 , ๐๐๐๐ค ๐1 , โฆ , ๐๐ , ๐๐๐๐ค ๐๐
Define for every ๐ โ ๐ธ
๐ค๐โ = max ๐๐๐๐ค ๐๐
๐:๐โ๐๐
Network Flow Games
s
25
5
2
5
2
5
5
2
Proof idea:
Suppose we are given the input
๐1 , ๐๐๐๐ค ๐1 , โฆ , ๐๐ , ๐๐๐๐ค ๐๐
Define for every ๐ โ ๐ธ
๐ค๐โ = max ๐๐๐๐ค ๐๐
๐:๐โ๐๐
t
2
Network Flow Games
5
5
s
1
1
1
1
5
1
5
1
2
Proof idea:
Suppose we are given the input
๐1 , ๐๐๐๐ค ๐1 , โฆ , ๐๐ , ๐๐๐๐ค ๐๐
Define for every ๐ โ ๐ธ
๐ค๐โ = max ๐๐๐๐ค ๐๐
๐:๐โ๐๐
t
2
Threshold Task Games
[Chalkiadakis et al., 2011]
Each agent has a weight ๐ค๐
A finite set of tasks ๐ฏ; each with a value V ๐ก and a
threshold ๐ ๐ก .
A set ๐ โ ๐ can complete a task ๐ก if ๐ค ๐ โฅ ๐(๐ก).
Value of a set: most valuable task that it can
complete.
Weighted voting games: single task of value 1.
Threshold Task Games
Theorem: let ๐-TTG be the class of TTGs with ๐
tasks; then ๐-TTG is PAC learnable.
Proof Idea:
1. ๐๐๐บ๐ (๐): class of TTGs
with ๐ tasks whose values are
known (๐ = ๐1 , โฆ , ๐๐ ).
First show that ๐๐๐บ๐ ๐ is
PAC learnable
2. If after ๐ samples from
TTG ๐ฃ we saw the value set
๐; then w.p. โฅ 1 โ ๐ฟ,
Pr ๐ฃ ๐ โ ๐ < ๐
3. Combining these
observations, we know
that after enough samples
we are likely to know the
values of ๐, we can then
pretend that our input is
from ๐๐๐บ๐ ๐ , and learn a
game for it. That game PAC
approximates ๐ฃ.
Additional Results
Induced Subgraph Games [Deng &
Papadimitriou, 1994]: PAC learnable, PAC
stabilizable if edge weights are non-negative.
1
2
5
3
3
2
4
1
1
5
3
4
7
3
6
1
7
9
5
6
4
8
2
Additional Results
Coalitional Skill Games [Bachrach et al., 2008]:
generally hard to learn (but possible under some
structural assumptions).
๐ฎ โ a set of skills
๐๐ โ ๐ฎ : the skills of agent ๐ โ ๐
๐พ๐ก โ ๐ฎ: the skills required by task ๐ก
๐ ๐ = ๐ก: ๐พ๐ก โ ๐โ๐ ๐๐ : the set of tasks that ๐
can complete.
๐ฃ ๐ is a function of ๐(๐) (we look at several
variants).
Additional Results
MC-nets [Ieong & Shoham, 2005]: learning MCnets is hard (disjoint DNF problem).
A list of ๐ rules of the form
๐ฅ๐ โง ๐ฅ๐ โง ยฌ๐ฅ๐ โ ๐ฃ
โif ๐ contains ๐ and ๐, but does not contain ๐,
award it a value of ๐ฃโ
Value of ๐: sum of its evaluations on rules.
Conclusions
Handling uncertainty in cooperative games is
important!
- Gateway to their applicability.
- Can we circumvent hardness of PAC learning and
directly obtain PAC stable outcomes (like we did in
simple games)?
- What about distributional assumptions?
Thank you!
Questions?
Additional Slides
Shattering Dimension and Learning
Given a class of functions ๐ that take values in 0,1 ,
and a set ๐ฎ = ๐1 , โฆ , ๐๐ of ๐ sets, we say that ๐
shatters ๐ฎ if for every vector ๐ โ 0,1 ๐ , there is
some function ๐๐ โ ๐ such that
โ๐ = 1, โฆ , ๐: ๐๐ ๐๐ = ๐๐
Intuitively: ๐ is complex enough in order to label the
sets in ๐ฎ in any way possible.
๐๐ถ๐๐๐ ๐ = max ๐ โฃ ๐ can shatter a set of size ๐
Shattering Dimension and Learning
Claim: we only need a number of samples polynomial
1
1
in ,
and ๐๐ถ๐๐๐(๐) to ๐, ๐ฟ -learn a class of
๐ log ๐ฟ
boolean functions ๐.
Shattering Dimension and Learning
If ๐ takes real values, we cannot use VC dimension.
Given a set of sets ๐ฎ = ๐1 , โฆ , ๐๐ of size ๐, and a list
of real values ๐ซ = ๐1 , โฆ , ๐๐ , we say that ๐ shatters
๐ฎ, ๐ซ if for every ๐ โ 0,1 ๐ there exists some
function ๐๐ โ ๐ such that
โ๐ such that ๐๐ = 0: ๐๐ ๐๐ < ๐๐
โ๐ such that ๐๐ = 1: ๐๐ ๐๐ โฅ ๐๐
The pseudo-dimension of ๐
๐๐๐๐ ๐
= max ๐ ๐ can shatter a tuple ๐ฎ, ๐ซ of size ๐
Shattering Dimension and Learning
Claim: we only need a number of samples polynomial
1
1
in ,
and ๐๐๐๐(๐) to ๐, ๐ฟ -learn a class of real
๐ log ๐ฟ
functions ๐.
Reverse Engineering a Game
I have a (known) game ๐ฃ: 2๐ โ โ
I tell you that it belongs to some class ๐:
- itโs a ๐-vector WVG
- Itโs a network flow game
- Itโs a succinct MC net
But Iโm not telling you what are the parameters!
Can you recover them? Using active/passive learning