Learning Cooperative Games Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick (to appear in IJCAI 2015)

Download Report

Transcript Learning Cooperative Games Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick (to appear in IJCAI 2015)

Learning Cooperative
Games
Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick
(to appear in IJCAI 2015)
Cooperative Games
Players divide into
coalitions to perform tasks
Coalition members can
freely divide profits.
How should profits be divided?
Cooperative Games
A set of players - ๐‘ = 1, โ€ฆ , ๐‘›
Characteristic function - ๐‘ฃ: 2๐‘ โ†’ โ„+
โ€ข ๐‘ฃ(๐‘†) โ€“ value of a coalition ๐‘†.
Imputation: a vector ๐ฑ โˆˆ โ„๐‘› satisfying
efficiency: ๐‘–โˆˆ๐‘ ๐‘ฅ๐‘– = ๐‘ฃ ๐‘
And individual rationality: ๐‘ฅ๐‘– โ‰ฅ ๐‘ฃ ๐‘–
Cooperative Games
A game ๐’ข = โŒฉ๐‘, ๐‘ฃโŒช is called simple if
๐‘ฃ ๐‘† โˆˆ {0,1}
๐’ข is monotone if for any ๐‘† โŠ† ๐‘‡ โŠ† ๐‘:
๐‘ฃ ๐‘† โ‰ค๐‘ฃ ๐‘‡
The Core
An imputation ๐ฑ is in the core if
๐‘ฅ๐‘– = ๐‘ฅ(๐‘†) โ‰ฅ ๐‘ฃ ๐‘† , โˆ€๐‘† โŠ† ๐‘
๐‘–โˆˆ๐‘†
โ€ข Each subset of players is getting at least
what it can make on its own.
โ€ข A notion of stability; no one can deviate.
Learning Coalitional Values
I want the
forest cleared
of threats!
6
Learning Coalitional Values
Iโ€™ll pay my men
fairly to do it.
7
Learning Coalitional Values
But, what can
they do?
8
Learning Coalitional Values
I know
nothing!
9
Learning Coalitional Values
Let me observe
what the scouting
missions do
0
100
50
150
10
Learning Cooperative
Games
We want to find a stable outcome, but the
valuation function is unknown.
Can we, using a small number of samples,
find a payoff division that is
likely to be stable?
11
PAC Learning
We are given ๐‘š samples from an (unknown)
function ๐‘ฃ: 2๐‘ โ†’ โ„
๐‘†1 , ๐‘ฃ ๐‘†1 , โ€ฆ , ๐‘†๐‘š , ๐‘ฃ ๐‘†๐‘š
Given these samples, find a function
๐‘ฃ โˆ— : 2๐‘ โ†’ โ„ that approximates ๐‘ฃ.
Need to make some structural assumptions
on ๐‘ฃ (e.g. ๐‘ฃ is a linear classifier)
12
PAC Learning
Probably approximately correct: observing ๐‘š i.i.d
samples from a distribution ๐’Ÿ,
with probability 1 โˆ’ ๐›ฟ (probably), I am going to
output a function that is wrong on at most a measure
of ๐œ€ of sets sampled from ๐’Ÿ (approximately correct).
14
PAC Stability
Probably approximately stable: observing ๐‘š i.i.d
samples from a distribution ๐’Ÿ,
with probability 1 โˆ’ ๐›ฟ (probably), output a payoff
vector that is unstable against at most a measure of ๐œ€
of sets sampled from ๐’Ÿ (approximately stable),
โ€ฆ or output that the core is empty.
15
Stability via Learnability
Theorem: let ๐‘ฃ โˆ— be an ๐œ€, ๐›ฟ PAC approximation of ๐‘ฃ;
if ๐ฑ โˆ— โˆˆ ๐‘๐‘œ๐‘Ÿ๐‘’ ๐‘ฃ โˆ— then w.p. โ‰ฅ 1 โˆ’ ๐›ฟ,
Pr ๐‘ฅ โˆ— ๐‘† < ๐‘ฃ(๐‘†) < ๐œ€
๐‘†โˆผ๐’Ÿ
Some caveats:
1. Need to still guarantee that ๐‘ฅ โˆ— ๐‘ โ‰ค ๐‘ฃ(๐‘) (we
often can)
2. Need to handle cases where ๐‘๐‘œ๐‘Ÿ๐‘’ ๐‘ฃ โˆ— = โˆ… but
๐‘๐‘œ๐‘Ÿ๐‘’ ๐‘ฃ โ‰  โˆ….
16
Stability via Learnability
So, if we can PAC learn ๐’ž, we can PAC stabilize ๐’ž.
Is there another way of achieving PAC stability?
For some classes of games, the core has a simple
structure.
17
Simple Games
18
PAC Stability in Simple
Games
Simple games are generally hard to learn [Procaccia &
Rosenschein 2006].
But, their core has a very simple structure
Fact: the core of a simple game ๐’ข = โŒฉ๐‘, ๐‘ฃโŒช is not empty if
and only if ๐’ข has veto players,
in which case any division of payoffs among the veto
players is in the core.
No need to learn the structure of the game, just identify
the veto players!
19
Simple Games
20
Simple Games
21
Simple Games
22
Simple Games
23
PAC Stability in Simple
Games
Only Sam appeared in all observed winning coalitions:
he is likely to be a veto player; pay him everything.
24
PAC Stability in Simple
Games
Theorem: simple games are PAC stabilizable (though
they are not generally PAC learnable).
What about other classes of games?
We investigate both PAC learnability and PAC stability
of some common classes of cooperative games.
25
Network Flow Games
โ€ข We are given a weighted, directed graph
s
3
10
7
5
1
3
6
1
7
3
t
5
1
2
4
โ€ข Players are edges; value of a coalition is the
value of the max. flow it can pass from s to t.
Network Flow Games
Theorem: network flow games are not
efficiently PAC learnable unless RP = NP.
Proof idea: we show that a similar class of
games (min-sum games) is not efficiently
learnable (the reduction from them to network
flows is easy).
Network Flow Games
Min-sum games: the class of ๐‘˜-min-sum games
is the class of games defined by ๐‘˜ vectors
๐ฐ1 , โ€ฆ , ๐ฐ๐‘˜ โˆˆ โ„๐‘›
๐‘“ ๐‘† = min
โ„“=1โ€ฆ๐‘˜
๐‘คโ„“๐‘–
๐‘–โˆˆ๐‘†
1-min-sum games: linear functions.
Network Flow Games
Proof Idea:
It is known that ๐‘˜-clause-CNF formulas (CNF
formulas with ๐‘˜ clauses) are hard to learn if ๐‘˜ > 1.
We reduce hardness for ๐‘˜-CNF formulas to
hardness for (๐‘˜ + 1)-min-sum.
๐ฑ1 , ๐œ™ ๐ฑ1 , โ€ฆ , (๐ฑ ๐‘š , ๐œ™ ๐ฑ๐‘š )
๐‘โ†’๐‘
๐œ™ โ†’ ๐‘“๐œ™
Construct ๐‘˜-clause
CNF ๐œ™ โˆ— from ๐‘“ โˆ—
Learn ๐‘“ โˆ— that PAC
approximates ๐‘“๐œ™
Argue that ๐œ™ โˆ— PAC
approximates ๐œ™
Network Flow Games
Network flow games are generally hard to learn.
But, if we limit ourselves to path queries, they
are easy to learn!
Theorem: the class of network flow games is
PAC learnable (and PAC stabilizable) when we
are limited to path queries.
Network Flow Games
s
3
10
7
5
1
3
6
1
7
3
t
5
1
2
4
Proof idea:
Suppose we are given the input
๐‘1 , ๐‘“๐‘™๐‘œ๐‘ค ๐‘1 , โ€ฆ , ๐‘๐‘š , ๐‘“๐‘™๐‘œ๐‘ค ๐‘๐‘š
Define for every ๐‘’ โˆˆ ๐ธ
๐‘ค๐‘’โˆ— = max ๐‘“๐‘™๐‘œ๐‘ค ๐‘๐‘—
๐‘—:๐‘’โˆˆ๐‘๐‘—
Network Flow Games
s
2
2
2
2
t
2
2
Proof idea:
Suppose we are given the input
๐‘1 , ๐‘“๐‘™๐‘œ๐‘ค ๐‘1 , โ€ฆ , ๐‘๐‘š , ๐‘“๐‘™๐‘œ๐‘ค ๐‘๐‘š
Define for every ๐‘’ โˆˆ ๐ธ
๐‘ค๐‘’โˆ— = max ๐‘“๐‘™๐‘œ๐‘ค ๐‘๐‘—
๐‘—:๐‘’โˆˆ๐‘๐‘—
Network Flow Games
s
25
5
2
5
2
5
5
2
Proof idea:
Suppose we are given the input
๐‘1 , ๐‘“๐‘™๐‘œ๐‘ค ๐‘1 , โ€ฆ , ๐‘๐‘š , ๐‘“๐‘™๐‘œ๐‘ค ๐‘๐‘š
Define for every ๐‘’ โˆˆ ๐ธ
๐‘ค๐‘’โˆ— = max ๐‘“๐‘™๐‘œ๐‘ค ๐‘๐‘—
๐‘—:๐‘’โˆˆ๐‘๐‘—
t
2
Network Flow Games
5
5
s
1
1
1
1
5
1
5
1
2
Proof idea:
Suppose we are given the input
๐‘1 , ๐‘“๐‘™๐‘œ๐‘ค ๐‘1 , โ€ฆ , ๐‘๐‘š , ๐‘“๐‘™๐‘œ๐‘ค ๐‘๐‘š
Define for every ๐‘’ โˆˆ ๐ธ
๐‘ค๐‘’โˆ— = max ๐‘“๐‘™๐‘œ๐‘ค ๐‘๐‘—
๐‘—:๐‘’โˆˆ๐‘๐‘—
t
2
Threshold Task Games
[Chalkiadakis et al., 2011]
Each agent has a weight ๐‘ค๐‘–
A finite set of tasks ๐’ฏ; each with a value V ๐‘ก and a
threshold ๐‘ž ๐‘ก .
A set ๐‘† โŠ† ๐‘ can complete a task ๐‘ก if ๐‘ค ๐‘† โ‰ฅ ๐‘ž(๐‘ก).
Value of a set: most valuable task that it can
complete.
Weighted voting games: single task of value 1.
Threshold Task Games
Theorem: let ๐‘˜-TTG be the class of TTGs with ๐‘˜
tasks; then ๐‘˜-TTG is PAC learnable.
Proof Idea:
1. ๐‘‡๐‘‡๐บ๐‘˜ (๐‘„): class of TTGs
with ๐‘˜ tasks whose values are
known (๐‘„ = ๐‘‰1 , โ€ฆ , ๐‘‰๐‘˜ ).
First show that ๐‘‡๐‘‡๐บ๐‘˜ ๐‘„ is
PAC learnable
2. If after ๐‘š samples from
TTG ๐‘ฃ we saw the value set
๐‘„; then w.p. โ‰ฅ 1 โˆ’ ๐›ฟ,
Pr ๐‘ฃ ๐‘† โˆ‰ ๐‘„ < ๐œ€
3. Combining these
observations, we know
that after enough samples
we are likely to know the
values of ๐‘„, we can then
pretend that our input is
from ๐‘‡๐‘‡๐บ๐‘˜ ๐‘„ , and learn a
game for it. That game PAC
approximates ๐‘ฃ.
Additional Results
Induced Subgraph Games [Deng &
Papadimitriou, 1994]: PAC learnable, PAC
stabilizable if edge weights are non-negative.
1
2
5
3
3
2
4
1
1
5
3
4
7
3
6
1
7
9
5
6
4
8
2
Additional Results
Coalitional Skill Games [Bachrach et al., 2008]:
generally hard to learn (but possible under some
structural assumptions).
๐’ฎ โ€“ a set of skills
๐‘†๐‘– โŠ† ๐’ฎ : the skills of agent ๐‘– โˆˆ ๐‘
๐พ๐‘ก โŠ† ๐’ฎ: the skills required by task ๐‘ก
๐‘‡ ๐‘† = ๐‘ก: ๐พ๐‘ก โŠ† ๐‘–โˆˆ๐‘† ๐‘†๐‘– : the set of tasks that ๐‘†
can complete.
๐‘ฃ ๐‘† is a function of ๐‘‡(๐‘†) (we look at several
variants).
Additional Results
MC-nets [Ieong & Shoham, 2005]: learning MCnets is hard (disjoint DNF problem).
A list of ๐‘˜ rules of the form
๐‘ฅ๐‘– โˆง ๐‘ฅ๐‘— โˆง ¬๐‘ฅ๐‘˜ โ†’ ๐‘ฃ
โ€œif ๐‘† contains ๐‘– and ๐‘—, but does not contain ๐‘˜,
award it a value of ๐‘ฃโ€
Value of ๐‘†: sum of its evaluations on rules.
Conclusions
Handling uncertainty in cooperative games is
important!
- Gateway to their applicability.
- Can we circumvent hardness of PAC learning and
directly obtain PAC stable outcomes (like we did in
simple games)?
- What about distributional assumptions?
Thank you!
Questions?
Additional Slides
Shattering Dimension and Learning
Given a class of functions ๐’ž that take values in 0,1 ,
and a set ๐’ฎ = ๐‘†1 , โ€ฆ , ๐‘†๐‘š of ๐‘š sets, we say that ๐’ž
shatters ๐’ฎ if for every vector ๐› โˆˆ 0,1 ๐‘š , there is
some function ๐‘“๐› โˆˆ ๐’ž such that
โˆ€๐‘— = 1, โ€ฆ , ๐‘š: ๐‘“๐› ๐‘†๐‘— = ๐‘๐‘—
Intuitively: ๐’ž is complex enough in order to label the
sets in ๐’ฎ in any way possible.
๐‘‰๐ถ๐‘‘๐‘–๐‘š ๐’ž = max ๐‘š โˆฃ ๐’ž can shatter a set of size ๐‘š
Shattering Dimension and Learning
Claim: we only need a number of samples polynomial
1
1
in ,
and ๐‘‰๐ถ๐‘‘๐‘–๐‘š(๐’ž) to ๐œ€, ๐›ฟ -learn a class of
๐œ€ log ๐›ฟ
boolean functions ๐’ž.
Shattering Dimension and Learning
If ๐’ž takes real values, we cannot use VC dimension.
Given a set of sets ๐’ฎ = ๐‘†1 , โ€ฆ , ๐‘†๐‘š of size ๐‘š, and a list
of real values ๐ซ = ๐‘Ÿ1 , โ€ฆ , ๐‘Ÿ๐‘š , we say that ๐’ž shatters
๐’ฎ, ๐ซ if for every ๐› โˆˆ 0,1 ๐‘š there exists some
function ๐‘“๐› โˆˆ ๐’ž such that
โˆ€๐‘— such that ๐‘๐‘— = 0: ๐‘“๐› ๐‘†๐‘— < ๐‘Ÿ๐‘—
โˆ€๐‘— such that ๐‘๐‘— = 1: ๐‘“๐› ๐‘†๐‘— โ‰ฅ ๐‘Ÿ๐‘—
The pseudo-dimension of ๐’ž
๐‘ƒ๐‘‘๐‘–๐‘š ๐’ž
= max ๐‘š ๐’ž can shatter a tuple ๐’ฎ, ๐ซ of size ๐‘š
Shattering Dimension and Learning
Claim: we only need a number of samples polynomial
1
1
in ,
and ๐‘ƒ๐‘‘๐‘–๐‘š(๐’ž) to ๐œ€, ๐›ฟ -learn a class of real
๐œ€ log ๐›ฟ
functions ๐’ž.
Reverse Engineering a Game
I have a (known) game ๐‘ฃ: 2๐‘ โ†’ โ„
I tell you that it belongs to some class ๐’ž:
- itโ€™s a ๐‘˜-vector WVG
- Itโ€™s a network flow game
- Itโ€™s a succinct MC net
But Iโ€™m not telling you what are the parameters!
Can you recover them? Using active/passive learning