Structured Models for Multi-Agent Interactions Daphne Koller Stanford University Joint work with Brian Milch, U.C.

Download Report

Transcript Structured Models for Multi-Agent Interactions Daphne Koller Stanford University Joint work with Brian Milch, U.C.

Structured Models for
Multi-Agent Interactions
Daphne Koller
Stanford University
Joint work with Brian Milch, U.C. Berkeley
Scaling Up
• Question:
– Modeling and solving small games is already hard
– How can we scale up to larger ones?
• Answer:
– Real-world situations have a lot of structure
– Otherwise people wouldn’t be able to handle them
• Goal: construct
– languages based on structured representations,
allowing compact models of complex situations
– algorithms that exploit this structure to support
effective reasoning
strategies of player I
Representations of Games
strategies of player II
• Normal form
– basic units: strategies
– game representation loses
all structure
– matrix size exponentially
larger than game tree
• Extensive form
– basic units: events
– game structure explicitly
encodes time, information
– game tree size can still be
very large
Representation & Inference
Minimax linear program for two-player zero-sum games
Applied to abstract 2-player poker [Koller + Pfeffer]
30
Normal form
Sequence form
solution time (sec)
25
20
15
10
5
0
0
5000
10000
15000
size of tree
[Romanovskii, 1962; Koller, Megiddo & von Stengel, 1994]
20000
MAID Representation
Resource
Allocation
Commission
Cost
A Sales
Strategy
B Sales
Strategy
Sales-A
Sales-B
Commission
Revenue
• MAID form
–
–
–
–
basic units: variables & dependencies between them
game structure explicitly encodes time, information, independence
can be exponentially smaller than game tree
game structure supports new forms of decomposition & backward
inductions
• solving can be exponentially more efficient than extensive form
Outline
 Probabilistic Reasoning: Bayesian networks
[Pearl, Jensen, …]
 Influence Diagrams
 Strategic Relevance
 Exploiting Structure for Solving Games
Probability Distributions
• Probabilistic model (e.g., a la Savage):
– set of possible states in which the world can be;
– probability distribution over this space.
• State: assignment of values to variables
– diseases, symptoms, predisposing factors, …
• Problem:
– n variables  2n states (or more);
– representing the joint distribution is infeasible.
Bayesian Network
B E P(A | B, E)
| B,E)0.2
b eP(A0.8
a function
b e 0.6 0.4
Burglary
Earthquake
Val(B,E)  (Val(A))
b e
b e
0.2 0.8
0.01 0.99
Alarm
PhoneCall
Newscast
nodes = random variables
edges = direct probabilistic influence
Network structure encodes conditional independencies:
Phone-Call is independent of Burglary given Alarm
BN Semantics: Probability Model
B
E
A
C
N
qualitative
BN structure
local
full joint
+ probability = distribution
models
over domain
P(b , e, a, c, n )  P(b ) P(e) P(a | b , e)
P(c | a) P(n | e)
• Compact & natural representation:
– nodes have  k parents  2kn vs. 2n parameters
– parameters natural and easy to elicit.
BN Semantics: Independencies
• The graph structure of the BN implies a set of
conditional independence assumptions
– satisfied by every distribution over this graph
B
B
E
E
A
A
C
C
N
N
Burglary and Earthquake independent
Burglary and Call independent given Alarm
Newscast and Alarm independent given Earthquake
BN Semantics: Dependencies
• BN structure also specifies potential dependencies
– those that might hold for some distribution over graph
B
B
E
E
A
A
C
C
N
N
• Burglary and Earthquake dependent given Alarm
Active paths
B, C can be dependent
B, C are independent
given A
B, C can be dependent
given A,D
A
B
C
D
• Probabilistic influence “flows” along “active” paths
• “d-separation” if there is no active path
Simple linear-time algorithm for testing conditional
independence using only graphical structure:
• Sound: d-separation  independence for all P
• Complete: no d-separation  dependence for
almost all P
CPCS
 21000 states
Bayesian Networks
• Explicit representation of domain structure
• Cognitively intuitive compact models of complex
domains
• Same model allows relevant probabilities to be
computed in any evidence state
• Algorithms that exploit structure for effective
inference even in very large models
Outline
 Probabilistic Reasoning: Bayesian networks
 Influence Diagrams
[Howard, Shachter, Jensen, …]
 Strategic Relevance
 Exploiting Structure for Solving Games
Example: The Tree Killer
• Alice wants a patio, but the benefit outweighs the
cost only if she gets an ocean view
• Bob’s tree blocks her view
• Alice chooses whether to poison the tree
• Tree may become sick
• Bob chooses whether to call a tree doctor
– Alice can see whether tree doctor comes
• Alice chooses whether to build her patio
• Tree may die when winter comes
Standard Representation:
Game Tree
Poison Tree?
Tree Sick?
Call Tree Doctor?
Build Patio?
Tree Dead?
5 levels; 25 = 32 terminal nodes
Multi-Agent Influence Diagrams
(MAIDs)
Influence diagram representation easily extended to
multiple agents
Spike
Tree
TreeSick
“Tree killer”
example
Build
Patio
Tree
Doctor
Cost
TreeDead
Tree
View
Decision Nodes
• Incoming edges
are information edges
– variables whose values the agent knows when deciding
– agent’s strategy can depend on values of parents
• Each parent instantiation
– u  Val(Parents(D))
is an information set
• Perfect recall: if D1 precedes D2
– at D2 agent remembers:
• his decision at D1
• everything he knew at D1
– formally: {D1,Parents(D1)}  Parents(D2)
– usually perfect recall edges are implicit, not drawn
Spike
Tree
TreeSick
Tree
Doctor
Build
Patio
Strategies
• Strategy  at D:
– A pure (deterministic) strategy specifies an action at D for
every information set u
– A behavior strategy specifies a distribution over actions
for every u
• Strategy  specifies distribution P(D | Parents(D))
– turns a decision node into a chance node
– information parents play exactly the same role as parents
of chance node
MAID Semantics
• MAID M defines a set of possible strategy profiles
• M plus any strategy profile  defines a BN M[]
– Each decision node D becomes a chance node, with
[D] as its CPD
• M[] defines a probability distribution, from which
we can derive an expected utility for each agent:
EUa ( ) 
PM  (U  u )  u


U
u Val U
Ua 
( )
[ ]
• Thus, a MAID defines a mapping from strategy
profiles to expected utility vectors
Readability
P1 Hand
P2 Hand
Bet
Bet
Bet
Flop Cards
Bet
Bet
Bet
Card 4
Bet
Bet
Bet
U
Compactness
Suitability
1W
Suitability
1E
Building
1W
Util 1W
Util 1E
Suitability
2W
Building
2W
Suitability
3W
“Road”
example
Util 2W
Util 2E
Building
1E
Suitability
2E
Building
2E
Suitability
3E
Building
3W
Util 3W
Util 3E
Building
3E
Compactness
• Assume all variables have three values
• Each decision node observes three variables
• Number of information sets per agent: 33 = 27
• Size of MAID:
– n chance nodes of “size” 3
– n decision nodes of “size” 27·3
54n
• Size of game tree:
32n
• Size of normal (matrix) form:
 (327)n
– 2n splits, each over three values
– n players, each with 327 pure strategies
Outline
 Probabilistic Reasoning: Bayesian networks
 Influence Diagrams
 Strategic Relevance
 Exploiting Structure for Solving Games
Optimality and Equilibrium
• Let E be a subset of Da, and let  be a partial
strategy over E
• Is  the best partial strategy for agent a to adopt?
– Depends on decision rules for other decision nodes
•  is optimal for a strategy profile  if for all partial
strategies ’ over E :
EUa ( E , )  EUa ( E , ' )
• A strategy profile  is a Nash equilibrium if for
every agent a, Da is optimal for 
MAIDs and Games
• A MAID is equivalent to a game tree: it defines a
mapping from strategy profiles to payoff vectors
• Finding equilibria in the MAID is equivalent to
finding equilibria in the game tree
• One way to find equilibrium in MAID:
– construct the game tree
– solve the game
Incurs exponential blowup in representation size
• Question: can we find equilibria in a MAID directly?
Local Optimization
• Consider finding a decision rule for a single
decision node D that is optimal for 
• For each instantiation pa of Pa(D), must find P*
that maximizes:

d Val D

( )
P * (d | pa ) 
PM  (U  u | d , pa )  u

u Val U
U Ua 
( )
[ ]
• Some decision rules in  may not affect this
maximization problem
Strategic Relevance
• Intuitively, D relies on D’ if we need to know the
decision rule for D’ in order to determine the
optimal decision rule for D.
• We define a relevance graph, with:
– a node for each decision
– an edge from D to D’ if D relies on D’
D
D’
Examples I: Information
D
D
D’
U
D’
D
D
D’
D’
U
U
simultaneous
move
don’t
care
U
perfect
info
perfect
enough
D
D
D
D
D’
D’
D’
D’
U
Examples II: Simple Card Game
Deal
Bet1
Bet2
Bet1
Bet2
U
Bet2 relies on Bet1 even though Bet2 observes Bet1
– Bet2 can depend on Deal
– Deal influences U
– Need probability model of Bet2 to derive posterior on
Deal and compute expectation over U
Decision D can require D’ even if D’ is observed at D !
Examples III: Decoupled Utilities
Deal
Bet1
Bet1
Bet2
Bet2
U
U
Bet2 relies on Bet1 even without influence on utility
– Bet2 can depend on Deal
– Deal influences U
– Need probability model of Bet2 to derive posterior on
Deal and compute expectation over U
Examples IV: Tree Killer
Poison
Tree
Build
Patio
Tree
Doctor
TreeSick
Build
Patio
Cost
TreeDead
Tree
Poison
Tree
View
Tree
Doctor
s-Reachability
given
Dˆ'
D relies on D’
(D’ relevant
to D)
D
exists
U
D’
CPD of D’
influences
P(U | D,Pa(D))
D
U
• D’ is s-reachable from D if there is some
among the descendants of D, such that if a new
parent Dˆ' were added to D, there would be an
active path from Dˆ' to U given D and Pa(D).
s-Reachability
Nodes that D relies on are the nodes that are
s-reachable from D.
Theorem: s-reachability is sound & complete for
strategic relevance
• Sound: no s-reachability  strategic irrelevance  P,U
• Complete: s-reachability  relevance for some P,U
Theorem: Can build the relevance graph in
quadratic time using only structure of MAID
Outline
 Probabilistic Reasoning: Bayesian networks
 Influence Diagrams
 Strategic Relevance
 Exploiting Structure for Solving Games
Intuition: Backward Induction
• D’ observes D
• Can optimize decision rule at D’
without knowing decision rule at D
• Having optimized D, can optimize D’
D
D
D’
D’
U
• D doesn’t care about D’
• Can optimize decision rule at D
without knowing decision rule at D’
• Having optimized D’ , can optimize D
D
D’
U
U
D
D’
Generalized Backward Induction
Idea: Solve decisions by order of relevance graph
Generalized Backward Induction:
• Choose decision node D that relies on no other
• Find optimal strategy for D by maximizing its
local expected utility
• Replace D by chance node
D
D
D
D’
U
D’
D’
U
U
D
D’
Finding Equilibria:
Acyclic Relevance Graphs
D1
D2
…
Dn-1
D
Dnn
• Choose any strategy profile  for D1,…,Dn-1
• Derive decision rule  for Dn that is optimal for 
• Node Dn does not rely on preceding ones
  is optimal for any other strategy profile as well!
• We can now set  as CPD for Dn
• And continue by optimizing Dn-1
Generalized Backward Induction
• Given topological sort D1,…,Dn of relevance graph:
• Begin with arbitrary fully mixed strategy profile 
• For i = n down to 1:
– Find decision rule  for Di that is optimal for 
• Decision rules at previous decisions fixed earlier
• Decision rules at subsequent decisions irrelevant
– Let (Di) = 
Theorem: If the relevance graph of a MAID is acyclic,
it can be solved by generalized backward induction,
and the result is a pure-strategy Nash equilibrium
When is the
Relevance Graph Acyclic?
• Single-agent influence diagrams with perfect recall
• Multi-agent games with perfect information
• Some games with imperfect information
– e.g., Tree Killer example
But in many MAIDs the relevance graph has cycles…
Cyclic Relevance Graphs
Question: What if the relevance graph is cyclic?
• Strongly connected component (SCC):
– maximal subgraph s.t.  directed path between every
pair of nodes
• The decisions in each SCC require each other
– They must be optimized together
• Different SCCs can be solved separately
Generalized Backward Induction
Given topological sort C1,…,Cm of SCCs in relevance
graph:
• Begin with arbitrary fully mixed strategy profile 
• For i = m down to 1:
– Construct reduced MAID M[-Ci]
• Strategies for previous SCCs selected before
• Strategies for subsequent SCCs irrelevant
– Create game tree for M[-Ci]
– Use game solver to find equilibrium strategy
profile  for Ci in this reduced game
– Let (Ci) = 
Theorem: If find equilibrium for each SCC, the result
is equilibrium for whole game
“Road” Relevance Graph
1W
1E
2W
2E
3W
3E
Note: Reduced games over SCCs are not subgames!
Experiment: “Road” Example
Running Time of Backward Induction Algorithm
Time (seconds)
600
500
400
300
200
100
0
0
10
20
30
Number of Plots of Land
Reminder, for n=4:
Tree size: 6561 nodes
Matrix size: 4.71027
For n=40:
Tree size: 1.47 1038 nodes
40
Cutting Cycles
D
• Idea: enumerate possible values d for some decision D
– Once we determine D, residual MAID has acyclic relevance graph
– Solve residual MAID using generalized backward induction
– Check whether combined strategy with d is an equilibrium
• May need to instantiate several decision nodes to cut cycle
• Can deal with each SCC separately
Theorem: Can find all pure strategy equilibria in time
linear in # of SCCs, exponential in max # of
decisions required to cut all loops in component
Irrelevant Information
What if B can observe A’s decision
completely irrelevant to him
Resource
Commission
Cost
A Sales
B Sales
Sales-A
Sales-B
Commission
Revenue
• We can automatically
– analyze relevance based on graph structure
– eliminate irrelevant information edges
• In associated tree, safe merging of information sets
• Leads to exponential decrease in # of decisions to
optimize in influence diagram!
Related Work
• Suryadi and Gmytrasiewicz (1999) use multi-agent
influence diagrams, but with recursive modeling
• Milch and Koller (2000) use the MAID
representation described here, but have no
algorithm for finding equilibria
• Nilsson and Lauritzen (2000) discuss limited
memory influence diagrams (LIMIDs) and derive
s-reachability, but do not apply it to multi-agent case
• La Mura (2000) proposes game networks, with an
undirected notion of strategic dependence
Future Work
• Take advantage of structure within SCCs
• Represent asymmetric scenarios
compactly
• Detect irrelevant observations
Computational Game Theory
Game theory: Past
• Expert analysis of:
– “Prototypical” examples
that highlight key issues
– Abstracted problems for
big organizations
• Simplified examples
– small enough to be
analyzed by hand
Game theory: Future
• Autonomous agents
interacting economically
• Decision support systems
for consumers
• Complex problems:
– many relevant variables
– interacting decisions
Goals: Make game theory
• a broadly usable tool even for lay people
• a formal basis for interacting autonomous agents
by allowing real-world games to be easily represented
and solved.
Conclusions
• Multi-agent influence diagrams:
– compact intuitive language for multi-agent interactions
– basic units: variables rather than strategies or events
• MAIDs make explicit structure that is lost in game
trees
• Can exploit structure to find equilibria efficiently
– sometimes exponentially faster than existing algorithms
• Exciting question:
What else does structure buy us?
http://robotics.stanford.edu/~koller
[email protected]