Influence Diagrams for Robust Decision Making in Multiagent Settings Prashant Doshi University of Georgia, USA.

Download Report

Transcript Influence Diagrams for Robust Decision Making in Multiagent Settings Prashant Doshi University of Georgia, USA.

Influence Diagrams for Robust Decision
Making in Multiagent Settings
Prashant Doshi
University of Georgia, USA
http://thinc.cs.uga.edu
Yifeng Zeng
Reader, Teesside Univ.
Previously: Assoc Prof., Aalborg
Univ.
Muthu Chandrasekaran Yingke Chen
Doctoral student Post doctoral student
Influence diagram
Ri
Ai
S
Oi
ID for decision
making where state
may be partially
observable
How do we generalize IDs
to multiagent settings?
Adversarial tiger problem
Growli
Open or
Listeni
Ri
Tiger
loc
Growlj
Open or
Listenj
Rj
MAIDs
offer
representation
for of
a game
and may
Multiagent
influence
diagram
(MAID)
A
strategy
of aanricher
agent
is an assignment
a decision
rule
be transformed
into
a normalorof
extensive-form
to every
decision
node
that agent game
(Koller&Milch01)
Growli
Open or
Listeni
Ri
Tiger
loc
Growlj
Open or
Listenj
Rj
A strategy
profile
in Nash equilibrium
if each
Expected
utility
of ais strategy
profile to agent
i is agent’s
the sum
strategy
the profile
is optimal
given
strategies
of the in
expected
utilities
at each
of i’sothers’
decision
node
Strategic relevance
Consider two strategy profiles which differ in the
decision rule at D’ only. A decision node, D,
strategically relies on another, D’, if D‘s decision rule
does not remain optimal in both profiles.
Is there a way of finding all decision
nodes that are strategically relevant to
D using the graphical structure?
Yes, s-reachability
Analogous to d-separation for determining
conditional independence in BNs
Growli
Open or
Listeni
Ri
Tiger
loc
Growlj
Open or
Listenj
Rj
Evaluating whether a decision rule at D is optimal in a
given strategy profile involves removing decision nodes
that are not s-relevant to D and transforming the decision
and utility nodes into chance nodes
What if the agents are using differing models of
the same game to make decisions, or are
uncertain about the mental models
others are using?
Growli
Open or
Listeni
Ri
Tiger
loc
Growlj
Open or
Listenj
Rj
Let agent i believe with probability, p, that j will listen and with
1- p that j will do the best response decision
Analogously, j believes that i will open a door with probability q,
otherwise play the best response
Top-level
q
Open
Open
Listen
p
Listen
Network of ID (NID)
L
OL
OR
L
OL
OR
0.9
0.05
0.05
0.1
0.45
0.45
Block L
Block O
(Gal&Pfeffer08)
Let agent i believe with probability, p, that j will likely listen and
with 1- p that j will do the best response decision
Analogously, j believes that i will mostly open a door with
probability q, otherwise play the best response
Growli
Open or
Listeni
Ri
Tiger
loc
Growlj
Open or
Listenj
Rj
Top-level Block -- MAID
Let agent i believe with probability, p, that j will likely listen and
with 1- p that j will do the best response decision
Analogously, j believes that i will mostly open a door with
probability q, otherwise play the best response
Mod[j;Di]
GrowlTLi
RTLi
Mod[i;Dj]
GrowlTLj
BR[j]TL
BR[i]TL
Open or
ListenTLi
OpenO
Tiger
locTL
Open or
ListenTLj
RTLj
MAID representation for the NID
ListenL
MAIDs and NIDs
Rich languages for games based on IDs that
models problem structure by exploiting
conditional independence
MAIDs and NIDs
Focus is on computing equilibrium,
which does not allow for best response
to a distribution of non-equilibrium behaviors
Do not model dynamic games
Generalize IDs to dynamic interactions
in multiagent settings
Challenge: Other agents could be
updating beliefs and changing strategies
Open or
Listeni
Model node: Mj,l-1
models of agent j at level
l-1
Ri
Tiger
loci
Open or
Listenj
Growli
Mj,l-1
Level l I-ID
Policy link: dashed arrow
Distribution over the
other agent’s actions
given its models
Belief on Mj,l-1: Pr(Mj,l-1|s)
Open or
Listenj
Mj,l-1
S
Different chance nodes are
solutions of models mj,l-1
Mod[Mj] represents the
different models of agent j
Mod[Mj]
mj,l-11
Members of the model node
A j1
mj,l-12
A j2
mj,l-11, mj,l-12 could be I-IDs , IDs or simple distributions
Mj,l-1
Aj
S
Mod[Mj]
mj,l-11
Assumes the distribution of each
of the action nodes (Aj1, Aj2)
depending on the value of
Mod[Mj]
A j1
mj,l-12
CPT of the chance node Aj is a
multiplexer
A j2
Could I-IDs be extended over time?
We must address the challenge
Ri
Ri
Ait+1
Ait
A jt
Ajt+1
St
St+1
Oit
Oit+1
Mj,l-1t
Mj,l-1t+1
Model update link
Interactive dynamic influence diagram
(I-DID)
How do we implement the model
update link?
Mj,l-1t+1
A jt
Mj,l-1t
Mod[Mjt+1]
st
mj,l-1t+1,1
Mod[Mjt]
Oj
A j1
A j2
Ajt+1
mj,l-1
mj,l-1t,2
mj,l-1t+1,2
mj,l-1t+1,3
t,1
Oj1
mj,l-1t+1,4
Oj2
A j4
A j3
A j1
A j2
A jt
Mj,l-1t
st
Mod[Mjt]
A j2
Mj,l-1t+1
Ajt+1
Mod[Mjt+1]
mj,l-1t+1,1
TheseOmodels differ in their initial beliefs,
mj,l-1t+1,2
A j2
each ofj which is the result of j updating
its
mj,l-1t+1,3
t,1
mj,l-1 beliefs due to its actions and possible
A j3
observations
A j1
Oj1
mj,l-1t+1,4
mj,l-1t,2
A j4
Oj2
A j1
Recap
Ya’akov Gal
Prashant
Daphne
Koller
Doshi,
andand
Avi
Yifeng
Brian
Pfeffer,
Zeng
Milch,
“Networks
and“Multi-Agent
Qiongyu
of Influence
Chen,
“GraphicalDiagrams
Influence
Diagrams:
AModels
Formalism
for
forInteractive
Representing
for Representing
POMDPs:
andAgent’s
Solving
Representations
Games”,
Beliefs
and
Games
Decision-Making
and
andEconomic
Solutions”,
Processes”,Journal
Behavior,
Journal of
45(1):181AAMAS,
of AI
18(3):376-416,
221,
Research,
2003 33:109-147,
2009 2008
How large is the behavioral model space?
How large is the behavioral model space?
General definition
A mapping from the agent’s history of
observations to its actions
How large is the behavioral model space?
2H  (Aj)
Uncountably infinite
How large is the behavioral model space?
Let’s assume computable models
Countable
A very large portion of the model space is not computable!
Daniel Dennett
Philosopher and Cognitive Scientist
Intentional stance
Ascribe beliefs, preferences and intent to
explain others’ actions
(analogous to theory of mind - ToM)
Organize the mental models
Intentional models
Subintentional models
Organize the mental models
Intentional models
E.g., POMDP  =  bj, Aj, Tj, 
j, Oj, Rj, OCj 
Frame
(using DIDs)
(may give rise to recursive modeling)
BDI, ToM
Subintentional models
Organize the mental models
Intentional models
E.g., POMDP  =  bj, Aj, Tj, 
j, Oj, Rj, OCj 
Frame
(using DIDs)
BDI, ToM
Subintentional models
E.g., (Aj), finite state controller, plan
Finite model space grows as the interaction
progresses
Growth in the model space
Other agent may receive any one of
|j| observations
|Mj|  |Mj||j|  |Mj||j|2  ...  |Mj||j|t
0
1
2
t
Growth in the model space
Exponential
General model space is large and grows
exponentially as the interaction progresses
It would be great if we can compress
this space!
No
loss in value to the modeler
Lossless
Flexible
Lossy loss in value for greater compression
Expansive usefulness of model space
compression to many areas:
1. Sequential decision making in multiagent
settings using I-DIDs
2. Bayesian plan recognition
3. Games of imperfect information
General and domain-independent
approach for compression
Establish equivalence relations that partition the model
space and retain representative models from each
equivalence class
Approach #1: Behavioral equivalence
(Rathanasabapathy et al.06,Pynadath&Marsella07)
Intentional models whose complete solutions
are identical are considered equivalent
Approach #1: Behavioral equivalence
Behaviorally minimal set of models
Approach #1: Behavioral equivalence
Lossless
Works when intentional models have
differing frames
Approach #1: Behavioral equivalence
Impact on I-DIDs in multiagent settings
Multiagent tiger
Multiagent tiger
Multiagent MM
Approach #1: Behavioral equivalence
Utilize model solutions (policy trees) for
mitigating model growth
Model reps that are not BE may become BE next step onwards
Preemptively identify such models and do not update all of them
Thank you for your time
Approach #2: Revisit BE
(Zeng et al.11,12)
Intentional models whose partial depth-d solutions are identical
and vectors of updated beliefs at the leaves of the partial trees
are identical are considered equivalent
Lossless if frames are identical
Sufficient but not necessary
Approach #2: (,d)-Behavioral equivalence
Two models are (,d)-BE if their partial depth-d solutions are
identical and vectors of updated beliefs at the leaves of the
partial trees differ by 
Models are
(0.33,1)-BE
Lossy
Approach #2: -Behavioral equivalence
Lemma (Boyen&Koller98): KL divergence between two distributions
in a discrete Markov stochastic process reduces or remains the
same after a transition, with the mixing rate acting as a discount
factor
Mixing rate represents the minimal amount by which the posterior
distributions agree with each other after one transition
Property of a problem and may be pre-computed
Approach #2: -Behavioral equivalence
Given the mixing rate and a bound, , on the
divergence between two belief vectors, lemma allows
computing the depth, d, at which the bound is reached
Compare two solutions up to depth d for equality
Approach #2: -Behavioral equivalence
Impact on dt-planning in multiagent settings
Discount factor
F = 0.5
Multiagent Concert
On a UAV reconnaissance problem in
Multiagent Concert
a 5x5 grid, allows the solution to scale
to a 10 step look ahead in 20 minutes
Approach #2: -Behavioral equivalence
What is the value of d when some problems
exhibit F with a value of 0 or 1?
F=1 implies that the KL divergence is 0 after one step:
Set d = 1
F=0 implies that the KL divergence does not reduce:
Arbitrarily set d to the horizon
Approach #3: Action equivalence
(Zeng et al.09,12)
Intentional or subintentional models whose
predictions at time step t (action distributions)
are identical are considered equivalent at t
Approach #3: Action equivalence
Approach #3: Action equivalence
Lossy
Works when intentional models have
differing frames
Approach #3: Action equivalence
Impact on dt-planning in multiagent settings
AE bounds the model space at each time
Multiagent tiger
step to the number of distinct actions
Approach #4: Influence equivalence
(related to Witwicki&Durfee11)
Intentional or subintentional models whose predictions
at time step t influence the subject agent’s plan
identically are considered equivalent at t
Regardless of whether the other agent opened the left or right door,
the tiger resets thereby affecting the agent’s plan identically
Approach #4: Influence equivalence
Influence may be measured as the change in the
subject agent’s belief due to the action
Group more models at time step t compared to AE
Lossy
Compression due to approximate
equivalence may violate ACC
Regain ACC by appending a covering model
to the compressed set of representatives
Open questions
N > 2 agents
Under what conditions could equivalent
models belonging to different agents be
grouped together into an equivalence class?
Can we avoid solving models by using
heuristics for identifying approximately
equivalent models?
Modeling Strategic Human Intent
Yifeng Zeng
Reader, Teesside Univ.
Previously: Assoc Prof., Aalborg Univ.
Adam Goodie
Professor of Psychology, UGA
Yingke Chen
Doctoral student
Hua Mao
Doctoral student
Muthu Chandrasekaran
Xia Qu
Doctoral student
Doctoral student
Roi Ceren
Doctoral student
Matthew Meisel
Doctoral student
Computational modeling of probability
judgment in stochastic games
Computational modeling of human
recursive thinking in sequential games
Human strategic reasoning is generally
hobbled by low levels of recursive thinking
(Stahl&Wilson95,Hedden&Zhang02,Camerer et al.04,Ficici&Pfeffer08)
(I think what you think that I think...)
You are Player I and II is human. Will you move or stay?
Player to move:
I
II
Move
Stay
Payoff for I: 3
Payoff for II: 1
I
Move
Stay
1
3
II
Move
Stay
2
4
4
2
Less than 40% of the sample population
performed the rational action!
Thinking about how others think (...) is hard
in general contexts
Player to move:
I
II
Move
Stay
Payoff for I: 0.6
(Payoff for II is
1 – decimal)
I
Move
Stay
0.4
II
Move
Stay
0.2
0.8
About 70% of the sample population
performed the rational action in this simpler
and strictly competitive game
Simplicity, competitiveness and embedding the task
in intuitive representations seem to facilitate
human reasoning
(Flobbe et al.08, Meijering et al.11, Goodie et al.12)
Myopic opponents default to staying (level 0) while
predictive opponents think about the player’s
decision (level 1)
3-stage game
Can we computationally model these strategic
behaviors using process models?
Yes! Using a parameterized Interactive
POMDP framework
Notice that the achievement score increases as more games
are played indicating learning of the opponent models
Learning is slow and partial
Replace I-POMDP’s normative Bayesian belief update
with Bayesian learning that underweights evidence,
parameterized by 
Notice the presence of rationality errors in the participants’
choices (action is inconsistent with prediction)
Errors appear to reduce with time
Replace I-POMDP’s normative expected utility
maximization with quantal response model that selects
actions proportional to their utilities, parameterized by 
Underweighting evidence during
learning and quantal response for
choice have prior psychological support
Use participants’ predictions of other’s action
to learn  and participants’ actions to learn 
Use participants’ actions to learn both  and 
Let  vary linearly
Insights revealed by process modeling:
1. Much evidence that participants did not make rote use of BI, instead
engaged in recursive thinking
2. Rationality errors cannot be ignored when modeling human decision
making and they may vary
3. Evidence that participants’ could be attributing surprising
observations of others’ actions to their rationality errors
Open questions:
1. What is the impact on strategic thinking if action outcomes
are uncertain?
2. Is there a damping effect on reasoning levels if participants
need to concomitantly think ahead in time
Suite of general and domain-independent
approaches for compressing agent model
spaces based on equivalence
Computational modeling of human behavioral
data pertaining to strategic thinking
2. Bayesian plan recognition under uncertainty
Plan recognition literature has paid scant attention to
finding general ways of reducing the set of feasible
plans (Carberry, 01)
3. Games of imperfect information (Bayesian games)
Real-world applications often involve many player types
Examples
• Ad hoc coordination in a spontaneous team
• Automated Poker player agent
3. Games of imperfect information (Bayesian games)
Real-world applications often involve many player types
Model space compression facilitates equilibrium
computation