Influence Diagrams for Robust Decision Making in Multiagent Settings Prashant Doshi University of Georgia, USA.
Download ReportTranscript Influence Diagrams for Robust Decision Making in Multiagent Settings Prashant Doshi University of Georgia, USA.
Influence Diagrams for Robust Decision Making in Multiagent Settings Prashant Doshi University of Georgia, USA http://thinc.cs.uga.edu Yifeng Zeng Reader, Teesside Univ. Previously: Assoc Prof., Aalborg Univ. Muthu Chandrasekaran Yingke Chen Doctoral student Post doctoral student Influence diagram Ri Ai S Oi ID for decision making where state may be partially observable How do we generalize IDs to multiagent settings? Adversarial tiger problem Growli Open or Listeni Ri Tiger loc Growlj Open or Listenj Rj MAIDs offer representation for of a game and may Multiagent influence diagram (MAID) A strategy of aanricher agent is an assignment a decision rule be transformed into a normalorof extensive-form to every decision node that agent game (Koller&Milch01) Growli Open or Listeni Ri Tiger loc Growlj Open or Listenj Rj A strategy profile in Nash equilibrium if each Expected utility of ais strategy profile to agent i is agent’s the sum strategy the profile is optimal given strategies of the in expected utilities at each of i’sothers’ decision node Strategic relevance Consider two strategy profiles which differ in the decision rule at D’ only. A decision node, D, strategically relies on another, D’, if D‘s decision rule does not remain optimal in both profiles. Is there a way of finding all decision nodes that are strategically relevant to D using the graphical structure? Yes, s-reachability Analogous to d-separation for determining conditional independence in BNs Growli Open or Listeni Ri Tiger loc Growlj Open or Listenj Rj Evaluating whether a decision rule at D is optimal in a given strategy profile involves removing decision nodes that are not s-relevant to D and transforming the decision and utility nodes into chance nodes What if the agents are using differing models of the same game to make decisions, or are uncertain about the mental models others are using? Growli Open or Listeni Ri Tiger loc Growlj Open or Listenj Rj Let agent i believe with probability, p, that j will listen and with 1- p that j will do the best response decision Analogously, j believes that i will open a door with probability q, otherwise play the best response Top-level q Open Open Listen p Listen Network of ID (NID) L OL OR L OL OR 0.9 0.05 0.05 0.1 0.45 0.45 Block L Block O (Gal&Pfeffer08) Let agent i believe with probability, p, that j will likely listen and with 1- p that j will do the best response decision Analogously, j believes that i will mostly open a door with probability q, otherwise play the best response Growli Open or Listeni Ri Tiger loc Growlj Open or Listenj Rj Top-level Block -- MAID Let agent i believe with probability, p, that j will likely listen and with 1- p that j will do the best response decision Analogously, j believes that i will mostly open a door with probability q, otherwise play the best response Mod[j;Di] GrowlTLi RTLi Mod[i;Dj] GrowlTLj BR[j]TL BR[i]TL Open or ListenTLi OpenO Tiger locTL Open or ListenTLj RTLj MAID representation for the NID ListenL MAIDs and NIDs Rich languages for games based on IDs that models problem structure by exploiting conditional independence MAIDs and NIDs Focus is on computing equilibrium, which does not allow for best response to a distribution of non-equilibrium behaviors Do not model dynamic games Generalize IDs to dynamic interactions in multiagent settings Challenge: Other agents could be updating beliefs and changing strategies Open or Listeni Model node: Mj,l-1 models of agent j at level l-1 Ri Tiger loci Open or Listenj Growli Mj,l-1 Level l I-ID Policy link: dashed arrow Distribution over the other agent’s actions given its models Belief on Mj,l-1: Pr(Mj,l-1|s) Open or Listenj Mj,l-1 S Different chance nodes are solutions of models mj,l-1 Mod[Mj] represents the different models of agent j Mod[Mj] mj,l-11 Members of the model node A j1 mj,l-12 A j2 mj,l-11, mj,l-12 could be I-IDs , IDs or simple distributions Mj,l-1 Aj S Mod[Mj] mj,l-11 Assumes the distribution of each of the action nodes (Aj1, Aj2) depending on the value of Mod[Mj] A j1 mj,l-12 CPT of the chance node Aj is a multiplexer A j2 Could I-IDs be extended over time? We must address the challenge Ri Ri Ait+1 Ait A jt Ajt+1 St St+1 Oit Oit+1 Mj,l-1t Mj,l-1t+1 Model update link Interactive dynamic influence diagram (I-DID) How do we implement the model update link? Mj,l-1t+1 A jt Mj,l-1t Mod[Mjt+1] st mj,l-1t+1,1 Mod[Mjt] Oj A j1 A j2 Ajt+1 mj,l-1 mj,l-1t,2 mj,l-1t+1,2 mj,l-1t+1,3 t,1 Oj1 mj,l-1t+1,4 Oj2 A j4 A j3 A j1 A j2 A jt Mj,l-1t st Mod[Mjt] A j2 Mj,l-1t+1 Ajt+1 Mod[Mjt+1] mj,l-1t+1,1 TheseOmodels differ in their initial beliefs, mj,l-1t+1,2 A j2 each ofj which is the result of j updating its mj,l-1t+1,3 t,1 mj,l-1 beliefs due to its actions and possible A j3 observations A j1 Oj1 mj,l-1t+1,4 mj,l-1t,2 A j4 Oj2 A j1 Recap Ya’akov Gal Prashant Daphne Koller Doshi, andand Avi Yifeng Brian Pfeffer, Zeng Milch, “Networks and“Multi-Agent Qiongyu of Influence Chen, “GraphicalDiagrams Influence Diagrams: AModels Formalism for forInteractive Representing for Representing POMDPs: andAgent’s Solving Representations Games”, Beliefs and Games Decision-Making and andEconomic Solutions”, Processes”,Journal Behavior, Journal of 45(1):181AAMAS, of AI 18(3):376-416, 221, Research, 2003 33:109-147, 2009 2008 How large is the behavioral model space? How large is the behavioral model space? General definition A mapping from the agent’s history of observations to its actions How large is the behavioral model space? 2H (Aj) Uncountably infinite How large is the behavioral model space? Let’s assume computable models Countable A very large portion of the model space is not computable! Daniel Dennett Philosopher and Cognitive Scientist Intentional stance Ascribe beliefs, preferences and intent to explain others’ actions (analogous to theory of mind - ToM) Organize the mental models Intentional models Subintentional models Organize the mental models Intentional models E.g., POMDP = bj, Aj, Tj, j, Oj, Rj, OCj Frame (using DIDs) (may give rise to recursive modeling) BDI, ToM Subintentional models Organize the mental models Intentional models E.g., POMDP = bj, Aj, Tj, j, Oj, Rj, OCj Frame (using DIDs) BDI, ToM Subintentional models E.g., (Aj), finite state controller, plan Finite model space grows as the interaction progresses Growth in the model space Other agent may receive any one of |j| observations |Mj| |Mj||j| |Mj||j|2 ... |Mj||j|t 0 1 2 t Growth in the model space Exponential General model space is large and grows exponentially as the interaction progresses It would be great if we can compress this space! No loss in value to the modeler Lossless Flexible Lossy loss in value for greater compression Expansive usefulness of model space compression to many areas: 1. Sequential decision making in multiagent settings using I-DIDs 2. Bayesian plan recognition 3. Games of imperfect information General and domain-independent approach for compression Establish equivalence relations that partition the model space and retain representative models from each equivalence class Approach #1: Behavioral equivalence (Rathanasabapathy et al.06,Pynadath&Marsella07) Intentional models whose complete solutions are identical are considered equivalent Approach #1: Behavioral equivalence Behaviorally minimal set of models Approach #1: Behavioral equivalence Lossless Works when intentional models have differing frames Approach #1: Behavioral equivalence Impact on I-DIDs in multiagent settings Multiagent tiger Multiagent tiger Multiagent MM Approach #1: Behavioral equivalence Utilize model solutions (policy trees) for mitigating model growth Model reps that are not BE may become BE next step onwards Preemptively identify such models and do not update all of them Thank you for your time Approach #2: Revisit BE (Zeng et al.11,12) Intentional models whose partial depth-d solutions are identical and vectors of updated beliefs at the leaves of the partial trees are identical are considered equivalent Lossless if frames are identical Sufficient but not necessary Approach #2: (,d)-Behavioral equivalence Two models are (,d)-BE if their partial depth-d solutions are identical and vectors of updated beliefs at the leaves of the partial trees differ by Models are (0.33,1)-BE Lossy Approach #2: -Behavioral equivalence Lemma (Boyen&Koller98): KL divergence between two distributions in a discrete Markov stochastic process reduces or remains the same after a transition, with the mixing rate acting as a discount factor Mixing rate represents the minimal amount by which the posterior distributions agree with each other after one transition Property of a problem and may be pre-computed Approach #2: -Behavioral equivalence Given the mixing rate and a bound, , on the divergence between two belief vectors, lemma allows computing the depth, d, at which the bound is reached Compare two solutions up to depth d for equality Approach #2: -Behavioral equivalence Impact on dt-planning in multiagent settings Discount factor F = 0.5 Multiagent Concert On a UAV reconnaissance problem in Multiagent Concert a 5x5 grid, allows the solution to scale to a 10 step look ahead in 20 minutes Approach #2: -Behavioral equivalence What is the value of d when some problems exhibit F with a value of 0 or 1? F=1 implies that the KL divergence is 0 after one step: Set d = 1 F=0 implies that the KL divergence does not reduce: Arbitrarily set d to the horizon Approach #3: Action equivalence (Zeng et al.09,12) Intentional or subintentional models whose predictions at time step t (action distributions) are identical are considered equivalent at t Approach #3: Action equivalence Approach #3: Action equivalence Lossy Works when intentional models have differing frames Approach #3: Action equivalence Impact on dt-planning in multiagent settings AE bounds the model space at each time Multiagent tiger step to the number of distinct actions Approach #4: Influence equivalence (related to Witwicki&Durfee11) Intentional or subintentional models whose predictions at time step t influence the subject agent’s plan identically are considered equivalent at t Regardless of whether the other agent opened the left or right door, the tiger resets thereby affecting the agent’s plan identically Approach #4: Influence equivalence Influence may be measured as the change in the subject agent’s belief due to the action Group more models at time step t compared to AE Lossy Compression due to approximate equivalence may violate ACC Regain ACC by appending a covering model to the compressed set of representatives Open questions N > 2 agents Under what conditions could equivalent models belonging to different agents be grouped together into an equivalence class? Can we avoid solving models by using heuristics for identifying approximately equivalent models? Modeling Strategic Human Intent Yifeng Zeng Reader, Teesside Univ. Previously: Assoc Prof., Aalborg Univ. Adam Goodie Professor of Psychology, UGA Yingke Chen Doctoral student Hua Mao Doctoral student Muthu Chandrasekaran Xia Qu Doctoral student Doctoral student Roi Ceren Doctoral student Matthew Meisel Doctoral student Computational modeling of probability judgment in stochastic games Computational modeling of human recursive thinking in sequential games Human strategic reasoning is generally hobbled by low levels of recursive thinking (Stahl&Wilson95,Hedden&Zhang02,Camerer et al.04,Ficici&Pfeffer08) (I think what you think that I think...) You are Player I and II is human. Will you move or stay? Player to move: I II Move Stay Payoff for I: 3 Payoff for II: 1 I Move Stay 1 3 II Move Stay 2 4 4 2 Less than 40% of the sample population performed the rational action! Thinking about how others think (...) is hard in general contexts Player to move: I II Move Stay Payoff for I: 0.6 (Payoff for II is 1 – decimal) I Move Stay 0.4 II Move Stay 0.2 0.8 About 70% of the sample population performed the rational action in this simpler and strictly competitive game Simplicity, competitiveness and embedding the task in intuitive representations seem to facilitate human reasoning (Flobbe et al.08, Meijering et al.11, Goodie et al.12) Myopic opponents default to staying (level 0) while predictive opponents think about the player’s decision (level 1) 3-stage game Can we computationally model these strategic behaviors using process models? Yes! Using a parameterized Interactive POMDP framework Notice that the achievement score increases as more games are played indicating learning of the opponent models Learning is slow and partial Replace I-POMDP’s normative Bayesian belief update with Bayesian learning that underweights evidence, parameterized by Notice the presence of rationality errors in the participants’ choices (action is inconsistent with prediction) Errors appear to reduce with time Replace I-POMDP’s normative expected utility maximization with quantal response model that selects actions proportional to their utilities, parameterized by Underweighting evidence during learning and quantal response for choice have prior psychological support Use participants’ predictions of other’s action to learn and participants’ actions to learn Use participants’ actions to learn both and Let vary linearly Insights revealed by process modeling: 1. Much evidence that participants did not make rote use of BI, instead engaged in recursive thinking 2. Rationality errors cannot be ignored when modeling human decision making and they may vary 3. Evidence that participants’ could be attributing surprising observations of others’ actions to their rationality errors Open questions: 1. What is the impact on strategic thinking if action outcomes are uncertain? 2. Is there a damping effect on reasoning levels if participants need to concomitantly think ahead in time Suite of general and domain-independent approaches for compressing agent model spaces based on equivalence Computational modeling of human behavioral data pertaining to strategic thinking 2. Bayesian plan recognition under uncertainty Plan recognition literature has paid scant attention to finding general ways of reducing the set of feasible plans (Carberry, 01) 3. Games of imperfect information (Bayesian games) Real-world applications often involve many player types Examples • Ad hoc coordination in a spontaneous team • Automated Poker player agent 3. Games of imperfect information (Bayesian games) Real-world applications often involve many player types Model space compression facilitates equilibrium computation