influence diagrams for robust decision making in multiagent settings

Download Influence Diagrams for Robust Decision Making in Multiagent Settings

If you can't read please download the document

Upload: beverley-bruce

Post on 26-Dec-2015

227 views

Category:

Documents


4 download

TRANSCRIPT

  • Slide 1
  • Influence Diagrams for Robust Decision Making in Multiagent Settings
  • Slide 2
  • Prashant Doshi University of Georgia, USA
  • Slide 3
  • http://thinc.cs.uga.edu
  • Slide 4
  • Yingke Chen Post doctoral student Yifeng Zeng Reader, Teesside Univ. Previously: Assoc Prof., Aalborg Univ. Muthu Chandrasekaran Doctoral student
  • Slide 5
  • Influence diagram
  • Slide 6
  • AiAi RiRi OiOi S ID for decision making where state may be partially observable
  • Slide 7
  • How do we generalize IDs to multiagent settings?
  • Slide 8
  • Adversarial tiger problem
  • Slide 9
  • Multiagent influence diagram (MAID) (Koller&Milch01) MAIDs offer a richer representation for a game and may be transformed into a normal- or extensive-form game A strategy of an agent is an assignment of a decision rule to every decision node of that agent Open or Listen i RjRj Growl i Tiger loc Open or Listen j Growl j RiRi
  • Slide 10
  • Expected utility of a strategy profile to agent i is the sum of the expected utilities at each of is decision node A strategy profile is in Nash equilibrium if each agents strategy in the profile is optimal given others strategies Open or Listen i RjRj Growl i Tiger loc Open or Listen j Growl j RiRi
  • Slide 11
  • Strategic relevance Consider two strategy profiles which differ in the decision rule at D only. A decision node, D, strategically relies on another, D, if D s decision rule does not remain optimal in both profiles.
  • Slide 12
  • Is there a way of finding all decision nodes that are strategically relevant to D using the graphical structure? Yes, s-reachability Analogous to d-separation for determining conditional independence in BNs
  • Slide 13
  • Evaluating whether a decision rule at D is optimal in a given strategy profile involves removing decision nodes that are not s-relevant to D and transforming the decision and utility nodes into chance nodes Open or Listen i RjRj Growl i Tiger loc Open or Listen j Growl j RiRi
  • Slide 14
  • What if the agents are using differing models of the same game to make decisions, or are uncertain about the mental models others are using?
  • Slide 15
  • Let agent i believe with probability, p, that j will listen and with 1- p that j will do the best response decision Analogously, j believes that i will open a door with probability q, otherwise play the best response Open or Listen i RjRj Growl i Tiger loc Open or Listen j Growl j RiRi
  • Slide 16
  • Network of ID (NID) Let agent i believe with probability, p, that j will likely listen and with 1- p that j will do the best response decision Analogously, j believes that i will mostly open a door with probability q, otherwise play the best response Listen Open LOLOR 0.90.05 LOLOR 0.10.45 Block L Block O Top-level ListenOpen q p (Gal&Pfeffer08)
  • Slide 17
  • Let agent i believe with probability, p, that j will likely listen and with 1- p that j will do the best response decision Analogously, j believes that i will mostly open a door with probability q, otherwise play the best response Open or Listen i RjRj Growl i Tiger loc Open or Listen j Growl j RiRi Top-level Block -- MAID
  • Slide 18
  • MAID representation for the NID BR[i] TL R TL j Growl TL i Tiger loc TL BR[j] TL Growl TL j R TL i Mod[j; D i ] Open O Open or Listen TL i Mod[i; D j ] Listen L Open or Listen TL j
  • Slide 19
  • MAIDs and NIDs Rich languages for games based on IDs that models problem structure by exploiting conditional independence
  • Slide 20
  • MAIDs and NIDs Focus is on computing equilibrium, which does not allow for best response to a distribution of non-equilibrium behaviors Do not model dynamic games
  • Slide 21
  • Generalize IDs to dynamic interactions in multiagent settings
  • Slide 22
  • Challenge: Other agents could be updating beliefs and changing strategies
  • Slide 23
  • Model node: M j, l -1 models of agent j at level l -1 Policy link: dashed arrow Distribution over the other agents actions given its models Belief on M j, l -1 : Pr(M j, l -1 |s) Open or Listen i RiRi Growl i Tiger loc i Open or Listen j M j, l -1 Level l I-ID
  • Slide 24
  • Members of the model node Different chance nodes are solutions of models m j, l -1 Mod[M j ] represents the different models of agent j Mod[M j ] Aj1Aj1 Aj2Aj2 M j, l -1 S m j, l -1 1 m j, l -1 2 Open or Listen j m j, l -1 1, m j, l -1 2 could be I-IDs, IDs or simple distributions
  • Slide 25
  • CPT of the chance node A j is a multiplexer Assumes the distribution of each of the action nodes (A j 1, A j 2 ) depending on the value of Mod[M j ] Mod[M j ] Aj1Aj1 Aj2Aj2 M j, l -1 S m j, l -1 1 m j, l -1 2 AjAj
  • Slide 26
  • Could I-IDs be extended over time? We must address the challenge
  • Slide 27
  • A i t+1 RiRi O i t+1 S t+1 A j t+1 M j, l -1 t+1 AitAit RiRi OitOit StSt AjtAjt M j, l -1 t Model update link
  • Slide 28
  • Interactive dynamic influence diagram (I-DID)
  • Slide 29
  • How do we implement the model update link?
  • Slide 30
  • m j,l-1 t,2 Mod[M j t ] Aj1Aj1 M j, l -1 t stst m j,l-1 t,1 AjtAjt Aj2Aj2 Oj1Oj1 Oj2Oj2 OjOj Mod[M j t+1 ] Aj1Aj1 M j, l -1 t+1 m j,l-1 t+1,1 m j,l-1 t+1,2 A j t+1 Aj2Aj2 Aj3Aj3 Aj4Aj4 m j,l-1 t+1,3 m j,l-1 t+1,4
  • Slide 31
  • m j,l-1 t,2 Mod[M j t ] Aj1Aj1 M j, l -1 t stst m j,l-1 t,1 AjtAjt Aj2Aj2 Oj1Oj1 Oj2Oj2 OjOj Mod[M j t+1 ] Aj1Aj1 M j, l -1 t+1 m j,l-1 t+1,1 m j,l-1 t+1,2 A j t+1 Aj2Aj2 Aj3Aj3 Aj4Aj4 m j,l-1 t+1,3 m j,l-1 t+1,4 These models differ in their initial beliefs, each of which is the result of j updating its beliefs due to its actions and possible observations
  • Slide 32
  • Recap
  • Slide 33
  • Prashant Doshi, Yifeng Zeng and Qiongyu Chen, Graphical Models for Interactive POMDPs: Representations and Solutions, Journal of AAMAS, 18(3):376-416, 2009 Daphne Koller and Brian Milch, Multi-Agent Influence Diagrams for Representing and Solving Games, Games and Economic Behavior, 45(1):181- 221, 2003 Yaakov Gal and Avi Pfeffer, Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes,Journal of AI Research, 33:109-147, 2008
  • Slide 34
  • How large is the behavioral model space?
  • Slide 35
  • General definition A mapping from the agents history of observations to its actions
  • Slide 36
  • How large is the behavioral model space? 2H (Aj)2H (Aj) Uncountably infinite
  • Slide 37
  • How large is the behavioral model space? Lets assume computable models Countable A very large portion of the model space is not computable!
  • Slide 38
  • Daniel Dennett Philosopher and Cognitive Scientist Intentional stance Ascribe beliefs, preferences and intent to explain others actions (analogous to theory of mind - ToM)
  • Slide 39
  • Organize the mental models Intentional models Subintentional models
  • Slide 40
  • Organize the mental models Intentional models E.g., POMDP = b j, A j, T j, j, O j, R j, OC j (using DIDs) BDI, ToM Subintentional models Frame (may give rise to recursive modeling)
  • Slide 41
  • Organize the mental models Intentional models E.g., POMDP = b j, A j, T j, j, O j, R j, OC j (using DIDs) BDI, ToM Subintentional models E.g., (A j ), finite state controller, plan Frame
  • Slide 42
  • Finite model space grows as the interaction progresses
  • Slide 43
  • Growth in the model space Other agent may receive any one of | j | observations |M j | |M j || j | |M j || j | 2 ... |M j || j | t 012t
  • Slide 44
  • Growth in the model space Exponential
  • Slide 45
  • General model space is large and grows exponentially as the interaction progresses
  • Slide 46
  • It would be great if we can compress this space! No loss in value to the modeler Flexible loss in value for greater compression Lossless Lossy
  • Slide 47
  • Expansive usefulness of model space compression to many areas: 1.Sequential decision making in multiagent settings using I-DIDs 2.Bayesian plan recognition 3.Games of imperfect information
  • Slide 48
  • General and domain-independent approach for compression Establish equivalence relations that partition the model space and retain representative models from each equivalence class
  • Slide 49
  • Approach #1: Behavioral equivalence (Rathanasabapathy et al.06,Pynadath&Marsella07) Intentional models whose complete solutions are identical are considered equivalent
  • Slide 50
  • Approach #1: Behavioral equivalence Behaviorally minimal set of models
  • Slide 51
  • Lossless Works when intentional models have differing frames Approach #1: Behavioral equivalence
  • Slide 52
  • Multiagent tiger Approach #1: Behavioral equivalence Impact on I-DIDs in multiagent settings Multiagent tiger Multiagent MM
  • Slide 53
  • Utilize model solutions (policy trees) for mitigating model growth Approach #1: Behavioral equivalence Model reps that are not BE may become BE next step onwards Preemptively identify such models and do not update all of them
  • Slide 54
  • Thank you for your time
  • Slide 55
  • Intentional models whose partial depth-d solutions are identical and vectors of updated beliefs at the leaves of the partial trees are identical are considered equivalent Approach #2: Revisit BE (Zeng et al.11,12) Sufficient but not necessary Lossless if frames are identical
  • Slide 56
  • Approach #2: ( ,d)-Behavioral equivalence Two models are ( ,d)-BE if their partial depth-d solutions are identical and vectors of updated beliefs at the leaves of the partial trees differ by Models are (0.33,1)-BE Lossy
  • Slide 57
  • Approach #2: -Behavioral equivalence Lemma (Boyen&Koller98): KL divergence between two distributions in a discrete Markov stochastic process reduces or remains the same after a transition, with the mixing rate acting as a discount factor Mixing rate represents the minimal amount by which the posterior distributions agree with each other after one transition Property of a problem and may be pre-computed
  • Slide 58
  • Given the mixing rate and a bound, , on the divergence between two belief vectors, lemma allows computing the depth, d, at which the bound is reached Approach #2: -Behavioral equivalence Compare two solutions up to depth d for equality
  • Slide 59
  • Discount factor F = 0.5 Multiagent Concert Approach #2: -Behavioral equivalence Impact on dt-planning in multiagent settings Multiagent Concert On a UAV reconnaissance problem in a 5x5 grid, allows the solution to scale to a 10 step look ahead in 20 minutes
  • Slide 60
  • What is the value of d when some problems exhibit F with a value of 0 or 1? Approach #2: -Behavioral equivalence F =1 implies that the KL divergence is 0 after one step: Set d = 1 F =0 implies that the KL divergence does not reduce: Arbitrarily set d to the horizon
  • Slide 61
  • Intentional or subintentional models whose predictions at time step t (action distributions) are identical are considered equivalent at t Approach #3: Action equivalence (Zeng et al.09,12)
  • Slide 62
  • Approach #3: Action equivalence
  • Slide 63
  • Lossy Works when intentional models have differing frames Approach #3: Action equivalence
  • Slide 64
  • Impact on dt-planning in multiagent settings Multiagent tiger AE bounds the model space at each time step to the number of distinct actions
  • Slide 65
  • Intentional or subintentional models whose predictions at time step t influence the subject agents plan identically are considered equivalent at t Regardless of whether the other agent opened the left or right door, the tiger resets thereby affecting the agents plan identically Approach #4: Influence equivalence (related to Witwicki&Durfee11)
  • Slide 66
  • Influence may be measured as the change in the subject agents belief due to the action Approach #4: Influence equivalence Group more models at time step t compared to AE Lossy
  • Slide 67
  • Compression due to approximate equivalence may violate ACC Regain ACC by appending a covering model to the compressed set of representatives
  • Slide 68
  • Open questions
  • Slide 69
  • N > 2 agents Under what conditions could equivalent models belonging to different agents be grouped together into an equivalence class?
  • Slide 70
  • Can we avoid solving models by using heuristics for identifying approximately equivalent models?
  • Slide 71
  • Modeling Strategic Human Intent
  • Slide 72
  • Yifeng Zeng Reader, Teesside Univ. Previously: Assoc Prof., Aalborg Univ. Yingke Chen Doctoral student Hua Mao Doctoral student Muthu Chandrasekaran Doctoral student Xia Qu Doctoral student Roi Ceren Doctoral student Matthew Meisel Doctoral student Adam Goodie Professor of Psychology, UGA
  • Slide 73
  • Computational modeling of human recursive thinking in sequential games Computational modeling of probability judgment in stochastic games
  • Slide 74
  • Human strategic reasoning is generally hobbled by low levels of recursive thinking (Stahl&Wilson95,Hedden&Zhang02,Camerer et al.04,Ficici&Pfeffer08) (I think what you think that I think...)
  • Slide 75
  • You are Player I and II is human. Will you move or stay? Move Stay Payoff for I: Payoff for II: 3131 1313 2424 4242 IIII Player to move:
  • Slide 76
  • Less than 40% of the sample population performed the rational action!
  • Slide 77
  • Thinking about how others think (...) is hard in general contexts
  • Slide 78
  • Move Stay Payoff for I: (Payoff for II is 1 decimal) 0.6 0.40.20.8 IIII Player to move:
  • Slide 79
  • About 70% of the sample population performed the rational action in this simpler and strictly competitive game
  • Slide 80
  • Simplicity, competitiveness and embedding the task in intuitive representations seem to facilitate human reasoning (Flobbe et al.08, Meijering et al.11, Goodie et al.12)
  • Slide 81
  • 3-stage game Myopic opponents default to staying (level 0) while predictive opponents think about the players decision (level 1)
  • Slide 82
  • Can we computationally model these strategic behaviors using process models?
  • Slide 83
  • Yes! Using a parameterized Interactive POMDP framework
  • Slide 84
  • Replace I-POMDPs normative Bayesian belief update with Bayesian learning that underweights evidence, parameterized by Notice that the achievement score increases as more games are played indicating learning of the opponent models Learning is slow and partial
  • Slide 85
  • Replace I-POMDPs normative expected utility maximization with quantal response model that selects actions proportional to their utilities, parameterized by Notice the presence of rationality errors in the participants choices (action is inconsistent with prediction) Errors appear to reduce with time
  • Slide 86
  • Underweighting evidence during learning and quantal response for choice have prior psychological support
  • Slide 87
  • Use participants predictions of others action to learn and participants actions to learn
  • Slide 88
  • Use participants actions to learn both and Let vary linearly
  • Slide 89
  • Insights revealed by process modeling: 1.Much evidence that participants did not make rote use of BI, instead engaged in recursive thinking 2.Rationality errors cannot be ignored when modeling human decision making and they may vary 3.Evidence that participants could be attributing surprising observations of others actions to their rationality errors
  • Slide 90
  • Open questions: 1.What is the impact on strategic thinking if action outcomes are uncertain? 2.Is there a damping effect on reasoning levels if participants need to concomitantly think ahead in time
  • Slide 91
  • Suite of general and domain-independent approaches for compressing agent model spaces based on equivalence Computational modeling of human behavioral data pertaining to strategic thinking
  • Slide 92
  • 2. Bayesian plan recognition under uncertainty Plan recognition literature has paid scant attention to finding general ways of reducing the set of feasible plans (Carberry, 01)
  • Slide 93
  • 3. Games of imperfect information (Bayesian games) Real-world applications often involve many player types Examples Ad hoc coordination in a spontaneous team Automated Poker player agent
  • Slide 94
  • 3. Games of imperfect information (Bayesian games) Real-world applications often involve many player types Model space compression facilitates equilibrium computation