Transcript PowerPoint
CS 416 Artificial Intelligence Lecture 21 Making Complex Decisions Chapter 17 Mapping MDPs to POMDPs The belief state after executing action a and observing observation o is: – Your belief vector, b’, at state, s’, evaluates to: – Call this b’ = FORWARD (b, a, o) But this requires knowing the observation, o Predicting future observation Prob of perceiving o given • starting in belief state b • action a was executed • s’ is the set of potentially reached states Predicting new belief state Previously we predicted observation… Now predict new belief state • t (b, a, b’) – prob of reaching b’ from b given action a • == 1 if b’ = FORWARD (b, a, o) • == 0 otherwise Another way to write For all observations, o, that (when combined with the knowledge of action, a, and previous state, b) lead to a new belief state == b’, sum… Game Theory Multiagent games with simultaneous moves • First, study games with one move – bankruptcy proceedings – auctions – economics – war gaming Definition of a game • The players • The actions • The payoff matrix – provides the utility to each player for each combination of actions Game theory strategies Strategy == policy • What do you do? – pure strategy you do the same thing all the time – mixed strategy you rely on some randomized policy to select an action • Strategy Profile – The assignment of strategies to players Game theoretic solutions What’s a solution to a game? • All players select a “rational” strategy • Note that we’re not analyzing one particular game, but the outcomes that accumulate over a series of played games Prisoner’s Dilemma Alice and Bob are caught red handed at the scene of a crime • both are interrogated separately by the police • the penalty for both if they both confess is 5 years • the penalty for both if neither confesses is 1 year • if one testifies against the other – the implicated gets 10 years – the snitch gets 0 years What do you do to act selfishly? Prisoner’s dilemma payoff matrix Prisoner’s dilemma strategy Alice’s Strategy • If Bob testifies – best option is to testify (-5) • If Bob refuses – best options is to testify (0) testifying is a dominant strategy Prisoner’s dilemma strategy Bob’s Strategy • If Alice testifies – best option is to testify (-5) • If Alice refuses – best options is to testify (0) testifying is a dominant strategy Rationality Both players seem to have clear strategies • Both testify – game outcome would be (-5, -5) Dominance of strategies Comparing strategies • Strategy s can strongly dominate s’ – the outcome of s is always better than the outcome of s’ no matter what the other player does testifying strongly dominates refusing for Bob and Alice • Strategy s can weakly dominate s’ – the outcome of s is better than the outcome of s’ on at least on action of the opponent and no worse on others Pareto Optimal Pareto optimality comes from economics • An outcome can be Pareto optimal – textbook: no alternative outcome that all players would prefer – I prefer: the best that could be accomplished without disadvantaging at least one group Is the testify outcome (-5, -5) Pareto Optimal? Is (-5, -5) Pareto Optimal? Is there an outcome that improves outcome without disadvantaging any group? How about (-1, -1) from (refuse, refuse)? Dominant strategy equilibrium (-5, -5) represents a dominant strategy equilibrium • neither player has an incentive to divert from dominant strategy – If Alice assumes Bob executes same strategy as he is now, she will only lose more by switching likewise for Bob • Imagine this as a local optimum in outcome space – each dimension of outcome space is dimension of a player’s choice – any movement from dominant strategy equilibrium in this space results in worse outcomes Thus the dilemma… Now we see the problem • Outcome (-5, -5) is Pareto dominated by outcome (-1, -1) – To achieve Pareto optimal outcome requires divergence from local optimum at strategy equilibrium • Tough situation… Pareto optimal would be nice, but it is unlikely since each player risks losing more Nash Equilibrium John Nash studied game theory in 1950s • Proved that every game has an equilibrium – If there is a set of strategies with the property that no player can benefit by changing her strategy while the other players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute the Nash Equilibrium • All dominant strategies are Nash equilibria Another game • Hardware manufacturer chooses between CD and DVD format for next game platform • Software manufacturer chooses between CD and DVD format for next title No dominant strategy • Verify that there is no dominant strategy Yet two Nash equilibria exist Outcome 1: (DVD, DVD)… (9, 9) Outcome 2: (CD, CD)… (5, 5) If either player unilaterally changes strategy, that player will be worse off We still have a problem Two Nash equlibria, but which is selected? • If players fail to select same strategy, both will lose – they could “agree” to select the Pareto optimal solution that seems reasonable – they could coordinate Brief Introduction Zero-sum game • Payoffs in each cell of payoff matrix sum to 0 • The Nash equilibrium in such cases may be a mixed strategy