Transcript PowerPoint

CS 416
Artificial Intelligence
Lecture 21
Making Complex Decisions
Chapter 17
Mapping MDPs to POMDPs
The belief state after executing action a and
observing observation o is:
– Your belief vector, b’, at state, s’, evaluates to:
– Call this b’ = FORWARD (b, a, o)
But this requires knowing the observation, o
Predicting future observation
Prob of perceiving o given
• starting in belief state b
• action a was executed
• s’ is the set of potentially reached states
Predicting new belief state
Previously we predicted observation…
Now predict new belief state
• t (b, a, b’)
– prob of reaching b’ from b given action a
• == 1 if b’ = FORWARD (b, a, o)
• == 0 otherwise
Another way to write
For all observations, o, that (when combined with
the knowledge of action, a, and previous state, b)
lead to a new belief state == b’, sum…
Game Theory
Multiagent games with simultaneous moves
• First, study games with one move
– bankruptcy proceedings
– auctions
– economics
– war gaming
Definition of a game
• The players
• The actions
• The payoff matrix
– provides the utility to each player for each combination of
actions
Game theory strategies
Strategy == policy
• What do you do?
– pure strategy
 you do the same thing all the time
– mixed strategy
 you rely on some randomized policy to select an action
• Strategy Profile
– The assignment of strategies to players
Game theoretic solutions
What’s a solution to a game?
• All players select a “rational” strategy
• Note that we’re not analyzing one particular game, but the
outcomes that accumulate over a series of played games
Prisoner’s Dilemma
Alice and Bob are caught red handed at the scene
of a crime
• both are interrogated separately by the police
• the penalty for both if they both confess is 5 years
• the penalty for both if neither confesses is 1 year
• if one testifies against the other
– the implicated gets 10 years
– the snitch gets 0 years
What do you do to act selfishly?
Prisoner’s dilemma payoff matrix
Prisoner’s dilemma strategy
Alice’s Strategy
• If Bob testifies
– best option is to testify (-5)
• If Bob refuses
– best options is to testify (0)
testifying is a
dominant
strategy
Prisoner’s dilemma strategy
Bob’s Strategy
• If Alice testifies
– best option is to testify (-5)
• If Alice refuses
– best options is to testify (0)
testifying is a
dominant
strategy
Rationality
Both players seem to have clear strategies
• Both testify
– game outcome would be (-5, -5)
Dominance of strategies
Comparing strategies
• Strategy s can strongly dominate s’
– the outcome of s is always better than the outcome of s’
no matter what the other player does
 testifying strongly dominates refusing for Bob and Alice
• Strategy s can weakly dominate s’
– the outcome of s is better than the outcome of s’ on at
least on action of the opponent and no worse on others
Pareto Optimal
Pareto optimality comes from economics
• An outcome can be Pareto optimal
– textbook: no alternative outcome that all players would
prefer
– I prefer: the best that could be accomplished without
disadvantaging at least one group
Is the testify outcome (-5, -5) Pareto Optimal?
Is (-5, -5) Pareto Optimal?
Is there an outcome that improves outcome
without disadvantaging any group?
How about (-1, -1) from (refuse, refuse)?
Dominant strategy equilibrium
(-5, -5) represents a dominant strategy equilibrium
• neither player has an incentive to divert from dominant strategy
– If Alice assumes Bob executes same strategy as he is now, she will
only lose more by switching
 likewise for Bob
• Imagine this as a local optimum in outcome space
– each dimension of outcome space is dimension of a player’s choice
– any movement from dominant strategy equilibrium in this space
results in worse outcomes
Thus the dilemma…
Now we see the problem
• Outcome (-5, -5) is Pareto dominated by outcome (-1, -1)
– To achieve Pareto optimal outcome requires divergence
from local optimum at strategy equilibrium
• Tough situation… Pareto optimal would be nice, but it is
unlikely since each player risks losing more
Nash Equilibrium
John Nash studied game theory in 1950s
• Proved that every game has an equilibrium
– If there is a set of strategies with the property that no
player can benefit by changing her strategy while the other
players keep their strategies unchanged, then that set of
strategies and the corresponding payoffs constitute the
Nash Equilibrium
• All dominant strategies are Nash equilibria
Another game
• Hardware manufacturer chooses between CD and DVD
format for next game platform
• Software manufacturer chooses between CD and DVD
format for next title
No dominant strategy
• Verify that there is no dominant strategy
Yet two Nash equilibria exist
Outcome 1: (DVD, DVD)… (9, 9)
Outcome 2: (CD, CD)… (5, 5)
If either player unilaterally changes strategy, that
player will be worse off
We still have a problem
Two Nash equlibria, but which is selected?
• If players fail to select same strategy, both will lose
– they could “agree” to select the Pareto optimal solution
 that seems reasonable
– they could coordinate
Brief Introduction
Zero-sum game
• Payoffs in each cell of payoff matrix sum to 0
• The Nash equilibrium in such cases may be a mixed strategy