www.cs.indiana.edu

Download Report

Transcript www.cs.indiana.edu

GAME PLAYING 2
THIS LECTURE
Alpha-beta pruning
 Games with chance
 Partially observable games

NONDETERMINISM

Uncertainty is caused by the actions of another
agent (MIN), who competes with our agent
(MAX)
MAX’s play
MIN’s play
MAX cannot tell
what move will be
played
NONDETERMINISM

Uncertainty is caused by the actions of another
agent (MIN), who competes with our agent
(MAX)
MAX’s play
Instead of a single path,
the agent must
construct an entire plan
MIN’s play
MAX must decide what to play for
BOTH these outcomes
MINIMAX BACKUP
MAX’s turn
+1
0
MIN’s turn
-1
+1
0
+1
MAX’s turn
0
-1
0
+1
0
DEPTH-FIRST MINIMAX ALGORITHM

MAX-Value(S)
1.
2.

MIN-Value(S)
1.
2.

If Terminal?(S) return Result(S)
Return maxS’SUCC(S) MIN-Value(S’)
If Terminal?(S) return Result(S)
Return minS’SUCC(S) MAX-Value(S’)
MINIMAX-Decision(S)

Return action leading to state S’SUCC(S) that
maximizes MIN-Value(S’)
REAL-TIME GAME PLAYING WITH
EVALUATION FUNCTION
e(s): function indicating estimated favorability of
a state to MAX
 Keep track of depth, and add line:



If(depth(s) = cutoff) return e(s)
After terminal test
CAN WE DO BETTER?

Yes ! Much better !
 3
3
 -1
 Pruning
-1
This part of the tree can’t
have any effect on the value
that will be backed up to the
root
EXAMPLE
EXAMPLE
b=2
2
The beta value of a MIN
node is an upper bound on
the final backed-up value.
It can never increase
EXAMPLE
The beta value of a MIN
node is an upper bound on
the final backed-up value.
It can never increase
b=1
2
1
EXAMPLE
a=1
The alpha value of a MAX
node is a lower bound on
the final backed-up value.
It can never decrease
b=1
2
1
EXAMPLE
a=1
b = -1
b=1
2
1
-1
EXAMPLE
a=1
b = -1
b=1
Search can be discontinued below
any MIN node whose beta value is
less than or equal to the alpha value
of one of its MAX ancestors
2
1
-1
ALPHA-BETA PRUNING
Explore the game tree to depth h in depth-first
manner
 Back up alpha and beta values whenever possible
 Prune branches that can’t lead to changing the
final decision

ALPHA-BETA ALGORITHM
Update the alpha/beta value of the parent of a
node N when the search below N has been
completed or discontinued
 Discontinue the search below a MAX node N if its
alpha value is  the beta value of a MIN ancestor
of N
 Discontinue the search below a MIN node N if its
beta value is  the alpha value of a MAX ancestor
of N

EXAMPLE
MAX
MIN
MAX
MIN
MAX
MIN
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
MAX
MIN
MAX
MIN
0
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
MAX
MIN
MAX
MIN
0
0
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
MAX
MIN
MAX
MIN
0
0
-3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
MAX
MIN
MAX
MIN
0
0
-3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
MAX
MIN
0
MAX
MIN
0
0
-3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
MAX
MIN
0
MAX
MIN
0
0
-3
3
3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
MAX
MIN
0
MAX
MIN
0
0
-3
3
3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
3
3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
3
3
5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
0
0
0
0
0
-3
3
3
2
2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
3
3
2
2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
2
2
3
3
2
2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
2
2
3
3
2
2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
0
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
2
2
3
3
2
2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
0
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
2
2
3
3
2
2
5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
0
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
2
2
3
3
2
2
1
1
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
0
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
2
2
3
3
2
2
1
1
-3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
0
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
2
2
3
3
2
2
1
1
-3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
0
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
2
2
1
3
3
1
2
2
1
1
-3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
0
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
2
2
1
3
3
1
2
2
1
1
-3
-5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
0
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
2
2
1
3
3
1
2
2
1
1
-3
-5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
0
MIN
0
MAX
0
MIN
0
MAX
MIN
0
0
-3
2
2
1
3
3
1
2
2
1
-5
1
-5
-3
-5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
0
0
0
0
0
0
-3
2
2
1
3
3
1
2
2
1
-5
1
-5
-3
-5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
0
MIN
0
MAX
0
MIN
0
MAX
MIN
1
0
0
-3
2
2
1
3
3
1
2
2
1
-5
1
-5
-3
-5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
1
MIN
0
MAX
0
MIN
0
MAX
MIN
1
0
0
-3
2
2
2
1
3
3
1
2
2
1
-5
2
1
-5
2
-3
-5
2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
EXAMPLE
MAX
1
MIN
0
MAX
0
MIN
0
MAX
MIN
1
0
0
-3
2
2
2
1
3
3
1
2
2
1
-5
2
1
-5
2
-3
-5
2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
HOW MUCH DO WE GAIN?

Consider these two cases:
a=3
3
3
b=-1
-1
a=3
(4)
b=4
4
-1
HOW MUCH DO WE GAIN?



Assume a game tree of uniform branching factor b
Minimax examines O(bh) nodes, so does alpha-beta in
the worst-case
The gain for alpha-beta is maximum when:
The children of a MAX node are ordered in decreasing
backed up values
 The children of a MIN node are ordered in increasing
backed up values




Then alpha-beta examines O(bh/2) nodes [Knuth and
Moore, 1975]
But this requires an oracle (if we knew how to order
nodes perfectly, we would not need to search the
game tree)
If nodes are ordered at random, then the average
number of nodes examined by alpha-beta is ~O(b3h/4)
ALPHA-BETA IMPLEMENTATION

MAX-Value(S,a,b)
1.
2.
3.
4.
5.

MIN-Value(S,a,b)
1.
2.
3.
4.
5.

If Terminal?(S) return Result(S)
For all S’SUCC(S)
a  max(a,MIN-Value(S’,a,b))
If a  b, then return a
Return a
If Terminal?(S) return Result(S)
For all S’SUCC(S)
b  min(b,MAX-Value(S’,a,b))
If a  b, then return b
Return b
Alpha-Beta-Decision(S)

Return action leading to state S’SUCC(S) that maximizes
MIN-Value(S’,-,+)
HEURISTIC ORDERING OF NODES

Order the nodes below the root according to the
values backed-up at the previous iteration
OTHER IMPROVEMENTS
Adaptive horizon + iterative deepening
 Extended search: Retain k>1 best paths, instead
of just one, and extend the tree at greater depth
below their leaf nodes (to help dealing with the
“horizon effect”)
 Singular extension: If a move is obviously better
than the others in a node at horizon h, then
expand this node along this move
 Use transposition tables to deal with repeated
states
 Null-move search

GAMES OF CHANCE
GAMES OF CHANCE
Dice games: backgammon, Yahtzee, craps, …
 Card games: poker, blackjack, …


Is there a fundamental difference between the
nondeterminism in chess-playing vs. the
nondeterminism in a dice roll?
MAX
CHANCE
MIN
CHANCE
MAX
EXPECTED VALUES
The utility of a MAX/MIN node in the game tree
is the max/min of the utility values of its
successors
 The expected utility of a CHANCE node in the
game tree is the average of the utility values of
its successors

ExpectedValue(s) = s’SUCC(s) ExpectedValue(s’) P(s’)
CHANCE nodes
Compare to
MinimaxValue(s) = max s’SUCC(s) MinimaxValue(s’)
MAX nodes
MinimaxValue(s) = min s’SUCC(s) MinimaxValue(s’)
MIN nodes
ADVERSARIAL GAMES OF CHANCE
E.g., Backgammon
 MAX nodes, MIN nodes, CHANCE nodes
 Expectiminimax search
 Backup step:





MAX = maximum of children
CHANCE = average of children
MIN = minimum of children
CHANCE = average of children
4 levels of the game tree separate each of MAX’s
turns!
 Evaluation function? Pruning?

GENERALIZING MINIMAX VALUES

Utilities can be continuous numerical values,
rather than +1,0,-1

Allows maximizing the amount of “points” (e.g., $)
rewarded instead of just achieving a win
Rewards associated with terminal states
 Costs can be associated with certain decisions at
non-terminal states (e.g., placing a bet)

ROULETTE

No bet
“Game tree” only has depth 2
Place a bet
 Observe the roulette wheel

Bet: Red, $5
Chance node
Probabilities
18/38
20/38
Red
Not red
+10
0
CHANCE NODE BACKUP

Expected value:


For k children, with backed up
values v1,…,vk
Chance node value =
p1 * v1 + p2 * v2 + … + pk * vk
Bet: Red, $5
Value:
18/38 * 10 + 20/38 * 0
= 4.74
Chance node
Probabilities
18/38
20/38
Red
Not red
+10
0
MAX/CHANCE NODES
Max should pick the action leading
to the node with the highest value
MAX
Bet: Red, $5
Bet: 17, $5
3.95 = 150/38
4.74
18/38
20/38
1/38
Red
Not red
17
+10
0
+150
37/38
Not 17
0
Chance
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins
 Pay $1 to start, at
which point both are
flipped
 Can flip up to two
coins again, at a cost
of $1 each
Done
 Payout: $5 for HH,
$1 for HT or TH, $0 HT
for TT

HT
Done
HT
Flip T
1/2
1/2
HT
HH
Flip T
1/2
1/2
HT
HH
Flip H
1/2
HT
Flip H
HT
1/2
TT
TT
Done
TT
1/2
1/2
Flip T
1/2
1/2
HT
TT
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins
 Pay $1 to start, at
which point both are
flipped
 Can flip up to two
coins again, at a cost
of $1 each
Done
 Payout: $5 for HH,
$1 for HT or TH, $0 HT
1-1=0
for TT

1/2
HT
1-2=-1
HT
Done
HT
1
Flip T
1/2
HT
Flip T
1/2
1/2
Flip H
1/2
HH
5-1=4
Flip H
HT
HH
HT
5-2=3 -1
TT
Done
TT
-1
1/2
1/2
Flip T
1/2
1/2
1/2
TT
-2
HT
-1
TT
-2
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins
 Pay $1 to start, at
which point both are
flipped
 Can flip up to two
coins again, at a cost
of $1 each
Done
 Payout: $5 for HH,
$1 for HT or TH, $0 HT
0
for TT

1/2
HT
-1
HT
Done
HT
1
Flip T
1/2
HT
Flip T
1
1/2
HH
3
1/2
Flip H
1/2
HH
4
Flip H
HT
HT
-1
TT
Done
TT
-3/2 -1
1/2
1/2
Flip T
1/2
1/2
1/2
TT
-2
HT
-1
TT
-2
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins
 Pay $1 to start, at
which point both are
flipped
 Can flip up to two
coins again, at a cost
of $1 each
Done
 Payout: $5 for HH,
$1 for HT or TH, $0 HT
0
for TT

HT
Done
HT
1
Flip T
1/2
HT
1/2
1
Flip T
1
1/2
1/2
HT
-1
HH
3
Flip H
1/2
HH
4
Flip H
HT
HT
-1
1/2
TT
-2
TT
Done
TT
-3/2 -1
1/2
1/2
Flip T
1/2
1/2
HT
-1
TT
-2
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins
 Pay $1 to start, at
which point both are
flipped
 Can flip up to two
coins again, at a cost
of $1 each
Done
 Payout: $5 for HH,
$1 for HT or TH, $0 HT
0
for TT

HT
Done
HT
1
Flip T
3
1/2
HT
1/2
2
Flip T
2
1/2
1/2
HT
-1
HH
3
Flip H
1/2
HH
4
Flip H
HT
HT
-1
1/2
TT
-2
TT
Done
TT
-3/2 -1
1/2
1/2
Flip T
1/2
1/2
HT
-1
TT
-2
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins
 Pay $1 to start, at
which point both are
flipped
 Can flip up to two
coins again, at a cost
of $1 each
Done
 Payout: $5 for HH,
$1 for HT or TH, $0 HT
0
for TT

HT
Done
HT
1
Flip T
3
1/2
HT
1/2
2
Flip T
2
1/2
1/2
HT
-1
HH
3
Flip H
1/2
HH
4
Flip H
HT
HT
-1
1/2
TT
-2
TT
Done
TT
-3/2 -1
1/2
1/2
Flip T
-3/2
1/2
1/2
HT
-1
TT
-2
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins
 Pay $1 to start, at
which point both are
flipped
 Can flip up to two
coins again, at a cost
of $1 each
Done
 Payout: $5 for HH,
$1 for HT or TH, $0 HT
0
for TT

HT
Done
HT
1
Flip T
3
1/2
HT
1/2
1/2
HT
-1
HH
3
1/2
1/2
2
Flip T
2
Flip H
1/2
1/2
HH
4
Flip H
HT TT
2
-1
Done
Flip T
TT
-3/2 -1
-3/2
1/2
HT
-1
1/2
TT
-2
1/2
1/2
HT
-1
TT
-2
A SLIGHTLY MORE COMPLEX EXAMPLE
Two fair coins
 Pay $1 to start, at
which point both are
flipped
 Can flip up to two
coins again, at a cost
of $1 each
Done
 Payout: $5 for HH,
$1 for HT or TH, $0 HT
0
for TT

HT 3
Done
HT
1
Flip T
3
1/2
HT
1/2
1/2
HT
-1
HH
3
1/2
1/2
2
Flip T
2
Flip H
1/2
1/2
HH
4
Flip H
HT TT
2
-1
Done
Flip T
TT
-3/2 -1
-3/2
1/2
HT
-1
1/2
TT
-2
1/2
1/2
HT
-1
TT
-2
CARD GAMES
Blackjack (6-deck), video poker: similar to coinflipping game
 But in many card games, need to keep track of
history of dealt cards in state because it affects
future probabilities

One-deck blackjack
 Bridge
 Poker

PARTIALLY OBSERVABLE GAMES

Partial observability
Don’t see entire state (e.g., other players’ hands)
 “Fog of war”


Examples:
Kriegspiel (see R&N)
 Battleship
 Stratego

OBSERVATION OF THE REAL WORLD
Real
world
in some
state
Percepts
Interpretation of the
percepts in the
representation language
On(A,B)
On(B,Table)
Handempty
Percepts can be user’s inputs, sensory data (e.g., image68
pixels), information received from other agents, ...
SECOND SOURCE OF UNCERTAINTY:
IMPERFECT OBSERVATION OF THE WORLD
Observation of the world can be:
 Partial, e.g., a vision sensor can’t see through
obstacles (lack of percepts)

R1
R2
The robot may not know whether
there is dust in room R2
69
SECOND SOURCE OF UNCERTAINTY:
IMPERFECT OBSERVATION OF THE WORLD
Observation of the world can be:
 Partial, e.g., a vision sensor can’t see through
obstacles
 Ambiguous, e.g., percepts have multiple possible
interpretations

A
C
B
On(A,B)  On(A,C)
70
SECOND SOURCE OF UNCERTAINTY:
IMPERFECT OBSERVATION OF THE WORLD
Observation of the world can be:
 Partial, e.g., a vision sensor can’t see through
obstacles
 Ambiguous, e.g., percepts have multiple possible
interpretations
 Incorrect

71
PARTIALLY-OBSERVABLE CARD GAMES

One possible strategy:
Consider all possible deals given observed
information
 Solve each deal as a fully-observable problem
 Choose the move that has the best average minimax
value

“Averaging over clairvoyance”
 [Why doesn’t this always work?]

BELIEF STATE


A belief state is the set of all states that an agent
think are possible at any given time or at any
stage of planning a course of actions, e.g.:
To plan a course of actions, the agent searches a
space of belief states, instead of a space of states
SENSOR MODEL
State space S
 The sensor model is a function
SENSE: S  2S
that maps each state s  S to a belief state (the
set of all states that the agent would think
possible if it were actually observing state s)
 Example: Assume our vacuum robot can perfectly
sense the room it is in and if there is dust in it.
But it can’t sense if there is dust in the other
room

SENSE(
) =
SENSE(
) =
VACUUM ROBOT ACTION MODEL
Right either moves the robot
right, or does nothing
Suck picks up the dirt in the
room, if any, and always
does the right thing
Left always moves the robot to the
left, but it may occasionally deposit
dust in the right room
• The robot perfectly senses the
room it is in and whether there is
dust in it
• But it can’t sense if there is dust in
the other room
TRANSITION BETWEEN BELIEF STATES

Suppose the robot is initially in state:

After sensing this state, its belief state is:

Just after executing Left, its belief state will be:

After sensing the new state, its belief state will be:
or
if there is no dust
in R1
if there is dust in R1
TRANSITION BETWEEN BELIEF STATES

Playing a “game against nature”
Left
Clean(R1)
Clean(R1)
After receiving an observation, the robot will
have one of these two belief states
AND/OR TREE OF BELIEF STATES
Left
Right
loop
Suck
goal
Suck
goal
An action is applicable to a belief state B if its
preconditions are achieved in all states in B
A goal belief state is one in which all states are goal states
RECAP
Alpha-beta pruning: reduce complexity of
minimax to O(bh/2) ideally, O(b3h/4) typically
 Games with chance


Expected values: averaging over probabilities

A 2nd source of uncertainty: partial observability

Reason about sets of states: belief state
Much more on latter 2 topics later

PROJECT PROPOSAL (OPTIONAL)
Mandatory: instructor’s advance approval
 Project title, team members
 1/2 to 1 page description

Specific topic (problem you are trying to solve, topic
of survey, etc)
 Why did you choose this topic?
 Methods (researched in advance, sources of
references)
 Expected results


Email to me by 9/20
HOMEWORK

Reading: R&N 6