Adversarial Search

Download Report

Transcript Adversarial Search

CSC 412: AI
Adversarial Search
Bikramjit Banerjee
Partly based on material available from the internet
Game search problems



Search problems
 Only problem solver actions can change the state of the
environment
Game search problems
 Multiple problems solvers (players) acting on the same
environment
Players’ actions can be
 Cooperative: common goal state
 Adversarial:


A whole spectrum between purely adversarial and
purely cooperative games
We first look at adversarial two-player games with turntaking


a win for one player is a loss for the other
Example: zero-sum games like chess, tic-tac-toe
Game Playing: State of the art




Checkers: Chinook ended 40-year-reign of human world
champion Marion Tinsley in 1994. Used an endgame database
defining perfect play for all positions involving 8 or fewer pieces
on the board, a total of 443,748,401,247 positions
Chess: Deep Blue defeated human world champion Gary
Kasparov in a six-game match in 1997. Deep Blue examined 200
million positions per second, used very sophisticated evaluation
and undisclosed methods for extending some lines of search up
to 40 ply
Othello: human champions refuse to compete against
computers, which are too good.
Go: human champions refuse to compete against computers,
which are too bad. In go, b > 300, so most programs use pattern
knowledge bases to suggest plausible moves.
Two Player Games






Max always moves first
Min is the opponent
states? Boards faced by Max/Min
Max Vs
actions? Players’ moves
goal test? Terminal board test
path cost? Utility function for each player
Min
Search tree: Alternate move games
Max
X
X
X
Min
X O
X
O
O
Max
Terminal
States
Utility
... ... ...
X
...
...
...
X O X
X X O
X X X
O
O O X
O
O O
X
-1
O
O
1
... ... ...
A simple abstract game
A1
A11
3
A12
12
A13
A2
A21
8
2
A3
A22
A23
4
A31
6
14
A32
A33
5
An action by one player is called a ply, two ply (an action and a counter
action) is called a move.
2
The Minimax Algorithm



Generate the game tree down to the terminal nodes
Apply the utility function to the terminal nodes
For a S set of sibling nodes, pass up to the parent




the lowest value in S if the siblings are
the largest value in S if the siblings are
Recursively do the above, until the backed-up
values reach the initial state
The value of the initial state is the minimum score
for Max
Minimax Decision
3
MAX
A1
MIN
A11
A2
A3
3
A12
max
2
A13
A21
A22
2
A23
A31
A32
min
A33
MAX
3

12
8
2
4
6
14
5
2
In this game Max’s best move is A1, because
he is guaranteed a score of at least 3
Properties of Minimax






Complete? Yes (if tree is finite)
Optimal? Yes (against an optimal opponent)
Time complexity? O(bm)
Space complexity? O(bm) (depth-first exploration)
For chess, b ≈ 35, m ≈100 for "reasonable" games
 finding optimal solution using Minimax is infeasible
Potential improvement to Minimax running time


Depth limited search
a-b pruning
Depth limited Minimax


One possible solution is to do depth
limited Minimax search
Search the game tree as deep as you
can in the given time



Evaluate the fringe nodes with the utility function
Back up the values to the root
Choose best move, repeat
… but we don’t
cutoff
We would like to
do Minimax on
this full game
tree...
have time, so we
will explore it to
some manageable
depth.
Example Utility Function
Tic Tac Toe
Assume Max is using “X”
X O
e(n) =
if n is win for Max, + 
if n is win for Min, - 
else
(number of rows, columns and diagonals
available to Max) - (number of rows,
columns and diagonals available to Min)
e(n) = 6 - 4 = 2
O X X
O
e(n) = 4 - 3 = 1
Example Utility Function
Chess I
Assume Max is “White”
Assume each piece has the following values
pawn
= 1;
knight
= 3;
bishop
= 3;
rook
= 5;
queen
= 9;
let w = sum of the value of white pieces
let b = sum of the value of black pieces
e(n) =
Note that this value ranges
w-b
between 1 and -1
w+b
Example Utility Function
Chess II
The previous evaluation function naively
gave the same weight to a piece regardless of
its position on the board
Let Xi be the number of squares the ith piece
attacks
e(n) =same as before
w=piece1value * X1 + piece2value * X2 + ...
Utility Functions
the ability to play a good game is highly dependant
on the evaluation functions
How do we come up with good evaluation
functions?




Interview an expert
Machine learning
Examples of class A
1
2
Examples of class B
Examples of
win for white
1
Examples of
win for black
1) Who would
win this game?
1
1) What class is
this object?
1
2
3
3
4
4
2
2) What class is
this object?
2
1) Who would
win this game?
α-β Pruning




We have seen how to use Minimax
search to play an optimal game
We have seen that because of time
limitations we may have to use a
cutoff depth to make the search
tractable. Using a cutoff causes
problem because of the “horizon”
effect
Is there some way we can search
deeper in the same amount of
time?
Yes! Use Alpha-Beta Pruning
Best move
before cutoff...
… but all its
children are
losing moves
Game
winning
move.
α-β Pruning
“If you have an idea that is surely bad, don't take the time to see how truly awful
it is”
-- Pat Winston
3
A1
A2
A3
max
2
3
2
min
A11
3
A12
12
A13
A21
8
2
A22
A23
A31
14
A32
A33
5
2
α-β pruning: Another example
MAX
MIN
MAX
4
3
6
2
2
1
9
5
3
Example courtesy of Dr. Milos Hauskrecht
1
5
4
7
5
α-β pruning
MAX
MIN
MAX
4
4
3
6
2
2
1
9
5
3
1
5
4
7
5
α-β pruning
MAX
MIN
4
MAX
 4
4
3
6
2
2
1
9
5
3
1
5
4
7
5
α-β pruning
MAX
MIN
4
MAX
!!
6
 4
4
3
6
2
2
1
9
5
3
1
5
4
7
5
α-β pruning
4
MAX
MIN
4
MAX
6
 4
4
3
6
2
2
1
9
5
3
1
5
4
7
5
α-β pruning
4
MAX
MIN
4
MAX
6
 4
4
3
6
2
2
2
1
9
5
3
1
5
4
7
α-β pruning
4
MAX
MIN
2
4
MAX
6
 4
4
3
6
2
2
2
1
9
5
3
1
5
4
7
5
α-β pruning
4
MAX
!!
MIN
2
4
MAX
6
 4
4
3
6
2
2
2
1
9
5
3
1
5
4
7
5
α-β pruning
4
MAX
MIN
2
4
MAX
6
 4
4
3
6
2
2
2
1
9
5
5
3
1
5
4
7
5
α-β pruning
5
MAX
Higher values first
below MAX level
MIN
Lower values first
below MIN level
MAX
6
 4
4
3
6
2
5
2
4
2
2
1
9
5
5
3
nodes that were never explored
1
5
4
7
7
5
α-β Pruning





Guaranteed to compute the same value for root as Minimax
In the worst case α-β does NO pruning, examining bd leaf nodes,
where each node has b children and a d-ply search is performed
In the best case, α-β will examine only 2bd/2 leaf nodes. Hence if
you hold fixed the number of leaf nodes then you can search
twice as deep as Minimax
The best case occurs when each player's best move is the
leftmost alternative (i.e., the first child generated). So, at MAX
nodes the child with the largest value is generated first, and at
MIN nodes the child with the smallest value is generated first ->
order the operators carefully
In the chess program Deep Blue, they found empirically that α-β
pruning meant that the average branching factor at each node
was ~6 instead of ~ 35-40
Non Zero-sum games



Similar to minimax:
Utilities are now tuples
Each player maximizes
their own entry at each
node
Propagate (or back up)
nodes from children
Stochastic 2-player games

E.g. backgammon



Expectiminimax
Environment is an extra player
that moves after each agent
At chance nodes take
expectations, otherwise like
minimax
Stochastic 2-player games

Dice rolls increase b: 21 possible rolls with 2 dice



As depth increases, probability of reaching a given
node shrinks




Backgammon ≈ 20 legal moves
Depth 4 = 20 x (21 x 20)3 1.2 x 109
So value of lookahead is diminished
So limiting depth is less damaging
But pruning is less possible
TDGammon uses depth-2 search + very good eval
function + reinforcement learning: world-champion
level play