Game Playing Chapter 6 • Game Playing and AI • Game Playing as Search • Greedy Searching Game Playing • Minimax • Alpha-Beta Pruning 11/5/2015 ©2001-2004 James D.
Download
Report
Transcript Game Playing Chapter 6 • Game Playing and AI • Game Playing as Search • Greedy Searching Game Playing • Minimax • Alpha-Beta Pruning 11/5/2015 ©2001-2004 James D.
Game Playing
Chapter 6
• Game Playing and AI
• Game Playing as Search
• Greedy Searching Game Playing
• Minimax
• Alpha-Beta Pruning
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
1
Game Playing and AI
Why would game playing be
a good problem for AI research?
game playing is non-trivial
games often are:
11/5/2015
players need “human-like” intelligence
games can be very complex (e.g. chess, go)
requires decision making within limited time
well-defined and repeatable
easy to represent
fully observable and limited environments
can directly compare humans and computers
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
2
Game Playing and AI
Deterministic
perfect info
(fully
observable)
imperfect info
(partially
observable)
11/5/2015
Chance
checkers
go
others?
backgammon
monopoly
others?
any?
dominoes
bridge
others?
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
3
Game Playing as Search
Consider two-player, turn-taking, board games
e.g., tic-tac-toe, checkers, chess
adversarial, zero-sum
board configs: unique arrangements of pieces
Representing these as search problem:
11/5/2015
states:
edges:
initial state:
goal state:
board configurations
legal moves
start board configuration
winning/terminal board configuration
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
4
Game Playing as Search: Game Tree
What's the new aspect
to the search problem?
There’s an opponent
that we cannot control!
…
X
X
X
X
…
XO
X
O
X
O
X
O
…
XX
X
O
X
X
O
XO
How can this be handled?
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
5
Game Playing as Search: Complexity
Assume the opponent’s moves can be
predicted given the computer's moves.
How complex would search be in this case?
worst case: O(bd) branching factor, depth
Tic-Tac-Toe: ~5 legal moves, 9 moves max game
Chess: ~35 legal moves, ~100 moves per game
59 = 1,953,125 states
35100 ~10154 states, but only ~1040 legal states
Common games
produce enormous search trees.
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
6
Greedy Search Game Playing
A utility function maps each terminal state of
the board to a numeric value corresponding to
the value of that state to the computer.
positive for winning, > + means better for computer
negative for losing, > - means better for opponent
zero for a draw
typical values (lost to win):
11/5/2015
-infinity to +infinity
-1.0 to +1.0
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
7
Greedy Search Game Playing
Expand each branch to the terminal states
Evaluate the utility of each terminal state
Choose the move that results in the board
configuration with the maximum value
A
A
9
B
B
-5
F
F
-7
C
C
9
G
G
-5
H
H
3
I
9I
D
D
2
J
J
-6
K
K
0
E
E
3
L
L
2
M
M
1
N
N
3
board evaluation from computer's perspective
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
computer's
possible moves
O
O
2
opponent's
possible moves
terminal states
8
Greedy Search Game Playing
Assuming a reasonable search space,
what's the problem with greedy search?
It ignores what the opponent might do!
e.g. Computer chooses C.
Opponent chooses J and defeats computer.
A
9
B
C
-5
F
-7
D
9
G
-5
H
3
I
9
E
2
J
-6
K
0
3
L
2
M
1
N
3
board evaluation from computer's perspective
11/5/2015
computer's
possible moves
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
O
opponent's
possible moves
2
terminal states
9
Minimax: Idea
Assuming the worst
(i.e. opponent plays optimally):
given there are two plays till the terminal states
If high utility numbers favor the computer
Computer should choose which moves?
maximizing moves
If low utility numbers favor the opponent
Smart opponent chooses which moves?
minimizing moves
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
10
Minimax: Idea
The computer assumes after it moves the
opponent will choose the minimizing move.
It chooses its best move considering
both its move and opponent’s best move.
A
A
1
B
B
-7
F
-7
C
C
-6
G
-5
H
3
I
9
D
D
0
J
-6
K
0
E
E
1
L
2
M
1
N
3
board evaluation from computer's perspective
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
computer's
possible moves
O
opponent's
possible moves
2
terminal states
11
Minimax: Passing Values up Game Tree
Explore the game tree to the terminal states
Evaluate the utility of the terminal states
Computer chooses the move to put the board
in the best configuration for it assuming
the opponent makes best moves on her turns:
start at the leaves
assign value to the parent node as follows
11/5/2015
use minimum of children when opponent’s moves
use maximum of children when computer's moves
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
12
Deeper Game Trees
Minimax can be generalize for > 2 moves
Values backed up in minimax way
A
A
3
B
B
-5
F
F
4
N
C
C
3
G
-5
O
O
-5
4
W
-3
11/5/2015
computer max
H
3
I
8
P
9
D
E
E
-7
0
J
J
9
Q
-6
K
K
5
R
0
S
3
opponent
min
M
M
L
2
T
5
opponent
X
min
-7
U
-7
computer
max
V
-9
terminal states
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
13
Minimax: Direct Algorithm
For each move by the computer:
1. Perform depth-first search to a terminal state
2. Evaluate each terminal state
3. Propagate upwards the minimax values
if opponent's move minimum value of children backed up
if computer's move maximum value of children backed up
4. choose move with the maximum of minimax values of
children
Note:
• minimax values gradually propagate upwards as DFS
proceeds: i.e., minimax values propagate up
in “left-to-right” fashion
• minimax values for sub-tree backed up “as we go”,
so only O(bd) nodes need to be kept in memory at any time
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
14
Minimax: Algorithm Complexity
Assume all terminal states are at depth d
Space complexity?
depth-first search, so O(bd)
Time complexity?
given branching factor b, so O(bd)
Time complexity is a major problem!
Computer typically only has a
finite amount of time to make a move.
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
15
Minimax: Algorithm Complexity
Direct minimax algo. is impractical in practice
instead do depth-limited search to ply (depth) m
What’s the problem for stopping at any ply?
evaluation defined only for terminal states
we need to know the value of non-terminal states
Static board evaluator (SBE) function
uses heuristics to estimate the value
of non-terminal states.
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
16
Minimax: Static Board Evaluator (SBE)
A static board evaluation function estimates
how good a board configuration is for the
computer.
it reflects the computer’s chances
of winning from that state
it must be easy to calculate from the board config
For Example, Chess:
SBE = α * materialBalance + β * centerControl + γ * …
material balance =
Value of white pieces - Value of black pieces
(pawn = 1, rook = 5, queen = 9, etc).
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
17
Minimax: Static Board Evaluator (SBE)
Typically, one subtracts how good board
config is for the computer from how good
board is for the opponent
SBE should be symmetric,
if the SBE gives X for a player
then it should give -X for opponent
The SBE must be consistent,
must agree with the utility function
when calculated at terminal nodes
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
18
Minimax: Algorithm with SBE
int minimax (Node s, int depth, int limit) {
if (isTerminal(s) || depth == limit) //base case
return(staticEvaluation(s));
else {
Vector v = new Vector();
//do minimax on successors of s and save their values
while (s.hasMoreSuccessors())
v.addElement(minimax(s.getNextSuccessor(),
depth+1, limit));
if (isComputersTurn(s))
return maxOf(v); //computer's move returns max of kids
else
return minOf(v); //opponent's move returns min of kids
}
}
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
19
Minimax: Algorithm with SBE
The same as direct minimax, except
only goes to depth m
estimates non-terminal states using SBE function
How would this algorithm perform at chess?
11/5/2015
if could look ahead ~4 pairs of moves (i.e. 8 ply)
would be consistently beaten by average players
if could look ahead ~8 pairs
as done in typical pc, is as good as human master
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
20
Recap
Can't minimax search to the end of the game.
SBE isn't perfect at estimating.
if could, then choosing move is easy
if it was, just choose best move without searching
Since neither is feasible for interesting games,
combine minimax and SBE concepts:
11/5/2015
minimax to depth m
use SBE to estimate board configuration
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
21
Alpha-Beta Pruning Idea
Some of the branches of the game tree won't
be taken if playing against a smart opponent.
Use pruning to ignore those branches.
While doing DFS of game tree, keep track of:
alpha at maximizing levels (computer’s move)
beta at minimizing levels (opponent’s move)
11/5/2015
highest SBE value seen so far (initialize to -infinity)
is lower bound on state's evaluation
lowest SBE value seen so far (initialize to +infinity)
is higher bound on state's evaluation
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
22
Alpha-Beta Pruning Idea
Beta cutoff pruning occurs when maximizing
if child’s alpha >= parent's beta
Why stop expanding children?
opponent won't allow computer to take this move
Alpha cutoff pruning occurs when minimizing
if parent's alpha >= child’s beta
Why stop expanding children?
computer has a better move than this
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
23
Alpha-Beta Search Example
minimax(A,0,4)
alpha initialized to -infinity
Expand A? Yes since there are successors, no cutoff test for root
B
N
W
-3
11/5/2015
-5
O
4
D
C
G
F
Call
Stack
A
A
α=-
max
H
3
I
8
J
P
Q
9
E
0
-6
L
K
R
0
S
3
M
2
T
5
U
-7
V
-9
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
24
Alpha-Beta Search Example
minimax(B,1,4)
beta initialized to +infinity
Expand B? Yes since A’s alpha >= B’s beta is false, no alpha cutoff
B
B
β=+
Call
Stack
A
max
α=-
D
C
E
0
min
G
F
N
O
4
W
-3
11/5/2015
-5
H
3
I
8
J
P
Q
9
-6
L
K
R
0
S
3
M
2
T
5
U
-7
V
-9
B
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
25
Alpha-Beta Search Example
minimax(F,2,4)
alpha initialized to -infinity
Expand F? Yes since F’s alpha >= B’s beta is false, no beta cutoff
B
Call
Stack
A
max
α=-
D
C
β=+
E
0
min
F
F
α=-
G
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
4
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
X
U
-7
V
-9
F
B
A
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
26
Alpha-Beta Search Example
minimax(N,3,4)
evaluate and return SBE value
B
Call
Stack
A
max
α=-
D
C
β=+
E
0
min
F
G
α=-
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
4
W
-3
11/5/2015
9
X
-5
-6
R
0
S
3
T
5
green: terminal state
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
U
-7
V
-9
N
F
B
A
27
Alpha-Beta Search Example
back to
minimax(F,2,4)
alpha = 4, since 4 >= -infinity (maximizing)
Keep expanding F? Yes since F’s alpha >= B’s beta is false, no beta cutoff
B
Call
Stack
A
max
α=-
D
C
β=+
E
0
min
F
G
α=4
α=-
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
4
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
X
U
-7
V
-9
F
B
A
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
28
Alpha-Beta Search Example
minimax(O,3,4)
beta initialized to +infinity
Expand O? Yes since F’s alpha >= O’s beta is false, no alpha cutoff
B
Call
Stack
A
max
α=-
D
C
β=+
E
0
min
F
α=4
G
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
O
β=+
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
X
U
-7
V
-9
O
F
B
A
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
29
Alpha-Beta Search Example
minimax(W,4,4)
evaluate and return SBE value
B
Call
Stack
A
max
α=-
D
C
β=+
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=+
4
min
W
-3
11/5/2015
9
X
-5
-6
R
0
S
3
T
5
blue: non-terminal state (depth limit)
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
U
-7
V
-9
W
O
F
B
A
30
Alpha-Beta Search Example
back to
minimax(O,3,4)
beta = -3, since -3 <= +infinity (minimizing)
Keep expanding O? No since F’s alpha >= O’s beta is true: alpha cutoff
B
Call
Stack
A
max
α=-
D
C
β=+
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
β=+
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
X
U
-7
V
-9
O
F
B
A
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
31
Alpha-Beta Search Example
Why?
Smart opponent will choose W or worse, thus O's upper bound is –3.
Computer already has better move at N.
B
Call
Stack
A
max
α=-
D
C
β=+
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
X
-5
-6
R
0
S
3
T
5
red: pruned state
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
U
-7
V
-9
O
F
B
A
32
Alpha-Beta Search Example
alpha doesn’t change, since -3 < 4 (maximizing)
back to
minimax(F,2,4)
Keep expanding F? No since no more successors for F
B
Call
Stack
A
max
α=-
D
C
β=+
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
X
U
-7
V
-9
F
B
A
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
33
Alpha-Beta Search Example
back to
minimax(B,1,4)
beta = 4, since 4 <= +infinity (minimizing)
Keep expanding B? Yes since A’s alpha >= B’s beta is false, no alpha cutoff
B
Call
Stack
A
max
α=-
D
C
β=+
β=4
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
U
-7
V
-9
B
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
34
Alpha-Beta Search Example
minimax(G,2,4)
evaluate and return SBE value
B
Call
Stack
A
max
α=-
D
C
β=4
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
X
-5
-6
R
0
S
3
T
5
green: terminal state
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
U
-7
V
-9
G
B
A
35
Alpha-Beta Search Example
back to
minimax(B,1,4)
beta = -5, since -5 <= 4 (minimizing)
Keep expanding B? No since no more successors for B
B
Call
Stack
A
max
α=-
D
C
β=-5
β=4
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
U
-7
V
-9
B
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
36
Alpha-Beta Search Example
back to
minimax(A,0,4)
alpha = -5, since -5 >= -infinity (maximizing)
Keep expanding A? Yes since there are more successors, no cutoff test
B
Call
Stack
A
max
α=-5
α=
D
C
β=-5
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
U
-7
V
-9
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
37
Alpha-Beta Search Example
minimax(C,1,4)
beta initialized to +infinity
Expand C? Yes since A’s alpha >= C’s beta is false, no alpha cutoff
B
Call
Stack
A
max
α=-5
α=
C
C
β=+
β=-5
D
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
U
-7
V
-9
C
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
38
Alpha-Beta Search Example
evaluate and return SBE value
minimax(H,2,4)
B
Call
Stack
A
max
α=-5
α=
C
β=-5
D
β=+
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
X
-5
-6
R
0
S
3
T
5
green: terminal state
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
U
-7
V
-9
H
C
A
39
Alpha-Beta Search Example
back to
minimax(C,1,4)
beta = 3, since 3 <= +infinity (minimizing)
Keep expanding C? Yes since A’s alpha >= C’s beta is false, no alpha cutoff
B
Call
Stack
A
max
α=-5
α=
C
β=-5
D
β=+
β=3
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
U
-7
V
-9
C
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
40
Alpha-Beta Search Example
evaluate and return SBE value
minimax(I,2,4)
B
Call
Stack
A
max
α=-5
α=
C
β=-5
D
β=3
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
X
-5
-6
R
0
S
3
T
5
green: terminal state
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
U
-7
V
-9
I
C
A
41
Alpha-Beta Search Example
beta doesn’t change, since 8 > 3 (minimizing)
back to
minimax(C,1,4)
Keep expanding C? Yes since A’s alpha >= C’s beta is false, no alpha cutoff
B
Call
Stack
A
max
α=-5
α=
C
β=-5
D
β=3
E
0
min
F
G
α=4
-5
H
3
I
8
J
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
U
-7
V
-9
C
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
42
Alpha-Beta Search Example
alpha initialized to -infinity
minimax(J,2,4)
Expand J? Yes since J’s alpha >= C’s beta is false, no beta cutoff
B
Call
Stack
A
max
α=-5
α=
C
β=-5
D
β=3
E
0
min
F
G
α=4
-5
H
3
I
8
J
J
α=-
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
X
U
-7
V
-9
J
C
A
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
43
Alpha-Beta Search Example
evaluate and return SBE value
minimax(P,3,4)
B
Call
Stack
A
max
α=-5
α=
C
β=-5
D
β=3
E
0
min
F
G
α=4
-5
H
3
I
J
8
α=-
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
X
-5
-6
R
0
S
3
T
5
green: terminal state
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
U
-7
V
-9
P
J
C
A
44
Alpha-Beta Search Example
back to
minimax(J,2,4)
alpha = 9, since 9 >= -infinity (maximizing)
Keep expanding J? No since J’s alpha >= C’s beta is true: beta cutoff
B
Call
Stack
A
max
α=-5
α=
C
β=-5
D
β=3
E
0
min
F
G
α=4
-5
H
3
I
J
8
α=9
α=-
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
X
-5
-6
R
0
S
3
T
5
red: pruned states
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
U
-7
V
-9
J
C
A
45
Alpha-Beta Search Example
Why?
Computer will choose P or better, thus J's lower bound is 9.
Smart opponent won’t let computer take move to J
(since opponent already has better move at H).
B
Call
Stack
A
max
α=-5
α=
C
β=-5
D
β=3
E
0
min
F
G
α=4
-5
H
3
I
J
8
α=9
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
X
-5
-6
R
0
S
3
T
5
red: pruned states
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
U
-7
V
-9
J
C
A
46
Alpha-Beta Search Example
beta doesn’t change, since 9 > 3 (minimizing)
back to
minimax(C,1,4)
Keep expanding C? No since no more successors for C
B
Call
Stack
A
max
α=-5
α=
C
β=-5
D
β=3
E
0
min
F
G
α=4
-5
H
3
I
J
8
α=9
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
U
-7
V
-9
C
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
47
Alpha-Beta Search Example
back to
minimax(A,0,4)
alpha = 3, since 3 >= -5 (maximizing)
Keep expanding A? Yes since there are more successors, no cutoff test
B
Call
Stack
A
max
α=-5
α=3
C
β=-5
D
β=3
E
0
min
F
G
α=4
-5
H
3
I
J
8
α=9
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
U
-7
V
-9
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
48
Alpha-Beta Search Example
evaluate and return SBE value
minimax(D,1,4)
B
Call
Stack
A
max
α=3
α=
C
β=-5
D
β=3
E
0
min
F
G
α=4
-5
H
3
I
J
8
α=9
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
X
-5
-6
R
0
S
3
T
5
green: terminal state
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
U
-7
V
-9
D
A
49
Alpha-Beta Search Example
alpha doesn’t change, since 0 < 3 (maximizing)
back to
minimax(A,0,4)
Keep expanding A? Yes since there are more successors, no cutoff test
B
Call
Stack
A
max
α=3
α=
C
β=-5
D
β=3
E
0
min
F
G
α=4
-5
H
3
I
J
8
α=9
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
U
-7
V
-9
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
50
Alpha-Beta Search Example
How does the algorithm finish searching the tree?
B
Call
Stack
A
max
α=3
α=
C
β=-5
D
β=3
E
0
min
F
G
α=4
-5
H
3
I
J
8
α=9
P
Q
L
K
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
U
-7
V
-9
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
51
Alpha-Beta Search Example
Stop Expanding E since A's alpha >= E's beta is true: alpha cutoff
Why?
Smart opponent will choose L or worse, thus E's upper bound is 2.
Computer already has better move at C.
B
Call
Stack
A
max
α=3
α=
C
β=-5
D
β=3
E
β=2
0
min
F
G
α=4
-5
H
3
I
J
8
α=9
P
Q
K
L
α=5
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
-6
R
0
S
3
T
5
U
-7
V
-9
A
X
-5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
52
Alpha-Beta Search Example
Result: Computer chooses move to C.
B
Call
Stack
A
max
α=3
α=
C
β=-5
D
β=3
E
β=2
0
min
F
G
α=4
-5
H
3
I
J
8
α=9
P
Q
K
L
α=5
M
2
max
N
O
β=-3
4
min
W
-3
11/5/2015
9
X
-5
-6
R
0
S
3
T
5
U
-7
green: terminal states, red: pruned states
blue: non-terminal state (depth limit)
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
V
-9
A
53
Game Playing
Chapter 6
• Alpha-Beta Effectiveness
• Other Issues
• Linear Evaluation Functions
• Non-Deterministic Games
• Case Studies
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
54
Alpha-Beta Effectiveness
Effectiveness depends on the order
in which successors are examined.
What ordering gives more effective pruning?
More effective if best successors are examined first.
Best Case: each player’s best move is left-most
Worst Case: ordered so that no pruning occurs
no improvement over exhaustive search
In practice, performance is closer
to best case rather than worst case.
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
55
Alpha-Beta Effectiveness
If opponent’s best move where first
more pruning would result:
A
A
α=3
α=3
E
E
β=2
K
L
α=5
S
3
11/5/2015
β=2
2
T
5
L
M
U
-7
K
2
V
-9
S
3
M
T
5
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
U
-7
V
-9
56
Alpha-Beta Effectiveness
In practice often get O(b(d/2)) rather than O(bd)
same as having a branching factor of sqrt(b)
recall (sqrt(b))d = b(d/2)
For Example: chess
11/5/2015
goes from b ~ 35 to b ~ 6
permits much deeper search for the same time
makes computer chess competitive with humans
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
57
Other Issues: Dealing with Limited Time
In real games, there is usually
a time limit T on making a move.
How do we take this into account?
11/5/2015
cannot stop alpha-beta midway and expect to use
results with any confidence
so, we could set a conservative depth-limit that
guarantees we will find a move in time < T
but then, the search may finish early and
the opportunity is wasted to do more search
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
58
Other Issues: Dealing with Limited Time
In practice, iterative deepening is used
11/5/2015
run alpha-beta search
with an increasing depth limit
when the clock runs out,
use the solution found
for the last completed alpha-beta search
(i.e. the deepest search that was completed)
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
59
Other Issues: The Horizon Effect
Sometimes disaster lurks just
beyond search depth
e.g. computer captures queen,
but a few moves later the opponent checkmates
The computer has a limited horizon, it cannot
see that this significant event could happen
How do you avoid catastrophic losses
due to “short-sightedness”?
11/5/2015
quiescence search
secondary search
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
60
Other Issues: The Horizon Effect
Quiescence Search
when SBE value frequently changing,
look deeper than limit
looking for point when game quiets down
Secondary Search
1.
2.
3.
11/5/2015
find best move looking to depth d
look k steps beyond to verify that it still looks good
if it doesn't, repeat step 2 for next best move
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
61
Other Issues: Book Moves
Build a database of opening moves,
end games, studied configurations
If the current state is in the database,
use database:
to determine the next move
to evaluate the board
Otherwise do alpha-beta search
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
62
Linear Evaluation Functions
The static board evaluation function estimates
how good the current board configuration
is for the computer.
it is a heuristic function of the board's features
the features are numeric characteristics
11/5/2015
i.e., function(f1, f2, f3, …, fn)
feature 1, f1, is number of white pieces
feature 2, f2, is number of black pieces
feature 3, f3, is f1/f2
feature 4, f4, is estimate of “threat” to white king
etc…
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
63
Linear Evaluation Functions
A linear evaluation function
of the features is a weighted sum of f1, f2, f3...
w1 * f1 + w2 * f2 + w3 * f3 + … + wn * fn
where f 1, f2, …, fn are the features
and w1, w2 , …, wn are the weights
More important features get more weight.
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
64
Linear Evaluation Functions
The quality of play depends directly on
the quality of the evaluation function.
To build an evaluation function we have to:
construct good features using
expert domain knowledge
2. pick or learn good weights
1.
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
65
Linear Evaluation Functions
How could we learn these weights?
Basic idea:
play lots of games against an opponent
11/5/2015
for every move (or game) look at the
error = true outcome - evaluation function
if error is positive (underestimating)
adjust weights to increase the evaluation function
if error is zero do nothing
if error is negative (overestimating)
adjust weights to decrease the evaluation function
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
66
Non-Deterministic Games
How do some games involve chance?
roll of dice
spin of game wheel
deal cards from shuffled deck
How can we handle games of chance?
The game tree representation is extended
to include chance nodes:
1.
2.
3.
11/5/2015
computer moves
chance nodes
opponent moves
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
67
Non-Deterministic Games
e.g. extended game tree representation:
A
α=-
50/50
max
50/50
chance
.5
B
11/5/2015
.5
C
β=2
7
.5
D
β=6
2 9
.5
E
β=0
6
5
β=-4
0 8
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
min
-4
68
Non-Deterministic Games
Weight score by the probabilities
that move occurs
Use expected value for move:
sum of possible random outcomes
A
α=
50/50
50/50
4
.5
B
11/5/2015
50/50
50/50
-2
.5
D
β=6
2 9
chance
.5
C
β=2
7
.5
max
E
β=0
6
5
β=-4
0 8
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
min
-4
69
Non-Deterministic Games
Choose move with highest expected value
A
α=4
α=
50/50
50/50
4
-2
.5
B
11/5/2015
.5
.5
D
β=6
2 9
chance
.5
C
β=2
7
max
E
β=0
6
5
β=-4
0 8
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
min
-4
70
Non-Deterministic Games
Non-determinism increases branching factor
21 possible rolls with 2 dice
Value of look ahead diminishes:
as depth increases probability
of reaching a given node decreases
alpha-beta pruning less effective
TDGammon:
11/5/2015
depth-2 search
very good heuristic
plays at world champion level
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
71
Case Studies: Learned to Play Well
Checkers:
A. L. Samuel, “Some Studies in Machine Learning
using the Game of Checkers,” IBM Journal of
Research and Development, 11(6):601-617, 1959
Learned by playing thousands of times
against a copy of itself
Used only an IBM 704 with 10,000 words of RAM,
magnetic tape, and a clock speed of 1 kHz
Successful enough to compete well at human
tournaments
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
72
Case Studies: Learned to Play Well
Backgammon:
G. Tesauro and T. J. Sejnowski, “A Parallel
Network that Learns to Play Backgammon,”
Artificial Intelligence, 39(3), 357-390, 1989
Also learns by playing against copies of itself
Uses a non-linear evaluation function a neural network
Rates in the top (three) players in the world
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
73
Case Studies: Playing Grandmaster Chess
“Deep Blue” (IBM)
Parallel processor, 32 nodes
Each node has 8 dedicated VLSI “chess chips”
Can search 200 million configurations/second
Uses minimax, alpha-beta,
sophisticated heuristics
In 2001 searched to 14 ply (i.e. 7 pairs of moves)
Can avoid horizon by searching as deep as 40 ply
Uses book moves
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
74
Case Studies: Playing Grandmaster Chess
Kasparov vs. Deep Blue, May 1997
6 game full-regulation match sponsored by ACM
Kasparov lost the match 2 wins to 3 wins and 1 tie
This was a historic achievement for computer
chess being the first time a computer became
the best chess player on the planet.
Note that Deep Blue plays by “brute force”
(i.e. raw power from computer speed and
memory). It uses relatively little that is similar
to human intuition and cleverness.
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
75
Case Studies: Playing Grandmaster Chess
3000
2800
2600
Chess Ratings
Garry Kasparov (current World Champion)
Deep Blue
2400
2200
Deep Thought
2000
1800
1600
1400
1200
1966
11/5/2015
1971
1976
1981
1986
1991
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
1997
76
Case Studies: Other Deterministic Games
Checkers/Draughts
Othello
world champion is Chinook
beats any human, (beat Tinsley in 1994)
uses alpha-beta search, book moves (>443 billion)
computers easily beat world experts
Go
11/5/2015
branching factor b ~ 360, very large!
$2 million prize for any system
that can beat a world expert
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
77
Summary
Game playing is best modeled as
a search problem.
Search trees for games represent alternate
computer/opponent moves.
Evaluation functions estimate the quality of
a given board configuration for each player.
- good for opponent
0 neutral
+ good for computer
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
78
Summary
Minimax is a procedure that chooses moves
by assuming that the opponent always
chooses their best move.
Alpha-beta pruning is a procedure that can
eliminate large parts of the search tree thus
enabling the search to go deeper.
For many well-known games, computer
algorithms using heuristic search can match
or out-perform human world experts.
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
79
Conclusion
Initially thought
to be good area for AI research.
But brute force has proven to be better
than a lot of knowledge engineering.
more high-speed hardware issues than AI issues
simplifying AI part enabled scaling up of hardware
Is a good test-bed for computer learning.
Perhaps machines don't have to think like us?
11/5/2015
©2001-2004 James D. Skrentny from notes by C. Dyer, et. al.
80