Search - Hiram College

Download Report

Transcript Search - Hiram College

Games
CPSC 386 Artificial Intelligence
Ellen Walker
Hiram College
Why Games?
•
•
•
•
Small number of rules
Well-defined knowledge set
Easy to evaluate performance
Large search spaces (too large for exhaustive
search)
• Fame & Fortune, e.g. Chess
Example Games & Best Computer
Players (sec. 6.6 w/updates)
• Chess - Deep Blue (beat Kasparov); Deep Junior
(tied Kasparov); Hydra (scheduled to play British
champion for 80,000 pounds)
• Checkers - Chinook (world champion)
• Go (Goemate, Go4++ rated “weak amateur”)
• Othello - Iago (world championship level), Logistello
(defeated world champion, now retired)
• Backgammon - TD-Gammon (neural network that
learns to play using “reinforcement learning”)
Properties of Games
• Two-Player
• Zero-sum
– If it’s good for one player, it’s bad for the opponent
and vice versa
• Perfect information
– All relevant information is apparent to both players
(no hidden cards)
Game as Search Problem
– State space search
• Each potential board or game position is a state
• Each possible move is an operation
• Space can be BIG:
– large branching factor (chess avg. 35)
– deep search for game (chess avg. 50 ply)
– Components of any search technique
• Move generator (successor function)
• Terminal test (end of game?)
• Utility function (win, lose or draw?)
Game Tree
•
•
•
•
Root is initial state
Next level is all of first player’s moves
Next level is all of second player’s moves
Example: Tic Tac Toe
–
–
–
–
Root: 9 blank squares
Level 1: 3 different boards (corner, center and edge X)
Level 2 below center: 2 different boards (corner, edge)
Etc.
• Utility function: win for X is 1, win for O is -1
– X is Maximizer, O is minimizer
Minimax Strategy
• Max’s goal: get to 1
• Min’s goal: get to -1
• Max’s strategy
– Choose moves that will lead to a win, even though min is
trying to block
• Minimax value of a node (backed up value):
– If N is terminal, use the utility value
– If N is a Max move, take max of successors
– If N is a Min move, take min of successors
Minimax Values: 2-Ply Example
1
-1
-1
1
-3
4 -3
5
1
0
3
2
1
Minimax Algorithm
• Depth-first search to bottom of tree
• As search “unwinds”, compute backed up
values
• Backed-up value of root determines which
step to take.
• Assumes:
– Both players are playing this strategy (optimally)
– Tree is small enough to search completely
Alpha-Beta Pruning
• We don’t really have to look at all subtrees!
• Recognize when a position can never be
chosen in minimax no matter what its children
are
– Max (3, Min(2,x,y) …) is always ≥ 3
– Min (2, Max(3,x,y) …) is always ≤ 2
– We know this without knowing x and y!
Alpha-Beta Pruning
• Alpha = the value of the best choice we’ve
found so far for MAX (highest)
• Beta = the value of the best choice we’ve
found so far for MIN (lowest)
• When maximizing, cut off values lower than
Alpha
• When minimizing, cut off values greater than
Beta
Alpha-Beta Example
3
3
3
5
<=1
8 1
x
2
x
7
6
2
Notes on Alpha-Beta Pruning
• Effectiveness depends on order of
successors (middle vs. last node of 2-ply
example)
• If we can evaluate best successor first,
search is O(bd/2) instead of O(bd)
• This means that in the same amount of time,
alpha-beta search can search twice as deep!
Optimizing Minimax Search
• Use alpha-beta cutoffs
– Evaluate most promising moves first
• Remember prior positions, reuse their backed-up
values
– Transposition table (like closed list in A*)
• Avoid generating equivalent states (e.g. 4 different
first corner moves in tic tac toe)
• But, we still can’t search a game like chess to the
end!
When you can’t search to the end
• Replace terminal test (end of game) by cutoff
test (don’t search deeper)
• Replace utility function (win/lose/draw) by
heuristic evaluation function that estimates
results on the best path below this board
– Like A* search, good evaluation functions mean
good results (and vice versa)
• Replace move generator by plausible move
generator (don’t consider “dumb” moves)
Good evaluation functions…
• Order terminal states in the same order as
the utility function
• Don’t take too long (we want to search as
deep as possible in limited time)
• Should be as accurate as possible (estimate
chances of winning from that position…)
– Human knowledge (e.g. material value)
– Known solution (e.g. endgame)
– Pre-searched examples (take features, average
value of endgame of all games with that feature)
How Deep to Search?
• Until time runs out (the original application of Iterative
Deepening!)
• Until values don’t seem to change (quiescence)
• Deep enough to avoid horizon effect (delaying tactic
to delay the inevitable beyond the depth of the
search)
• Singular extensions - search best (apparent) paths
deeper than others
– Tends to limit horizon effect, since these are the moves that
will exhibit it