Adversarial Search

Download Report

Transcript Adversarial Search

Chapter 6, Sec. 1 - 4
Adversarial Search
1
Outline
Optimal decisions
α-β pruning
Imperfect, real-time decisions
2
Games vs. search problems
Competitive multiagent environment in which the agents’ goals are in
conflict: adversarial search problem - games.
Games in AI: deterministic, turn-taking, 2-player, zero-sum games of
perfect information – i.e. deterministic, fully observable environments in which
there two agents whose actions must alternate and in which the utility values at the
end of the game are always equal and opposite.
"Unpredictable" opponent  solution is a strategy specifying a move for
every possible opponent reply.
Time limits  unlikely to find goal, must approximate.
–
–
–
–
how to make the best possible use of time?
The optimal move and an algorithm for finding it
the techniques for choosing a good move within the time limits
Pruning allows us to ignore portions of the search tree that make no difference
to the final choice
– Heuristic evaluation functions allow us to approximate the true utility
of a state without doing a complete search.
3
Types of games
deterministic
chance
Perfect
information
Chess, checkers, backgammon,
go, othello
monopoly
Imperfect
information
Bridge, poker,
scrabble,
nuclear war
4
Optimal decisions in game
A game can be formally defined as a kind of search problem with
the following components:
– Initial state: includes the board position and identifies the player to
move
– Successor function: returns a list of (move, state) pairs, each
indicating a legal move and resulting state.
– Terminal test: determines when the game is over – terminal states.
– Utility function ( an objective function or payoff function):
gives a numeric value for the terminal states. -- +x, -x, 0
A game tree for tic-tac-toe:
– Play alternates b/t MAX’s placing an X and MIN’s placing an O until
we reach leaf nodes of terminal states.
– The # on each leaf node: the utility value of the terminal state from the
point of view of MAX.
5
Continued... Optimal decisions
Cf) In a normal search problem, the optimal solution would be a sequence
of moves leading to a goal state in the minimum path cost.
In a game, a player must find a contingent strategy for every possible
response by opponent
An optimal strategy leads to outcomes at last as good as any other
strategy when one is playing an infallible opponent.
In a game tree of tic-tac-toe, the optimal strategy can be determined by
examining the minimax value of each node, which is the utility (for MAX)
of being in the corresponding state, assuming that both players play
optimally from there to the end.
UTILITY (n)
if n is a terminal state

MINIMAX  VALUE (n)  max sSuccessor ( n ) MINIMAX  VALUE ( s ) if n is a MAX node

min sSuccessor ( n ) MINIMAX  VALUE ( s) if n is a MIN node
Given a choice, MAX prefer to move to a state of maximum value,
whereas MIN prefers a state of minimum value.
6
Game tree
(2-player, deterministic, turns)


The top node: the initial state
Alternative moves by MIN(O) and MAX(X), until we eventually reach terminal states,
which can be assigned utilities according to the rules of the game.
7
Minimax algorithm
Compute the minimax decision from the current state.
Perfect play for deterministic games, perfect-information
game.
Idea: choose move to position with highest minimax value
= best achievable payoff against best play
E.g., 2-ply game:
8
Minimax algorithm
-- the minimax decision from the current state
9
Properties of minimax
Complete? Yes (if tree is finite)
Optimal?
Yes (against an optimal opponent)
Time complexity? O(bm)
Space complexity? O(bm) (depth-first exploration)
The # of game states it has to examine is exponential in
the # of moves.
For chess, b ≈ 35, m ≈100 for "reasonable" games
 exact solution completely infeasible
10
α-β pruning
Compute the correct minimax decision w/o looking at every node
in the game tree => reduce the # of states examined.
Prunes away branches of the game tree that can’t possibly
influence the final decision.
General principle:
Consider a node n somewhere in the tree, s.t. Player has a choice
of moving to that node. If Player has a better choice m either at
the parent node of n or at any choice point further up, then n will
never be reached in actual play.
11
α-β pruning example
12
Continued… α-β pruning example
13
Continued… α-β pruning example
14
Continued… α-β pruning example
15
Continued… α-β pruning example
16
Properties of α-β pruning
Pruning does not affect final result
Good move ordering improves effectiveness of pruning
-- highly dependent on the order of successors examined.
It might be worthwhile to try to examine first the successor that are
likely to be best.
With "perfect ordering – the best-first” of successor’s examination,
time complexity = O(bm/2)
 Effective branching factor:  b
 doubles depth of search w/i the same amount of time.
 Can easily reach depth 8 and play good chess.
With random examination, the total # of nodes examined will be
O(b3m/4)
A simple example of the value of reasoning about which
computations are relevant (a form of metareasoning)
17
Why is it called α-β?
α is the value of the best (i.e., highest-value) choice found so far at
any choice point along the path for MAX
 is the value of the best (i.e. lowest-value) choice found so far at
any choice point along the path for MIN.
If V is worse than α, MAX will avoid it  prune that branch
Define β similarly for MIN
18
The α-β algorithm
19
Resource limits
Suppose we have 100 seconds, explore 104 nodes/sec
 106 nodes per move
Alpha-beta pruning still has to search all the way to terminal
states for at least a portion of the search space.
Standard approach:
– cutoff test:
Cut off the search earlier and decide when to apply
evaluation function, turning nonterminal nodes into terminal
leaves.
e.g., depth limit (perhaps add quiescence search)
– evaluation function
= estimated desirability(utility) of position
20
Evaluation functions
Returns an estimate of the expected utility of the game from a position.
Requirements
– The eval-fn should order the terminal states in the same way as the true utility
function; otherwise, an agent using it might select suboptimal moves
even if it can see ahead all the way to the end of the game.
– The computation must not take too long.
– For nonterminal states, the eval-fn should be strongly correlated with the actual
chances of winning. => Make a guess a/b the final outcome.
Categories of states defined by features which are calculated by eval-fn and
the states in each category have the same values for all the features
as a single value which reflects the proportion of states with each outcome
– the weighted average or expected value.
The expected value determined for each category, resulting in an eval-fn that
works for any state.
– too much to estimate all the probabilities of winning for too may categories
Compute separate numerical contributions from each feature and then combine
them to find the total value. -- a weighted linear function
Eval(s)=w1 f1 (s)+w2 f2 (s)+ …. +wn fn (s)
21
Evaluation functions
For chess, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black queens), etc.
Deciding the features and weights are from human chess-playing experience.
The weights of the Eval-fn can be estimated by the machine learning
techniques.
22
Cutting off search
Modify -SEARCH so that it’ll call the Eval-fn when to
cutt off the search.
MinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval.
E.g.) Apply a depth-limit. => the amount of time used
won’t exceed what the rules of the game allow.
Does it work in practice?
bm = 106, b=35  m=4
23
Deterministic games in practice
Checkers:
Chinook ended 40-year-reign of human world champion Marion Tinsley
in 1994. Used a precomputed endgame database defining perfect play
for all positions involving 8 or fewer pieces on the board, a total of 444
billion positions.
Chess:
Deep Blue defeated human world champion Garry Kasparov in a sixgame match in 1997. Deep Blue searches 200 million positions per
second, uses very sophisticated evaluation, and undisclosed methods
for extending some lines of search up to 40 ply.
Othello:
human champions refuse to compete against computers, who are too
good.
Go:
human champions refuse to compete against computers, who are too
bad. In go, b > 300, so most programs use pattern knowledge bases to
suggest plausible moves.
24
Summary
Games are fun to work on!
They illustrate several important points about AI
– perfection is unattainable  must approximate
– good idea to think about what to think about
25