CS 294-5: Statistical Natural Language Processing

Transcript CS 294-5: Statistical Natural Language Processing

CS 5368: Artificial Intelligence
Fall 2010
Search Part 4
9/13/2011
Mohan Sridharan
Slides adapted from Dan Klein
11
This Lecture
 Adversarial search!
2
Adversarial Search
 Pacman adversarial by definition 
 Agents have to contend with ghosts.
3
Game Playing
 Many different kinds of games!
 Axes:
 Deterministic or stochastic?
 One, two, or more players?
 Perfect information (can you see the state)?
 Need algorithms for computing a strategy (policy) which
suggests a move in each state.
4
Deterministic Games
 Many possible formalizations, one is:






States: S (start at s0)
Players: P={1...N} (usually take turns)
Actions: A (may depend on player / state)
Transition Function: SxA  S
Terminal Test: S  {t,f}
Terminal Utilities: SxP  R
 Solution for a player is a policy: S  A
5
Deterministic Single-Player?
 Deterministic single-player, perfect
information:
 Know the rules and action results.
 Know when you win.
 Free-cell, 8-Puzzle, Rubik’s cube.
 Just search!
 Slight reinterpretation:
 Each node stores a value: the best
outcome it can reach.
 This is the maximal outcome of its
children.
 No path sums (utilities at end).
 After search, can pick move that
leads to best node.
lose
win
lose
6
Deterministic Two-Player
 Tic-tac-toe, chess, checkers.
 Zero-sum games:
max
 One player maximizes result.
 The other minimizes result.
 Chess!
min
 Minimax search:
 A state-space search tree.
 Players take turns.
 Each layer, or ply, consists of a round of
moves*.
 Choose move to position with highest
minimax value = best achievable utility
against best play.
8
2
5
6
* Slightly different from the
book definition
7
Tic-tac-toe Game Tree
8
Minimax Example
9
Minimax Basics
 Minimax(n) is the utility for MAX being in that state,
assuming both players play optimally from there to the
end of the game!
 Minimax of terminal state is the utility.
Utility(s)
if terminal test(s)

Minimax(s) maxaActions( s ) Minimax(Result(s,a)) if player(s) max

minaActions( s ) Minimax(Result(s,a)) if player(s) min
 Simple recursive search computation and value backups
(Figure 5.3).
10
Minimax Search
function Minimax- Decision(state) returns an action
return arg maxaActions( s ) Min - Value(Result(state, a))
11
Minimax Properties
 Optimal against a perfect player. Otherwise?
max
 Time complexity?
 O(bm)
min
 Space complexity?
 O(bm)
 For chess, b  35, m  100
10
10
9
100
 Exact solution is completely infeasible.
 Do we need to explore the whole tree?
12
Resource Limits
 Cannot search to leaves!
4
-2 min
 Depth-limited search:
 Search a limited depth of tree.
 Replace utilities with an evaluation function for
non-terminal positions.
max
-1
4
-2
4
?
?
min
9
 Guarantee of optimal play is gone!
 More plies makes a BIG difference 
 Example:
 Have 100 seconds, can explore 10K nodes/sec.
 Can check 1M nodes per move.
 - reaches about depth 8; decent chess
program.
?
?
13
Evaluation Functions
 Function which scores non-terminals.
 Ideal function: returns the utility of the position.
 In practice: typically weighted linear sum of features:
 E.g. f1(s) = (num white queens – num black queens).
14
Evaluation for Pacman
15
Can Pacman Starve?
 Knows score will go up by eating the
dot now.
 Knows score will go up just as much
by eating the dot later on.
 No point-scoring opportunities after
eating the dot.
 Therefore, waiting seems just as
good as eating!
16
Iterative Deepening
Iterative deepening uses DFS as a
subroutine:
…
b
1. Do a DFS which only searches for paths of
length 1 or less. (DFS gives up on any path of
length 2).
2. If “1” failed, do a DFS which only searches paths
of length 2 or less.
3. If “2” failed, do a DFS which only searches paths
of length 3 or less.
….and so on.
Do we want to do this for multiplayer
games?
17
- Pruning
 At node n: if player has better choice m at parent node of
n or further up, n will never be reached!
19
- Pruning
 General configuration:
  is the best value that MAX can
get at any choice point along the
current path.
Player
Opponent

 If n becomes worse than , MAX
will avoid it, so can stop
considering n’s other children.
Player
 Define  similarly for MIN.
Opponent
n
 Figure 5.7 in textbook.
20
- Pruning Pseudocode

v
21
- Pruning Properties
 Pruning has no effect on final result! 
 Good move ordering improves effectiveness of pruning.
(Section 5.3.1).
 With “perfect ordering”:
 Time complexity drops to O(bm/2)
 Doubles solvable depth.
 Full search of, e.g. chess, is still hopeless 
 A simple example of meta-reasoning: reasoning about
which computations are relevant.
22
Non-Zero-Sum Games
 Similar to minimax:
 Utilities are now
tuples.
 Each player
maximizes their own
entry at each node.
 Propagate (or back
up) nodes from
1,2,6
children.
4,3,2
6,1,2
7,4,1
5,1,1
1,5,2
7,7,1
5,4,5
23
Expectimax Search Trees
 What if action results are unknown?
max
 In solitaire, next card is unknown.
 In minesweeper, mine locations.
 In pacman, the ghosts!
chance
 Can do Expectimax search:
 Chance nodes, like min nodes, except the
outcome is uncertain.
 Calculate expected utilities.
 Chance nodes take average (expectation) of
value of children.
10
4
5
7
 Later: formalize the underlying problem
as a Markov Decision Process.
24
Maximum Expected Utility
 Why should we average utilities? Why not minimax?
 Principle of maximum expected utility: an agent should chose the
action which maximizes its expected utility, given its knowledge.
 General principle for decision making.
 Often taken as the definition of rationality.
 We will see this idea over and over in this course!
25
Reminder: Probabilities
 Random variables and probability distributions.
 Example: traffic on freeway?
 Random variable: T = whether there is traffic.
 Val(T) = {none, light, heavy}
 Distribution: P(T=none) = 0.25, P(T=light) = 0.55, P(T=heavy) = 0.20
 Some laws of probability:
 Probabilities are always non-negative and between 0 and 1.
 Probabilities over all possible outcomes sum to one.
 As we get more evidence, probabilities may change:
 P(T=heavy) = 0.20, P(T=heavy | Hour=8am) = 0.60
 Uncertainty everywhere!
26
Reminder: Expectations
 We can define function f(X) of a random variable X.
 The expected value of a function is its average value, weighted by the
probability distribution over inputs.
 Example: How long to get to the airport?
 Length of driving time as a function of traffic:
L(none) = 20, L(light) = 30, L(heavy) = 60
 What is my expected driving time?
 P(T) = {none: 0.25, light: 0.5, heavy: 0.25}
 E[ L(T) ] = L(none) * P(none) + L(light) * P(light) + L(heavy) * P(heavy)
 E[ L(T) ] = (20 * 0.25) + (30 * 0.5) + (60 * 0.25) = 35.
27
Utilities
 Utilities are functions from outcomes (states of the world) to real
numbers that describe an agent’s preferences.
 Where do utilities come from?
 In a game, may be simple (+1/-1).
 Utilities summarize the agent’s goals.
 Theorem: any set of preferences between outcomes can be
summarized as a utility function (under certain conditions).
 In general, we hard-wire utilities and let actions emerge (why not let
agents decide their own utilities?).
 Utilities used in Markov networks!
28
Expectimax Search

Expectimax search: a probabilistic
model of behavior:
 Model can be a simple uniform
distribution (roll a die).
 Model can be sophisticated and
require significant computation.
 A node for every outcome out of our
control: opponent or environment.
 The model might say that
adversarial actions are likely!

For any state, assume we have a
distribution to assign probabilities to
opponent actions or environment
outcomes!
Having a probabilistic belief about an agent’s
action does not mean that agent is flipping coins!
29
Expectimax Pseudocode
def value(s)
if s is a max node return maxValue(s)
if s is an exp node return expValue(s)
if s is a terminal node return evaluation(s)
def maxValue(s)
values = [value(s’) for s’ in successors(s)]
return max(values)
8
4
5
6
def expValue(s)
values = [value(s’) for s’ in successors(s)]
weights = [probability(s, s’) for s’ in successors(s)]
return expectation(values, weights)
30
Expectimax Tree
31
Expectimax for Pacman
 Ghosts are not trying to minimize pacman’s score, they are part of
the environment.
 Pacman has a belief (distribution) over how ghosts will act.
 Can we see minimax as a special case of expectimax?
 What would pacman’s computation be if ghosts were doing 1-ply
minimax and taking the result 80% of the time, otherwise moving
randomly?
 End up calculating belief distributions over opponents’ belief
distributions over your belief distributions.
 Can get unmanageable very quickly!
32
Expectimax Evaluation
 Minimax search: scale does not matter:
 Just want better states to have higher evaluations (get the
ordering right).
 Insensitivity to monotonic transformations!
 Expectimax: need the magnitudes to be meaningful.
 Sensitivity to scaling 
 Evaluation function should be a +ve linear transform of the
expected utility of a position.
33
Expectimax Evaluation
0
40
20
30
x2
 What are the values at root node?
 See Figure 5.12.
0
1600
400
900
Mixed Layer Types
 E.g. Backgammon.
 Expectiminimax 

 Environment is an extra player that moves after each agent.
 Chance nodes take expectations, otherwise like minimax.
Section 5.5.
Expectiminimax(s)
if T erminal- T est(s)
Utility(s)
max Expectiminimax(Result(s,a)) if Player(s) Max
a

min Expectiminimax(Result(s,a)) if Player(s) Min
a

 P(r)Expect
iminimax(Result(s,r)) if Player(s) Chance
 r
35
Stochastic Two-Player
 Dice rolls increase b: 21 possible rolls with 2
dice.
 Backgammon  20 legal moves.
 Depth 4 = 20 x (21 x 20)3 =1.2 x 109.
 As depth increases, probability of reaching a
given node shrinks:
 Value of look-ahead is diminished.
 Limiting depth is less damaging 
 But pruning is less possible 
 Time and space complexity?
 TDGammon: depth-2 search, good evaluation
function + RL: world-champion level play.
36
What Next?
 Get ready for applications of probability and Bayesian
reasoning!
 Upcoming topics:
 Bayesian networks.
 Markov Decision Processes.
37