CS 294-5: Statistical Natural Language Processing
Download
Report
Transcript CS 294-5: Statistical Natural Language Processing
CS 5368: Artificial Intelligence
Fall 2010
Search Part 4
9/13/2011
Mohan Sridharan
Slides adapted from Dan Klein
11
This Lecture
Adversarial search!
2
Adversarial Search
Pacman adversarial by definition
Agents have to contend with ghosts.
3
Game Playing
Many different kinds of games!
Axes:
Deterministic or stochastic?
One, two, or more players?
Perfect information (can you see the state)?
Need algorithms for computing a strategy (policy) which
suggests a move in each state.
4
Deterministic Games
Many possible formalizations, one is:
States: S (start at s0)
Players: P={1...N} (usually take turns)
Actions: A (may depend on player / state)
Transition Function: SxA S
Terminal Test: S {t,f}
Terminal Utilities: SxP R
Solution for a player is a policy: S A
5
Deterministic Single-Player?
Deterministic single-player, perfect
information:
Know the rules and action results.
Know when you win.
Free-cell, 8-Puzzle, Rubik’s cube.
Just search!
Slight reinterpretation:
Each node stores a value: the best
outcome it can reach.
This is the maximal outcome of its
children.
No path sums (utilities at end).
After search, can pick move that
leads to best node.
lose
win
lose
6
Deterministic Two-Player
Tic-tac-toe, chess, checkers.
Zero-sum games:
max
One player maximizes result.
The other minimizes result.
Chess!
min
Minimax search:
A state-space search tree.
Players take turns.
Each layer, or ply, consists of a round of
moves*.
Choose move to position with highest
minimax value = best achievable utility
against best play.
8
2
5
6
* Slightly different from the
book definition
7
Tic-tac-toe Game Tree
8
Minimax Example
9
Minimax Basics
Minimax(n) is the utility for MAX being in that state,
assuming both players play optimally from there to the
end of the game!
Minimax of terminal state is the utility.
Utility(s)
if terminal test(s)
Minimax(s) maxaActions( s ) Minimax(Result(s,a)) if player(s) max
minaActions( s ) Minimax(Result(s,a)) if player(s) min
Simple recursive search computation and value backups
(Figure 5.3).
10
Minimax Search
function Minimax- Decision(state) returns an action
return arg maxaActions( s ) Min - Value(Result(state, a))
11
Minimax Properties
Optimal against a perfect player. Otherwise?
max
Time complexity?
O(bm)
min
Space complexity?
O(bm)
For chess, b 35, m 100
10
10
9
100
Exact solution is completely infeasible.
Do we need to explore the whole tree?
12
Resource Limits
Cannot search to leaves!
4
-2 min
Depth-limited search:
Search a limited depth of tree.
Replace utilities with an evaluation function for
non-terminal positions.
max
-1
4
-2
4
?
?
min
9
Guarantee of optimal play is gone!
More plies makes a BIG difference
Example:
Have 100 seconds, can explore 10K nodes/sec.
Can check 1M nodes per move.
- reaches about depth 8; decent chess
program.
?
?
13
Evaluation Functions
Function which scores non-terminals.
Ideal function: returns the utility of the position.
In practice: typically weighted linear sum of features:
E.g. f1(s) = (num white queens – num black queens).
14
Evaluation for Pacman
15
Can Pacman Starve?
Knows score will go up by eating the
dot now.
Knows score will go up just as much
by eating the dot later on.
No point-scoring opportunities after
eating the dot.
Therefore, waiting seems just as
good as eating!
16
Iterative Deepening
Iterative deepening uses DFS as a
subroutine:
…
b
1. Do a DFS which only searches for paths of
length 1 or less. (DFS gives up on any path of
length 2).
2. If “1” failed, do a DFS which only searches paths
of length 2 or less.
3. If “2” failed, do a DFS which only searches paths
of length 3 or less.
….and so on.
Do we want to do this for multiplayer
games?
17
- Pruning
At node n: if player has better choice m at parent node of
n or further up, n will never be reached!
19
- Pruning
General configuration:
is the best value that MAX can
get at any choice point along the
current path.
Player
Opponent
If n becomes worse than , MAX
will avoid it, so can stop
considering n’s other children.
Player
Define similarly for MIN.
Opponent
n
Figure 5.7 in textbook.
20
- Pruning Pseudocode
v
21
- Pruning Properties
Pruning has no effect on final result!
Good move ordering improves effectiveness of pruning.
(Section 5.3.1).
With “perfect ordering”:
Time complexity drops to O(bm/2)
Doubles solvable depth.
Full search of, e.g. chess, is still hopeless
A simple example of meta-reasoning: reasoning about
which computations are relevant.
22
Non-Zero-Sum Games
Similar to minimax:
Utilities are now
tuples.
Each player
maximizes their own
entry at each node.
Propagate (or back
up) nodes from
1,2,6
children.
4,3,2
6,1,2
7,4,1
5,1,1
1,5,2
7,7,1
5,4,5
23
Expectimax Search Trees
What if action results are unknown?
max
In solitaire, next card is unknown.
In minesweeper, mine locations.
In pacman, the ghosts!
chance
Can do Expectimax search:
Chance nodes, like min nodes, except the
outcome is uncertain.
Calculate expected utilities.
Chance nodes take average (expectation) of
value of children.
10
4
5
7
Later: formalize the underlying problem
as a Markov Decision Process.
24
Maximum Expected Utility
Why should we average utilities? Why not minimax?
Principle of maximum expected utility: an agent should chose the
action which maximizes its expected utility, given its knowledge.
General principle for decision making.
Often taken as the definition of rationality.
We will see this idea over and over in this course!
25
Reminder: Probabilities
Random variables and probability distributions.
Example: traffic on freeway?
Random variable: T = whether there is traffic.
Val(T) = {none, light, heavy}
Distribution: P(T=none) = 0.25, P(T=light) = 0.55, P(T=heavy) = 0.20
Some laws of probability:
Probabilities are always non-negative and between 0 and 1.
Probabilities over all possible outcomes sum to one.
As we get more evidence, probabilities may change:
P(T=heavy) = 0.20, P(T=heavy | Hour=8am) = 0.60
Uncertainty everywhere!
26
Reminder: Expectations
We can define function f(X) of a random variable X.
The expected value of a function is its average value, weighted by the
probability distribution over inputs.
Example: How long to get to the airport?
Length of driving time as a function of traffic:
L(none) = 20, L(light) = 30, L(heavy) = 60
What is my expected driving time?
P(T) = {none: 0.25, light: 0.5, heavy: 0.25}
E[ L(T) ] = L(none) * P(none) + L(light) * P(light) + L(heavy) * P(heavy)
E[ L(T) ] = (20 * 0.25) + (30 * 0.5) + (60 * 0.25) = 35.
27
Utilities
Utilities are functions from outcomes (states of the world) to real
numbers that describe an agent’s preferences.
Where do utilities come from?
In a game, may be simple (+1/-1).
Utilities summarize the agent’s goals.
Theorem: any set of preferences between outcomes can be
summarized as a utility function (under certain conditions).
In general, we hard-wire utilities and let actions emerge (why not let
agents decide their own utilities?).
Utilities used in Markov networks!
28
Expectimax Search
Expectimax search: a probabilistic
model of behavior:
Model can be a simple uniform
distribution (roll a die).
Model can be sophisticated and
require significant computation.
A node for every outcome out of our
control: opponent or environment.
The model might say that
adversarial actions are likely!
For any state, assume we have a
distribution to assign probabilities to
opponent actions or environment
outcomes!
Having a probabilistic belief about an agent’s
action does not mean that agent is flipping coins!
29
Expectimax Pseudocode
def value(s)
if s is a max node return maxValue(s)
if s is an exp node return expValue(s)
if s is a terminal node return evaluation(s)
def maxValue(s)
values = [value(s’) for s’ in successors(s)]
return max(values)
8
4
5
6
def expValue(s)
values = [value(s’) for s’ in successors(s)]
weights = [probability(s, s’) for s’ in successors(s)]
return expectation(values, weights)
30
Expectimax Tree
31
Expectimax for Pacman
Ghosts are not trying to minimize pacman’s score, they are part of
the environment.
Pacman has a belief (distribution) over how ghosts will act.
Can we see minimax as a special case of expectimax?
What would pacman’s computation be if ghosts were doing 1-ply
minimax and taking the result 80% of the time, otherwise moving
randomly?
End up calculating belief distributions over opponents’ belief
distributions over your belief distributions.
Can get unmanageable very quickly!
32
Expectimax Evaluation
Minimax search: scale does not matter:
Just want better states to have higher evaluations (get the
ordering right).
Insensitivity to monotonic transformations!
Expectimax: need the magnitudes to be meaningful.
Sensitivity to scaling
Evaluation function should be a +ve linear transform of the
expected utility of a position.
33
Expectimax Evaluation
0
40
20
30
x2
What are the values at root node?
See Figure 5.12.
0
1600
400
900
Mixed Layer Types
E.g. Backgammon.
Expectiminimax
Environment is an extra player that moves after each agent.
Chance nodes take expectations, otherwise like minimax.
Section 5.5.
Expectiminimax(s)
if T erminal- T est(s)
Utility(s)
max Expectiminimax(Result(s,a)) if Player(s) Max
a
min Expectiminimax(Result(s,a)) if Player(s) Min
a
P(r)Expect
iminimax(Result(s,r)) if Player(s) Chance
r
35
Stochastic Two-Player
Dice rolls increase b: 21 possible rolls with 2
dice.
Backgammon 20 legal moves.
Depth 4 = 20 x (21 x 20)3 =1.2 x 109.
As depth increases, probability of reaching a
given node shrinks:
Value of look-ahead is diminished.
Limiting depth is less damaging
But pruning is less possible
Time and space complexity?
TDGammon: depth-2 search, good evaluation
function + RL: world-champion level play.
36
What Next?
Get ready for applications of probability and Bayesian
reasoning!
Upcoming topics:
Bayesian networks.
Markov Decision Processes.
37