Transcript PowerPoint
CS 416 Artificial Intelligence Lecture 8 Adversarial Search Chapter 6 Chess Match – Spring 2003 Ends in a 3-3 Draw Adversarial Search Problems involving • Multiple agents • Competitive environments • Agents have conflicting goals Also called games Since the dawn of time? Oldest known written fair-division problem Talmud – Jewish Oral Law dating to first century • A Bankruptcy Case – A man married three wives and in each marriage contract he promised each of them different amounts of money upon his death: one of them gets $100 another gets $200 the third gets $300 – When he died, he had fewer than $600 units of money What do you do? Bankruptcy law • Modern bankruptcy provides shares of the estate proportional to their individual claims, no matter what size of the estate – A receives 100/600 * estate_holdings – B receives 200/600 * estate_holdings – C receives 300/600 * estate_holdings Bankruptcy law Rabbi Nathan in Mishnah section of Talmud Estate→ Claims↓ 100 200 300 100 33.3 50 50 200 33.3 75 100 300 33.3 75 150 This allocation not understood until recently Unexplained until 1984 Aumann and Maschler (Israeli Mathematicians) • Realistically, when you die, people could come out of the woodwork saying you owe them money. Some could coalesce into deceptive groups. How can we reduce the incentives (rewards) of forming such groups? • Minimize largest dissatisfaction among all possible coalitions • A common fair-division problem – http://www.math.gatech.edu/~hill/publications/cv.dir/madevice.pdf Garment Principle Two people claim a garment worth $100 • One claims the entire garment belongs to him • The other claims half the garment is his The one claiming the full garment gets $75 The one claiming half gets $25 Why? Minimizing maximum dissatisfaction • The one who wants the entire garment cedes nothing to the other and thus wants $100. • The one who wants half the garment would be perfectly happy to cede $50 to the other. – But a split of 50/50 would make one person unhappy and the other perfectly happy How to make them equally unhappy? A $100 Garment Person 1 Person 2 Requested Amount 100 50 Ceded from competitor 50 0 25 25 75 25 Split what remains Sum of ceded and split Game Theory Studied by mathematicians, economists, finance In AI we limit games to: • deterministic • turn-taking • two-player • zero-sum • perfect information Games “Shall we play a game?” Let’s play tic-tac-toe Tic-Tac-Toe game tree MAX’s first move MIN’s first move Each layer is a ply What data do we need to play? Initial State • How does the game start? Successor Function • A list of legal (move, state) pairs for each state Terminal Test • Determines when game is over Utility Function • Provides numeric value for all terminal states Minimax strategy Optimal Strategy • Leads to outcomes at least as good as any other strategy when playing an infallible opponent • Pick the option that minimizes the maximum damage your opponent can do – minimize the worst-case outcome – because your skillful opponent will certainly find the most damaging move Minimax Algorithm • MinimaxValue(n) = Utility (n) if n is a terminal state max MinimaxValue(s) of all successors, s if n is a MAX node min MinimaxValue(s) of all successors, s if n is a MIN node This is optimal strategy assuming both players play optimally from there until end of game A two-ply example MIN considers minimizing how much it loses… A two-ply example MAX considers minimizing how much it loses… Minimax Algorithm We wish to identify minimax decision at the root • Recursive evaluation of all nodes in game tree • Time complexity = O (bm) Feasibility of minimax? How about a nice game of chess? • Avg branching = 35 and avg # moves = 50 for each player – O(35100) time complexity = 10154 nodes 1040 distinct nodes Minimax is impractical if directly applied to chess Pruning minimax tree Are there times when you know you need not explore a particular move? • When the move is poor? • Poor compared to what? • Poor compared to what you have explored so far Alpha-beta pruning a – the value of the best (highest) choice so far in search of MAX b – the value of the best (lowest) choice so far in search of MIN • Order of considering successors matters – If possible, consider best successors first Alpha-beta pruning MIN knows it will at least score a 3. MAX worries that MIN knows player –inf is still possible MAX has an option of going to node B MAX knows that 3 is with a min payoff of worst case for this 3. MAX will never node. take action C and MAX knows that it culling is possible. can accomplish a score of at least 3. Discovery could find a higher value Alpha-beta pruning • Without pruning – O(bd) nodes to explore • With a good heuristic pruner (consider part (f) of figure) – O(bd/2) Chess can drop from O(35100) to O(6100) • With a random heuristic – O(b3d/4) Real-time decisions What if you don’t have enough time to explore entire search tree? • We cannot search all the way down to terminal state for all decision sequences • Use a heuristic to approximate (guess) eventual terminal state Evaluation Function (Estimator) The heuristic that estimates expected utility • Cannot take too long (otherwise recurse to get answer) • It should preserve the ordering among terminal states – otherwise it can cause bad decision making • Define features of game state that assist in evaluation – what are features of chess? Truncating minimax search When do you recurse or use evaluation function? • Cutoff-Test (state, depth) returns 1 or 0 – When 1 is returned, use evaluation function When do you cut off? • When exploring beyond a certain depth – The horizon effect When do you cut off? • Cutoff if state is stable or quiescient (more predictable) When do you cut off? Cutoff moves you know are bad (forward pruning) Benefits of truncation Comparing Chess Number of plys that can considered per unit time • Using minimax 5 ply • Average Human 6-8 ply • Using alpha-beta 10 ply • Intelligent pruning 14 ply Games with chance How to include chance in game tree? • Add chance nodes Expectiminimax Expectiminimax (n) = • utility(n) if n is a terminal state • max sSuccessors( n ) EXPECTIMINIMAX ( s) if n is a MAX node • min sSuccessors( n ) EXPECTIMINIMAX ( s) if n is a MIN node • P(s) * EXPECTIMINIMAX (s) sSuccessors( n ) if n is a chance node Pruning Can we prune search in games of chance? • Think about alpha-beta pruning – With alpha-beat, we don’t explore nodes that we know are worse than what we know we can accomplish – With randomness, we never really what we will accomplish chance node values are average of successors Thus it is hard to use alpha-beta History of Games Chess, Deep Blue • IBM: 30 RS/6000 comps with 480 custom VLSI chess chips • Deep Thought design came from Campbell and Hsu at CMU • 126 mil nodes / s • 30 bil positions per move • routine reaching depth of 14 • iterative deepening alpha-beta search Deep Blue • evaluation function had 8000 features • 4000 opening moves in memory • 700,000 grandmaster games from which recommendations extracted • many endgames solved for all five piece combos Checkers Arthur Samuel of IBM, 1952 • program learned by playing against itself • beat a human in 1962 (but human clearly made error) • 19 KB of memory • 0.000001 Ghz processor Checkers Chinook, Jonathan Schaeffer, 1990 • Alpha-beta search on regular PCs • database of all 444 billion endgame positions with 8 pieces • Played against Marion Tinsley – world champion for over 40 years – lost only 3 games in 40 years – Chinook won two games, but lost match • Rematch with Tinsley was incomplete for health reasons – Chinook became world champion Othello Smaller search space (5 to 15 legal moves) Humans are no match for computers Backgammon Garry Tesauro, TD-Gammon, 1992 • Reliably ranked in top-three players of world • Learned to play through playing against itself – Reinforcement Learning Discussion How reasonable is minimax? • perfectly performing opponent • perfect knowledge of leaf node evaluations • strong assumptions Building alpha-beta tree Can we restrict the size of game tree? • alpha-beta will blindly explore tree in depth-first fashion even if only one move is possible from root • even if multiple moves are possible, can we use a quick search to eliminate some entirely? • utility vs. time tradeoff to decide when to explore new branches or to stay with what you have Metareasoning Reasoning about reasoning • alpha-beta is one example – think before you think – think about utility of thinking about something before you think about it – don’t think about choices you don’t have to think about Goal-directed reasoning / planning Minimax starts from root and moves forward using combinatorial search What about starting at goal and working backward • We talked about difficulty of identifying goal states in bidirectional search • We do not know how to combine the two in practical way