Transcript lecture17
Today’s Topics • Remember: no discussing exam until next Tues! ok to stop by Thurs 5:45-7:15pm for HW3 help • More BN Practice (from Fall 2014 CS 540 Final) • BN’s for Playing Nannon (http://nannon.com/) • Exploration vs. Exploitation Tradeoff • Stationarity • Nannon Class Tourney? • Read: Sections 18.6 (skim), 18.7, & 18.9 (artificial neural networks [ANNs] and support vector machines [SVMs]) 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 1 From Fall 2014 Final What is the probability that A and C are true but B and D are false? What is the probability that A is false, B is true, and D is true? C A B P(C=true | A, B) false false true true false true false true 0.2 0.4 0.6 0.8 D A B C P(D=true | A, B, C) false false false false true true true true false false true true false false true true false true false true false true false true 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 What is the probability that C is true given A is false, B is true, and D is true? 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 2 From Fall 2014 Final What is the probability that A and C are true but B and D are false? = P(A) (1 – P(B)) P(C | A ˄ ¬B) (1 - P(D | A ˄ ¬B ˄ C)) = 0.3 (1 – 0.8) 0.6 (1 – 0.6) What is the probability that A is false, B is true, and D is true? = P(¬A ˄ B ˄ D) = P(¬A ˄ B ˄ ¬C ˄ D) + P(¬A ˄ B ˄ C ˄ D) = process ‘complete world states’ like first question C A B P(C=true | A, B) false false true true false true false true 0.2 0.4 0.6 0.8 D A B C P(D=true | A, B, C) false false false false true true true true false false true true false false true true false true false true false true false true 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 What is the probability that C is true given A is false, B is true, and D is true? = P(C | ¬A ˄ B ˄ D) = P(C ˄ ¬A ˄ B ˄ D) / P(¬A ˄ B ˄ D) = process like first and second questions 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 3 From Fall 2014 Final Consider the following training set, where three Boolean-valued features are used to predict a Boolean-valued output. Assume you wish to apply the Naïve Bayes algorithm. Calculate the ratio below and use pseudo examples. Ex # A B C Output 1 True False True True 2 False True False True 3 False True True True 4 False False True False Prob(Output = True | A = False, B = False, C = False) ____________________________________________ = _________________ Prob(Output = False | A = False, B = False, C = False) 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 4 From Fall 2014 Final Consider the following training set, where three Boolean-valued features are used to predict a Boolean-valued output. Assume you wish to apply the Naïve Bayes algorithm. Calculate the ratio below and use pseudo examples. Ex # A B C Output 1 True False True True 2 False True False True 3 False True True True 4 False False True False Prob(Output = True | A = False, B = False, C = False) ____________________________________________ = _________________ Prob(Output = False | A = False, B = False, C = False) P(¬A | Out) P (¬B | Out) P (¬C | Out) P ( Out) = ________________________________________________ = (3 / 5) (2 / 5) (2 / 5) (5 / 8) ___________________________ P(¬A | ¬Out) P (¬B | ¬Out) P (¬C | ¬Out) P (¬Out) (2 / 3) (2 / 3) (1 / 3) (3 / 8) 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 Assume FOUR pseudo examples (ffft, tttt, ffff, tttf) 5 The Big Picture (not a BN, but like mini-max) - provided s/w gives you set of (annotated) legal moves - if zero or one, the s/w passes or makes the only possible move Current NANNON Board Possible Next Board ONE Possible Next Board THREE Possible Next Board TWO Four Effects of MOVES HIT: _XO_ BREAK: _XX_ EXTEND: _X_XX CREATE: _X_X_ 10/27/15 __X_ _X_X __XXX __XX_ CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 Choose move that gives best odds of winning 6 Reinforcement Learning (RL) vs. Supervised Learning • Nannon is Really an RL Task • We’ll Treat as a SUPERVISED ML Task – All moves in winning games considered GOOD – All moves in losing games considered BAD • Noisy Data, but Good Play Still Results • ‘Random Move’ & Hand-Coded Players Provided • Provided Code can make 106 Moves/Sec 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 7 What to Compute? Multiple Possibilities (Pick only One) P(move in winning game | current board chosen move) OR P(move in winning game | next board) OR P(move in winning game | next board prev board) OR P(move in winning game | next board effect of move) OR Etc. 10/27/15 Hit, break, extend, create, or some combo CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 8 `Raw’ Random Variables Representing the Board # of Safe Pieces for O # of Home Pieces for X What is on Board Cell i (X, O, or empty) # of Home Pieces for O Board size varies (L cells) Number of pieces each player has also varies (K pieces) Full Joint Size for Above = K (K+1) 3L (K+1) K - for L=12 and K=5, | full joint | = 900 x 312 = 478,296,900 # of Safe Pieces for X You can also create ‘derived’ features, eg, ‘inDanger’ Some Possible Ways of Encoding the Move - die value - which of 12 (?) possible effect combo’s occurred - moved from cell i to cell j (L2 possibilities; some not possible with 6-sided die) - how many possible moves there were (L – 2 possibilities) 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 9 HW3: Draw a BN then Implement the Calculation for that BN (also do NB) WIN S1 Odds(WIN) = S2 S3 … Sn [ P(Si = valuei | WIN=true) ] x P(WIN=true) [ P(Si = valuei | WIN=false) ] x P(WIN=false) Recall: Choose move that gives best odds of winning 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 10 Going Slightly Beyond NB WIN S1 Odds(WIN) = S2 S3 … P(S1 = ? | S2 = ? WIN) [ P(Si = ? | Sn WIN) ] x P( WIN) P(S1 = ? | S2 = ? WIN) [ P(Si = ? | WIN) ] x P(WIN) Here the PRODUCT is from 2 to n 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 11 Going Slightly Beyond NB (Part 2) WIN S1 Odds(WIN) = S2 S3 … P(S1 = ? S2 = ? | WIN) [ P(Si = ? | 10/27/15 WIN) ] x P( WIN) P(S1 = ? S2 = ? | WIN) [ P(Si = ? | WIN) ] x P(WIN) A little bit of joint probability! Used: Sn Here the PRODUCT is from 3 to n P(S1 = ? S2 = ? | WIN) = P(S1 = ? | S2 = ? WIN) x P(S2 = ? | WIN) CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 12 Some Possible Java Code Snippets for NB private static int boardSize = 6; // Default width of board. private static int pieces = 3; // Default #pieces per player. … int homeX_win[] = new int[pieces + 1]; // Holds p(homeX=? | win). int homeX_lose[] = new int[pieces + 1]; // Holds p(homeX=? | !win). int safeX_win[] = new int[pieces]; // NEED TO ALSO DO FOR ‘0’! int safeX_lose[] = new int[pieces]; // Be sure to initialize! int board_win[][] = new int[boardSize][3]; int board_lose[][] = new int[boardSize][3]; int wins = 1; // Remember m-estimates. int losses = 1; 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 13 Exploration vs. Exploitation Tradeoff • We are not getting iid data since the data we get depends on the moves we choose • Always doing what we currently think is best (exploitation) might be a local minimum • So we should try out seemingly non-optimal moves now and then (exploration), but likely to lose game • Think about learning how to get from home to work - many possible routes, try various ones now and then, but most days take what has been best in past • Simple sol’n for HW3: observe 100,000 games where two random-move choosers play each other (‘burn-in’ phase) 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 14 Stationarity • What About the Fact Opponent also Learns? • That Changes the Probability Distributions We are Trying to Estimate! • However, We’ll Assume that the Prob Distribution Remains Unchanged (ie, is Stationary) While We Learn 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 15 Have a Class Tourney? • Everyone Plays Everyone Else, Many Times across Various Combo’s of Board Size x #Pieces • Won’t Impact Course Grade • Opt In or Opt Out? (Student names not shared) • Exceedingly Slow, Memory Hogging, or Crashing Code Disqualified • Yahoo Research Sponsored in Past but Not Appropriate Here (since most of you have jobs) 10/27/15 CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8 16