Transcript lecture17

Today’s Topics
• Remember: no discussing exam until next Tues!
ok to stop by Thurs 5:45-7:15pm for HW3 help
• More BN Practice (from Fall 2014 CS 540 Final)
• BN’s for Playing Nannon (http://nannon.com/)
• Exploration vs. Exploitation Tradeoff
• Stationarity
• Nannon Class Tourney?
• Read: Sections 18.6 (skim), 18.7, & 18.9
(artificial neural networks [ANNs]
and support vector machines [SVMs])
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
1
From Fall 2014 Final
What is the probability that A and C are true but B and D are false?
What is the probability that A is false, B is true, and D is true?
C
A
B
P(C=true | A, B)
false
false
true
true
false
true
false
true
0.2
0.4
0.6
0.8
D
A
B
C
P(D=true | A, B, C)
false
false
false
false
true
true
true
true
false
false
true
true
false
false
true
true
false
true
false
true
false
true
false
true
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
What is the probability that C is true given A is false, B is true, and D is true?
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
2
From Fall 2014 Final
What is the probability that A and C are true but B and D are false?
= P(A)  (1 – P(B))  P(C | A ˄ ¬B)  (1 - P(D | A ˄ ¬B ˄ C)) = 0.3  (1 – 0.8)  0.6  (1 – 0.6)
What is the probability that A is false, B is true, and D is true?
= P(¬A ˄ B ˄ D) = P(¬A ˄ B ˄ ¬C ˄ D) + P(¬A ˄ B ˄ C ˄ D) = process ‘complete world states’ like first question
C
A
B
P(C=true | A, B)
false
false
true
true
false
true
false
true
0.2
0.4
0.6
0.8
D
A
B
C
P(D=true | A, B, C)
false
false
false
false
true
true
true
true
false
false
true
true
false
false
true
true
false
true
false
true
false
true
false
true
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
What is the probability that C is true given A is false, B is true, and D is true?
= P(C | ¬A ˄ B ˄ D) = P(C ˄ ¬A ˄ B ˄ D) / P(¬A ˄ B ˄ D) = process like first and second questions
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
3
From Fall 2014 Final
Consider the following training set, where three Boolean-valued
features are used to predict a Boolean-valued output. Assume you
wish to apply the Naïve Bayes algorithm. Calculate the ratio below and
use pseudo examples.
Ex #
A
B
C
Output
1
True False True
True
2
False True False
True
3
False True True
True
4
False False True
False
Prob(Output = True | A = False, B = False, C = False)
____________________________________________ =
_________________
Prob(Output = False | A = False, B = False, C = False)
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
4
From Fall 2014 Final
Consider the following training set, where three Boolean-valued
features are used to predict a Boolean-valued output. Assume you
wish to apply the Naïve Bayes algorithm. Calculate the ratio below and
use pseudo examples.
Ex #
A
B
C
Output
1
True False True
True
2
False True False
True
3
False True True
True
4
False False True
False
Prob(Output = True | A = False, B = False, C = False)
____________________________________________ =
_________________
Prob(Output = False | A = False, B = False, C = False)
P(¬A | Out)  P (¬B | Out)  P (¬C | Out)  P ( Out)
= ________________________________________________ =
(3 / 5)  (2 / 5)  (2 / 5)  (5 / 8)
___________________________
P(¬A | ¬Out)  P (¬B | ¬Out)  P (¬C | ¬Out)  P (¬Out)
(2 / 3)  (2 / 3)  (1 / 3)  (3 / 8)
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
Assume FOUR
pseudo examples
(ffft, tttt, ffff, tttf)
5
The Big Picture (not a BN, but like mini-max)
- provided s/w gives you set of (annotated) legal moves
- if zero or one, the s/w passes or makes the only possible move
Current
NANNON Board
Possible Next
Board ONE
Possible Next
Board THREE
Possible Next
Board TWO
Four Effects of MOVES
HIT:
_XO_
BREAK:
_XX_
EXTEND:
_X_XX
CREATE:
_X_X_
10/27/15




__X_
_X_X
__XXX
__XX_
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
Choose move that
gives best odds of
winning
6
Reinforcement Learning (RL)
vs. Supervised Learning
• Nannon is Really an RL Task
• We’ll Treat as a SUPERVISED ML Task
– All moves in winning games considered GOOD
– All moves in losing games considered BAD
• Noisy Data, but Good Play Still Results
• ‘Random Move’ & Hand-Coded
Players Provided
• Provided Code can make 106 Moves/Sec
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
7
What to Compute?
Multiple Possibilities (Pick only One)
P(move in winning game | current board  chosen move) OR
P(move in winning game | next board) OR
P(move in winning game | next board  prev board) OR
P(move in winning game | next board  effect of move) OR
Etc.
10/27/15
Hit, break,
extend, create,
or some combo
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
8
`Raw’ Random Variables
Representing the Board
# of Safe
Pieces for
O
# of Home
Pieces for
X
What is on
Board Cell i
(X, O, or empty)
# of Home
Pieces for
O
Board size varies (L cells)
Number of pieces each player has also varies (K pieces)
Full Joint Size for Above = K  (K+1)  3L  (K+1)  K
- for L=12 and K=5, | full joint | = 900 x 312 = 478,296,900
# of Safe
Pieces for
X
You can also
create
‘derived’
features, eg,
‘inDanger’
Some Possible Ways of Encoding the Move
- die value
- which of 12 (?) possible effect combo’s occurred
- moved from cell i to cell j (L2 possibilities; some not possible with 6-sided die)
- how many possible moves there were (L – 2 possibilities)
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
9
HW3: Draw a BN then Implement the
Calculation for that BN (also do NB)
WIN
S1
Odds(WIN) =
S2
S3
…
Sn
[  P(Si = valuei | WIN=true) ] x P(WIN=true)
[  P(Si = valuei | WIN=false) ] x P(WIN=false)
Recall: Choose move that
gives best odds of
winning
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
10
Going Slightly Beyond NB
WIN
S1
Odds(WIN) =
S2
S3
…
P(S1 = ? | S2 = ?  WIN)  [  P(Si = ? |
Sn
WIN) ] x P( WIN)
P(S1 = ? | S2 = ?  WIN)  [  P(Si = ? | WIN) ] x P(WIN)
Here the
PRODUCT is
from 2 to n
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
11
Going Slightly Beyond NB (Part 2)
WIN
S1
Odds(WIN) =
S2
S3
…
P(S1 = ?  S2 = ? | WIN)  [  P(Si = ? |
10/27/15
WIN) ] x P( WIN)
P(S1 = ?  S2 = ? | WIN)  [  P(Si = ? | WIN) ] x P(WIN)
A little bit of joint probability!
Used:
Sn
Here the
PRODUCT is
from 3 to n
P(S1 = ?  S2 = ? | WIN) = P(S1 = ? | S2 = ?  WIN) x P(S2 = ? | WIN)
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
12
Some Possible Java Code
Snippets for NB
private static int boardSize = 6; // Default width of board.
private static int pieces
= 3; // Default #pieces per player.
…
int homeX_win[] = new int[pieces + 1]; // Holds p(homeX=? | win).
int homeX_lose[] = new int[pieces + 1]; // Holds p(homeX=? | !win).
int safeX_win[] = new int[pieces];
// NEED TO ALSO DO FOR ‘0’!
int safeX_lose[] = new int[pieces];
// Be sure to initialize!
int board_win[][] = new int[boardSize][3];
int board_lose[][] = new int[boardSize][3];
int wins
= 1;
// Remember m-estimates.
int losses
= 1;
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
13
Exploration vs. Exploitation
Tradeoff
• We are not getting iid data since the data
we get depends on the moves we choose
• Always doing what we currently think is best
(exploitation) might be a local minimum
• So we should try out seemingly non-optimal moves
now and then (exploration), but likely to lose game
• Think about learning how to get from home to work
- many possible routes, try various ones now and then,
but most days take what has been best in past
• Simple sol’n for HW3: observe 100,000 games where two
random-move choosers play each other (‘burn-in’ phase)
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
14
Stationarity
• What About the Fact Opponent also Learns?
• That Changes the Probability Distributions
We are Trying to Estimate!
• However, We’ll Assume that the Prob
Distribution Remains Unchanged
(ie, is Stationary) While We Learn
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
15
Have a Class Tourney?
• Everyone Plays Everyone Else,
Many Times across Various Combo’s
of Board Size x #Pieces
• Won’t Impact Course Grade
• Opt In or Opt Out?
(Student names not shared)
• Exceedingly Slow, Memory Hogging, or
Crashing Code Disqualified
• Yahoo Research Sponsored in Past but
Not Appropriate Here (since most of you have jobs)
10/27/15
CS 540 - Fall 2015 (Shavlik©), Lecture 17, Week 8
16