Introduction - Stanford University

Download Report

Transcript Introduction - Stanford University

General Game Playing
Lecture 3
Game Description
Michael Genesereth / Nat Love
Spring 2006
Finite Synchronous Games
Finite environment
Environment with finitely many states
One initial state and one or more terminal states
Finite Players
Fixed finite number of players
Each with finitely many “actions”
Each with one or more goal states
Synchronous Update
All players move on all steps (some no ops)
Environment changes only in response to moves
2
Example
aa
s2
aa
ba
s1
ab
bb
ab
s3
ab
s5
ab
s6
ba
ab
s4
aa
aa
bb
aa
s7
s8
s9
ba
aa
bb
ab
s11
aa
bb
s10
3
Example Revisited
S ={s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11}
I1={a, b} I2={a, b}
u(s1,a,a,s2)
u(s1,a,b,s3)
u(s1,b,a,s4)
u(s1,b,b,s1)
u(s2,a,a,s5)
u(s2,a,b,s3)
u(s4,a,a,s7)
u(s4,a,b,s3)
u(s6,a,a,s5)
u(s6,a,b,s9)
u(s6,b,a,s7)
u(s6,b,b,s3)
u(s7,a,a,s10) u(s10,b,a,s9)
u(s8,a,b,s5) u(s10,b,b,s11)
u(s8,b,b,s11) u(s11,a,a,s11)
u(s9,a,a,s8)
u(s9,a,b,s11)
i=s1
T ={s3, s8, s11}
G1={s8, s11} G2={s3, s7}
4
Games as Mathematical Structures
An n-player game is a structure with components:
S - finite set of states
I1, …, In - finite sets of actions, one for each player
u  S  I1 ... In  S - update relation
i  S - initial game state
T  S - the terminal states
G1, ..., Gn - where Gi  S - the goal relations
5
Direct Description
Good News: Since all of the games that we are
considering are finite, it is possible in principle to
communicate game information in the form of
sets of objects and tuples of objects.
Problem: Size of description. Even though
everything is finite, these sets can be large.
6
Relational Nets
7
States versus Propositions
In many cases, worlds are best thought of in terms
of propositions, e.g. whether a particular light is on
or off. Actions often affect a subset of propositions.
States represent all possible ways the world can be.
As such, the number of states is exponential in the
number of such propositions, and the action tables
are correspondingly large.
Idea - represent propositions directly and describe
actions in terms of their effects on indidvidual
propositions rather than entire states.
8
Tic-Tac-Toe
X
O
X
9
Relational States
cell(1,1,x)
cell(1,2,b)
cell(1,3,b)
cell(2,1,b)
cell(2,2,o)
cell(2,3,b)
cell(3,1,b)
cell(3,2,b)
cell(3,3,x)
control(black)
10
Transitions
cell(1,1,x)
cell(1,2,b)
cell(1,3,b)
cell(2,1,b)
cell(2,2,o)
cell(2,3,b)
cell(3,1,b)
cell(3,2,b)
cell(3,3,x)
control(black)
noop
mark(1,3)
cell(1,1,x)
cell(1,2,b)
cell(1,3,o)
cell(2,1,b)
cell(2,2,o)
cell(2,3,b)
cell(3,1,b)
cell(3,2,b)
cell(3,3,x)
control(white)
11
Logical Encoding
init(cell(1,1,b))
init(cell(1,2,b))
init(cell(1,3,b))
init(cell(2,1,b))
init(cell(2,2,b))
init(cell(2,3,b))
init(cell(3,1,b))
init(cell(3,2,b))
init(cell(3,3,b))
init(control(x))
legal(W,mark(X,Y)) :true(cell(X,Y,b)) &
true(control(W))
legal(white,noop) :true(cell(X,Y,b)) &
true(control(black))
legal(black,noop) :true(cell(X,Y,b)) &
true(control(white))
…
12
Syntax of Relational Logic
13
Syntax of Relational Logic
Vocabulary:
Object Variables: X, Y, Z
Object Constants: a, b, c
Function Constants: f, g, h
Relation Constants: p, q, r
Logical Operators: ~, &, |, :-, #
Terms:
Variables: X, Y, Z
Object Constants: a, b, c
Functional Terms: f(a), g(a,b), h(a,b,c)
Sentences:
Simple Sentences: p(a,g(a,b),c)
Logical Sentences: r(X,Y) :- p(X,Y) & ~q(Y)
14
Safety
A rule is safe if and only if every variable in the head appears
in some positive subgoal in the body.
Safe Rule:
r(x, y) :  p(x, y)  q(y, z)
Unsafe Rule:
r(x, z) :  p(x, y)  q(y, x)
In GDL, we require all rules to be safe.
15
Dependency Graph
The dependency graph for a set of rules is a directed graph in
which (1) the nodes are the relations mentioned in the head
and bodies of the rules and (2) there is an arc from a node p to
a node q whenever p occurs with the body of a rule in which q
is in the head.
t
r(x, y) :  p(x, y),q(x, y)
s
s(x, y) : r(x, y)
r
s(x, z) : r(x, y),t(y, z)
t(x, z) : s(x, y),s(y, x)
p
q
A set of rules is recursive if it contains a cycle. Otherwise, it is
non-recursive.
16
Recursion
A set of rules is recursive if it contains a cycle. Otherwise, it is
non-recursive.
t
r(x, y) :  p(x, y),q(x, y)
s
s(x, y) : r(x, y)
s(x, z) : r(x, y),t(y, z)
t(x, z) : s(x, y),s(y, x)
r
p
q
17
Stratified Negation
The negation in a set of rules is said to be stratified if there is
no recursive cycle in the dependency graph involving a
negation.
Stratified Negation:
Negation that is not stratified:
In GDL, we require that all negations be stratified.
18
Extensional and Intensional Relations
Database applications start with a partial database, i.e.
sentences for some relations (extensional relations) and not
others (intensional relations). Rules are then written to define
the intensional relations in terms of the extensional relations.
rules
Extensional
Intensional
Given an extensional database and a set of rules, we can obtain
the database’s closure as follows.
19
Example
Database applications start with a partial database, i.e.
sentences for some relations (extensional relations) and not
others (intensional relations). Rules are then written to define
the intensional relations in terms of the extensional relations.
Given an extensional database and a set of rules, we can obtain
the database’s closure as follows.
20
Single Rule
The value of a single non-recursive rule on a database D is the
set of all rule heads obtained by consistently substituting
ground terms from D for variables in such a way that the
substituted subgoals are all in D.
Sample Rule:
q(x, z) :  p(x, y), p(y, z)
Database:
{p(a, b), p(b,c), p(c,d)}
Extension:
{q(a, c),q(b,d)}
21
Multiple Rules
The value of a set of rules with a common relation on a
database D is the union of the values on the individual rules.
Sample Rules:
q(x, y) :  p(x, y)
q(x, z) :  p(x, y), p(y, z)
Sample Database:
{p(a, b), p(b,c), p(c,d)}
Value:
{q(a, b), q(b,c),q(c, d),q(a,c), q(b, d)}
22
Multiple Relations
The value of a set of non-recursive rules with different head
relations is obtained by evaluating rules in order in which their
head relations appear in the corresponding dependency graph.
Sample Rules:
s(x, y) :  p(x, y),q(x, y)
t(x, z) : s(x, y),r(y, z)
Value Computation:
{p(a, b), p(b,c), q(b, c),r(c, d)}
{s(b,c)}
{t(a,c)}
23
Recursion
To compute the value of a recursive rule, start with the empty
relation. Compute the value using multiple rule computation.
Iterate till no new tuples are added.
Sample Rules:
q(x, y) :  p(x, y)
q(x, z) : q(x, y), q(y, z)
Value Computation:
{p(a, b), p(b,c), p(c,d)}
{q(a, b), q(b,c),q(c, d)}
{q(a, c),q(b,d)}
{q(a, d}
24
Negation
There are various ways to compute the value of negative rules.
In classical negation, a negation is true only if the negated
sentence is known to be false (i.e. there must be rules
concluding negated sentences). This is the norm in
computational logic systems. In GDL, we do not have such
rules.
In negation as failure, a negation is true if and only if the
negated sentence is not known to be true. This is the norm in
database systems.
25
Negation as Failure Example
Definition:
childless(x) :  person(x),father(x), mother(x)
Value Computation:
{person( joe), person(bill), father( joe)}
{childless(bill)}
26
Game Description Language
27
Game-Independent Vocabulary
Object Constants:
0, 1, 2, 3, … - numbers
Relation Constants:
role(player)
init(proposition)
true(proposition)
next(proposition)
legal(player,action)
does(player,action)
goal(proposition)
terminal
28
Tic-Tac-Toe Vocabulary
Object constants:
white, black - players
x, o, b - marks
Function Constants:
mark(number,number) --> action
cell(number,number,mark) --> proposition
control(player) --> proposition
RelationConstants:
row(number,player)
column(number,player)
diagonal(player)
line(player)
open
29
Extensional and Intensional Relations
Extensional Relations:
does(player,action)
true(proposition)
Intensional Relations:
role(player)
init(proposition)
legal(player,action)
next(proposition)
goal(proposition,score)
terminal
30
Roles
role(white)
role(black)
31
Initial State
init(cell(1,1,b))
init(cell(1,2,b))
init(cell(1,3,b))
init(cell(2,1,b))
init(cell(2,2,b))
init(cell(2,3,b))
init(cell(3,1,b))
init(cell(3,2,b))
init(cell(3,3,b))
init(control(x))
32
Legality
legal(W,mark(X,Y)) :true(cell(X,Y,b)) &
true(control(W))
legal(white,noop) :true(cell(X,Y,b)) &
true(control(black))
legal(black,noop) :true(cell(X,Y,b)) &
true(control(white))
33
Physics
next(cell(M,N,x)) :does(white,mark(M,N))
next(cell(M,N,o)) :does(black,mark(M,N))
next(cell(M,N,Z)) :does(W,mark(M,N)) &
true(cell(M,N,Z)) & Z#b
next(cell(M,N,b)) :does(W,mark(J,K)) &
true(cell(M,N,b)) & (M#J | N#K)
next(control(white)) :true(control(black))
next(control(black)) :true(control(white))
34
Supporting Concepts
row(M,W) :diagonal(W) :true(cell(M,1,W)) &
true(cell(1,1,W)) &
true(cell(M,2,W)) &
true(cell(2,2,W)) &
true(cell(M,3,W))
true(cell(3,3,W))
column(N,W) :diagonal(W) :true(cell(1,N,W)) &
true(cell(1,3,W)) &
true(cell(2,N,W)) &
true(cell(2,2,W)) &
true(cell(3,N,W))
true(cell(3,1,W))
line(W) :- row(M,W)
line(W) :- column(N,W)
line(W) :- diagonal(W)
open :- true(cell(M,N,b))
35
Goals and Termination
goal(white,100) :- line(x)
goal(white,50) :- ~line(x) & ~line(o) & ~open
goal(white,0) :- line(o)
goal(black,100) :- line(o)
goal(white,50) :- ~line(x) & ~line(o) & ~open
goal(white,0) :- line(x)
terminal :- line(W)
terminal :- ~open
36
More Tedious Details
37
No Built-in Assumptions
What we see:
next(cell(M,N,x)) :does(white,mark(M,N)) &
true(cell(M,N,b))
What the player sees:
next(welcoul(M,N,himenoing)) :does(himenoing,dukepse(M,N)) &
true(welcoul(M,N,lorenchise))
38
Knowledge Interchange Format
Knowledge Interchange Format is a standard for programmatic
exchange of knowledge represented in relational logic.
Syntax is prefix version of standard syntax.
Some operators are renamed: not, and, or.
Case-independent. Variables are prefixed with ?.
r(X,Y) <= p(X,Y) & ~q(Y)
(<= (r ?x ?y) (and (p ?x ?y) (not (q ?y))))
(<= (r ?x ?y) (p ?x ?y) (not (q ?y)))
Semantics is the same.
39
Agent Communication Language
Start Message
(start id role (s1 … sn) startclock playclock)
Play Message
(play id (a1 ... ak))
Stop Message
(stop id (a1 ... ak))
40
41
Propositional Nets
42
Buttons and Lights
p
q
r
a
b
c
43
Relational States
p
p
q
r
p
q
44
State Machine
p
p
q
r
p
q
45
Logical Encoding
init(q)
legal(robot,a)
legal(robot,b)
legal(robot,c)
next(p)
next(p)
next(p)
next(q)
next(q)
next(q)
next(r)
next(r)
next(r)
:::::::::-
does(robot,a)
does(robot,b)
does(robot,c)
does(robot,a)
does(robot,b)
does(robot,c)
does(robot,a)
does(robot,b)
does(robot,c)
&
&
&
&
&
&
&
&
&
-true(p)
true(q)
true(p)
true(q)
true(p)
true(q)
true(r)
true(r)
true(q)
goal :- true(p) & -true(q) & true(r)
term :- true(p) & -true(q) & true(r)
46
Buttons and Lights Formalization
S ={s1, s2, s3, s4, s5, s6, s7, s8}
I = {a, b, c}
u(s1,a,s5)
u(s2,a,s2)
u(s3,a,s3)
u(s4,a,s4)
u(s000,b,s010)
u(s000,b,s010)
u(s000,b,s010)
u(s000,b,s010)
u(s000,c,s001)
u(s000,c,s001)
u(s000,c,s001)
u(s000,c,s001)
I = s1
T = {s8}
G = {s8}
47
Buttons and Lights Formalization
P ={p, q, r}
I = {a, b, c}
u(s1,a,s5)
u(s2,a,s2)
u(s3,a,s3)
u(s4,a,s4)
u(s000,b,s010)
u(s000,b,s010)
u(s000,b,s010)
u(s000,b,s010)
u(s000,c,s001)
u(s000,c,s001)
u(s000,c,s001)
u(s000,c,s001)
I = s1
T = {s8}
G = {s8}
48
Transitions
cell(1,1,x)
cell(1,2,b)
cell(1,3,b)
cell(2,1,b)
cell(2,2,o)
cell(2,3,b)
cell(3,1,b)
cell(3,2,b)
cell(3,3,x)
control(black)
noop
mark(1,3)
cell(1,1,x)
cell(1,2,b)
cell(1,3,o)
cell(2,1,b)
cell(2,2,o)
cell(2,3,b)
cell(3,1,b)
cell(3,2,b)
cell(3,3,x)
control(white)
49
Buttons and Lights
p
q
r
s
a
b
c
d
50
Buttons and Lights Formalization
S ={s000, s001, s010, s011, s100, s101, s110, s111}
I = {a, b, c}
u(s000,a,s100) u(s000,b,s010) u(s000,c,s001)
u(s001,a,s001) u(s000,b,s010) u(s000,c,s001)
u(s010,a,s010) u(s000,b,s010) u(s000,c,s001)
u(s011,a,s011) u(s000,b,s010) u(s000,c,s001)
I = s0
T = {sF}
G = {sF}
51
States versus Features
In many cases, worlds are best thought of in terms
of features, e.g. red or green, left or right, high or
low. Actions often affect subset of features.
States represent all possible ways the world can be.
As such, the number of states is exponential in the
number of “features” of the world, and the action
tables are correspondingly large.
Idea - represent features directly and describe how
actions change individual features rather than entire
states.
52
Propositional Net Components
Propositions
p
q
r
Connectives
Transitions
53
Propositional Net
54
Markings
55
Inputs
56
Enablement
?
57
Update
58
Buttons and Lights
p
q
a
b
Pressing button a toggles p.
Pressing button b interchanges p and q.
59
Propositional Net for Buttons and Lights
p
a
q
60
Propositional Nets and State Machines
Propositional Nets as State Machines
s110
State Machines as Propositional Nets
One proposition per state
Only one proposition is true at each point in time
61
Comparison
Propositional Nets vs State Machines
Expressively equivalent and interconvertible
State Machines can be exponentially larger
e.g. state machine for Tic-Tac-Toe has 5478 states
propositional net has 45 propositions
Propositional Nets vs Petri Nets
Propositional Nets are computable
(equivalent to Petri nets with finitely many tokens)
Propositional Nets are composable
without revealing inner details of components
62
Object Nets
p
2,1
1.3
r
q
63
64
Relational Nets
65
Propositional Net Fragment
o11
o12
x11
or1
x12
o13
x13
o21
x21
o22
or2
x22
o23
x23
o31
x31
o32
o33
or3
x32
xr1
xr2
xr3
x33
66
Relational Nets
Decompose states into “relations”.
a
p
ss1
d
a b
bq c
d
e
a
b
c
a d
e
r

Use relational operators to capture
behavior.

a
b
c
p
d
d
e
2,1
d
e
f
q
g
h
h
1.3
a
b
c
r
g
g
h
67
Comparison
Relational Nets vs Propositional Nets
Expressively equivalent and interconvertible
Number of Tuples = Number of Propositions
Fewer Relations than propositions
Fewer connectives
Relational Nets vaguely related to RMDPs
68
Logical Encoding
69
Relational Net
p
2,1
1.3
r
q
70
Possible Relational Net Encoding
Relational Net Fragment
a
b
c
p
d
d
e
2,1
d
e
f
q
g
h
h
1.3
a
b
c
r
g
g
h
Encoding
r(X,Z) :- p(X,Y) & q(Y,Z)
71
Actual Relational Net Encoding
Relational Net Fragment
p
2,1
1.3
r
q
Encoding without delay Encoding with delay
true(r(X,Z)) :true(p(X,Y)) &
true(q(Y,Z))
next(r(X,Z)) :true(p(X,Y)) &
true(q(Y,Z))
72
Tic-Tac-Toe
X
O
X
73
Partial Propositional Net for Tic-Tac-Toe
mark(1,1)
cell(1,1,x)
cell(1,1,b)
mark(1,2)
cell(1,2,x)
cell(1,2,b)
mark(1,3)
cell(1,3,x)
cell(1,3,b)
74
Logical Description
Direct encoding in relational logic:
next(cell(1,1,x)) <=
does(mark(1,1)) &
true(cell(1,1,b))
Use of variables to compact description:
next(cell(M,N,x)) <=
does(mark(M,N)) &
true(cell(M,N,b))
Game-specific “views” / “macros”:
row(M,W) <=
true(cell(M,1,W)) &
true(cell(M,2,W)) &
true(cell(M,3,W))
75
Syntax of Relational Logic
Object Variables: X, Y, Z
Object Constants: a, b, c
Function Constants: f, g, h
Relation Constants: p, q, r
Logical Operators: ~, &, |, :-, distinct
Terms: X, Y, Z, a, b, c, f(a), g(a,b), h(a,b,c)
Relational Sentences: p(a,b)
Logical Sentences: r(X,Y) <= p(X,Y) & ~q(Y)
An expression is ground iff it contains no variables.
The Herbrand base is the set of all ground relational sentences.
76
Legality
legal(W,mark(X,Y)) 
true(cell(X,Y,b)) 
true(control(W))
legal(white,noop) 
true(cell(X,Y,b)) 
true(control(o))
legal(black,noop) 
true(cell(X,Y,b)) 
true(control(x))
77
Update
next(cell(M,N,x)) 
does(white,mark(M,N)) 
true(cell(M,N,b))
next(cell(M,N,o)) 
does(black,mark(M,N)) 
true(cell(M,N,b))
next(cell(M,N,W)) 
true(cell(M,N,W)) 
distinct(W,b)
next(cell(M,N,b)) 
does(W,mark(J,K)) 
true(cell(M,N,b)) 
(distinct(M,J) | distinct(N,K))
78
Update (continued)
next(control(x)) 
true(control(o))
next(control(o)) 
true(control(x))
79
Goals
goal(white,100)  line(x)
goal(white,0)  line(o)
goal(black,100)  line(o)
goal(white,0)  line(x)
line(W)  row(M,W)
line(W)  column(N,W)
line(W)  diagonal(W)
80
Supporting Concepts
row(M,W) 
true(cell(M,1,W)) 
true(cell(M,2,W)) 
true(cell(M,3,W))
column(N,W) 
true(cell(1,N,W)) 
true(cell(2,N,W)) 
true(cell(3,N,W))
diagonal(W) 
true(cell(1,1,W)) 
true(cell(2,2,W)) 
true(cell(3,3,W))
diagonal(W) 
true(cell(1,3,W)) 
true(cell(2,2,W)) 
true(cell(3,1,W))
81
Termination
terminal  line(W)
terminal  ~open
open  true(cell(M,N,b))
82
83
Completeness
Of necessity, game descriptions are logically
incomplete in that they do not uniquely specify the
moves of the players.
Every game description contains complete
definitions for legality, termination, goalhood, and
update in terms of the primitive moves and the does
relation.
The upshot is that in every state every player can
determine legality, termination, goalhood and, given
a joint move, can update the state.
84
Playability
A game is playable if and only if every player has at
least one legal move in every non-terminal state.
Note that in chess, if a player cannot move, it is a
stalemate. Fortunately, this is a terminal state.
In GGP, we guarantee that every game is playable.
85
Winnability
A game is strongly winnable if and only if, for some
player, there is a sequence of individual moves of
that player that leads to a terminating goal state for
that player.
A game is weakly winnable if and only if, for every
player, there is a sequence of joint moves of the
players that leads to a terminating goal state for that
player.
In GGP, every game is weakly winnable, and all
single player games are strongly winnable.
86
Comparison to Extensive Normal Form
In Extensive Normal Form, a game is modeled as a
tree with actions of one player at each node.
In State Machine Form, a game is modeled as a
graph and players’ moves are all synchronous.
In GGP, a game must be described formally. While
ENF and SMF are expressively equivalent for finite
games, SMF descriptions are simpler.
Some players may create game trees from game
descriptions; however, searching game graphs can
be more efficient.
87
Programme for Today
State Machines
Propositional Nets
Relational Nets
Tabular Encoding
Logical Encoding
88
Game Model
An n-player game is a structure with components:
S - finite set of states
I1, …, In - finite sets of actions, one for each player
l1, ..., ln - where li  Ii  S - the legality relations
u  S  I1 ... In  S - update relation
i  S - initial game state
T  S - the terminal states
G1, ..., Gn - where Gi  S - the goal relations
89
Propositional Nets and State Machines
Define states in terms of propositions.
s110
s
p
q
r
Use propositional connectives to capture behavior.
p
r
q
90
Markings
A marking for a propositional net is a function from
the propositions to boolean values.
m: P  {true,false}
91
Acceptability
A marking is acceptable iff it obeys the logical
properties of all connectives.
Negation with input x and output y:
m(y)=true  m(x)=false
Conjunction with inputs x and y and output z:
m(z)=true  m(x)=true  m(y)=true
Disjunction with inputs x and y and output z:
m(z)=true  m(x)=true  m(y)=true
92
Update
A transition is enabled by a marking m iff all of its
inputs are marked true.
The update for a marking m is the partial marking
m* that assigns true to the outputs of all transitions
enabled by m and false to the outputs of all other
transitions.
A successor m’ of a marking m is any complete,
acceptable marking consistent with m*.
93
Example
p
r
q
94
Logical Encoding
cell(1,1,x)
cell(1,2,b)
cell(1,3,b)
cell(2,1,b)
cell(2,2,o)
cell(2,3,b)
cell(3,1,b)
cell(3,2,b)
cell(3,3,x)
control(black)
noop
mark(1,3)
cell(1,1,x)
cell(1,2,b)
cell(1,3,o)
cell(2,1,b)
cell(2,2,o)
cell(2,3,b)
cell(3,1,b)
cell(3,2,b)
cell(3,3,x)
control(white)
95
Arguing for Evaluation Function
Assume evaluation function f partitions states into n
categories S1, …, Sn.
Consider probabilities p1, …, pn of winning in each
category. (More generally, consider expected utilities
u1, …, un.) Use these probabilities (utilities) as
evaluation function values for the corresponding
categories.
Choosing a move that leads to a category with
maximal value maximizes chances of winning.
96
Evaluation Functions
An ideal evaluation function is one that reflects the
expected utility of each state. (In the case of win-lose
games, it is the probability of winning.)
For each terminal state, it is the payoff in that state.
For each nonterminal state, it is the maximum of the
expected utilities of the legal actions in that state.
(The expected utility of an action in a state is the sum
of the expected values of the states resulting from that
action weighted by probabilities of the opponents’
actions.)
97
Evaluation Functions
Choosing moves that maximize expected value
NB: Different priors possible. Random is common.
98