Projection Methods

Download Report

Transcript Projection Methods

Projection Methods
(Symbolic tools we have used to do…)
Ron Parr
Duke University
Joint work with:
Carlos Guestrin (Stanford)
Daphne Koller (Stanford)
Overview
• Why?
– MPDs need value functions
– Value function approximation
– “Good” approximate value functions
• How
–
–
–
–
–
Approximation architecture
Dot products in large state spaces
Expectation in large state spaces
Orthogonal projection
MAX in large state spaces
Why You Need Value Functions
Given current configuration:
• Expected value of all widgets produced by factory
• Expected number of steps before failure
DBN - MDPs
Time
t
t+1
X
State
Variables
Y
Z
Action
Adding rewards
t
t+1
X
R2
Y
Reward have small
sets of parent variables too
Total reward adds
sub-rewards:
R=R1+R2
Z
R1
Computing Values
V  γPV  R
Value
Function
Symbolic transition model
(DBN)
Q: Does V have a convenient, compact form?
Compact Models = Compact V?
t
t+1
t+2
t+3
X
Y
Z
R=+1
yz
yz
yz
yz
yz
yz
yz
yz
yz
yz
yz
yz
yz
yz
yz
yz
x
x
x
x
x
x
x
x
Enter Value Function Approximation
• Not enough structure for exact, symbolic
methods in many domains
• Our approach:
–
–
–
–
Combine symbolic methods with VFA
Define a restricted class of value functions
Find the “best” in that class
Bound error
Linearly Decomposable Value Functions
Note:
Overlapping
is allowed!
Approximate high-dimensional functions with
a combination of lower-dimensional functions
Motivation: Multi-attribute utility theory (Keeney & Raifa)
Decomposable Value Functions
Linear combination of functions:
~
V ( s )  i wi hi ( s )
• Each hi has a domain of a small set of variables
• Each hi a feature of a complex system
– status of a machine
– inventory of a store
• Also: think of each hi as a basis function
Matrix Form
ˆ
V  Aw assigns a value to every state
K basis functions
A=
h1(s1) h2(s1)...
h1(s2) h2(s2)…
.
.
.
states
Note for linear
Algebra fans:
Vˆ is a linear function
in the column space
of h1…hk
Defining a fixed point
V  γPV  R
Standard fixed
point equation
ˆ
V  Aw  (PAw R)
Fixed point
With approximation
Projection operator
We use orthogonal projection to force V to have the desired form.
Solving for the fixed point


1
w  A A  A PA A R
T
T
T
LSTD[Bratdke & Barto 96]
O(k2n)
Theorem: w has a solution for all but finitely many
discount factors [Koller & Parr 00]
Note: The existence of a solution is a weaker condition than the
contraction property required for iterative, value iteration based methods.
Key Operations
• Backprojection of a basis function: Phi
• Dot product of two restricted domain basis
functions hi  h j
If these two operations can be done efficiently:


1
w  A A  A PA A R
T
kxk
T
kxk
T
kx1
Solution Cost for k basis functions: matrix inversion
Backprojection = 1-step Expectation
X
Y
Z
Ph1  f ( yz) h1  f (z)
yz
yz
yz
yz
yz
yz
yz
yz
x
x
x
x
Important: Single
step of lookahead
only - no more
Efficient dot product
Need to compute: s hi ( s )h j ( s )
e.g.: h1 = f(x), h2 = f(y)
x
x
x
x
x
yz
yz
yz
yz
yz
yz
yz
yz
yz
yz
yz
yz
h1
h2
x
h1  h2
Symbolic Linear VFA
• Incurs only 1 step worth representation blowup
– Solve directly for fixed point
– Contrast with bisimulation/structured DP
• Exact
• Iterative – representation grows with each step
• No a priori quality guarantees
• a posteriori quality guarantees
Error Bounds
How are we doing?:
(one-step lookahead expected value)
V *  Vˆ


Vˆ  (PVˆ  R)
1 

(max one-step error)
Claim:
•Equivalent to maximizing sum of restricted domain functions
•Use a cost network (Dechter 99)
Cost Networks
• Can use variable elimination to maximize over state
space: [Bertele & Brioschi ‘72]
max f1 ( A, B )  f 2 ( A, C )  f 3 (C , D )  f 4 ( B, D )
A, B ,C , D
 max f1 ( A, B )  f 2 ( A, C )  max f 3 (C , D )  f 4 ( B, D )
A, B ,C
D
 max f1 ( A, B )  f 2 ( A, C )  g1 ( B, C )
A, B ,C
f1
B

f2
C
f4
Here we need only 16, instead of 64 sum operations.

A
As in Bayes nets, maximization is exponential in
size of largest factor.
NP-hard in general
D
f3
Checkpoint
• Starting with:
– Factored model (DBN)
– Restricted value function space
(restricted domain basis functions)
• Find fixed point in restricted space
• Bound solution quality a posteriori
• But: Fixed point may not have lowest max
norm error
Max-norm Error Minimization
Aw  R  P Aw
w  arg minw A  P Aw  R

w  arg minw Hw  b 
• General max-norm error minimization
• Symbolic operation over large state spaces
General Max-norm Error Minimization
• Algorithm for finding:
w*  argminw Hw  b 

Solve by Linear Programming:
Variables: w1 ,...,wk ,  ;
[Cheney ’82]
Minim ize:  ;
 k

Subject to:   max   wi hi ( s )  b ( s )  and
s
 i 1

k


  max  b( s )   wi hi ( s )  .
s
i 1


Symbolic max-norm minimization
• For fixed weights w, compute max-norm:
  Hw  b   max  wi hi ( s)  b( s)
s


i
However, if basis and target are functions of only a few
variables, we can do it efficiently!
Cost Networks can maximize over large state spaces
efficiently when function is factored:
max  f i (Ci ),
X 1X n
where Ci  X 1  X n 
Representing the Constraints
• Explicit representation is exponential (|S|=2n):

k
 w h ( s)  b( s) , s  1 S
i 1

i
i
If basis and target are factored, can use Cost Networks
to represent the constraints:
  max f1 ( A, B)  f 2 ( A, C )  max  f 3 (C , D)  f 4 ( B, D)
A, B ,C
D
  max f1 ( A, B)  f 2 ( A, C )  g1( B,C )
A, B ,C
( B ,C )
g1
 f 3 (C, D)  f 4 ( B, D)
Conclusions
• Value function approximation w/error bounds
• Symbolic operations (no sampling!)
• Methods over large state spaces
– Orthogonal Projection
– Max-norm error minimization
• Tools over large state spaces
– Expectation
– Dot product
– Max