BWSN Status Update - Carnegie Mellon University

Download Report

Transcript BWSN Status Update - Carnegie Mellon University

Nonmyopic Active Learning
of Gaussian Processes
An Exploration – Exploitation Approach
Andreas Krause, Carlos Guestrin
Carnegie Mellon University
River monitoring
Mixing zone of
San Joaquin and
Merced rivers
7.8
7.6
7.4
pH value
8
NIMS
(Kaiser et al,
UCLA)
Position along transect (m)

Want to monitor ecological condition of river

Need to decide where to make observations!
Observation selection for
spatial prediction
pH value
observations
Prediction
Confidence
bands
Unobserved
process
Horizontal position

Gaussian processes


Allow prediction at unobserved locations (regression)
Allows estimating uncertainty in prediction
Mutual Information
[Caselton Zidek 1984]


Finite set of possible locations V
For any subset A µ V, can compute
Entropy of
uninstrumented
locations
before sensing
Entropy of
uninstrumented
locations
after sensing
Want: A* = argmax MI(A) subject to |A| · k

Finding A* is NP hard optimization problem 
The greedy algorithm


Want to find: A* = argmax|A|=k MI(A)
Greedy algorithm:

Start with A = ;

For i = 1 to k


s* := argmaxs MI(A [ {s})
A := A [ {s*}
Theorem [ICML 2005, with Carlos Guestrin, Ajit Singh]
Constant
Result of
greedy algorithm factor, ~63%
Optimal
solution
A priori vs. sequential

Greedy algorithm finds near-optimal a priori set:


Sensors are placed before making observations
In many practical cases, we want to
sequentially select observations:

Select next observation depending on the previous
observations made
Focus of the talk!
Sequential design
XX55=21
=17
=?
XX33=16
=?
XX77=19
=?
MI(X5=17, X3=16,
X7=19) = 3.4


Observation
policy 
X2 =?
X12=?
MI(…) = 2.1
X23 =?
MI(…) = 2.4
MI() = 3.1
Observed variables depend on previous measurements
and observation policy 
MI() = expected MI score over outcome of observations
Is sequential better?

Sets are very simple policies. Hence:
maxA MI(A) · max MI() subject to |A|=||=k

Key question addressed in this work:
How much better is sequential vs. a priori design?

Main motivation:


Performance guarantees about sequential design?
A priori design is logistically much simpler!
GPs slightly more formally


XV
pH value

8
Set of locations V
7.8
7.6
Joint distribution P(XV)
For any A µ V, P(XA) Gaussian7.4
…
V


…
Position along transect (m)
GP defined by
Prior mean (s) [often constant, e.g., 0]
Kernel K(s,t)
Example: Squared
exponential kernel
1
Correlation

1: Variance
(Amplitude)
0.5
0
4
2
0
Distance
2
4
2: Bandwidth
Known parameters
Known parameters 
(bandwidth, variance, etc.)
Mutual Information does not
depend on observed values:
No benefit in sequential design!
maxA MI(A) = max MI()
Unknown parameters
Unknown parameters:
Bayesian approach: Prior P( = )
Mutual Information does
depend on observed values!
depends on observations!
Sequential design can be better!
maxA MI(A) · max MI()
Assume
discretized
in this talk
Key intuition of our main result
Gap depends
on is
H()
How large
this gap?
No gap!
MI
0
Best set
MI(A*)
MI(*)
Best policy

If = known:
MI(A*) = MI(*)
If  “almost” known: MI(A*) ¼ MI(*)

“Almost” known  H() small

How big is the gap?
Theorem: As H() ! 0
If H() small, no point in active learning:
we can concentrate on finding the best set A*!
Near-optimal policy if parameter
approximately known

Use greedy algorithm to optimize
MI(Agreedy | ) =  P() MI(Agreedy | )
If parameters almost known,
Corollary
[using
our resultsequential
from ICML
05]
can find
near-optimal
policy.
What if parameters are unknown?
Result of
Gap ¼ 0
~63% Optimal
greedy algorithm
seq. plan (known par.)
Exploration—Exploitation for GPs
Reinforcement
Learning
Active Learning in
GPs
Parameters
P(St+1|St, At), Rew(St) Kernel parameters 
(Almost) Known
parameters:
Find near-optimal
Find near-optimal
policy by solving MDP! policy by finding best set
Exploitation
Unknown
Try to quickly learn
parameters: parameters!
Exploration Need to waste only
polynomially many
robots! 
Try to quickly learn
parameters.
How many samples
do we need?
Info-gain exploration (IGE)


Gap depends on H()
Intuitive heuristic: greedily select
s* = argmaxs H() – H( | Xs)


No sample complexity bounds 
Does not directly try to improve spatial prediction
Implicit exploration (IE)

Sequential greedy algorithm: Given previous
observations XA = xA, greedily select
Neither of
the two strategies has
s* = argmax
s MI ({s} | XA=xA, )
sample complexity bounds 

Contrary to a priori greedy, this algorithm takes
Is there
any way to
get them?parameters)
observations
into account
(updates
Proposition: H( | X) · H()
“Information never hurts” for policies
No sample
complexity
bounds 
Learning the bandwidth
Kernel
Bandwidth
Sensors outside
bandwidth are
¼ independent
Sensors within
bandwidth are
correlated
Can narrow down kernel bandwidth by sensing
within and outside bandwidth distance! 
Hypothesis testing:
Distinguishing two bandwidths

Square exponential kernel:
1
BW = 3
0.8
0.6
2
2
0
0
-2
-2
0.4
BW = 1
0.2
-2
0
-4
-2
0
2
2
-2
0
2
4
Distance 
correlation gap largest

0
Correlation
under BW=3
Choose pairs of samples at distance 
to test correlation!
Correlation
under BW=1
Hypothesis testing:
Sample complexity
Theorem: To distinguish bandwidths with
minimum gap  in correlation and error < 
we need
independent samples.



In GPs, samples are dependent, but “almost”
independent samples suffice! (details in paper)
Other tests can be used for variance/noise etc.
What if we want to distinguish more than two
bandwidths?
Hypothesis testing:
Searching for bandwidth
Find “most informative split” at posterior median
Test: BW>2?
Test: BW>3?
0.6
P()

0.4
0.2
0
1
2
3
4
5
Testing policy ITE needs
only logarithmically
many tests! 
Hypothesis testing:
Exploration Theorem
Theorem: If we have tests with error < T then
Logarithmic sample complexity
T
ITE
error probability of hypothesis tests
Hypothesis testing exploration policy
Exploration—Exploitation Algorithm

Exploration phase




Exploitation phase


Sample according to exploration policy
Compute bound on gap between best set and best policy
If bound < specified threshold, go to exploitation
phase, otherwise continue exploring.
Use a priori greedy algorithm select remaining samples
For hypothesis testing, guaranteed to proceed to
exploitation after logarithmically many samples! 
50
OFFICE
52
49
12
9
54
OFFICE
51
53
QUIET
PHONE
11
8
13
14
7
48
L AB
ELEC
45
SERVER
39
ITE
More observations


25
More param. uncertainty
More RMS error
IE
20
41
38
22
1
23
33
35
36
20
29
27
31
34
25
32
30
28
24
26
Temperature data
1
15
37
40
IGE
10
21
3
2
KITCHEN
43
1.5
5
19
6
4
44
ITE: Hypothesis testing
IE: Implicit exploration
0
5
46
IGE: Parameter info-gain
0.5
COPY
47
42
2
17
18
STORAGE
Results
16
15
10
CONFERENCE
0.5
IE
0.45
0.4
ITE
0.35
0.3
0
IGE
5
10
15
More observations
None of the strategies dominates each other
Usefulness depends on application
20
River data
pH data from Merced river


Isotropic process is a bad fit
Need nonstationary approach
Nonstationarity by spatial
partitioning


1
0
10


20
30
40
Coordinates (m)
50
Partition into regions
Isotropic GP for each
region, weighted by
region membership
Final GP is spatially
varying linear
combination
Exploration—Exploitation approach applies to
nonstationary models as well!
Nonstationary GPs
Stationary fit



Nonstationary fit
Nonstationary model fits much better
Problem: Parameter space blows up exponentially in #regions
Solution: Variational approximation (BK-style) allows efficient
approximate inference (Details in paper) 
More RMS error
Results on River data
sequential (IE)
isotropic
a priori,
nonstationary
0.1
sequential (IE)
nonstationary
0.05
0
0
pH data from Merced river
(Kaiser et al.)
10
20
30
40
More observations

Nonstationary model + active learning
lead to lower RMS error
Conclusions



Nonmyopic approach towards active learning in GPs
If parameters known, greedy algorithm achieves
near-optimal exploitation
If parameters unknown, perform exploration






Implicit exploration
Explicit, using information gain
Explicit, using hypothesis tests, with logarithmic sample
complexity bounds!
Each exploration strategy has its own advantages
Can use bound to compute when to stop exploring
Presented extensive evaluation on real world data

See poster yesterday for more details