Transcript talk

Nonmyopic Adaptive Informative
Path Planning for Multiple Robots
Amarjeet Singh (UCLA)
Andreas Krause (Caltech)
William Kaiser (UCLA)
rsrg@caltech
..where theory and practice collide
Monitoring rivers and lakes [IJCAI ‘07]
Need to monitor large spatial phenomena
Temperature, nutrient distribution, fluorescence, …
NIMS
Kaiser
et.al.
(UCLA)
Can only make a limited
number of measurements!
Depth
Color indicates actual temperature
Predicted temperature
Use robotic sensors to
cover large areas
Predict at
unobserved
locations
Location
across
Where
should
welake
sense to get most accurate predictions?
2
Urban Search & Rescue
Detection Range
Detected Survivors
How can we coordinate multiple search & rescue
helicopters to quickly locate moving survivors?
3
Related work
Information gathering problems considered in
Experimental design (Lindley ’56, Robbins ’52…), Value of
information (Howard ’66), Spatial statistics (Cressie ’91, …),
Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …),
Sensor Networks (Zhao et al ’04, …), Operations Research
(Nemhauser ’78, …)
Existing algorithms typically
Heuristics: No guarantees! Can do arbitrarily badly.
Find optimal solutions (Mixed integer programming, POMDPs):
Very difficult to scale to bigger problems.
Want algorithms that have theoretical guarantees
and scale to large problems!
4
How to quantify collected information?
Sensing quality function F(A) assigns utility to set A of locations, e.g.,
Expected reduction in MSE for predictions based GP model
F(A1) = 4
F(A2) = 10
Want to pick sensing locations A µ V to maximize F(A)
5
Selecting sensing locations
Given: finite set V of locations
Want:
A*µ V such that
G4
G1
G2
Typically NP-hard!
G3
Greedy algorithm:
Start with A = ;
For i = 1 to k
s* := argmaxs F(A [ {s})
A := A [ {s*}
How well does the greedy algorithm do?
6
Key observation: Diminishing returns
Selection A = {Y1, Y2}
Y1
Selection B = {Y1,…, Y5}
Y2
Y1
Y2
Y3
Y4
Many sensing quality functions are submodular*:
Y5
Information gain [Krause & Guestrin ’05]
Y‘
Adding Y’ doesn’t help much
Adding Y’ will help a lot!
Expected Mean Squared Error [Das & Kempe ’08]
New observation Y’
Detection time / likelihood [Krause et al. ’08]
+ Y’ Large improvement
…
Submodularity:
B A
*See paper for details
+ Y’ Small improvement
For A µ B, F(A [ {Y’}) – F(A) ¸ F(B [ {Y’}) – F(B)
7
Selecting sensing locations
Given: finite set V of locations
Want:
A*µ V such that
G4
G1
G2
Typically NP-hard!
G3
Greedy algorithm:
Start with A = ;
For i = 1 to k
s* := argmaxs F(A [ {s})
A := A [ {s*}
Theorem [Nemhauser et al. ‘78]:
F(AG) ¸ (1-1/e) F(OPT)
Greedy near-optimal! 
8
Challenges for informative path planning
Use robots to monitor
environment
Not just select best k locations A for given F(A). Need to
… take into account cost of traveling between locations
… cope with environments that change over time
… need to efficiently coordinate multiple agents
Want to scale to very large problems and have guarantees
9
Outline and Contributions
Path
Constraints
Dynamic
environments
Multi-robot
coordination
10
Informative path planning
So far:
max F(A) s.t. |A|· k
Most informative
locations
Known as submodular orienteering
problem.
Robot needs to travel
might be far apart!
between&selected
Best known algorithms (Chekuri
Pal ’05, locations
Singh et al ’07) are superpolynomial!
Locations V nodes in a graph
Can we exploit additionalC(A)
structure
to
get
betterpath
=
cost
of
cheapest
s4
algorithms? connecting nodes A
2
1
2
s2
1
s10
s1
1
1
s11
1
s3
1
s5
max F(A) s.t. C(A) · B
Greedy algorithm fails arbitrarily badly!
11
Additional structure: Locality
If A, B are observation sets close by, then F(A [ B) < F(A) + F(B)
If A, B are observation sets, at least r apart, then
F(A [ B) ¼ F(A) + F(B) [we only assume F(A [ B) ¸ ° (F(A) + F(B))]
F(B)
A1
A2
F(A)
B1
r
B2
Call such an F
(r,°)-local
Sensors that are far apart are approximately independent
Holds for many objective functions (e.g., GPs with decaying covariance etc.)
We showed locality is empirically valid!
12
The pSPIELOR Algorithm
based on sensor placement algorithm by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06
pSPIEL: Efficient nonmyopic algorithm
(padded Sensor Placements at Informative and costEffective Locations)
Select starting and ending
location s1 and sB
Decompose sensing region
g
g
g
into small, well-separated
g
g
C
C
clusters
S
Solve cardinality constrained
problem per cluster (greedy)
g
g
g
C
Combine solutions using
g
C
g
g
g
orienteering algorithm
Smooth resulting path
13
2,2
1,2
g1,1
2,1
1,3
2,3
1
S1
2
B
4,3
g4,1
4,4
3,1
3,3
4
4,2
3,2
3
3,4
Guarantees for pSPIELOR
based on results by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06
Theorem: For (r,°)-local submodular F
pSPIEL finds a path A with
submodular utility
path length
F(A) ¸
C(A) ·
(°) OPTF
O(r) OPTC
*See paper for details
14
pSPIEL Results: Search & Rescue
Sensor Planning Research Challenge
Rescued Survivors
Rescue Range
Expected number of survivors rescued
Coordination of multiple mobile sensors to detect survivors
of major urban disaster
Buildings obstruct viewfield of camera
F(A) = Expected # of people detected
80
pSPIEL
60
Greedy
40
20
Heuristic
pSPIELDetection
outperforms
existing
algorithms
Range
(Chao et al)
for informative0path
planning
Detected Survivors
0
20
30
40
50
10
Number of timesteps
15
Outline and Contributions
Path
Constraints
pSPIELOR exploits
(r,°)-locality to near-optimally
solve submodular orienteering
Dynamic
environments
Multi-robot
coordination
16
Dynamic environments
So far: maxA F(A) s.t. C(A) · B
Assumes we know the sensing quality F in advance
Plan a fixed (nonadaptive) path / placement A
In practice:
Model unknown; need to learn as we go
Environment changes dynamically
 Active
learning: Find adaptive policy  that modifies
solution based on observations
Gigantic POMDP (intractable) 
Can we efficiently find a good solution?
17
Sequential sensing
Sensing
policy 
XX55=21
=17
=?
XX33=16
=?
XX77=19
=?
F(X5=17, X3=16,
X7=19) = 3.4
X2 =?
X12=?
F(…) = 2.1
X23 =?
F(…) = 2.4
F() = 3.1
expected utility over
outcome of observations
Want to pick sensing policy ¼ to maximize F(¼)
18
NAÏVE Algorithm [Singh, K, Kaiser, IJCAI ’09]
At each timestep t
Plan nonadaptive solution A* = argmax Ft(A)
Execute first step of nonadaptive solution
Receive observations obs
Update sensing quality Ft+1(A) = Ft(A | obs) 8 A
Efficient!
E.g., using
pSPIEL
Defines a Nonmyopic Adaptive informatIVE policy NAIVE
How well does this policy compare to the optimal policy?
19
Guarantees for NAÏVE-pSPIEL
[Singh, K, Kaiser IJCAI ‘09]
Theorem: (see paper for details)
At every timestep t it holds that
Ft(NAIVE) = (1) Ft(OPT) – O(H(|obs))
Value
of optimal
Uncertainty
Key idea:
Replace
Ft by Gt(¼) = Ft(¼)
+ ¸ I(£ | in
¼)
policy OPT
model parameters 
Application specific
where ¸  0 is a learning rate parameter
Need to trade off exploration (reducing H())
and exploitation (maximizing F(A))
20
Expected number of survivors rescued
Exploration-exploitation tradeoff
100
l = 0.1
80
60
l=0
40
l = 0.5
20
l = 0.9
0
0
10
20
30
40
Number of timesteps
50
Intermediate values of ¸ lead to best performance
21
Expected number of survivors rescued
Results: Search & Rescue
80
60
NAIVE-pSPIEL
OR
NAIVE-Greedy
pSPIELOR
40
Greedy
20
0
0
10
20
30
40
Number of timesteps
50
Adaptive planning leads to significant
performance improvement!
22
Example paths
400
400
300
Distance (pixels)
Distance (pixels)
Initial Survivor
Locations
Initial Survivor
Locations
200
100
0
0
Starting
Location
100
200
Distance (pixels)
300
Greedy algorithm
400
300
Starting
Location
200
100
0
0
100
200
Distance (pixels)
300
400
pSPIELOR
23
Results: environmental monitoring
Monitor photosynthetically
active regions under
forest canopy
F(A) = #”critical” regions
covered
% of critical locations observed
0.2
0.15
NAIVE-pSPIEL
0.1
0.05
Adaptive planning leads to significant
pSPIEL
performance improvement!
0
0
10
20
30
Number of timesteps
4024
Outline and Contributions
Path
Constraints
Dynamic
environments
pSPIELOR exploits
(r,°)-locality to near-optimally
solve submodular orienteering
NAÏVE-pSPIEL implicitly trades off
exploration and exploitation to
obtain near-optimal adaptive policy
Multi-robot
coordination
25
Multi-robot coordination
max¼1…¼k F(¼1 U ¼2 U … U ¼k)
s.t. C(¼1) · B; C(¼2) · B; … ; C(¼k) · B
¼k
s
¼1
t
¼2
Can use single-robot algorithm to plan joint policy
Exponential increase in complexity with #robots 
26
Sequential allocation
Use pSPIEL to find policy P1 for the first robot
max¼1 F(¼1) s.t. C(¼1) · B
Optimize for second robot (P2) committing to nodes in P1
max¼ 2 F(¼1 U ¼2) s.t. C(¼2) · B
Optimize for k-th robot (Pk) committing to nodes in P1,…,Pk-1
max¼k F(¼1 U ¼2 U … ¼k} s.t. C(¼k) · B
¼k
s
¼1
¼2
t
27
Performance comparison
Greedy selection of
nodes with no path
cost constraint
Arbitrarily Poor
Sequential allocation
NAÏVE-pSPIELOR
for multiple robots –
policy planning
RewardOpt Greedy over policies
RewardPS ¸
??
 = O(1/°)
Theorem:
RewardSA
RewardOpt
¸
1+
Works for any single robot path adaptive planning algorithm!
Independent of number of robots used!
Key tool for analysis:
Extension of submodular functions to adaptive policies
28
Average number of survivors rescued
Multi-robot results
120
3 Robots
100
2 Robots
80
60
40
1 Robot
20
0
0
10
20
30
40
Number of timesteps
50
Diminishing returns as the number of robots increases
29
Conclusions
New algorithm pSPIELOR for nonadaptive informative path
planning for (r,°)-local submodular functions
New algorithm, NAÏVE-pSPIELOR for adaptive informative path
planning using implicit exploration-exploitation analysis
Extensions to multiple robots by sequential allocation
Perform well on real world problems
30