Transcript Slide 1

High-Precision Globally-Referenced Position and
Attitude via a Fusion of Visual SLAM, Carrier-PhaseBased GPS, and Inertial Measurements
Daniel Shepard and Todd Humphreys
2014 IEEE/ION PLANS Conference, Monterey, CA | May 8, 2014
Overview
 Globally-Referenced Visual SLAM
 Motivating Application: Augmented Reality
 Estimation Architecture
 Bundle Adjustment (BA)
 Simulation Results for BA
2 of 21
Stand-Alone Visual SLAM
 Produces high-precision estimates of
 Camera motion (with ambiguous scale for monocular SLAM)
 A map of the environment
 Limited in application due to lack of a global reference
[1] G. Klein and D. Murray, “Parallel tracking and mapping for small AR workspaces,” in 6th IEEE and ACM International
Symposium on Mixed and Augmented Reality. IEEE, 2007, pp. 225–234.
3 of 21
Visual SLAM with Fiduciary Markers
 Globally-referenced solution if fiduciary markers are
globally-referenced
 Requires substantial infrastructure and/or mapping effort
 Microsoft’s augmented reality maps (TED2010[2])
[2] B. A. y Arcas, “Blaise Aguera y Arcas demos augmented-reality maps,” TED, Feb. 2010,
http://www.ted.com/talks/blaise aguera.html.
4 of 21
Can globally-referenced position and
attitude (pose) be recovered from
combining visual SLAM and GPS?
5 of 21
Observability of Visual SLAM + GPS
No GPS positions
 Translation
 Rotation
 Scale
1 GPS position
 Translation
 Rotation
 Scale
2 GPS positions
Translation
~ Rotation
 Scale
3 GPS positions
 Translation
 Rotation
 Scale
6 of 21
Combined Visual SLAM and CDGPS
 CDGPS anchors visual SLAM to
a global reference frame
 Can add an IMU to improve
dynamic performance (not
required!)
 Can be made inexpensive
 Requires little infrastructure
Very Accurate!
7 of 21
Motivating Application: Augmented Reality
 Augmenting a live view of the world
with computer-generated sensory
input to enhance one’s current
perception of reality[3]
 Current applications are limited by
lack of accurate global pose
 Potential uses in
 Construction
 Real-Estate
 Gaming
 Social Media
[3] Graham, M., Zook, M., and Boulton, A. "Augmented reality in urban places: contested content and the duplicity of
code." Transactions of the Institute of British Geographers.
.
8 of 21
Estimation Architecture Motivation
 Sensors:
 Camera
 Two GPS antennas
(reference and mobile)
 IMU
 How can the information from these sensors best be
combined to estimate the camera pose and a map of
the environment?
 Real-time operation
 Computational burden vs. precision
9 of 21
Sensor Fusion Approach
 Tighter coupling = higher precision, but increased
computational burden
IMU
IMU
IMU
IMU
Visual SLAM
Visual SLAM
Visual SLAM
Visual SLAM
CDGPS
CDGPS
CDGPS
CDGPS
10 of 21
The Optimal Estimator
11 of 21
IMU only for Pose Propagation
12 of 21
Tightly-Coupled Architecture
13 of 21
Loosely-Coupled Architecture
14 of 21
Hybrid Batch/Sequential Estimator
 Only geographically diverse frames (keyframes) in batch estimator
15 of 21
Bundle Adjustment State and Measurements
 State Vector:
𝑿𝐵𝐴
𝒄
= 𝒑 ,𝒄 = …
𝐶𝑖 𝑇
𝒙𝐺
𝐶𝑖 𝑇
𝒒𝐺
𝑇
…
,𝒑 = …
𝑝𝑗 𝑇
𝒙𝐺
…
𝑇
 Measurement Models:
 CDGPS Positions:
𝐴
𝐶
𝐶
𝐶
𝐶
𝒙𝐺 𝑖 = 𝒉𝑥 𝒙𝐺𝑖 , 𝒒𝐺𝑖 + 𝒘𝑥𝑖 = 𝒙𝐺𝑖 + 𝑅 𝒒𝐺𝑖 𝒙𝐶𝐴 + 𝒘𝑥𝑖
 Image Feature Measurements:
𝑝
𝑝
𝑝
𝒔𝐼 𝑗 = 𝒉𝑠 𝒙𝐶𝑗 + 𝒘𝐼 𝑗 =
𝑖
𝑝
𝒙𝐶𝑗
𝑖
𝑖
=
𝑝
𝑥𝐶 𝑗
𝑖
𝑝
𝑦𝐶 𝑗
𝑖
𝑖
𝑝𝑗 𝑇
𝑧𝐶
𝑖
𝑝
𝑥𝐶 𝑗
𝑖
𝑝𝑗
𝑧𝐶
𝑖
= 𝑅
𝑝𝑗 𝑇
𝑦𝐶
𝑖
𝑝𝑗
𝑧𝐶
𝑖
𝐶
𝒒𝐺𝑖
𝑝
+ 𝒘𝐼 𝑗
𝑇
𝑖
𝑝
𝐶
(𝒙𝐺𝑗 − 𝒙𝐺𝑖 )
16 of 21
Bundle Adjustment Cost Minimization
 Weighted least-squares cost function
 Employs robust weight functions to handle outliers
𝑁
1
argmin
𝑿𝐵𝐴 2
𝐴
Δ𝒙𝐺 𝑖
𝐴𝑖 2
Δ𝒙𝐺
𝑀
+
𝑖=1
=𝑅
−1/2
𝐴
𝒙𝐺𝑖
𝑤𝑉
𝑗=1
𝐴
𝒙𝐺 𝑖
−
𝐴
𝒙𝐺 𝑖
𝑝𝑗
Δ𝒔𝐼
𝑖
𝑝
Δ𝒔𝐼 𝑗
𝑖
=𝑅
−1/2
𝑝𝑗
𝑖
𝒔𝐼
𝑝𝑗 2
Δ𝒔𝐼
𝑖
𝑝𝑗
𝒔𝐼
𝑖
−
𝑝𝑗
𝒔𝐼
𝑖
 Sparse Levenberg-Marquart algorithm
 Computational complexity linear in number of point features, but
cubic in number of keyframes
17 of 21
Bundle Adjustment Initialization
 Initialize BA based on stand-alone visual SLAM solution
and CDGPS positions
 Determine similarity transform relating coordinate systems
1
argmin
𝒙𝑉 , 𝒒𝑉 , 𝑠 2
𝐺
𝐺
𝑁
𝐴
𝒙𝐺 𝑖
−
𝒙𝑉𝐺
−𝑅
𝒒𝑉𝐺
𝐶
𝑠𝒙𝑉𝑖
+𝑅
𝐶
𝒒𝑉𝑖
𝒙𝐶𝐴
2
𝑖=1
 Generalized form of Horn’s transform[4]
 Rotation: Rotation that best aligns deviations from mean camera
position
 Scale: A ratio of metrics describing spread of camera positions
 Translation: Difference in mean antenna position
[4] B. K. Horn, “Closed-form solution of absolute orientation using unit quaternions,” JOSA A, vol. 4, no. 4, pp. 629–642, 1987.
18 of 21
Simulation Scenario for BA
 Simulations investigating
estimability included in paper
 Hallway Simulation:
A
D
 Measurement errors:
 2 cm std for CDGPS
 1 pixel std for vision
←C
 Keyframes every 0.25 m
 242 keyframes
 1310 point features
←B
 Three scenarios:
1. GPS available
2. GPS lost when hallway entered
3. GPS reacquired when hallway exited
19 of 21
Simulation Results for BA
20 of 21
Summary
 Hybrid batch/sequential estimator for loosely-coupled
visual SLAM and CDGPS with IMU for state propagation
 Compared to optimal estimator
 Outlined algorithm for BA (batch)
 Presented a novel technique for initialization of BA
 BA simulations
 Demonstrated positioning accuracy of ~1 cm and attitude
accuracy of ~0.1∘ in areas of GPS availability
 Attained slow drift during GPS unavailability (0.4% drift over 50 m)
21 of 21
Navigation Filter
 State Vector:
𝑿𝐹 =
𝑇
𝒙𝐶𝐺
𝑇
𝒗𝐶𝐺
𝑓 𝑇
𝒃𝐵
𝑇
𝒒𝐶𝐺
𝑇
𝒃𝜔
𝐵
𝑇
 Propagation Step:
 Standard EKF propagation step using accelerometer and gyro
measurements
 Accelerometer and gyro biases modeled as a first-order Gauss-
Markov processes
 More information in paper …
22 of 21
Navigation Filter (cont.)
 Measurement Update Step:
 Image feature measurements from all non-keyframes
 Temporarily augment the state with point feature positions
 Prior from map produced by BA
 Must ignore cross-covariances ⇒ filter inconsistency
 Similar block diagonal structure in the normal equations as BA
𝑈𝐹
𝑊𝐹𝑇
𝑈 − 𝑊𝐹 𝑉𝐹
⇒ 𝐹
𝑊𝐹𝑇
−1 𝑊 𝑇
𝐹
𝝐𝐹
𝑊𝐹 𝛿𝑿𝐹
= 𝝐
𝑝
𝑉𝐹 𝛿𝒑
0 𝛿𝒄
= 𝐼 −𝑊𝐹 𝑉𝐹
𝑉𝐹 𝛿𝒑
0
𝐼
−1
𝝐𝐹
𝝐𝑝
23 of 21
Simulation Results for BA (cont.)
24 of 21