Transcript Slide 1
High-Precision Globally-Referenced Position and
Attitude via a Fusion of Visual SLAM, Carrier-PhaseBased GPS, and Inertial Measurements
Daniel Shepard and Todd Humphreys
2014 IEEE/ION PLANS Conference, Monterey, CA | May 8, 2014
Overview
Globally-Referenced Visual SLAM
Motivating Application: Augmented Reality
Estimation Architecture
Bundle Adjustment (BA)
Simulation Results for BA
2 of 21
Stand-Alone Visual SLAM
Produces high-precision estimates of
Camera motion (with ambiguous scale for monocular SLAM)
A map of the environment
Limited in application due to lack of a global reference
[1] G. Klein and D. Murray, “Parallel tracking and mapping for small AR workspaces,” in 6th IEEE and ACM International
Symposium on Mixed and Augmented Reality. IEEE, 2007, pp. 225–234.
3 of 21
Visual SLAM with Fiduciary Markers
Globally-referenced solution if fiduciary markers are
globally-referenced
Requires substantial infrastructure and/or mapping effort
Microsoft’s augmented reality maps (TED2010[2])
[2] B. A. y Arcas, “Blaise Aguera y Arcas demos augmented-reality maps,” TED, Feb. 2010,
http://www.ted.com/talks/blaise aguera.html.
4 of 21
Can globally-referenced position and
attitude (pose) be recovered from
combining visual SLAM and GPS?
5 of 21
Observability of Visual SLAM + GPS
No GPS positions
Translation
Rotation
Scale
1 GPS position
Translation
Rotation
Scale
2 GPS positions
Translation
~ Rotation
Scale
3 GPS positions
Translation
Rotation
Scale
6 of 21
Combined Visual SLAM and CDGPS
CDGPS anchors visual SLAM to
a global reference frame
Can add an IMU to improve
dynamic performance (not
required!)
Can be made inexpensive
Requires little infrastructure
Very Accurate!
7 of 21
Motivating Application: Augmented Reality
Augmenting a live view of the world
with computer-generated sensory
input to enhance one’s current
perception of reality[3]
Current applications are limited by
lack of accurate global pose
Potential uses in
Construction
Real-Estate
Gaming
Social Media
[3] Graham, M., Zook, M., and Boulton, A. "Augmented reality in urban places: contested content and the duplicity of
code." Transactions of the Institute of British Geographers.
.
8 of 21
Estimation Architecture Motivation
Sensors:
Camera
Two GPS antennas
(reference and mobile)
IMU
How can the information from these sensors best be
combined to estimate the camera pose and a map of
the environment?
Real-time operation
Computational burden vs. precision
9 of 21
Sensor Fusion Approach
Tighter coupling = higher precision, but increased
computational burden
IMU
IMU
IMU
IMU
Visual SLAM
Visual SLAM
Visual SLAM
Visual SLAM
CDGPS
CDGPS
CDGPS
CDGPS
10 of 21
The Optimal Estimator
11 of 21
IMU only for Pose Propagation
12 of 21
Tightly-Coupled Architecture
13 of 21
Loosely-Coupled Architecture
14 of 21
Hybrid Batch/Sequential Estimator
Only geographically diverse frames (keyframes) in batch estimator
15 of 21
Bundle Adjustment State and Measurements
State Vector:
𝑿𝐵𝐴
𝒄
= 𝒑 ,𝒄 = …
𝐶𝑖 𝑇
𝒙𝐺
𝐶𝑖 𝑇
𝒒𝐺
𝑇
…
,𝒑 = …
𝑝𝑗 𝑇
𝒙𝐺
…
𝑇
Measurement Models:
CDGPS Positions:
𝐴
𝐶
𝐶
𝐶
𝐶
𝒙𝐺 𝑖 = 𝒉𝑥 𝒙𝐺𝑖 , 𝒒𝐺𝑖 + 𝒘𝑥𝑖 = 𝒙𝐺𝑖 + 𝑅 𝒒𝐺𝑖 𝒙𝐶𝐴 + 𝒘𝑥𝑖
Image Feature Measurements:
𝑝
𝑝
𝑝
𝒔𝐼 𝑗 = 𝒉𝑠 𝒙𝐶𝑗 + 𝒘𝐼 𝑗 =
𝑖
𝑝
𝒙𝐶𝑗
𝑖
𝑖
=
𝑝
𝑥𝐶 𝑗
𝑖
𝑝
𝑦𝐶 𝑗
𝑖
𝑖
𝑝𝑗 𝑇
𝑧𝐶
𝑖
𝑝
𝑥𝐶 𝑗
𝑖
𝑝𝑗
𝑧𝐶
𝑖
= 𝑅
𝑝𝑗 𝑇
𝑦𝐶
𝑖
𝑝𝑗
𝑧𝐶
𝑖
𝐶
𝒒𝐺𝑖
𝑝
+ 𝒘𝐼 𝑗
𝑇
𝑖
𝑝
𝐶
(𝒙𝐺𝑗 − 𝒙𝐺𝑖 )
16 of 21
Bundle Adjustment Cost Minimization
Weighted least-squares cost function
Employs robust weight functions to handle outliers
𝑁
1
argmin
𝑿𝐵𝐴 2
𝐴
Δ𝒙𝐺 𝑖
𝐴𝑖 2
Δ𝒙𝐺
𝑀
+
𝑖=1
=𝑅
−1/2
𝐴
𝒙𝐺𝑖
𝑤𝑉
𝑗=1
𝐴
𝒙𝐺 𝑖
−
𝐴
𝒙𝐺 𝑖
𝑝𝑗
Δ𝒔𝐼
𝑖
𝑝
Δ𝒔𝐼 𝑗
𝑖
=𝑅
−1/2
𝑝𝑗
𝑖
𝒔𝐼
𝑝𝑗 2
Δ𝒔𝐼
𝑖
𝑝𝑗
𝒔𝐼
𝑖
−
𝑝𝑗
𝒔𝐼
𝑖
Sparse Levenberg-Marquart algorithm
Computational complexity linear in number of point features, but
cubic in number of keyframes
17 of 21
Bundle Adjustment Initialization
Initialize BA based on stand-alone visual SLAM solution
and CDGPS positions
Determine similarity transform relating coordinate systems
1
argmin
𝒙𝑉 , 𝒒𝑉 , 𝑠 2
𝐺
𝐺
𝑁
𝐴
𝒙𝐺 𝑖
−
𝒙𝑉𝐺
−𝑅
𝒒𝑉𝐺
𝐶
𝑠𝒙𝑉𝑖
+𝑅
𝐶
𝒒𝑉𝑖
𝒙𝐶𝐴
2
𝑖=1
Generalized form of Horn’s transform[4]
Rotation: Rotation that best aligns deviations from mean camera
position
Scale: A ratio of metrics describing spread of camera positions
Translation: Difference in mean antenna position
[4] B. K. Horn, “Closed-form solution of absolute orientation using unit quaternions,” JOSA A, vol. 4, no. 4, pp. 629–642, 1987.
18 of 21
Simulation Scenario for BA
Simulations investigating
estimability included in paper
Hallway Simulation:
A
D
Measurement errors:
2 cm std for CDGPS
1 pixel std for vision
←C
Keyframes every 0.25 m
242 keyframes
1310 point features
←B
Three scenarios:
1. GPS available
2. GPS lost when hallway entered
3. GPS reacquired when hallway exited
19 of 21
Simulation Results for BA
20 of 21
Summary
Hybrid batch/sequential estimator for loosely-coupled
visual SLAM and CDGPS with IMU for state propagation
Compared to optimal estimator
Outlined algorithm for BA (batch)
Presented a novel technique for initialization of BA
BA simulations
Demonstrated positioning accuracy of ~1 cm and attitude
accuracy of ~0.1∘ in areas of GPS availability
Attained slow drift during GPS unavailability (0.4% drift over 50 m)
21 of 21
Navigation Filter
State Vector:
𝑿𝐹 =
𝑇
𝒙𝐶𝐺
𝑇
𝒗𝐶𝐺
𝑓 𝑇
𝒃𝐵
𝑇
𝒒𝐶𝐺
𝑇
𝒃𝜔
𝐵
𝑇
Propagation Step:
Standard EKF propagation step using accelerometer and gyro
measurements
Accelerometer and gyro biases modeled as a first-order Gauss-
Markov processes
More information in paper …
22 of 21
Navigation Filter (cont.)
Measurement Update Step:
Image feature measurements from all non-keyframes
Temporarily augment the state with point feature positions
Prior from map produced by BA
Must ignore cross-covariances ⇒ filter inconsistency
Similar block diagonal structure in the normal equations as BA
𝑈𝐹
𝑊𝐹𝑇
𝑈 − 𝑊𝐹 𝑉𝐹
⇒ 𝐹
𝑊𝐹𝑇
−1 𝑊 𝑇
𝐹
𝝐𝐹
𝑊𝐹 𝛿𝑿𝐹
= 𝝐
𝑝
𝑉𝐹 𝛿𝒑
0 𝛿𝒄
= 𝐼 −𝑊𝐹 𝑉𝐹
𝑉𝐹 𝛿𝒑
0
𝐼
−1
𝝐𝐹
𝝐𝑝
23 of 21
Simulation Results for BA (cont.)
24 of 21