Uncertainty in Data Classification, Pose Estimation and 3D Reconstruction

Download Report

Transcript Uncertainty in Data Classification, Pose Estimation and 3D Reconstruction

Uncertainty in
Data Classification, Pose Estimation and 3D Reconstruction
for Cross-Camera and Multiple Sensor Scenarios
Suresh Lodha
Professor, Computer Science
University of California
Santa Cruz
• Classification of aerial lidar data into buildings, trees,
roads, and grass
-- Supervised parametric classification based on ExpectationMaximization algorithm using a mixture of Gaussian models
-- Confusion matrix computed and classified data visualized
• 3D reconstruction using multi-cameras such as pinhole and
omnidirectional
-- Generic imaging model used for hybrid camera scenarios
-- Uncertainty computed and reduced using bundle adjustment
algorithms
-- 3D reconstruction of outdoor scenes and comparison with ground
truth
• Stereo and lidar-based pose estimation with uncertainty
-- Adaptive stereo-based pose estimation with uncertainty
-- Adaptive lidar-based pose estimation with uncertainty
Data Classification of Lidar Data using Different Features
Using height and
height variation
Using 4 features
all from LiDAR
Using 5 features
including images
Structure from Multi-Camera Scenarios
pinhole image
omni image
texture-mapped model
top view of the points
Pose Estimation with Uncertainty
Aerial view
Stereo Based Registration
•Stereo based
approach captures
terrain undulations
LiDAR Based Registration
•LiDAR based
approach seems better
at turns.
Data Classification Problem
Classes
LiDAR
Aerial Image
DEM
-- Buildings
-- High
Vegetation
-- Roads
-- Grass
Overview and Key Questions
• Data registration
– interpolated LiDAR registered with DEM and aerial images
• Algorithm?
– EM Algorithm: how many modes?
– which data features?
(height, height texture, multiple returns, LiDAR intensity
returns, luminance)
• Uncertainty
– quantify and visualize
Previous work on
Aerial LiDAR Data Classification
•
Axelsson (1999) – Curvature-based methods based on MDL using
height, LiDAR intensity returns, multiple returns
(terrain, buildings, electrical power lines;
no discussion of quality of results)
•
Filin (2002) – unsupervised clustering in 7-tuple feature space
using height, tangent plane, height differences etc.
(categorized into high/low vegetation and
smooth/planar surfaces)
•
Song et. al. (2002) – separation of different materials such as
trees, grass, roads, roofs based on interpolated
intensity data
(focus is on identifying the most suitable interpolation
technique)
Aerial LiDAR Data
Dataset
UCSC
SC City
Area (sq. km.)
10.76
21.53
No. of Points
13196563
23028512
Average point
spacing (m) *
0.26
0.26
Point Density
(points/m2)
1.23
1.07
Point spacing = average nearest
neighbor distance
UCSC Campus
•
Provider: Airborne1 Inc.
•
Acquired on: November 2001
•
Geo-referencing: NAD83 SPCS, CA
III
•
Laser wavelength: 1064 nm
•
Laser pulse rate: 25 KHz
Santa Cruz City
Previous work on classification
Pattern Classification
•
Supervised classification
–
–
•
Parametric class conditional density estimation
•
Parametric models for each class (p(x|i,Q))
•
Simple Gaussian models unsuitable
•
Mixture density
Non-parametric density estimation
•
Less training data
•
K-nearest neighbor
•
Others
Unsupervised segmentation
–
K-means
Data Classification – Pipeline
Class design
Feature extraction
Training
Testing
Accuracy
assessment
Data Classification – Datasets
Baskin
Engineering
Oakes
College
College
Eight
Physical
Planning
Crown
College
Porter
College
East Field
House
Science Hill
Family
Student
Housing
Theatre
Arts
Data Classification – Features
Luminance
Intensity
Height
Height
Variation
Multiple
Return
Difference
Data Classification – Training
Supervised Classification:
Expectation-Maximization Algorithm
Training sample
Histogram (1D)
i = 1 (trees)
i = 2 (grass)
p(x|i) ?
i = 3 (roads)
Expectation – Maximization (EM)
algorithm
i = 4 (roofs)
Data Classification – Results
Leave-One-Out Test
Data Classification – Results
Leave-One-Out Test
Using just height and height variation
Using just LiDAR data (no aerial image)
Using all five features
Data Classification – Results
C = Number of classes
N = Number of test samples
Data Classification:
Observations and Analysis
• Height variation important for classification of high vegetation areas.
• Light features (luminance, intensity) are necessary for separating low
vegetation (grass) and roads.
• Height feature is extremely important for accurate overall classification.
Questions?
-- What is the accuracy of classification using only aerial lidar data?
-- Can this accuracy be improved using learning algorithms?
-- Can this accuracy be improved further using spatial coherence?
Problem : Generic Structure-from-Motion
pinhole
Reconstruction
omni
Input:
Goals:
Images from two different cameras
(pinhole, stereo, omnidirectional, non-central, etc.)
Motion estimation, pose estimation,
3D reconstruction of the scene
Uncertainty reduction using bundle
adjustment
Motivation
Pinhole – spatial resolution, texture
Stereo – absolute scale, motion estimation, spatial resolution
Omnidirectional - motion estimation, large field of view
Non-central - absolute scale, motion estimation
“Hybrid scenarios have advantages”
Example Scenarios
1) Video conferencing / Surveillance systems – omni cameras can scan large
regions and pinhole cameras can focus on specific high priority scenes.
2) Large scale modeling – one omni image can globally register several
pinhole images and avoid local error accumulation.
Proposed algorithm pipeline
Cam-1 image
Bundle adjustment
algorithms
Cam-1 image of a scene
Generic
Motion
Structure
Calibration
Estimation
Recovery
Proposed Ray-point
Proposed Reproj
Cam-2 image
Cam-2 image of the scene
Parametric reconstruction of the scene
Parametric Reproj
Ground-truth for the scene ( if available)
Analysis
Generic calibration
Related work
M.D. Grossberg and S.K. Nayar, A general imaging model and a method for finding its parameters, ICCV-2001.
P. Sturm and S.Ramalingam, A generic method for camera calibration, ECCV-2004.
Cross-camera scenarios
C. Geyer and K. Daniilidis, A unifying theory of central panoramic systems and practical implications, ECCV-2000.
P.Sturm, Mixing catadioptric and perspective cameras, OMNIVIS-2002.
Jingyi Yu and Leonard McMillan, General Linear Cameras, ECCV-2004.
Structure-from-Motion
R.I.Hartley and A.Zisserman, Multiple view geometry in computer vision, 2000.
B.Micusik and T.Pajdla, 3D metric reconstruction from uncalibrated omnidirectional images, ACCV-2004.
H. Bakstein and T. Pajdla, An overview of non-central cameras, Research Report CTU-CMP-2000-14.
Robert Pless. Using many cameras as one, CVPR-2003.
Bundle adjustment
B. Triggs, P. McLauchlan, R.I. Hartley, and A. Fitzgibbon, Bundle Adjustment- A modern synthesis, Workshop on
vision algorithms: Theory and Practice, 2000.
Camera calibration using generic imaging model
x2
Curved reflective
surface
3D ray of points
that are seen
pixel
x1
X2
X1
Image plane
pinhole
Camera looking at a curved surface
General model
Basic task : given a point in image, we need to compute a ray in 3D.
Existing Approaches:
parametric ( i.e defined by few intrinsic parameters )
mostly single viewpoint
custom-made for specific camera models
Generic Imaging Model: Non-parametric association of 3D rays for every image pixel.
Generic calibration for an type of camera
(R2,T2)
(R,T)
X
(R1,T1)
x
x
Unknown motion generic calibration
[Sturm and Ramalingam03]
Known motion generic calibration[ Grossberg01]
•
3 boards in the case of unknown motion
•
Colinearity constraint is applied to compute motion
Calibration of different cameras
0.3
1.5
0.2
1
0.1
0.5
0
-0.1
0
-0.4
-0.3
-0.2
-0.1
0
0.1
Boards used in the calibration
pinhole
stereo
omnidirectional
0.2
-0.2
0.3
Uncertainty Reduction in 3D Reconstruction
using Bundle adjustment
Ray-point (relatively straightforward)
•
•
We minimize the distance between the 3D point and the intersecting rays.
Easily extends to non-parametrically calibrated cameras and non-central cameras.
Reprojection based bundle adjustment
•
•
•
No parametric association between the pixel and the ray.
Optimization functions like Levenberg Marquardt iteration uses the derivatives of reprojection
errors.
Difficult to compute in a non-parametric scenario.
Approach: Any camera is a cluster of central cameras
Divide all the rays into k central clusters of rays
k=1 for pinhole camera,
k=2 for stereo camera,
k=n for an oblique camera, etc.
We intersect a plane to each of the central cluster
On each central cluster, reproj based bundle adjustment is applied.
Issues : Choice of the plane, clustering issues.
Experiments :Different cross-camera scenarios
Results:
Percentage errors and difference measures
for indoor and outdoor scenes
Percentage errors in the pinhole and omni scenario (house)
Percentage errors in the stereo and omni scenario (objects)
Difference measures in the pinhole and omni scenario (stevensons appts, UCSC)
Discussion: Uncertainty in 3D Reconstruction
using
Cross-Camera Scenarios
• Generic 3D reconstruction algorithm works for crosscamera scenarios
(we do not have to design camera-specific parametric
algorithms)
• Reprojection-based bundle adjustment algorithm
reduces uncertainty in 3D reconstruction
Questions?
• Multiple hybrid camera scenarios
• Partially overlapping scenes
• Dynamic scenes
Stereo Camera and Lidar-Based
Pose Computation with
Uncertainty
• To compute poses for a mobile sensor unit for
indoor and outdoor scenes.
• Compute and visualize the uncertainty
associated with pose computation.
Previous Work
Equipment
Vision
Sensor/
DOF
Lidar
Sensor/
DOF
Environment
Vision
Uncertainty
Lidar
Uncertainty
Task/Objective
Lowe et al.
(2000-2004)
Tri-camera,
odometry
Y/5
N
Indoor
Y
-
SLAM
Thrun et al.
(2000-2004)
Horizontal
Lidar,
odometry,
mosaic
images/sona
r
Y/3
/3(overhead
images)
Y/3
(In/out)door
Y (PF)
Y (PF)
SLAM
Zakhor et
al. (20012004)
Horizontal
lidar, aerial
data,
odometry
N
Y/(3,5)
Outdoor
-
Y (PF)
3D
Reconstruction
Neumann/Y
ou et al.
(1996-2004)
Vision
based
fiducials,
inertial
sensors
Y/6
(fiducials)
N
(In/out)door
Y
-
Augmented
Reality
Ours
Stereo
camera,
horizontal
lidar
Y/6
Y/3
(In/out)door
Y
Y
3D
Reconstruction
Data Acquisition
•We have mounted our
equipment on a
moveable trolley
Laptop
GPS Unit
X
Stereo Camera
Y
Horizontal LiDAR
Vertical LiDAR
Z
•Allows for easy motion
indoors and on streets,
narrow paths and
undulating terrain.
Adaptive stereo based approachOverview
Initialization
•Find a suitable stereo pair for the global coordinate frame. Call this Frame 1, set i = 1
•Find left-right frame matches and triangulate. Use stereo camera baseline to obtain 3D
points.
Select plausible next frame
•Find a suitable second stereo frame (criteria: good matches)
Estimate motion for plausible next frame
•Find matches between left image of frame i and left image of plausible frame.
•Triangulate matches and estimate motion upto scale.
•Scale subset of triangulated points to match with left-right matches of frame i (scale recovery)
Adaptive pose computation
•Recover scaled motion between frame i and plausible next frame
•If motion is within acceptable threshold, accept plausible frame, set i = i+1
Adaptive stereo based approachInitialization
• Find an initial stereo frame to set
global coordinate frame.
• Frame should have enough (>30)
good L-R matches (+ve depths).
• Compute 3D structure (triangulate
L-R matches) and recover scale using
stereo camera baseline.
L
R
Stereo camera
baseline length
Frame i = 1
Adaptive stereo based approach- Select
plausible next frame
• Examine next frame.
• Find L-R matches.
• Compute 3D structure (triangulate
L-R matches) and discard points with
–ve z values.
•
L
R
Frame i
Frame i+1 ?
Ensure final #matches > 30
Adaptive stereo based approachEstimate motion for plausible next frame
• Find L-L matches and triangulate
(upto scale).
• Recover scale using previously
computed 3D L-R points from frame i.
• Using recovered scale, compute
relative motion.
L
R
Frame i
Frame i+1 ?
Adaptive stereo based approachAdaptive pose computation
L
R
min
displacement
max
Frame i
Frame i+1 !
•
If computed displacement is within pre-defined
thresholds then accept plausible frame as
frame i+1.
•
Reduces local error accumulation.
•
If motion is too large, extrapolate from frame i
and re-initialize pose computation.
Adaptive stereo based approach
Left Image
Frame i
Right Image
Adaptive stereo based approach
Left Image
Frame i
Next
Frame
Right Image
•Find frame i LR
matches and scaled
structure.
Adaptive stereo based approach
Left Image
Frame i
Right Image
•Find frame i LR
matches and scaled
structure.
•Find next frame LR
matches.
Next
Frame
Adaptive stereo based approach
Left Image
Frame i
Next
Frame
Right Image
•Find frame i LR
matches and scaled
structure.
•Find next frame LR
matches and
unscaled structure.
•Find LL matches and
recover scale from
frame i LR structure.
Adaptive LiDAR based approachOverview
Initialization
•Start from first horizontal scan
•Assume intial motion estimate.
Initial Pose Estimate
•Using motion estimate extrapolate previous pose.
Grid Refined Pose Estimate
•Use scan matching to find best pose in neighborhood of intial pose estimate.
Final Pose Estimate
•Perform gradient descent to refine pose estimate further.
Adaptive Pose Computation
•Reject pose if displacement less than threshold.
Adaptive LiDAR based approach
-Scan Matching
• Pose refinement done by
maximizing the quality Q of
the match between two scans.
• Convert scan 1 into a set of
lines.
• Apply pose transformation to
scan 2 points.
• Compute minimum distance
(di’s) between scan 2 points
and scan 1 lines.
 d
• Quality, Q   e
2
i
i
Adaptive LiDAR based approach
-Scan Matching
• Pose refinement done by
maximizing the quality Q of
the match between two scans.
• Convert scan 1 into a set of
lines.
• Apply pose transformation to
scan 2 points.
• Compute minimum distance
(di’s) between scan 2 points
and scan 1 lines.
• Quality, Q   e  d
2
i
i
• Metric has flaws
Adaptive LiDAR based approach
-Scan Matching
Initial match
•Quality metric has flaws.
•Joining lines pairwise can be noisy.
•Matching might not always be to the
expected line.
•Metric gives less importance to larger
errors.
Final match
•However overall we got better results
with this metric than a few others.
Adaptive LiDAR based approach
-Pose Uncertainty
• Numerically compute change in error
measurements di for slight changes in pose.
 d 1

 x
• Compute Jacobian: J   ...
 d n
 x

d1
z
...
d n
z
• Compute pose covariance matrix :
d1 

 
... 
d n 
 
JTJ
Results and DiscussionIndoor scene
Non adaptive stereo registration
Indoor scene
Adaptive stereo registration
Uncertainty ellipses aligned along z-axis
Results and DiscussionIndoor scene
Non adaptive LiDAR based registration
•Non-adaptive LiDAR
based registration.
Choppiness and
indistinct dispensing
machine.
Adaptive LiDAR based registration
•Adaptive approach
Dispensing machine
is much clearer,
registration
smoother.
•Uncertainty ellipses
more uncertainty
along x-axis.
Aerial view
Results and DiscussionOutdoor scene
Stereo Based Registration
•Stereo based
approach captures
terrain undulations
LiDAR Based Registration
•LiDAR based
approach seems better
at turns.
Results and DiscussionMotion Statistics
Motion Statistic
Cafe
C-8dorm
# stereo pairs captured
143
417
# poses computed
78
281
Total
125.74
356.17
Avg
1.61
1.26
X
20.11
43.21
Y
0.22
2.69
Z
(depth)
-1.87
16.30
Tilt
0.0004
-0.0002
Pan
-0.0025
-0.0045
Roll
-0.0002
-0.0017
Times(s)
Average
Translation speed
(cm/s)
Average
Rotation using
Rodrigues
Formula
(radians)
Motion Statistic
Cafe
C-8dorm
# horizontal scans captured
439
1025
#poses computed
52
321
Total
116.59
417.52
Avg
2.29
1.30
X
23.53
40.94
Z
(depth)
0.17
-0.29
-0.0088
-0.0062
Times(s)
Average
Translation speed
(cm/s)
Average Rotation (Pan)
Motion statistics for LiDAR based registration.
Motion statistics for Stereo based registration
Both approaches give similar trolley speeds in
Indoor scene has mostly pan. Outdoor has roll too.
x-direction (direction of major motion)
Results and DiscussionUncertainty Statistics
Uncertainty Statistic
Translation
Amt
(cm)
Aver
age
Axes
Rotation (Rodrigues
Notation)
Cafe
C8-dorm
Avg
[0.35, 0.04, 0.01]
[1.51, 0.04, 0.01]
Max
[2.41, 0.27, 0.02]
[56.70, 0.65, 0.05]
1
[-0.02, -0.07, 0.94]
[-0.03, -0.01, 0.95]
2
[0.13, 0.37, 0.07]
[0.23, 0.30, 0.03]
3
[0.73, 0.11 ,0.04]
[0.60, 0.15, 0.03]
[0.005, 0.001, 0.00]
[0.012, 0.003,0.00]
[0.091, -0.041, 0.992]
[0.30,0.09,.0886]
Average (radians)
Most significant
direction
Uncertainty Statistic
Translation
Cafe
C8-dorm
Avg
[0.94,0.63]
[0.05, 0.04]
Max
[4.12, 2.83]
[0.45, 0.28]
1
[0.57, -0.45]
[0.40, -0.43]
2
[0.21, 0.65]
[0.20, 0.53]
Average (radians)
[0.0011]
[0.00]
Max (radians)
[0.0087]
[0.00]
Amt
(cm)
Average
Axes
Rotation (Rodrigues
Notation)
Uncertainty statistics for
stereo pose computation
Max. uncertainty along zaxis, in translation and
rotation
Uncertainty statistics for
LiDAR pose computation
Uncertainty distributed
more evenly, but more
along x-direction.
Typical Uncertainty
Ellipsoids
Axes are scaled to 1cm.
Stereo
Cafe
Stereo
C8
LiDAR
Cafe
LiDAR
C8
Discussion:
Sensor Pose Estimation with Uncertainty
• Stereo Camera uncertainty along z-axis;
Lidar uncertainty mostly along x-axis;
(fusion can reduce uncertainty in pose estimation)
• Inertial sensors and motion models can further
reduce pose uncertainty
• Global context (and annotations by mobile users
can further reduce uncertainty)
Publications (1)
•
Sanjit Jhala and Suresh K. Lodha, “Stereo and Lidar-Based Pose Estimation with Uncertainty for 3D
Reconstruction”, To appear in the Proceedings of Vision, Modeling, and Visualization Conference,
Stanford, Palo Alto, CA, November 2004.
•
Srikumar Ramalingam, Suresh K. Lodha, and Peter Sturm, “Srikumar Ramalingam, Suresh K. Lodha,
and Peter Sturm, ``A Generic Structure-from-Motion Algorithm for Cross-Camera Scenarios'',
Proceedings of the OmniVis (Omnidirectional Vision, Camera Networks, and Non-Classical
Cameras) Conference, Prague, Czech Republic, May 2004.
•
Amin Charaniya, Roberto Manduchi, and Suresh K. Lodha, “Supervised Parametric Classification of
Aerial Lidar Data”, Proceedings of the IEEE workshop on Real-Time Sensors and Their Use,
Washington DC, June 2004.
•
Hemanth Singamsetty and Suresh K. Lodha, “An Integrated Geospatial Data Acquisition System for
Reconstructing 3D Environments”, To appear in the Proceedings of the IASTED Conference on
Advances in Computer Science and Technology (ACST), St. Thomas, Virgin Islands, USA,
November 2004.
Publications (2)
•
Sanjit Jhala and Suresh K. Lodha, “On-line Learning of Motion Patterns using
an Expert Learning Framework,” Proceedings of the IEEE workshop on
Learning in Computer Vision and Pattern Recognition, Washington DC, June
2004.
•
Suresh K. Lodha, Nikolai M. Faaland, Grant Wong, Amin Charaniya,Srikumar
Ramalingam, and Arthur Keller, "Consistent Visualization and Querying of
Geospatial Databases by a Location-Aware Mobile Agent", Proceedings of the
Computer Graphics International Conference 2003, Tokyo, Japan, July 2003.
•
Srikumar Ramalingam and Suresh K. Lodha, ``Adaptive Enhancement of 3D
Scenes using Hierarchical Registration of Texture-Mapped Models,
Proceedings of 3DIM 2003, October 2003.
•
Suresh K. Lodha, Krishna M. Roskin, and Jose C. Renteria, ``Hierarchical
Topology Preserving Compression of Terrains", Visual Computer, 2003.
Publications (3)
•
Suresh K. Lodha, Nikolai M. Faaland, and Jose Renteria, “Hierarchical Toplogy
Preserving Compression of 3D Vector Fields using Bintree and Triangular Quadtrees”,
IEEE Transactions on Visualization and Computer Graphics, Vol. 9, No. 4, October
2003, pages 433—442.
•
Christopher Campbell, Michael M. Shafae, Suresh K. Lodha, and Dominic W. Massaro,
“Discriminating Visible Speech Tokens using Multi-Modality”, Proceedings of the
International Conference on Auditory Display (ICAD), Boston, MA, July 2003.
•
Amin Charaniya and Suresh K. Lodha, “Speech Interface for Geo-Spatial Visualization”,
Proceedings for the Conference on Computer Science and Technology (CST), Cancun,
Mexico, May 2003.
•
Lilly Spirkovska and Suresh Lodha, ``Audio-Visual Situational Awareness for General
Aviation Pilots'', Proceedings of the SPIE Conference on Visualization and Data
Analysis, January 2003, Vol. 5009.
Collaborations
• Syracuse (Pramod Varshney et al.)
– probabilistic uncertain particle movement
• Georgia Tech (Bill Ribarsky et al.)
– integration of uncertainty within VGIS
• USC (Neumann et al.)
– aerial lidar data, GPS infrastructure, pose estimation
with uncertainty
• UC, Berkeley (Zakhor et al.)
– sensor pose estimation with uncertainty
People
•
Graduate students
–
–
–
–
–
–
–
–
–
–
•
Undergraduate students
–
–
–
–
–
–
–
–
•
Karthik-Kumar Arun-Kumar
Amin Charaniya
Alex D’Angelo
Sanjit Jhala
Srikumar Ramalingam
Jose Renteria
Krishna Roskin
Hemanth Singamsetty
Shailaja Vats
Yongqin Xiao
Andrew Ames
Jason Bane
Adam Bickett
Nikolai Faaland
Darren Fitzpatrick
Krishna Roskin
Michael Shafae
Grant Wong
Researchers
–
–
–
–
–
Christopher Campbell
Arthur Keller
Roberto Manduchi
Dominic Massaro
Peter Sturm (INRIA)
Acknowledgements
• MURI Grant
• Airborne1 Corporation
• NSF
Thank You!