Transcript Document
Introduction: Robot Vision Philippe Martinet Unifying Vision and Control Selim Benhimane Efficient Keypoint Recognition Vincent Lepetit Multi-camera and Model-based Robot Vision Andrew Comport Visual SLAM for Spatially Aware Robots Walterio Mayol-Cuevas Outdoor Visual SLAM for Robotics Tutorial organized by Andrew Comport and Adrien Bartoli Nice, September 22 Kurt Konolige Advanced Vision in Deformable Environments Adrien Bartoli Visual SLAM and Spatial Awareness SLAM= Simultaneous Localisation and Mapping An overview of some methods currently used for SLAM using computer vision. Recent work on enabling more stable and/or robust mapping in real- time. Work aiming to provide better scene understanding in the context of SLAM: Spatial Awareness. Here we concentrate on “Small” working areas where GPS, odometry and other traditional sensors are not operational or available. Spatial Awareness SA: A key cognitive competence that permits efficient motion and task planning. Even from early age we use spatial awareness: the toy has not vanished it is behind the sofa. I can point to where the entrance to the building is but cant tell how many doors are from here to there. SLAM offers a rigorous way to implement and manage SA Wearable personal assistants Video at http://www.robots.ox.ac.uk/ActiveVision/Projects/Vslam/vslam.02/Videos/wearableslam2.mpg Mayol, Davison and Murray 2003 SLAM Key historical reference: Smith, R.C.and Cheeseman, P. "On the Representation and Estimation of Spatial Uncertainty". The International Journal of Robotics Research 5 (4): 56-68. 1986. Proposed a stochastic framework to maintain the relationship (uncertainties) between features in the map. “Our knowledge of the spatial relationships among objects is inherently uncertain. A manmade object does not match its geometric model exactly because of manufacturing tolerances. Even if it did, a sensor could not measure the geometric features, and thus locate the object exactly, because of measurement errors. And even if it could, a robot using the sensor cannot manipulate the object exactly as intended, because of hand positioning errors…”[Smith,Self,Cheesman 1986] SLAM A problem that has been identified for several years, central in mobile robot navigation and branching into other fields like wearable computing and augmented reality. SLAM – Simultaneous Localisation And Mapping 3D points (features) update positions Aim to: • Localise camera perspective projection (6DOF – Rotation and Translation from reference view) update location predict location camera camera moved • Simultaneously estimate 3D map of features (e.g. 3D points) Implemented using: Extended Kalman Filter, Particle filters, SIFT, Edglets, etc. State representation as in [Davison 2003] SLAM with first order uncertainty representation as in [Davison 2003] Challenges for visual SLAM On the computer vision side, improving data association: Ensuring a match is a true positive. Representations and parameterizations to enhance mapping while within real-time. Alternative frameworks for mapping: Can we extend area of operation? Better scene understanding. For data association, earlier approach Small (e.g. 11x11) image patches around salient points to represent features. Normalized Cross Correlation (NCC) to detect features. Small patches + accurate search regions lead to fast camera pose estimation. Depth by projecting hypothesis at different depths. See: A. Davison, Real-Time Simultaneous Localisation and Mapping with a Single Camera, ICCV 2003. However Simple patches are insufficient for large view point or scale variations. Small patches help speed but prone to mismatch. Search regions can’t always be trusted (camera occlusion, motion blur). Possible solutions: Use better feature description or Other types of features e.g. edge information. SIFT [D. Lowe, IJCV 2004] Find maxima in scale space to locate keypoint. If for tracking, this may be wasteful! … 128 elements vector Around keypoint, build invariant local descriptor using gradient histograms. [Chekhlov, Pupilli, Mayol and Calway, ISVC06/CVPR07] •Uses SIFT-like descriptors (histogram of gradients) around Harris corners. •Get scale from SLAM = “predictive SIFT”. [Chekhlov, Pupilli, Mayol and Calway, ISVC06/CVPR07] Video at http://www.cs.bris.ac.uk/Publications/attachment-delivery.jsp?id=9 [Eade and Drummond, BMVC2006] Video at http://mi.eng.cam.ac.uk/~ee231/bmvcmovie.avi Edglets: • Locally straight section of gradient Image. •Parameterized as 3D point + direction. •Avoid regions of conflict (e.g. close parallel edges). •Deal with multiple matches through robust estimation. RANSAC [Fischler and Bolles 1981] Random Sampling ANd Consensus RANSAC fit Least squares fit Gross “outliers” •Select random sample of points. •Propose a model (hypothesis) based on sample. •Assess fitness of hypothesis to rest of data. •Repeat until max number of iterations or fitness threshold reached. •Keep best hypothesis and potentially refine hypothesis with all inliers. OK but… Having rich descriptors or even multiple kinds of features may still lead to wrong data associations (mismatches). If we pass to the SLAM system every measurement we think is good it can be catastrophic. Better to be able to recover from failure than to think it won’t fail! [Williams, Smith and Reid ICRA2007] Use 3 point algorithm -> up to 4 possible poses. Verify using Matas’ Td,d test. •Camera relocalization using small 2D patches + RANSAC to compute pose. •Adds a “supervisor” between visual measurements and SLAM system. [Williams, Smith and Reid ICRA2007] Video at http://www.robots.ox.ac.uk/ActiveVision/Projects/Vslam/vslam.04/Videos/relocalisation_icra_07.mpg In brief, while within real-time limit do: Is lost? yes Select 3 matches Compute pose Consistent? yes Carry on no Also see recent work [Williams, Klein and Reid ICCV2007] using randomised trees rather than simple 2D patches. Relocalisation based on appearance hashing Use a hash function to index similar descriptors (Brown et al 2005). Fast and memory efficient (only an index needs to be saved per descriptor). Quantize result of Haar masks Video at: http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=2000939 Chekhlov et al 2008 Parallel Tracking and Mapping [Klein and Murray, Parallel Tracking and Mapping for Small AR Workspaces Proc. International Symposium on Mixed and Augmented Reality. 2007] Decouple Mapping from Tracking, run them in separate threads on multi-core CPU. Mapping is based on key-frames, processed using batch Bundle Adjustment. Map is intialised from a stereo pair (using 5-Point Algorithm). Initialised new points with epipolar search. Large numbers (thousands) of points can be mapped in a small workspace. Parallel Tracking and Mapping CPU1 CPU2 Detect Compute Draw Features Camera Pose Graphics … Detect Compute Draw Features Camera Pose Graphics Update Map … … Video at http://www.robots.ox.ac.uk/ActiveVision/Videos/index.html [Klein and Murray, 2007] So far we have mentioned that Maps are sparse collections of low-level features: Points (Davison et al., Chekhlov et al.) Edgelets (Eade and Drummond) Lines (Smith et al., Gee and Mayol-Cuevas) Full correlation between features and camera Maintain full covariance matrix Loop closure: effects of measurements propagated to all features in map Increase in state size limits number of features Commonly in Visual SLAM Emphasis on localization and less on the mapping output. SLAM should avoid making “beautiful” maps (there are other better methods for that!). Very few examples exist on improving the awareness element, e.g. Castle and Murray BMVC 07 on known object recognition within SLAM. Better spatial awareness through higher level structural inference Types of Structure • Coplanar points → planes • Collinear edgelets → lines • Intersecting lines → junctions Our Contribution • Method for augmenting SLAM map with planar and line structures. • Evaluation of method in simulated scene: discover trade-off between efficiency and accuracy. Discovering structure within SLAM Gee, Checkhlov, Calway and Mayol-Cuevas, 2008 Plane Representation World Plane Parameters: m x y z 1 1 2 2 O Camera normal Basis vectors: cosφi sin θi cθi ,φi sin i cosi cosθi (x,y,z) c(θ2,φ2) c(θ1,φ1) Plane Gee et al 2007 Plane Initialisation World 1. Discover planes using RANSAC over thresholded subset of map 2. O Initialise plane in state using best-fit plane parameters P P new J 0 J s v I sm1 ... sm n 0 T J R 0 0 s z 0 found from SVD of inliers 3. Augment state covariance, P, with new plane Multiplication Append measurement with Jacobian State covariance size increases by 7 after populates cross-covariance adding plane terms R to covariance matrix 0 P= Gee et al 2007 Adding Points to Plane World 1. Decide whether point lies on plane σmax d 2. s O Add point by projecting onto plane and transforming state and covariance P new JPJT J rv I ... rm i1 0 0 rm i 0 3. ... rm n I 0 rm i1 Decide whether to fix point on plane Stateother Add point size to points decreases plane toState Fix plane by points size 1 after in is plane: smallerreduces than adding point to plane state size original state by 2if for >7 each points are fixed point added to plane Gee et al 2007 1. Plane Observation World Cannot make direct observation of plane 2. Transform points to 3D world space 3. Project points into image and match with predicted observations 4. Covariance matrix embodies constraints between plane, camera and points Gee et al 2007 Discovering planes in SLAM Video at: http://www.cs.bris.ac.uk/~gee Gee et al. 2007 Discovering planes in SLAM Video at: http://www.cs.bris.ac.uk/~gee Gee et al. 2007 Mean error & State reduction, planes Average 30 runs Gee at al 2008 Discovering 3D lines Video at: http://www.cs.bris.ac.uk/~gee An example application Video at http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=2000745 Chekhlov et al. 2007 Other interesting recent work Active search and matching: or know what to measure. Davison ICCV 2005 and Chli and Davison ECCV 2008 Submapping: managing better the scalability problem. Clemente et al RSS 2007 Eade and Drummond BMVC 2008 And the work presented in this tutorial: Randomised trees: Vincent Lepetit SFM: Andrew Comport Software tools: http://www.doc.ic.ac.uk/~ajd/Scene/index.html <MonoSLAM code for Linux, works out of the box> http://www.robots.ox.ac.uk/~gk/PTAM/ <Parallel tracking and mapping> http://www.openslam.org/ <for SLAM algorithms mainly from robotics community> http://www.robots.ox.ac.uk/~SSS06/ <SLAM literature and some software in Matlab> Recommended intro reading: Yaakov Bar-Shalom, X. Rong Li, Thiagalingam Kirubarajan, Estimation with Applications to Tracking and Navigation, Wiley-Interscience, 2001. Hugh Durrant-Whyte and Tim Bailey, Simultaneous Localisation and Mapping (SLAM): Part I The Essential Algorithms. Robotics and Automation Magazine, June, 2006. Tim Bailey and Hugh Durrant-Whyte, Simultaneous Localisation and Mapping (SLAM): Part II State of the Art. Robotics and Automation Magazine, September, 2006. Andrew Davison, Ian Reid, Nicholas Molton and Olivier Stasse MonoSLAM: Real-Time Single Camera SLAM, IEEE Trans. PAMI 2007. Andrew Calway, Andrew Davison and Walterio Mayol-Cuevas, Slides of Tutorial on Visual SLAM, BMVC 2007 avaliable at: http://www.cs.bris.ac.uk/Research/Vision/Realtime/bmvctutorial/ Some Challenges Deal with larger maps. Obtain maps that are task-meaningful (manipulation, AR, metrology). Use different feature kinds on an informed way. Benefit from other approaches such as SFM but keep efficiency. Incorporate semantics and beyond-geometric scene understanding. Fin