Multi-Modal Place Recognition and Pose Estimation for Autonomous Rovers in Unstructured Environments
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
122
Series
Abstract
Autonomous navigation in planetary-like environments presents unique challenges due to the absence of GPS signals, limited semantic structure, and visual ambiguity caused by repetitive textures. Traditional place recognition and localization methods either rely on dense maps and structured environments or only provide place retrieval without estimating full 6-DoF (Degrees of Freedom) poses. This limits their applicability in the context of real-time Simultaneous Lo-calization and Mapping (SLAM) for planetary exploration. This thesis addresses the problem by developing a multi-modal system that per-forms both place recognition and relative pose estimation in unstructured environ-ments. The proposed approach fuses transformer-based visual features from DI-NOv2 with LiDAR-derived 3D descriptors from SONATA, aligning them in 3D space to produce interpretable correspondences. Place retrieval is performed by ag-gregating DINOv2 descriptors with SALAD and searching via FAISS indexing. The method is evaluated on the Etna volcano dataset, representative of planetary ter-rains. Results show superior retrieval performance over NetVLAD and TransVPR and more stable pose estimation than handcrafted or regression-based base-lines. Multi-modal fusion improved robustness in low-texture and low-light scenes, supporting the value of combining vision and LiDAR. The system produces interpretable outputs and operates within real-time SLAM constraints for retrieval, although further optimization is needed for pose estimation. These findings high-light the feasibility of delivering explainable 6D poses for SLAM in extreme envi-ronments, with potential applications in exploration, disaster response, and agri-culture.Description
Supervisor
Zhou, QuanThesis advisor
Jensfelt, PatricFolkesson, John
Keywords
multi-modal place recognition, six degrees of freedom pose estimation, Simultaneous Localization and Mapping (SLAM) Integration, transformer-based encoders, Light Detection and Ranging (LiDAR), DINO version 2, feature aggregation, Sinkhorn Algorithm for Locally Aggregated Descriptors (SALAD), unstructured planetary environments