HSCNet++ : Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer
Loading...
Access rights
openAccess
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
21
Series
International Journal of Computer Vision, Volume 132, issue 7, pp. 2530-2550
Abstract
Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the mapping between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a regression task directly can be difficult for a single network. In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments. It sets a new state-of-the-art for single-image localization on the 7-Scenes, 12-Scenes, Cambridge Landmarks datasets, and the combined indoor scenes.Description
Other note
Citation
Wang, S, Laskar, Z, Melekhov, I, Li, X, Zhao, Y, Tolias, G & Kannala, J 2024, 'HSCNet++ : Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer', International Journal of Computer Vision, vol. 132, no. 7, pp. 2530-2550. https://doi.org/10.1007/s11263-023-01982-9