Multimodal fusion for sensorimotor control in steering angle prediction

No Thumbnail Available

Access rights

embargoedAccess

URL

Journal Title

Journal ISSN

Volume Title

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
Embargo ends: 2025-09-09

Other link related to publication (opens in new window)

Date

2023-11

Major/Subject

Mcode

Degree programme

Language

en

Pages

16

Series

Engineering Applications of Artificial Intelligence, Volume 126

Abstract

Efficient reasoning about the spatial and temporal structure of the environment is crucial for perception in autonomous driving, particularly in an end-to-end approach. Although different sensor modalities are employed to capture the complex nature of the environment, they each have their limitations. For example, frame-based RGB cameras are susceptible to variations in illumination conditions. However, these limitations at the sensor level can be addressed by complementing them with sensor fusion techniques, enabling the learning of efficient feature representations for end-to-end autonomous perception. In this study, we address the end-to-end perception problem by fusing a frame-based RGB camera with an event camera to improve the learned representation for predicting lateral control. To achieve this, we propose a convolutional encoder– decoder architecture called DRFuser. DRFuser encodes the features from both sensor modalities and leverages self-attention to fuse the frame-based RGB and event camera features in the encoder part. The decoder component unrolls the learned features to predict lateral control, specifically in the form of a steering angle. We extensively evaluate the proposed method on three datasets: our collected Dataset, Davis Driving dataset, and the EventScape dataset for simulation. The results demonstrate the generalization capability of our method on both real-world and simulated datasets. We observe qualitative and quantitative improvements in the performance of the proposed method for predicting lateral control by incorporating the event camera in fusion with the frame-based RGB camera. Notably, our method outperforms state-of-the-art techniques on the Davis Driving Dataset, achieving a 5.6% improvement in the root mean square error (RMSE) score.

Description

Keywords

Other note

Citation

Munir, F, Azam, S, Yow, K-C, Lee, B-G & Jeon, M 2023, ' Multimodal fusion for sensorimotor control in steering angle prediction ', Engineering Applications of Artificial Intelligence, vol. 126, 107087 . https://doi.org/10.1016/j.engappai.2023.107087