Exploring Contextual Representation and Multi-modality for End-to-end Autonomous Driving

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorAzam, Shoaiben_US
dc.contributor.authorMunir, Farzeenen_US
dc.contributor.authorKyrki, Villeen_US
dc.contributor.authorKucner, Tomasz Piotren_US
dc.contributor.authorJeon, Moonguen_US
dc.contributor.authorPedrycz, Witolden_US
dc.contributor.departmentDepartment of Electrical Engineering and Automationen
dc.contributor.groupauthorIntelligent Roboticsen
dc.contributor.groupauthorMobile Roboticsen
dc.contributor.organizationGwangju Institute of Science and Technologyen_US
dc.contributor.organizationUniversity of Albertaen_US
dc.date.accessioned2024-06-14T07:46:52Z
dc.date.available2024-06-14T07:46:52Z
dc.date.issued2024-09en_US
dc.description.abstractLearning contextual and spatial environmental representations enhances autonomous vehicle’s hazard anticipation and decision-making in complex scenarios. Recent perception systems enhance spatial understanding with sensor fusion but often lack global environmental context. Humans, when driving, naturally employ neural maps that integrate various factors such as historical data, situational subtleties, and behavioral predictions of other road users to form a rich contextual understanding of their surroundings. This neural map-based comprehension is integral to making informed decisions on the road. In contrast, even with their significant advancements, autonomous systems have yet to fully harness this depth of human-like contextual understanding. Motivated by this, our work draws inspiration from human driving patterns and seeks to formalize the sensor fusion approach within an end-to-end autonomous driving framework. We introduce a framework that integrates three cameras (left, right, and center) to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation. The sensor data is fused and encoded using a self-attention mechanism, leading to an auto-regressive waypoint prediction module. We treat feature representation as a sequential problem, employing a vision transformer to distill the contextual interplay between sensor modalities. The efficacy of the proposed method is experimentally evaluated in both open and closed-loop settings. Our method achieves displacement error by 0.67 m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset. In closed-loop evaluations on CARLA’s Town05 Long and Longest6 benchmarks, the proposed method enhances driving performance, route completion, and reduces infractions.en
dc.description.versionPeer revieweden
dc.format.extent13
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationAzam, S, Munir, F, Kyrki, V, Kucner, T P, Jeon, M & Pedrycz, W 2024, ' Exploring Contextual Representation and Multi-modality for End-to-end Autonomous Driving ', Engineering Applications of Artificial Intelligence, vol. 135, 108767 . https://doi.org/10.1016/j.engappai.2024.108767en
dc.identifier.doi10.1016/j.engappai.2024.108767en_US
dc.identifier.issn0952-1976
dc.identifier.otherPURE UUID: 5d77ac89-3c09-43b3-9726-06ca3dc009cben_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/5d77ac89-3c09-43b3-9726-06ca3dc009cben_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85195421264&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/148460472/1-s2.0-S0952197624009254-main.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/128715
dc.identifier.urnURN:NBN:fi:aalto-202406144304
dc.language.isoenen
dc.publisherElsevier Ltd
dc.relation.ispartofseriesEngineering Applications of Artificial Intelligence
dc.rightsopenAccessen
dc.subject.keywordVision transformeren_US
dc.subject.keywordImitation learningen_US
dc.subject.keywordAttentionen_US
dc.subject.keywordVision-centric autonomous drivingen_US
dc.subject.keywordContextual representationen_US
dc.titleExploring Contextual Representation and Multi-modality for End-to-end Autonomous Drivingen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion
Files