Vision Transformer for Learning Driving Policies in Complex and Dynamic Environments

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorKargar, Eshaghen_US
dc.contributor.authorKyrki, Villeen_US
dc.contributor.departmentDepartment of Electrical Engineering and Automationen
dc.contributor.groupauthorIntelligent Roboticsen
dc.date.accessioned2022-08-17T09:39:23Z
dc.date.available2022-08-17T09:39:23Z
dc.date.issued2022-07-19en_US
dc.descriptionPublisher Copyright: © 2022 IEEE.
dc.description.abstractDriving in a complex and dynamic urban environment is a difficult task that requires a complex decision policy. In order to make informed decisions, one needs to gain an understanding of the long-range context and the importance of other vehicles. In this work, we propose to use Vision Transformer (ViT) to learn a driving policy in urban settings with birds-eye-view (BEV) input images. The ViT network learns the global context of the scene more effectively than with earlier proposed Convolutional Neural Networks (ConvNets). Furthermore, ViT's attention mechanism helps to learn an attention map for the scene which allows the ego car to determine which surrounding cars are important to its next decision. We demonstrate that a DQN agent with a ViT backbone outperforms baseline algorithms with ConvNet backbones pre-trained in various ways. In particular, the proposed method helps reinforcement learning algorithms to learn faster, with increased performance and less data than baselines.en
dc.description.versionPeer revieweden
dc.format.extent7
dc.format.extent1558-1564
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationKargar, E & Kyrki, V 2022, Vision Transformer for Learning Driving Policies in Complex and Dynamic Environments . in 2022 IEEE Intelligent Vehicles Symposium, IV 2022 . IEEE Intelligent Vehicles Symposium, Proceedings, vol. 2022-June, IEEE, pp. 1558-1564, IEEE Intelligent Vehicles Symposium, Aachen, Germany, 05/06/2022 . https://doi.org/10.1109/IV51971.2022.9827348en
dc.identifier.doi10.1109/IV51971.2022.9827348en_US
dc.identifier.isbn9781665488211
dc.identifier.otherPURE UUID: f97c446f-205c-4c27-9f50-a701649f9376en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/f97c446f-205c-4c27-9f50-a701649f9376en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85135372101&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/86913182/Kyrki_ViT_IVS.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/116109
dc.identifier.urnURN:NBN:fi:aalto-202208174926
dc.language.isoenen
dc.relation.ispartofIEEE Intelligent Vehicles Symposiumen
dc.relation.ispartofseries2022 IEEE Intelligent Vehicles Symposium, IV 2022en
dc.relation.ispartofseriesIEEE Intelligent Vehicles Symposium, Proceedingsen
dc.relation.ispartofseriesVolume 2022-Juneen
dc.rightsopenAccessen
dc.titleVision Transformer for Learning Driving Policies in Complex and Dynamic Environmentsen
dc.typeConference article in proceedingsfi
dc.type.versionacceptedVersion
Files