Human-in-the-loop evaluation of autonomous driving agents in CARLA: A comparative study of end-to-end and vision–language–action models
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
Master's thesis
Authors
Date
Department
Major/Subject
Mcode
Language
en
Pages
48
Series
Abstract
Recent advances in end-to-end and vision–language–action (VLA) driving models have increased the need for careful evaluation. However, most existing benchmarks rely on scripted, rule-based non-player characters (NPCs), which often fail to capture the interactive and sometimes strategic behavior of human drivers. This leaves a gap when evaluating safety under real interaction. To address this, this thesis introduces a human-in-the-loop (HIL) benchmark in the CARLA simulator. By integrating a Logitech G29 steering wheel with the Pretrained CARLA Leaderboard Agents (PCLA) framework, the proposed benchmark enables real-time interaction between a human driver and autonomous agents. Four representative agents are evaluated: three end-to-end models (NEAT, TransFuser, InterFuser) and one VLA model (SimLingo). Experiments cover three safety-critical micro-scenarios: Following Safety, Static Blocking, and Dynamic Cut-in. Quantitative results show that Dynamic Cut-in is consistently the most challenging scenario; most agents fail to avoid collisions, indicating limitations in prediction horizon and reaction latency under human-induced disturbances. Among the models, NEAT exhibits the most stable longitudinal control. Qualitative analysis further identifies two recurring issues in the VLA agent: language outputs that refer to irrelevant or non-existent objects, and a language–action mismatch where seemingly correct intent does not translate into safe control. In addition, several agents show unsafe post-collision behavior by continuing to apply throttle after impact. These findings confirm that HIL testing exposes failure modes that are rarely triggered in NPC-based evaluations, motivating more robust HIL protocols and refined safety metrics for next-generation driving models.Description
Supervisor
Kyrki, VilleThesis advisor
Kurita, ShuheiAzam, Shoaib