A comparison between humans and AI at recognizing objects in unusual poses
Loading...
Access rights
openAccess
CC BY
CC BY
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
2025-01
Major/Subject
Mcode
Degree programme
Language
en
Pages
32
Series
Transactions on Machine Learning Research, Volume 2025, issue January, pp. 1-32
Abstract
Deep learning is closing the gap with human vision on several object recognition benchmarks. Here we investigate this gap in the context of challenging images where objects are seen in unusual poses. We find that humans excel at recognizing objects in such poses. In contrast, state-of-the-art deep networks for vision (EfficientNet, SWAG, ViT, SWIN, BEiT, ConvNext) and state-of-the-art large vision-language models (Claude 3.5, Gemini 1.5, GPT-4, SigLIP) are systematically brittle on unusual poses, with the exception of Gemini showing excellent robustness to that condition. As we limit image exposure time, human performance degrades to the level of deep networks, suggesting that additional mental processes (requiring additional time) are necessary to identify objects in unusual poses. An analysis of error patterns of humans vs. networks reveals that even time-limited humans are dissimilar to feed-forward deep networks. In conclusion, our comparison reveals that humans are overall more robust than deep networks and that they rely on different mechanisms for recognizing objects in unusual poses. Understanding the nature of the mental processes taking place during extra viewing time may be key to reproduce the robustness of human vision in silico. All code and data is available at https://github.com/BRAIN-Aalto/unusual_poses.Description
Publisher Copyright: © 2025, Transactions on Machine Learning Research. All rights reserved.
Keywords
Other note
Citation
Ollikka, N, Abbas, A, Perin, A, Kilpeläinen, M & Deny, S 2025, ' A comparison between humans and AI at recognizing objects in unusual poses ', Transactions on Machine Learning Research, vol. 2025, no. January, pp. 1-32 . < https://arxiv.org/abs/2402.03973 >