A comparison between humans and AI at recognizing objects in unusual poses

Loading...
Thumbnail Image

Access rights

openAccess
CC BY
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Date

2025-01

Major/Subject

Mcode

Degree programme

Language

en

Pages

32

Series

Transactions on Machine Learning Research, Volume 2025, issue January, pp. 1-32

Abstract

Deep learning is closing the gap with human vision on several object recognition benchmarks. Here we investigate this gap in the context of challenging images where objects are seen in unusual poses. We find that humans excel at recognizing objects in such poses. In contrast, state-of-the-art deep networks for vision (EfficientNet, SWAG, ViT, SWIN, BEiT, ConvNext) and state-of-the-art large vision-language models (Claude 3.5, Gemini 1.5, GPT-4, SigLIP) are systematically brittle on unusual poses, with the exception of Gemini showing excellent robustness to that condition. As we limit image exposure time, human performance degrades to the level of deep networks, suggesting that additional mental processes (requiring additional time) are necessary to identify objects in unusual poses. An analysis of error patterns of humans vs. networks reveals that even time-limited humans are dissimilar to feed-forward deep networks. In conclusion, our comparison reveals that humans are overall more robust than deep networks and that they rely on different mechanisms for recognizing objects in unusual poses. Understanding the nature of the mental processes taking place during extra viewing time may be key to reproduce the robustness of human vision in silico. All code and data is available at https://github.com/BRAIN-Aalto/unusual_poses.

Description

Publisher Copyright: © 2025, Transactions on Machine Learning Research. All rights reserved.

Keywords

Other note

Citation

Ollikka, N, Abbas, A, Perin, A, Kilpeläinen, M & Deny, S 2025, ' A comparison between humans and AI at recognizing objects in unusual poses ', Transactions on Machine Learning Research, vol. 2025, no. January, pp. 1-32 . < https://arxiv.org/abs/2402.03973 >