aalto1 untyped-item.component.html
Multi-Scale Fusion for Object Representation
Loading...
Access rights
openAccess
CC BY
CC BY
Creative Commons license
Except where otherwised noted, this item's license is described as openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
17
Series
13th International Conference on Learning Representations (ICLR), pp. 61970-61986
Abstract
Representing images or videos as object-level feature vectors, rather than pixel-level feature maps, facilitates advanced visual tasks. Object-Centric Learning (OCL) primarily achieves this by reconstructing the input under the guidance of Variational Autoencoder (VAE) intermediate representation to drive so-called slots to aggregate as much object information as possible. However, existing VAE guidance does not explicitly address that objects can vary in pixel sizes while models typically excel at specific pattern scales. We propose Multi-Scale Fusion (MSF) to enhance VAE guidance for OCL training. To ensure objects of all sizes fall within VAE's comfort zone, we adopt the image pyramid, which produces intermediate representations at multiple scales; To foster scale-invariance/variance in object super-pixels, we devise inter/intra-scale fusion, which augments low-quality object super-pixels of one scale with corresponding high-quality super-pixels from another scale. On standard OCL benchmarks, our technique improves mainstream methods, including state-of-the-art diffusion-based ones. The source code is available on https://github.com/Genera1Z/MultiScaleFusion.
Description
Publisher Copyright: © 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.
Keywords
Other note
Citation
Zhao, R, Wang, V, Kannala, J & Pajarinen, J 2025, Multi-Scale Fusion for Object Representation. in 13th International Conference on Learning Representations (ICLR). Curran Associates Inc., pp. 61970-61986, International Conference on Learning Representations, Singapore, Singapore, 24/04/2025. < https://openreview.net/forum?id=nobDw4d1k7 >
