Slot Attention with Re-Initialization and Self-Distillation

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorZhao, Rongzhen
dc.contributor.authorZhao, Yi
dc.contributor.authorKannala, Juho
dc.contributor.authorPajarinen, Joni
dc.contributor.departmentDepartment of Electrical Engineering and Automationen
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorRobot Learningen
dc.contributor.groupauthorComputer Science Professorsen
dc.contributor.groupauthorComputer Science - Visual Computing (VisualComputing) - Research areaen
dc.contributor.groupauthorComputer Science - Artificial Intelligence and Machine Learning (AIML) - Research areaen
dc.contributor.groupauthorProfessorship Kannala Juhoen
dc.contributor.organizationDepartment of Computer Science
dc.date.accessioned2025-11-12T06:42:25Z
dc.date.available2025-11-12T06:42:25Z
dc.date.issued2025-10-27
dc.description.abstractUnlike popular solutions based on dense feature maps, Object-Centric Learning (OCL) represents visual scenes as sub-symbolic object-level feature vectors, termed slots, which are highly versatile for tasks involving visual modalities. OCL typically aggregates object superpixels into slots by iteratively applying competitive cross attention, known as Slot Attention, with the slots as the query. However, once initialized, these slots are reused naively, causing redundant slots to compete with informative ones for representing objects. This often results in objects being erroneously segmented into parts. Additionally, mainstream methods derive supervision signals solely from decoding slots into the input's reconstruction, overlooking potential supervision based on internal information. To address these issues, we propose Slot Attention with re-Initialization and self-Distillation (DIAS): i) We reduce redundancy in the aggregated slots and re-initialize extra aggregation to update the remaining slots; ii) We drive the bad attention map at the first aggregation iteration to approximate the good at the last iteration to enable self-distillation. Experiments demonstrate that DIAS achieves state-of-the-art on OCL tasks like object discovery and recognition, while also improving advanced visual prediction and reasoning. Our source code and model checkpoints are available on https://github.com/Genera1Z/DIAS.en
dc.description.versionPeer revieweden
dc.format.extent8
dc.format.mimetypeapplication/pdf
dc.identifier.citationZhao, R, Zhao, Y, Kannala, J & Pajarinen, J 2025, Slot Attention with Re-Initialization and Self-Distillation. in Proceedings of the 33rd ACM International Conference on Multimedia. ACM, pp. 4185-4192, ACM International Conference on Multimedia, Dublin, Ireland, 27/10/2025. https://doi.org/10.1145/3746027.3755339en
dc.identifier.doi10.1145/3746027.3755339
dc.identifier.isbn979-8-4007-2035-2
dc.identifier.otherPURE UUID: 8a801cad-1f44-42c7-bcdc-b7b0588e1be5
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/8a801cad-1f44-42c7-bcdc-b7b0588e1be5
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/200184159/Slot_Attention_with_Re-Initialization_and_Self-Distillation.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/140618
dc.identifier.urnURN:NBN:fi:aalto-202511128769
dc.language.isoenen
dc.relation.fundinginfoWe acknowledge the support of Finnish Center for Artificial Intelligence (FCAI), Research Council of Finland flagship program. We thank the Research Council of Finland for funding the projects ADEREHA (grant no. 353198), BERMUDA(362407), PROFI7 (352788) and MARL (357301). We also appreciate CSC - IT Center for Science, Finland, for granting access to supercomputers Mahti and Puhti, as well as LUMI, owned by the European High Performance Computing Joint Undertaking (EuroHPC JU) and hosted by CSC Finland in collaboration with the LUMI consortium. Furthermore, we acknowledge the computational resources provided by the Aalto Science-IT project through the Triton cluster.
dc.relation.ispartofACM International Conference on Multimediaen
dc.relation.ispartofseriesProceedings of the 33rd ACM International Conference on Multimediaen
dc.relation.ispartofseriespp. 4185-4192en
dc.rightsopenAccessen
dc.rightsCC BY
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.keywordobject representation
dc.subject.keywordobject-centric learning
dc.subject.keywordslot attention
dc.subject.keywordvisual prediction
dc.subject.keywordvisual reasoning
dc.titleSlot Attention with Re-Initialization and Self-Distillationen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Slot_Attention_with_Re-Initialization_and_Self-Distillation.pdf
Size:
2.17 MB
Format:
Adobe Portable Document Format