Optimizing the Performance of Text Classification Models by Improving the Isotropy of the Embeddings using a Joint Loss Function

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorAttieh, Josephen_US
dc.contributor.authorZewoudie, Abrahamen_US
dc.contributor.authorVlassov, Vladimiren_US
dc.contributor.authorFlanagan, Adrianen_US
dc.contributor.authorBäckström, Tomen_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.departmentDepartment of Communications and Networkingen
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.editorFink, Gernot A.en_US
dc.contributor.editorJain, Rajiven_US
dc.contributor.editorKise, Koichien_US
dc.contributor.editorZanibbi, Richarden_US
dc.contributor.groupauthorSpeech Interaction Technologyen
dc.contributor.organizationDepartment of Computer Scienceen_US
dc.contributor.organizationKTH Royal Institute of Technologyen_US
dc.contributor.organizationHuawei Technologiesen_US
dc.date.accessioned2023-08-23T06:08:40Z
dc.date.available2023-08-23T06:08:40Z
dc.date.embargoinfo:eu-repo/date/embargoEnd/2024-08-19en_US
dc.date.issued2023-08-19en_US
dc.description.abstractRecent studies show that the spatial distribution of the sentence representations generated from pre-trained language models is highly anisotropic. This results in a degradation in the performance of the models on the downstream task. Most methods improve the isotropy of the sentence embeddings by refining the corresponding contextual word representations, then deriving the sentence embeddings from these refined representations. In this study, we propose to improve the quality of the sentence embeddings extracted from the [CLS] token of the pretrained language models by improving the isotropy of the embeddings. We add one feed-forward layer between the model and the downstream task layers, and we train it using a novel joint loss function. The proposed approach results in embeddings with better isotropy, that generalize better on the downstream task. Experimental results on 3 GLUE datasets with classification as the downstream task show that our proposed method is on par with the state-of-the-art, as it achieves performance gains of around 2–3% on the downstream tasks compared to the baseline.en
dc.description.versionPeer revieweden
dc.format.extent16
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationAttieh, J, Zewoudie, A, Vlassov, V, Flanagan, A & Bäckström, T 2023, Optimizing the Performance of Text Classification Models by Improving the Isotropy of the Embeddings using a Joint Loss Function . in G A Fink, R Jain, K Kise & R Zanibbi (eds), Document Analysis and Recognition – ICDAR 2023 - 17th International Conference, Proceedings . Lecture notes in computer science, Springer, pp. 121-136, International Conference on Document Analysis and Recognition, San Jose, California, United States, 21/08/2023 . https://doi.org/10.1007/978-3-031-41734-4_8en
dc.identifier.doi10.1007/978-3-031-41734-4_8en_US
dc.identifier.isbn978-3-031-41734-4
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.otherPURE UUID: 9e4c72f1-3ffc-4dfa-9f43-644941a65869en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/9e4c72f1-3ffc-4dfa-9f43-644941a65869en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85173582381&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/112050557/3475.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/122658
dc.identifier.urnURN:NBN:fi:aalto-202308235004
dc.language.isoenen
dc.publisherSpringer
dc.relation.ispartofInternational Conference on Document Analysis and Recognitionen
dc.relation.ispartofseries17th International Conference on Document Analysis and Recognition (ICDAR 2023)en
dc.relation.ispartofseriesLecture notes in computer scienceen
dc.rightsopenAccessen
dc.titleOptimizing the Performance of Text Classification Models by Improving the Isotropy of the Embeddings using a Joint Loss Functionen
dc.typeA4 Artikkeli konferenssijulkaisussafi

Files