Multimodal Humor Detection and Social Perception Prediction

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorBijoy, Mehedi Hasanen_US
dc.contributor.authorPorjazovski, Dejanen_US
dc.contributor.authorPhan, Nhanen_US
dc.contributor.authorHuang, Guangpuen_US
dc.contributor.authorGrósz, Tamásen_US
dc.contributor.authorKurimo, Mikkoen_US
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.groupauthorSpeech Recognitionen
dc.date.accessioned2024-12-17T16:15:37Z
dc.date.available2024-12-17T16:15:37Z
dc.date.issued2024-10-28en_US
dc.descriptionPublisher Copyright: © 2024 Copyright held by the owner/author(s).
dc.description.abstractThe parallel audio-visual-text data contains vast amount of information. Thus it is essential to develop machine learning algorithms that can utilise them efficiently. In this work, we investigated unimodal and multimodal solutions for MuSe Humor and Perception challenges. Our main goal was to explicitly show the contribution of each modality in the multimodal systems. In addition, for the Humor challenge, we examined the effect of extending the input context and smoothing the framewise predictions. For Perception challenge, we trained an attention-encoder-decoder model to predict all perceived labels with a single model. During the challenge, the best results were achieved by a fusion of unimodal and multimodal systems, AUC = 0.8645 for Humor, and mean Pearson’s correlation ρ = 0.3550 for Perception. By investigating the multimodal systems we found that using only part of the video for model training can be beneficial, suggesting that valuable information is condensed to certain parts of the video. The implementation of our models and experiments can be found at https://github.com/aalto-speech/MuSe-2024.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationBijoy, M H, Porjazovski, D, Phan, N, Huang, G, Grósz, T & Kurimo, M 2024, Multimodal Humor Detection and Social Perception Prediction. in MuSe 2024 - Proceedings of the 5th Multimodal Sentiment Analysis Challenge and Workshop : Social Perception and Humor, Co-Located with: MM 2024. MuSe 2024 - Proceedings of the 5th Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor, Co-Located with: MM 2024, ACM, pp. 60-64, Multimodal Sentiment Analysis Challenge and Workshop, Melbourne, Australia, 28/10/2024. https://doi.org/10.1145/3689062.3689376en
dc.identifier.doi10.1145/3689062.3689376en_US
dc.identifier.isbn9798400711992
dc.identifier.otherPURE UUID: 11e57792-1214-45b6-a10d-92d9326440b4en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/11e57792-1214-45b6-a10d-92d9326440b4en_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/167387995/3689062.3689376.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/132371
dc.identifier.urnURN:NBN:fi:aalto-202412177848
dc.language.isoenen
dc.relation.ispartofMultimodal Sentiment Analysis Challenge and Workshop: Multimodal Sentiment Analysis Challenge and Workshopen
dc.relation.ispartofseriesMuSe 2024 - Proceedings of the 5th Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor, Co-Located with: MM 2024en
dc.relation.ispartofseriespp. 60-64en
dc.rightsopenAccessen
dc.rightsCC BYen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.keywordAffective Computingen_US
dc.subject.keywordHumor Detectionen_US
dc.subject.keywordMultimodal Fusionen_US
dc.subject.keywordMultimodal Sentiment Analysisen_US
dc.subject.keywordSocial Perceptionen_US
dc.titleMultimodal Humor Detection and Social Perception Predictionen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3689062.3689376.pdf
Size:
1.14 MB
Format:
Adobe Portable Document Format