Automatic analysis of the emotional content of speech in daylong child-centered recordings from a neonatal intensive care unit

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorVaaras, Einarien_US
dc.contributor.authorAhlqvist-Bj¨orkroth, Sarien_US
dc.contributor.authorDrossos, Konstantinosen_US
dc.contributor.authorR&Die;as¨anen, Okkoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.organizationTampere Universityen_US
dc.contributor.organizationUniversity of Turkuen_US
dc.date.accessioned2021-12-01T07:50:51Z
dc.date.available2021-12-01T07:50:51Z
dc.date.issued2021en_US
dc.descriptionPublisher Copyright: Copyright © 2021 ISCA.
dc.description.abstractResearchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationVaaras, E, Ahlqvist-Bj¨orkroth, S, Drossos, K & R&Die;as¨anen, O 2021, Automatic analysis of the emotional content of speech in daylong child-centered recordings from a neonatal intensive care unit. in 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. Proceedings of the Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 526-530, Interspeech, Brno, Czech Republic, 30/08/2021. https://doi.org/10.21437/Interspeech.2021-303en
dc.identifier.doi10.21437/Interspeech.2021-303en_US
dc.identifier.isbn9781713836902
dc.identifier.issn2308-457X
dc.identifier.issn1990-9772
dc.identifier.otherPURE UUID: 4c7b5ec4-2f1d-489d-93e7-ab89a124adc2en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/4c7b5ec4-2f1d-489d-93e7-ab89a124adc2en_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/76318991/Automatic_Analysis_of_the_Emotional_Content_of_Speech_in_Daylong_Child_Centered_Recordings_from_a_Neonatal_Intensive_Care_Unit.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/111350
dc.identifier.urnURN:NBN:fi:aalto-2021120110500
dc.language.isoenen
dc.relation.ispartofInterspeechen
dc.relation.ispartofseries22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021en
dc.relation.ispartofseriespp. 526-530en
dc.relation.ispartofseriesProceedings of the Annual Conference of the International Speech Communication Associationen
dc.rightsopenAccessen
dc.subject.keywordDaylong audioen_US
dc.subject.keywordLena recorderen_US
dc.subject.keywordReal-world audioen_US
dc.subject.keywordSpeech analysisen_US
dc.subject.keywordSpeech emotion recognitionen_US
dc.titleAutomatic analysis of the emotional content of speech in daylong child-centered recordings from a neonatal intensive care uniten
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Automatic_Analysis_of_the_Emotional_Content_of_Speech_in_Daylong_Child_Centered_Recordings_from_a_Neonatal_Intensive_Care_Unit.pdf
Size:
713.31 KB
Format:
Adobe Portable Document Format