Data augmentation strategies for neural network F0 estimation

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorAiraksinen, Manuen_US
dc.contributor.authorJuvela, Laurien_US
dc.contributor.authorAlku, Paavoen_US
dc.contributor.authorRäsänen, Okkoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorJorma Skyttä's Groupen
dc.contributor.groupauthorSpeech Communication Technologyen
dc.date.accessioned2019-06-03T14:16:37Z
dc.date.available2019-06-03T14:16:37Z
dc.date.issued2019-05-01en_US
dc.description.abstractThis study explores various speech data augmentation methods for the task of noise-robust fundamental frequency (F0) estimation with neural networks. The explored augmentation strategies are split into additive noise and channel -based augmentation and into vocoder-based augmentation methods. In vocoder-based augmentation, a glottal vocoder is used to enhance the accuracy of ground truth F0 used for training of the neural network, as well as to expand the training data diversity in terms of F0 patterns and vocal tract lengths of the talkers. Evaluations on the PTDB-TUG corpus indicate that noise and channel augmentation can be used to greatly increase the noise robustness of trained models, and that vocoder-based ground truth enhancement further increases model performance. For smaller datasets, vocoder-based diversity augmentation can also be used to increase performance. The best-performing proposed method greatly outperformed the compared F0 estimation methods in terms of noise robustness.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationAiraksinen, M, Juvela, L, Alku, P & Räsänen, O 2019, Data augmentation strategies for neural network F0 estimation . in 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019; Brighton; United Kingdom; 12-17 May 2019 : Proceedings ., 8683041, IEEE International Conference on Acoustics Speech and Signal Processing, IEEE, pp. 6485 - 6489, IEEE International Conference on Acoustics, Speech, and Signal Processing, Brighton, United Kingdom, 12/05/2019 . https://doi.org/10.1109/ICASSP.2019.8683041en
dc.identifier.doi10.1109/ICASSP.2019.8683041en_US
dc.identifier.isbn978-1-4799-8132-8
dc.identifier.isbn978-1-4799-8131-1
dc.identifier.issn1520-6149
dc.identifier.issn2379-190X
dc.identifier.otherPURE UUID: a3de2b16-b5b1-40b4-9b0e-8b75b0fa276cen_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/a3de2b16-b5b1-40b4-9b0e-8b75b0fa276cen_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85068966502&partnerID=8YFLogxK
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/33983314/ELEC_Airaksinen_Data_augmentation_ICASSP19.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/38336
dc.identifier.urnURN:NBN:fi:aalto-201906033421
dc.language.isoenen
dc.relation.ispartofIEEE International Conference on Acoustics, Speech, and Signal Processingen
dc.relation.ispartofseries44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019; Brighton; United Kingdom; 12-17 May 2019 : Proceedingsen
dc.relation.ispartofseriespp. 6485 - 6489en
dc.relation.ispartofseriesIEEE International Conference on Acoustics Speech and Signal Processingen
dc.rightsopenAccessen
dc.titleData augmentation strategies for neural network F0 estimationen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionacceptedVersion

Files