Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorRäsänen, Okkoen_US
dc.contributor.authorSeshadri, Shreyasen_US
dc.contributor.authorCasillas, Marisaen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.date.accessioned2018-12-10T10:30:31Z
dc.date.available2018-12-10T10:30:31Z
dc.date.issued2018-01-01en_US
dc.description.abstractWord count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationRäsänen, O, Seshadri, S & Casillas, M 2018, Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. in Proceedings of Interspeech. vol. 2018-September, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association (ISCA), pp. 1200-1204, Interspeech, Hyderabad, India, 02/09/2018. https://doi.org/10.21437/Interspeech.2018-1047en
dc.identifier.doi10.21437/Interspeech.2018-1047en_US
dc.identifier.issn2308-457X
dc.identifier.otherPURE UUID: ce49edc2-81a2-4267-a1d9-516b50fdbbf4en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/ce49edc2-81a2-4267-a1d9-516b50fdbbf4en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85054995553&partnerID=8YFLogxK
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/29109614/ELEC_rasanen_et_al_comparison_of_syllabification_interspeech.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/35274
dc.identifier.urnURN:NBN:fi:aalto-201812106289
dc.language.isoenen
dc.relation.ispartofInterspeechen
dc.relation.ispartofseriesProceedings of Interspeechen
dc.relation.ispartofseriesVolume 2018-September, pp. 1200-1204en
dc.relation.ispartofseriesProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECHen
dc.rightsopenAccessen
dc.subject.keywordDaylong recordingsen_US
dc.subject.keywordLanguage acquisitionen_US
dc.subject.keywordNoise robustnessen_US
dc.subject.keywordSyllabificationen_US
dc.subject.keywordWord count estimationen_US
dc.titleComparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditionsen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files