SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorSeshadri, Shreyasen_US
dc.contributor.authorRäsänen, Okkoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorJorma Skyttä's Groupen
dc.date.accessioned2019-09-20T11:16:14Z
dc.date.available2019-09-20T11:16:14Z
dc.date.issued2019-09en_US
dc.description.abstractAutomatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learning children in different environments. The majority of previously utilized SCE methods have relied on heuristic digital signal processing (DSP) methods, and only a small number of bi-directional long short-term memory (BLSTM) approaches have made use of modern machine learning approaches in the SCE task. This letter presents a novel end-to-end method called SylNet for automatic syllable counting from speech, built on the basis of a recent developments in neural network architectures. We describe how the entire model can be optimized directly to minimize SCE error on the training data without annotations aligned at the syllable level, and how it can be adapted to new languages using limited speech data with known syllable counts. Experiments on several different languages reveal that SylNet generalizes to languages beyond its training data and further improves with adaptation. It also outperforms several previously proposed methods for syllabification, including end-to-end BLSTMs.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationSeshadri, S & Räsänen, O 2019, 'SylNet : An Adaptable End-to-End Syllable Count Estimator for Speech', IEEE Signal Processing Letters, vol. 26, no. 9, pp. 1359-1363. https://doi.org/10.1109/LSP.2019.2929415en
dc.identifier.doi10.1109/LSP.2019.2929415en_US
dc.identifier.issn1070-9908
dc.identifier.issn1558-2361
dc.identifier.otherPURE UUID: cdd5a0b7-c735-4657-b7a6-a95dff84fc45en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/cdd5a0b7-c735-4657-b7a6-a95dff84fc45en_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/36761199/Syllable_Counter.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/40348
dc.identifier.urnURN:NBN:fi:aalto-201909205373
dc.language.isoenen
dc.publisherIEEE
dc.relation.fundinginfoThis work was supported by the Academy of Finland under Grants 312105 and 314602. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Tomoki Toda.
dc.relation.ispartofseriesIEEE Signal Processing Lettersen
dc.relation.ispartofseriesVolume 26, issue 9, pp. 1359-1363en
dc.rightsopenAccessen
dc.subject.keywordsyllable count estimationen_US
dc.subject.keywordend-to-end learningen_US
dc.subject.keyworddeep learningen_US
dc.subject.keywordspeech processingen_US
dc.subject.keywordSEGMENTATIONen_US
dc.titleSylNet: An Adaptable End-to-End Syllable Count Estimator for Speechen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionacceptedVersion

Files