Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en Räsänen, Okko Seshadri, Shreyas Karadayi, Julien Riebling, Eric Bunce, John Cristia, Alejandrina Metze, Florian Casillas, Marisa Rosemberg, Celia Bergelson, Elika Soderstrom, Melanie 2019-09-03T13:46:54Z 2019-09-03T13:46:54Z 2019-10-01
dc.identifier.citation Räsänen , O , Seshadri , S , Karadayi , J , Riebling , E , Bunce , J , Cristia , A , Metze , F , Casillas , M , Rosemberg , C , Bergelson , E & Soderstrom , M 2019 , ' Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech ' Speech Communication , vol. 113 , pp. 63-80 . en
dc.identifier.issn 0167-6393
dc.identifier.issn 1872-7182
dc.identifier.other PURE UUID: 870c0a6b-f491-44aa-97e8-bf0a1f9668bb
dc.identifier.other PURE ITEMURL:
dc.identifier.other PURE LINK:
dc.identifier.other PURE FILEURL:
dc.description.abstract Automatic word count estimation (WCE) from audio recordings can be used to quantify the amount of verbal communication in a recording environment. One key application of WCE is to measure language input heard by infants and toddlers in their natural environments, as captured by daylong recordings from microphones worn by the infants. Although WCE is nearly trivial for high-quality signals in high-resource languages, daylong recordings are substantially more challenging due to the unconstrained acoustic environments and the presence of near- and far-field speech. Moreover, many use cases of interest involve languages for which reliable ASR systems or even well-defined lexicons are not available. A good WCE system should also perform similarly for low- and high-resource languages in order to enable unbiased comparisons across different cultures and environments. Unfortunately, the current state-of-the-art solution, the LENA system, is based on proprietary software and has only been optimized for American English, limiting its applicability. In this paper, we build on existing work on WCE and present the steps we have taken towards a freely available system for WCE that can be adapted to different languages or dialects with a limited amount of orthographically transcribed speech data. Our system is based on language-independent syllabification of speech, followed by a language-dependent mapping from syllable counts (and a number of other acoustic features) to the corresponding word count estimates. We evaluate our system on samples from daylong infant recordings from six different corpora consisting of several languages and socioeconomic environments, all manually annotated with the same protocol to allow direct comparison. We compare a number of alternative techniques for the two key components in our system: speech activity detection and automatic syllabification of speech. As a result, we show that our system can reach relatively consistent WCE accuracy across multiple corpora and languages (with some limitations). In addition, the system outperforms LENA on three of the four corpora consisting of different varieties of English. We also demonstrate how an automatic neural network-based syllabifier, when trained on multiple languages, generalizes well to novel languages beyond the training data, outperforming two previously proposed unsupervised syllabifiers as a feature extractor for WCE. en
dc.format.extent 18
dc.format.extent 63-80
dc.format.mimetype application/pdf
dc.language.iso en en
dc.publisher Elsevier
dc.relation.ispartofseries Speech Communication en
dc.relation.ispartofseries Volume 113 en
dc.rights openAccess en
dc.subject.other Software en
dc.subject.other Modelling and Simulation en
dc.subject.other Communication en
dc.subject.other Language and Linguistics en
dc.subject.other Linguistics and Language en
dc.subject.other Computer Vision and Pattern Recognition en
dc.subject.other Computer Science Applications en
dc.subject.other 113 Computer and information sciences en
dc.title Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech en
dc.type A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä fi
dc.description.version Peer reviewed en
dc.contributor.department Jorma Skyttä Group
dc.contributor.department Department of Signal Processing and Acoustics
dc.contributor.department CNRS
dc.contributor.department Carnegie Mellon University
dc.contributor.department University of Manitoba
dc.contributor.department Max Planck Institute for Psycholinguistics
dc.contributor.department Consejo Nacional de Investigaciones Científicas y Técnicas
dc.contributor.department Duke University
dc.contributor.department Department of Signal Processing and Acoustics en
dc.subject.keyword Automatic syllabification
dc.subject.keyword Daylong recordings
dc.subject.keyword Language acquisition
dc.subject.keyword Noise robustness
dc.subject.keyword Word count estimation
dc.subject.keyword Software
dc.subject.keyword Modelling and Simulation
dc.subject.keyword Communication
dc.subject.keyword Language and Linguistics
dc.subject.keyword Linguistics and Language
dc.subject.keyword Computer Vision and Pattern Recognition
dc.subject.keyword Computer Science Applications
dc.subject.keyword 113 Computer and information sciences
dc.identifier.urn URN:NBN:fi:aalto-201909035123
dc.identifier.doi 10.1016/j.specom.2019.08.005
dc.type.version publishedVersion

Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive

Advanced Search

article-iconSubmit a publication