Learning Centre

A computational model of early language acquisition from audiovisual experiences of young infants

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Räsänen, Okko
dc.contributor.author Khorrami, Khazar
dc.date.accessioned 2020-01-02T14:10:33Z
dc.date.available 2020-01-02T14:10:33Z
dc.date.issued 2019-01-01
dc.identifier.citation Räsänen , O & Khorrami , K 2019 , A computational model of early language acquisition from audiovisual experiences of young infants . in Proceedings of Interspeech . vol. 2019-September , Interspeech - Annual Conference of the International Speech Communication Association , International Speech Communication Association , pp. 3594-3598 , Interspeech , Graz , Austria , 15/09/2019 . https://doi.org/10.21437/Interspeech.2019-1523 en
dc.identifier.issn 2308-457X
dc.identifier.other PURE UUID: d81fc929-f314-4bfc-9b2d-1150831d9676
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/d81fc929-f314-4bfc-9b2d-1150831d9676
dc.identifier.other PURE LINK: http://www.scopus.com/inward/record.url?scp=85074720850&partnerID=8YFLogxK
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/38782654/ELEC_Rasanen_Computational_model_INTERSPEECH.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/42246
dc.description.abstract Earlier research has suggested that human infants might use statistical dependencies between speech and non-linguistic multimodal input to bootstrap their language learning before they know how to segment words from running speech. However, feasibility of this hypothesis in terms of real-world infant experiences has remained unclear. This paper presents a step towards a more realistic test of the multimodal bootstrapping hypothesis by describing a neural network model that can learn word segments and their meanings from referentially ambiguous acoustic input. The model is tested on recordings of real infant-caregiver interactions using utterance-level labels for concrete visual objects that were attended by the infant when caregiver spoke an utterance containing the name of the object, and using random visual labels for utterances during absence of attention. The results show that beginnings of lexical knowledge may indeed emerge from individually ambiguous learning scenarios. In addition, the hidden layers of the network show gradually increasing selectivity to phonetic categories as a function of layer depth, resembling models trained for phone recognition in a supervised manner. en
dc.format.extent 5
dc.format.extent 3594-3598
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartof Interspeech en
dc.relation.ispartofseries Proceedings of Interspeech en
dc.relation.ispartofseries Volume 2019-September en
dc.relation.ispartofseries Interspeech - Annual Conference of the International Speech Communication Association en
dc.rights openAccess en
dc.title A computational model of early language acquisition from audiovisual experiences of young infants en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Dept Signal Process and Acoust
dc.contributor.department Tampere University
dc.subject.keyword Computational modeling
dc.subject.keyword L1 acquisition
dc.subject.keyword Language acquisition
dc.subject.keyword Lexical learning
dc.subject.keyword Phonetic learning
dc.identifier.urn URN:NBN:fi:aalto-202001021357
dc.identifier.doi 10.21437/Interspeech.2019-1523
dc.type.version publishedVersion


Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

Statistics