Learning Centre

Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Rouhe, Aku
dc.contributor.author Kaseva, Tuomas
dc.contributor.author Kurimo, Mikko
dc.date.accessioned 2020-09-04T07:45:12Z
dc.date.available 2020-09-04T07:45:12Z
dc.date.issued 2020-05
dc.identifier.citation Rouhe , A , Kaseva , T & Kurimo , M 2020 , Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings . in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings . , 9053998 , Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , IEEE , pp. 7064-7068 , IEEE International Conference on Acoustics, Speech, and Signal Processing , Barcelona , Spain , 04/05/2020 . https://doi.org/10.1109/ICASSP40776.2020.9053998 en
dc.identifier.isbn 9781509066315
dc.identifier.issn 1520-6149
dc.identifier.issn 2379-190X
dc.identifier.other PURE UUID: 17b37434-aaee-43d8-9718-888409b279c4
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/17b37434-aaee-43d8-9718-888409b279c4
dc.identifier.other PURE LINK: http://www.scopus.com/inward/record.url?scp=85089246234&partnerID=8YFLogxK
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/51084049/Rouhe_Speaker_aware_training_of_attention_based_end_to_end_speech_recognition.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/46322
dc.description | openaire: EC/H2020/780069/EU//MeMAD
dc.description.abstract In speaker-aware training, a speaker embedding is appended to DNN input features. This allows the DNN to effectively learn representations, which are robust to speaker variability.We apply speaker-aware training to attention-based end-to-end speech recognition. We show that it can improve over a purely end-to-end baseline. We also propose speaker-aware training as a viable method to leverage untranscribed, speaker annotated data.We apply state-of-the-art embedding approaches, both i-vectors and neural embeddings, such as x-vectors. We experiment with embeddings trained in two conditions: on the fixed ASR data, and on a large untranscribed dataset. We run our experiments on the TED-LIUM and Wall Street Journal datasets. No embedding consistently outperforms all others, but in many settings neural embeddings outperform i-vectors. en
dc.format.extent 5
dc.format.extent 7064-7068
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation info:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMAD
dc.relation.ispartof IEEE International Conference on Acoustics, Speech and Signal Processing en
dc.relation.ispartofseries 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings en
dc.relation.ispartofseries Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing en
dc.rights openAccess en
dc.title Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Dept Signal Process and Acoust
dc.subject.keyword end-to-end speech recognition
dc.subject.keyword speaker embedding
dc.subject.keyword speaker-adaptation
dc.subject.keyword speaker-aware training
dc.identifier.urn URN:NBN:fi:aalto-202009045265
dc.identifier.doi 10.1109/ICASSP40776.2020.9053998
dc.type.version acceptedVersion


Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

Statistics