dc.contributor |
Aalto-yliopisto |
fi |
dc.contributor |
Aalto University |
en |
dc.contributor.author |
Rouhe, Aku |
|
dc.contributor.author |
Kaseva, Tuomas |
|
dc.contributor.author |
Kurimo, Mikko |
|
dc.date.accessioned |
2020-09-04T07:45:12Z |
|
dc.date.available |
2020-09-04T07:45:12Z |
|
dc.date.issued |
2020-05 |
|
dc.identifier.citation |
Rouhe , A , Kaseva , T & Kurimo , M 2020 , Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings . in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings . , 9053998 , Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , IEEE , pp. 7064-7068 , IEEE International Conference on Acoustics, Speech, and Signal Processing , Barcelona , Spain , 04/05/2020 . https://doi.org/10.1109/ICASSP40776.2020.9053998 |
en |
dc.identifier.isbn |
9781509066315 |
|
dc.identifier.issn |
1520-6149 |
|
dc.identifier.issn |
2379-190X |
|
dc.identifier.other |
PURE UUID: 17b37434-aaee-43d8-9718-888409b279c4 |
|
dc.identifier.other |
PURE ITEMURL: https://research.aalto.fi/en/publications/17b37434-aaee-43d8-9718-888409b279c4 |
|
dc.identifier.other |
PURE LINK: http://www.scopus.com/inward/record.url?scp=85089246234&partnerID=8YFLogxK |
|
dc.identifier.other |
PURE FILEURL: https://research.aalto.fi/files/51084049/Rouhe_Speaker_aware_training_of_attention_based_end_to_end_speech_recognition.pdf |
|
dc.identifier.uri |
https://aaltodoc.aalto.fi/handle/123456789/46322 |
|
dc.description |
| openaire: EC/H2020/780069/EU//MeMAD |
|
dc.description.abstract |
In speaker-aware training, a speaker embedding is appended to DNN input features. This allows the DNN to effectively learn representations, which are robust to speaker variability.We apply speaker-aware training to attention-based end-to-end speech recognition. We show that it can improve over a purely end-to-end baseline. We also propose speaker-aware training as a viable method to leverage untranscribed, speaker annotated data.We apply state-of-the-art embedding approaches, both i-vectors and neural embeddings, such as x-vectors. We experiment with embeddings trained in two conditions: on the fixed ASR data, and on a large untranscribed dataset. We run our experiments on the TED-LIUM and Wall Street Journal datasets. No embedding consistently outperforms all others, but in many settings neural embeddings outperform i-vectors. |
en |
dc.format.extent |
5 |
|
dc.format.extent |
7064-7068 |
|
dc.format.mimetype |
application/pdf |
|
dc.language.iso |
en |
en |
dc.relation |
info:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMAD |
|
dc.relation.ispartof |
IEEE International Conference on Acoustics, Speech and Signal Processing |
en |
dc.relation.ispartofseries |
2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings |
en |
dc.relation.ispartofseries |
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing |
en |
dc.rights |
openAccess |
en |
dc.title |
Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings |
en |
dc.type |
A4 Artikkeli konferenssijulkaisussa |
fi |
dc.description.version |
Peer reviewed |
en |
dc.contributor.department |
Dept Signal Process and Acoust |
|
dc.subject.keyword |
end-to-end speech recognition |
|
dc.subject.keyword |
speaker embedding |
|
dc.subject.keyword |
speaker-adaptation |
|
dc.subject.keyword |
speaker-aware training |
|
dc.identifier.urn |
URN:NBN:fi:aalto-202009045265 |
|
dc.identifier.doi |
10.1109/ICASSP40776.2020.9053998 |
|
dc.type.version |
acceptedVersion |
|