dc.contributor |
Aalto-yliopisto |
fi |
dc.contributor |
Aalto University |
en |
dc.contributor.author |
Smit, Peter |
|
dc.contributor.author |
Gangireddy, Siva |
|
dc.contributor.author |
Enarvi, Seppo |
|
dc.contributor.author |
Virpioja, Sami |
|
dc.contributor.author |
Kurimo, Mikko |
|
dc.date.accessioned |
2018-02-09T10:07:28Z |
|
dc.date.available |
2018-02-09T10:07:28Z |
|
dc.date.issued |
2018 |
|
dc.identifier.citation |
Smit , P , Gangireddy , S , Enarvi , S , Virpioja , S & Kurimo , M 2018 , Aalto system for the 2017 Arabic multi-genre broadcast challenge . in Automatic Speech Recognition and Understanding (ASRU), IEEE Workshop on . IEEE , pp. 338-345 , IEEE Automatic Speech Recognition and Understanding Workshop , Okinawa , Japan , 16/12/2017 . https://doi.org/10.1109/ASRU.2017.8268955 |
en |
dc.identifier.other |
PURE UUID: e4001435-8e01-43c8-9a67-603ce87e962c |
|
dc.identifier.other |
PURE ITEMURL: https://research.aalto.fi/en/publications/e4001435-8e01-43c8-9a67-603ce87e962c |
|
dc.identifier.other |
PURE FILEURL: https://research.aalto.fi/files/15224073/smit2017mgb.pdf |
|
dc.identifier.uri |
https://aaltodoc.aalto.fi/handle/123456789/30015 |
|
dc.description.abstract |
We describe the speech recognition systems we have created for MGB-3, the 3rd Multi Genre Broadcast challenge, which this year consisted of a task of building a system for transcribing Egyptian Dialect Arabic speech, using a big audio corpus of primarily Modern Standard Arabic speech and only a small amount (5 hours) of Egyptian adaptation data. Our system, which was a combination of different acoustic models, language models and lexical units, achieved a Multi-Reference Word Error Rate of 29.25%, which was the lowest in the competition. Also on the old MGB-2 task, which was run again to indicate progress, we achieved the lowest error rate: 13.2%. The result is a combination of the application of state-of-the-art speech recognition methods such as simple dialect adaptation for a Time-Delay Neural Network (TDNN) acoustic model (-27% errors compared to the baseline), Recurrent Neural Network Language Model (RNNLM) rescoring (an additional -5%), and system combination with Minimum Bayes Risk (MBR) decoding (yet another -10%). We also explored the use of morph and character language models, which was particularly beneficial in providing a rich pool of systems for the MBR decoding. |
en |
dc.format.extent |
338-345 |
|
dc.format.mimetype |
application/pdf |
|
dc.language.iso |
en |
en |
dc.relation.ispartofseries |
Automatic Speech Recognition and Understanding (ASRU), IEEE Workshop on |
en |
dc.rights |
openAccess |
en |
dc.title |
Aalto system for the 2017 Arabic multi-genre broadcast challenge |
en |
dc.type |
A4 Artikkeli konferenssijulkaisussa |
fi |
dc.description.version |
Peer reviewed |
en |
dc.contributor.department |
Dept Signal Process and Acoust |
|
dc.identifier.urn |
URN:NBN:fi:aalto-201802091512 |
|
dc.identifier.doi |
10.1109/ASRU.2017.8268955 |
|
dc.type.version |
acceptedVersion |
|