Aalto system for the 2017 Arabic multi-genre broadcast challenge

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorSmit, Peteren_US
dc.contributor.authorGangireddy, Sivaen_US
dc.contributor.authorEnarvi, Seppoen_US
dc.contributor.authorVirpioja, Samien_US
dc.contributor.authorKurimo, Mikkoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorCentre of Excellence in Computational Inference, COINen
dc.contributor.groupauthorSpeech Recognitionen
dc.date.accessioned2018-02-09T10:07:28Z
dc.date.available2018-02-09T10:07:28Z
dc.date.issued2018en_US
dc.description.abstractWe describe the speech recognition systems we have created for MGB-3, the 3rd Multi Genre Broadcast challenge, which this year consisted of a task of building a system for transcribing Egyptian Dialect Arabic speech, using a big audio corpus of primarily Modern Standard Arabic speech and only a small amount (5 hours) of Egyptian adaptation data. Our system, which was a combination of different acoustic models, language models and lexical units, achieved a Multi-Reference Word Error Rate of 29.25%, which was the lowest in the competition. Also on the old MGB-2 task, which was run again to indicate progress, we achieved the lowest error rate: 13.2%. The result is a combination of the application of state-of-the-art speech recognition methods such as simple dialect adaptation for a Time-Delay Neural Network (TDNN) acoustic model (-27% errors compared to the baseline), Recurrent Neural Network Language Model (RNNLM) rescoring (an additional -5%), and system combination with Minimum Bayes Risk (MBR) decoding (yet another -10%). We also explored the use of morph and character language models, which was particularly beneficial in providing a rich pool of systems for the MBR decoding.en
dc.description.versionPeer revieweden
dc.format.extent338-345
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationSmit, P, Gangireddy, S, Enarvi, S, Virpioja, S & Kurimo, M 2018, Aalto system for the 2017 Arabic multi-genre broadcast challenge . in Automatic Speech Recognition and Understanding (ASRU), IEEE Workshop on . IEEE, pp. 338-345, IEEE Automatic Speech Recognition and Understanding Workshop, Okinawa, Japan, 16/12/2017 . https://doi.org/10.1109/ASRU.2017.8268955en
dc.identifier.doi10.1109/ASRU.2017.8268955en_US
dc.identifier.otherPURE UUID: e4001435-8e01-43c8-9a67-603ce87e962cen_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/e4001435-8e01-43c8-9a67-603ce87e962cen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/15224073/smit2017mgb.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/30015
dc.identifier.urnURN:NBN:fi:aalto-201802091512
dc.language.isoenen
dc.relation.ispartofseriesAutomatic Speech Recognition and Understanding (ASRU), IEEE Workshop onen
dc.rightsopenAccessen
dc.titleAalto system for the 2017 Arabic multi-genre broadcast challengeen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionacceptedVersion

Files