Learning Centre

Aalto system for the 2017 Arabic multi-genre broadcast challenge

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Smit, Peter
dc.contributor.author Gangireddy, Siva
dc.contributor.author Enarvi, Seppo
dc.contributor.author Virpioja, Sami
dc.contributor.author Kurimo, Mikko
dc.date.accessioned 2018-02-09T10:07:28Z
dc.date.available 2018-02-09T10:07:28Z
dc.date.issued 2018
dc.identifier.citation Smit , P , Gangireddy , S , Enarvi , S , Virpioja , S & Kurimo , M 2018 , Aalto system for the 2017 Arabic multi-genre broadcast challenge . in Automatic Speech Recognition and Understanding (ASRU), IEEE Workshop on . IEEE , pp. 338-345 , IEEE Automatic Speech Recognition and Understanding Workshop , Okinawa , Japan , 16/12/2017 . https://doi.org/10.1109/ASRU.2017.8268955 en
dc.identifier.other PURE UUID: e4001435-8e01-43c8-9a67-603ce87e962c
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/e4001435-8e01-43c8-9a67-603ce87e962c
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/15224073/smit2017mgb.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/30015
dc.description.abstract We describe the speech recognition systems we have created for MGB-3, the 3rd Multi Genre Broadcast challenge, which this year consisted of a task of building a system for transcribing Egyptian Dialect Arabic speech, using a big audio corpus of primarily Modern Standard Arabic speech and only a small amount (5 hours) of Egyptian adaptation data. Our system, which was a combination of different acoustic models, language models and lexical units, achieved a Multi-Reference Word Error Rate of 29.25%, which was the lowest in the competition. Also on the old MGB-2 task, which was run again to indicate progress, we achieved the lowest error rate: 13.2%. The result is a combination of the application of state-of-the-art speech recognition methods such as simple dialect adaptation for a Time-Delay Neural Network (TDNN) acoustic model (-27% errors compared to the baseline), Recurrent Neural Network Language Model (RNNLM) rescoring (an additional -5%), and system combination with Minimum Bayes Risk (MBR) decoding (yet another -10%). We also explored the use of morph and character language models, which was particularly beneficial in providing a rich pool of systems for the MBR decoding. en
dc.format.extent 338-345
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartofseries Automatic Speech Recognition and Understanding (ASRU), IEEE Workshop on en
dc.rights openAccess en
dc.title Aalto system for the 2017 Arabic multi-genre broadcast challenge en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Dept Signal Process and Acoust
dc.identifier.urn URN:NBN:fi:aalto-201802091512
dc.identifier.doi 10.1109/ASRU.2017.8268955
dc.type.version acceptedVersion


Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

Statistics