Browsing by Author "Kurimo, Mikko, Prof., Aalto University, Finland"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
Item Towards Efficient and Robust Automatic Speech Recognition: Decoding Techniques and Discriminative Training(Aalto University, 2013) Pylkkönen, Janne; Kurimo, Mikko, Prof., Aalto University, Finland; Tietojenkäsittelytieteen laitos; Department of Information and Computer Science; Perustieteiden korkeakoulu; School of Science; Oja, Erkki, Prof., Aalto University, FinlandAutomatic speech recognition has been widely studied and is already being applied in everyday use. Nevertheless, the recognition performance is still a bottleneck in many practical applications of large vocabulary continuous speech recognition. Either the recognition speed is not sufficient, or the errors in the recognition result limit the applications. This thesis studies two aspects of speech recognition, decoding and training of acoustic models, to improve speech recognition performance in different conditions. A major part of this thesis studies discriminative training of acoustic models. The emphasis is on the most popular algorithm for discriminative model estimation, the extended Baum-Welch algorithm. The thesis points out theoretical connections of the algorithm to general constrained optimization. It also proposes new control methods for the algorithm, which are shown to improve the robustness of the acoustic models in several large vocabulary speech recognition tasks. Discriminative training methods are widely applied in the state-of-the-art speech recognizers which utilize the prevalent hidden Markov models for acoustic modeling. Therefore the proposed methods have many immediate practical applications. The speech recognition system developed at the Aalto university was utilized and significantly improved during the research of this thesis. The thesis gives an overview of that system and describes the decoder of the system in more detail. In speech recognition systems, the decoder combines the information from the statistical models of acoustics and language to implement the search for the word sequence which best matches the input speech. The thesis proposes new methods for improving the speed of this search, without incurring losses to the recognition accuracy.