Segment phoneme classification from speech under noisy conditions: Using amplitude-frequency modulation based two-dimensional auto-regressive features with deep neural networks
Sähkötekniikan korkeakoulu | Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
TLT - Master’s Programme in Communications Engineering (TS2005)
AbstractThis thesis investigates at the acoustic-phonetic level the noise robustness of features derived using the AM-FM analysis of speech signals. The analysis on the noise robustness of these features is done using various neural network models and is based on the segment classification of phonemes. This analysis is also extended and the robustness of the AM-FM based features is compared under similar noise conditions with the traditional features such as the Mel-frequency cepstral coefficients(MFCC). We begin with an important aspect of segment phoneme classification experiments which is the study of architectural and training strategies of the various neural network models used. The results of these experiments showed that there is a difference in the training pattern adopted by the various neural network models. Before over-fitting, models that undergo pre-training are seen to train for many epochs more than their opposite models that do not undergo pre-training. Taking this difference in training pattern into perspective and based on phoneme classification rate the Gaussian restricted Boltzmann machine and the single layer perceptron are selected as the best performing model of the two groups, respectively. Using the two best performing models for classification, segment phoneme classification experiments under different noise conditions are performed for both the AM-FM based and traditional features. The experiments showed that AM-FM based frequency domain linear prediction features with or without feature compensation are more robust in the classification of 61 phonemes under white noise and 0 $dB$ signal-to-noise ratio(SNR) conditions compared to the traditional features. However, when the phonemes are folded to 39 phonemes, the results are ambiguous under all noise conditions and there is no unanimous conclusion as to which feature is most robust.
Thesis advisorGowda, Dhananjaya
robust speech recognition, AM-FM based features, segment phoneme classification, deep neural networks