Segment phoneme classification from speech under noisy conditions: Using amplitude-frequency modulation based two-dimensional auto-regressive features with deep neural networks

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorGowda, Dhananjaya
dc.contributor.authorRangslang, Rijuban
dc.contributor.schoolSähkötekniikan korkeakoulufi
dc.contributor.supervisorAlku, Paavo
dc.date.accessioned2016-08-26T09:09:18Z
dc.date.available2016-08-26T09:09:18Z
dc.date.issued2016-08-24
dc.description.abstractThis thesis investigates at the acoustic-phonetic level the noise robustness of features derived using the AM-FM analysis of speech signals. The analysis on the noise robustness of these features is done using various neural network models and is based on the segment classification of phonemes. This analysis is also extended and the robustness of the AM-FM based features is compared under similar noise conditions with the traditional features such as the Mel-frequency cepstral coefficients(MFCC). We begin with an important aspect of segment phoneme classification experiments which is the study of architectural and training strategies of the various neural network models used. The results of these experiments showed that there is a difference in the training pattern adopted by the various neural network models. Before over-fitting, models that undergo pre-training are seen to train for many epochs more than their opposite models that do not undergo pre-training. Taking this difference in training pattern into perspective and based on phoneme classification rate the Gaussian restricted Boltzmann machine and the single layer perceptron are selected as the best performing model of the two groups, respectively. Using the two best performing models for classification, segment phoneme classification experiments under different noise conditions are performed for both the AM-FM based and traditional features. The experiments showed that AM-FM based frequency domain linear prediction features with or without feature compensation are more robust in the classification of 61 phonemes under white noise and 0 $dB$ signal-to-noise ratio(SNR) conditions compared to the traditional features. However, when the phonemes are folded to 39 phonemes, the results are ambiguous under all noise conditions and there is no unanimous conclusion as to which feature is most robust.en
dc.format.extent64+6
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/21633
dc.identifier.urnURN:NBN:fi:aalto-201608263089
dc.language.isoenen
dc.locationP1fi
dc.programmeTLT - Master’s Programme in Communications Engineering (TS2005)fi
dc.programme.majorSignal Processingfi
dc.programme.mcodeS3013fi
dc.rights.accesslevelopenAccess
dc.subject.keywordrobust speech recognitionen
dc.subject.keywordAM-FM based featuresen
dc.subject.keywordsegment phoneme classificationen
dc.subject.keyworddeep neural networksen
dc.titleSegment phoneme classification from speech under noisy conditions: Using amplitude-frequency modulation based two-dimensional auto-regressive features with deep neural networksen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.okmG2 Pro gradu, diplomityö
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
dc.type.publicationmasterThesis
local.aalto.idinssi54301
local.aalto.openaccessyes
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
master_Rangslang_Rijuban_2016.pdf
Size:
2.35 MB
Format:
Adobe Portable Document Format