Deep Learning for Automatic Classification of Speech Intensity Modes

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorKadiri, Sudarsana
dc.contributor.authorAnsari, Luna
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorAlku, Paavo
dc.date.accessioned2023-12-18T20:13:13Z
dc.date.available2023-12-18T20:13:13Z
dc.date.issued2023-12-11
dc.description.abstractOne of the fundamental phenomena in speech processing is speech intensity. As a concept, speech intensity and its regulation help capture various aspects as well as changes in the mechanisms of the human speech production system. Except for a few studies, less is known about the automatic classification of vocal intensity categories. This study investigates speech intensity category classification by applying machine learning (ML) and deep learning (DL) classifiers in conjunction with different spectral features. A data set of speech recordings from 50 speakers reciting 25 sentences in four speech intensities (soft, normal, loud and very loud) has been analysed in this study. Specifically four spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) as well as their combinations are investigated. For the classifiers, three ML classifiers: Support Vector Machine (SVM), Random Forest (RF) and Adaboost, and four DL classifiers: Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Bidirectional Long Short-Term Memory (BiLSTM) are explored. The best classification performance, accuracy of 76% is achieved with the combination of all features (dynamic MFCCs, spectrogram and mel-spectrogram) using the BiLSTM classifier.en
dc.format.extent48
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/125053
dc.identifier.urnURN:NBN:fi:aalto-202312187421
dc.language.isoenen
dc.programmeMaster’s Programme in Computer, Communication and Information Sciencesfi
dc.programme.majorMachine Learning, Data Science and Artificial Intelligencefi
dc.programme.mcodeSCI3044fi
dc.subject.keywordspeech communicationen
dc.subject.keywordvocal intensityen
dc.subject.keywordmachine learningen
dc.subject.keyworddeep learningen
dc.subject.keywordsound pressure levelen
dc.subject.keyworddeep neural networksen
dc.titleDeep Learning for Automatic Classification of Speech Intensity Modesen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
master_Ansari_Luna_2023.pdf
Size:
3.03 MB
Format:
Adobe Portable Document Format