Deep Learning for Automatic Classification of Speech Intensity Modes
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2023-12-11
Department
Major/Subject
Machine Learning, Data Science and Artificial Intelligence
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
48
Series
Abstract
One of the fundamental phenomena in speech processing is speech intensity. As a concept, speech intensity and its regulation help capture various aspects as well as changes in the mechanisms of the human speech production system. Except for a few studies, less is known about the automatic classification of vocal intensity categories. This study investigates speech intensity category classification by applying machine learning (ML) and deep learning (DL) classifiers in conjunction with different spectral features. A data set of speech recordings from 50 speakers reciting 25 sentences in four speech intensities (soft, normal, loud and very loud) has been analysed in this study. Specifically four spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) as well as their combinations are investigated. For the classifiers, three ML classifiers: Support Vector Machine (SVM), Random Forest (RF) and Adaboost, and four DL classifiers: Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Bidirectional Long Short-Term Memory (BiLSTM) are explored. The best classification performance, accuracy of 76% is achieved with the combination of all features (dynamic MFCCs, spectrogram and mel-spectrogram) using the BiLSTM classifier.Description
Supervisor
Alku, PaavoThesis advisor
Kadiri, SudarsanaKeywords
speech communication, vocal intensity, machine learning, deep learning, sound pressure level, deep neural networks