Deep Learning for Automatic Classification of Speech Intensity Modes

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2023-12-11

Department

Major/Subject

Machine Learning, Data Science and Artificial Intelligence

Mcode

SCI3044

Degree programme

Master’s Programme in Computer, Communication and Information Sciences

Language

en

Pages

48

Series

Abstract

One of the fundamental phenomena in speech processing is speech intensity. As a concept, speech intensity and its regulation help capture various aspects as well as changes in the mechanisms of the human speech production system. Except for a few studies, less is known about the automatic classification of vocal intensity categories. This study investigates speech intensity category classification by applying machine learning (ML) and deep learning (DL) classifiers in conjunction with different spectral features. A data set of speech recordings from 50 speakers reciting 25 sentences in four speech intensities (soft, normal, loud and very loud) has been analysed in this study. Specifically four spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) as well as their combinations are investigated. For the classifiers, three ML classifiers: Support Vector Machine (SVM), Random Forest (RF) and Adaboost, and four DL classifiers: Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Bidirectional Long Short-Term Memory (BiLSTM) are explored. The best classification performance, accuracy of 76% is achieved with the combination of all features (dynamic MFCCs, spectrogram and mel-spectrogram) using the BiLSTM classifier.

Description

Supervisor

Alku, Paavo

Thesis advisor

Kadiri, Sudarsana

Keywords

speech communication, vocal intensity, machine learning, deep learning, sound pressure level, deep neural networks

Other note

Citation