Automatic classification of vocal intensity category from speech

Loading...
Thumbnail Image
Access rights
openAccess
Journal Title
Journal ISSN
Volume Title
Conference article in proceedings
This publication is imported from Aalto University research portal.
View publication in the Research portal
View/Open full text file from the Research portal
Date
2023
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
Series
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’23), Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
Abstract
Regulation of vocal intensity is a fundamental phenomenon in speech communication. Vocal intensity can be quantified using sound pressure level (SPL), which can be measured easily by recording a standard calibration signal with speech and by comparing the energy of the recorded speech signal with that of the calibration tone. Unfortunately, speech recordings are mostly conducted without the SPL calibration signal, and speech signals are saved to databases using arbitrary amplitude scales. Therefore, neither the SPL nor the intensity category (e.g. soft or loud phonation) of a saved speech signal can be determined afterwards. Even though the original level information of speech is lost when the signal is presented on arbitrary amplitude scales, the speech signal contains other acoustic cues of vocal intensity. In the current study, we study machine learning and deep learning -based methods in automatic classification of vocal intensity category when the input speech is expressed using an arbitrary amplitude scale. A new gender-balanced database consisting of speech produced in four vocal intensity categories (soft, normal, loud, and very loud) was first recorded. Support vector machine and deep neural network (DNN) models were used to develop automatic classification systems using spectrograms, mel-spectrograms, and mel-frequency cepstral coefficients as features. The DNN classifier using the mel-spectrogram showed the best classification accuracy of about 90%. The database is made publicly available at https://bit.ly/3tLPGRx
Description
Keywords
Other note
Citation
Kodali , M , Kadiri , S , Laaksonen , L & Alku , P 2023 , Automatic classification of vocal intensity category from speech . in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’23) . Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , IEEE , IEEE International Conference on Acoustics, Speech, and Signal Processing , Rhodes Island , Greece , 04/06/2023 . https://doi.org/10.1109/ICASSP49357.2023.10097160