aalto1 untyped-item.component.html

Automatic classification of vocal intensity categories from amplitude-normalized speech signals by comparing acoustic features and classifier models

Loading...
Thumbnail Image

Access rights

openAccess
CC BY

Creative Commons license

Except where otherwised noted, this item's license is described as openAccess
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Major/Subject

Mcode

Degree programme

Language

en

Pages

27

Series

Speech Communication, Volume 174

Abstract

Regulation of vocal intensity is a fundamental phenomenon in speech communication. Speakers use different intensity categories (e.g., soft, normal, and loud voice) to generate different vocal emotions or to communicate in noisy conditions or over varying distances. Vocal intensity categories have been studied in fundamental research of speech, but much less is known about their automatic classification. This study investigates the classification of vocal intensity categories from speech signals in a scenario, where the original level information of speech is absent and the signal is presented on a normalized amplitude scale. Different acoustic features were studied together with machine learning (ML) and deep learning (DL) classifiers using two different labeling approaches. Speech signals recorded from 50 speakers reciting sentences in four intensity categories (soft, normal, loud, and very loud) were analyzed. Altogether 15 feature sets including different cepstral, spectral and handcrafted (eGeMAPS) features were compared. Three ML classifiers (support vector machine, random forest and AdaBoost), and four DL classifiers (deep neural network, convolutional neural network, recurrent neural network and bidirectional long short-term memory network) were compared. The best classification accuracy of 86.0% was obtained by combining the best performing cepstral and spectral features and using the bidirectional long short-term memory classifier.

Description

Other note

Citation

Kodali, M, Ansari, L, Kadiri, S, Narayanan, S & Alku, P 2025, 'Automatic classification of vocal intensity categories from amplitude-normalized speech signals by comparing acoustic features and classifier models', Speech Communication, vol. 174, 103288. https://doi.org/10.1016/j.specom.2025.103288

Endorsement

Review

Supplemented By

Referenced By