Wavelet scattering network features for intensity category classification and prediction of SPL from speech
Loading...
Access rights
openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
Series
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’25), Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
Abstract
Speakers change vocal intensity in daily life to communicate over long distances and to express vocal emotions. Humans produce speech using different intensity categories (e.g. soft, normal and loud voice) and they can regulate intensity across a wide sound pressure level (SPL) range. Knowing the intensity category or the SPL of speech is beneficial in speech-based biomarking of health. Recent studies have explored the vocal intensity category classification and prediction of SPL from speech, which has been recorded without SPL calibration information and is presented on an arbitrary amplitude scale. Using speech signals in such scenario, this study investigates the wavelet scattering network (WSN) features in two tasks: (1) classification of speech into four intensity categories (soft, normal, loud, very loud) (multi-class classification task) and (2) prediction of SPL (regression task). In the former task, the WSN features showed absolute accuracy improvements of 4-14% compared to reference features. For the latter task, the WSN features improved the prediction of SPL by an average of 1-2 dB compared to the reference features.Description
Other note
Citation
Kodali, M, Kadiri, S, Narayanan, S & Alku, P 2025, Wavelet scattering network features for intensity category classification and prediction of SPL from speech. in B D Rao, I Trancoso, G Sharma & N B Mehta (eds), Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’25). Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, IEEE International Conference on Acoustics, Speech, and Signal Processing, Hyderabad, India, 06/04/2025. https://doi.org/10.1109/ICASSP49660.2025.10888824