aalto1 untyped-item.component.html
Vocal effort compensation for MFCC feature extraction in a shouted versus normal speaker recognition task
Loading...
Access rights
openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Jokinen, Emma
Saeidi, Rahim
Kinnunen, Tomi
Alku, Paavo
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
11
Series
Computer Speech and Language, Volume 53, pp. 1-11
Abstract
In shouting, speakers use increased vocal effort to convey spoken messages over distance or above environmental noise. For automatic speaker recognition systems trained using normal speech, shouting causes a severe vocal effort mismatch between the enrollment and test hence reducing the recognition performance. In this study, two compensation methods are proposed to tackle the mismatch in a shouted versus normal speaker recognition task. These techniques are applied in the feature extraction stage of a speaker recognition system to modify the spectral envelopes of shouts to be closer to those in normal speech. The techniques modify the all-pole power spectrum of the MFCC computation chain with shouted-to-normal compensation filtering that is obtained using a GMM-based statistical mapping. In an evaluation using the state-of-the-art i-vector based recognition system, the proposed techniques provided considerable improvements in identification rates compared to the case when shouted speech spectra were not processed.
Description
Other note
Citation
Jokinen, E, Saeidi, R, Kinnunen, T & Alku, P 2019, 'Vocal effort compensation for MFCC feature extraction in a shouted versus normal speaker recognition task', Computer Speech and Language, vol. 53, pp. 1-11. https://doi.org/10.1016/j.csl.2018.06.002