Comparison of spectral tilt measures for sentence prominence in speech — Effects of dimensionality and adverse noise conditions
No Thumbnail Available
Access rights
openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Date
2018-10-01
Major/Subject
Mcode
Degree programme
Language
en
Pages
16
Series
Speech Communication, Volume 103, pp. 11-26
Abstract
Linguistic prominence in speech is known to correlate with the acoustic measures of energy, F0, and duration. In contrast, the role of spectral tilt in the realization of prominence has remained more inconsistent between previous empirical investigations. This may be partially due to the lack of a standard method for quantifying spectral tilt or due to difficulties in estimating the acoustical source of spectral tilt, the glottal flow, from continuous speech. These issues have rendered interpretations and comparisons between studies difficult. In addition, (i) little is known about the robustness of tilt estimators for prominence detection in the case when speech is not clean but corrupted, as in real life, by environmental noise or telephone transmission (i.e. degradation caused by bandpass filtering and quantization noise). Moreover, (ii) little attention has been paid to multidimensional representations of source spectrum that can potentially incorporate more information about the phonation style than purely scalar measures. In this work, we study spectral tilt in signaling prominence in spoken Dutch and French under different levels of additive noise, and for telephone-band coded speech, and compare several one-dimensional tilt measures that have been previously encountered in the literature as well as multidimensional tilt measures. We also compare spectral tilt measures with other standard acoustic correlates for prominence, namely, energy, F0, and duration. Our results provide further empirical support for the finding that tilt is a systematic correlate of prominence in Dutch, that the role is smaller in French, and that energy, F0, and duration appear still to be the most robust features for discriminating prominent and non-prominent words. In addition, our results show that there are notable differences between different tilt measures at different levels of noise, and that multidimensional representations for tilt improve class separability from the scalar measures.Description
Keywords
Prosody, Sentence prominence, Acoustic measures, Spectral tilt, Noise robustness, DNN
Other note
Citation
Kakouros, S, Räsänen, O & Alku, P 2018, ' Comparison of spectral tilt measures for sentence prominence in speech — Effects of dimensionality and adverse noise conditions ', Speech Communication, vol. 103, pp. 11-26 . https://doi.org/10.1016/j.specom.2018.08.002