Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorAiraksinen, Manuen_US
dc.contributor.authorJuvela, Laurien_US
dc.contributor.authorRäsänen, Okkoen_US
dc.contributor.authorAlku, Paavoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorSpeech Communication Technologyen
dc.date.accessioned2018-12-10T10:35:13Z
dc.date.available2018-12-10T10:35:13Z
dc.date.issued2018-09-02en_US
dc.description.abstractFeature extraction of speech signals is typically performed in short-time frames by assuming that the signal is stationary within each frame. For the extraction of the spectral envelope of speech, which conveys the formant frequencies produced by the resonances of the slowly varying vocal tract, an often used frame length is within 20-30 ms. However, this kind of conventional frame-based spectral analysis is oblivious of the broader temporal context of the signal and is prone to degradation by, for example, environmental noise. In this paper, we propose a new frame-based linear prediction (LP) analysis method that includes a regularization term that penalizes energy differences in consecutive frames of an all-pole spectral envelope model. This integrates the slowly varying nature of the vocal tract as a part of the analysis. Objective evaluations related to feature distortion and phonetic representational capability were performed by studying the properties of the mel-frequency cepstral coefficient (MFCC) representations computed from different spectral estimation methods under noisy conditions using the TIMIT database. The results show that the proposed time-regularized LP approach exhibits superior MFCC distortion behavior while simultaneously having the greatest average separability of different phoneme categories in comparison to the other methods.en
dc.description.versionPeer revieweden
dc.format.extent701-705
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationAiraksinen, M, Juvela, L, Räsänen, O & Alku, P 2018, Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech . in Proceedings of Interspeech . Interspeech - Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 701-705, Interspeech, Hyderabad, India, 02/09/2018 . https://doi.org/10.21437/Interspeech.2018-1230en
dc.identifier.doi10.21437/Interspeech.2018-1230en_US
dc.identifier.issn2308-457X
dc.identifier.otherPURE UUID: f24dd479-0b01-437f-bc6e-735249925b2fen_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/f24dd479-0b01-437f-bc6e-735249925b2fen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/28749604/ELEC_airaksinen_et_al_Interspeech.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/35351
dc.identifier.urnURN:NBN:fi:aalto-201812106366
dc.language.isoenen
dc.publisherInternational Speech Communication Association
dc.relation.ispartofInterspeechen
dc.relation.ispartofseriesProceedings of Interspeechen
dc.relation.ispartofseriesInterspeech - Annual Conference of the International Speech Communication Associationen
dc.rightsopenAccessen
dc.titleTime-regularized linear prediction for noise-robust extraction of the spectral envelope of speechen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files