Exploration of temporal dynamics of frequency domain linear prediction cepstral coefficients for dialect classification

Loading...
Thumbnail Image

Access rights

openAccess

URL

Journal Title

Journal ISSN

Volume Title

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Date

2022-01

Major/Subject

Mcode

Degree programme

Language

en

Pages

Series

Applied Acoustics, Volume 188

Abstract

Speakers exhibit dialectal traits in speech at sub-segmental, segmental, and supra-segmental levels. Any feature representation for dialect classification should appropriately represent these dialectal traits. Traditional segmental features such as mel-frequency cepstral coefficients (MFCCs) fail to represent sub-segmental and supra-segmental dialectal traits. This study proposes to use frequency domain linear prediction cepstral coefficients (FDLPCCs) for dialect classification inspired by its long temporal summarization during pole estimation. The i-vectors and x-vectors derived from both baseline (MFCCs, linear prediction cepstral coefficients (LPCCs), perceptual LPCCs (PLPCCs), RASTA filtered PLPCCs (PLPCC-R) and proposed (FDLPCC) features are used for identifying the dialects with support vector machine (SVM) and feed-forward neural network (FFNN) as classifiers. Proposed FDLPCC features have shown to perform better than baseline features such as MFCCs and PLPCC-Rs (best among LPCCs variants) by an absolute improvement of 3.4% and 3.9% (in unweighted average recall (UAR)), with i-vector + SVM system and 1.6% and 4.6% (in UAR), i-vector + FFNN system respectively. It is also found that there exists a complementary information between the proposed and baseline features. Furthermore current studies are compared with previous studies and it is found that performances of current studies are better than previous studies.

Description

Funding Information: The first author would like to thank the University Grants Commission India (Project No. 3582/(NET-NOV2017)) for supporting her PhD. The second author would like to thank the Academy of Finland (Projects 313390 and 330139) for supporting his stay in Finland as a Research Fellow. Funding Information: Rashmi Kethireddy received Bachelor of Technology degree from Kakatiya Institute of Technology and Science, Warangal, India, in 2011, with a specialization in Information Technology (IT). She then worked in IT services for a period of two years. Post that, she received Master of Technology degree from Osmania University, Hyderabad, India, in 2017, with a specialization in Computer Science Engineering. She qualified for University Grant Commission National Eligibility Test (UGC-NET) and hence was awarded with Junior Research Fellowship (JRF) and Senior Research Fellowship (SRF). She is currently a Ph.D., scholar at International Institute of Information Technology, Hyderabad (IIIT-H). Herresearch interests include speech signal processing, acoustic analysis, machine learning, speech dialectal challenges, and speech dialect identification. Funding Information: The first author would like to thank the University Grants Commission India (Project No. 3582/(NET-NOV2017)) for supporting her PhD. The second author would like to thank the Academy of Finland (Projects 313390 and 330139) for supporting his stay in Finland as a Research Fellow. Publisher Copyright: © 2021 The Author(s)

Keywords

Dialect classification, Frequency domain linear prediction, i-vectors, Long temporal variations, x-vectors

Other note

Citation

Kethireddy, R, Kadiri, S R & Gangashetty, S V 2022, ' Exploration of temporal dynamics of frequency domain linear prediction cepstral coefficients for dialect classification ', Applied Acoustics, vol. 188, 108553 . https://doi.org/10.1016/j.apacoust.2021.108553