Exploration of temporal dynamics of frequency domain linear prediction cepstral coefficients for dialect classification

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
Date
2022-01
Major/Subject
Mcode
Degree programme
Language
en
Pages
Series
Applied Acoustics, Volume 188
Abstract
Speakers exhibit dialectal traits in speech at sub-segmental, segmental, and supra-segmental levels. Any feature representation for dialect classification should appropriately represent these dialectal traits. Traditional segmental features such as mel-frequency cepstral coefficients (MFCCs) fail to represent sub-segmental and supra-segmental dialectal traits. This study proposes to use frequency domain linear prediction cepstral coefficients (FDLPCCs) for dialect classification inspired by its long temporal summarization during pole estimation. The i-vectors and x-vectors derived from both baseline (MFCCs, linear prediction cepstral coefficients (LPCCs), perceptual LPCCs (PLPCCs), RASTA filtered PLPCCs (PLPCC-R) and proposed (FDLPCC) features are used for identifying the dialects with support vector machine (SVM) and feed-forward neural network (FFNN) as classifiers. Proposed FDLPCC features have shown to perform better than baseline features such as MFCCs and PLPCC-Rs (best among LPCCs variants) by an absolute improvement of 3.4% and 3.9% (in unweighted average recall (UAR)), with i-vector + SVM system and 1.6% and 4.6% (in UAR), i-vector + FFNN system respectively. It is also found that there exists a complementary information between the proposed and baseline features. Furthermore current studies are compared with previous studies and it is found that performances of current studies are better than previous studies.
Description
Funding Information: The first author would like to thank the University Grants Commission India (Project No. 3582/(NET-NOV2017)) for supporting her PhD. The second author would like to thank the Academy of Finland (Projects 313390 and 330139) for supporting his stay in Finland as a Research Fellow. Funding Information: Rashmi Kethireddy received Bachelor of Technology degree from Kakatiya Institute of Technology and Science, Warangal, India, in 2011, with a specialization in Information Technology (IT). She then worked in IT services for a period of two years. Post that, she received Master of Technology degree from Osmania University, Hyderabad, India, in 2017, with a specialization in Computer Science Engineering. She qualified for University Grant Commission National Eligibility Test (UGC-NET) and hence was awarded with Junior Research Fellowship (JRF) and Senior Research Fellowship (SRF). She is currently a Ph.D., scholar at International Institute of Information Technology, Hyderabad (IIIT-H). Herresearch interests include speech signal processing, acoustic analysis, machine learning, speech dialectal challenges, and speech dialect identification. Funding Information: The first author would like to thank the University Grants Commission India (Project No. 3582/(NET-NOV2017)) for supporting her PhD. The second author would like to thank the Academy of Finland (Projects 313390 and 330139) for supporting his stay in Finland as a Research Fellow. Publisher Copyright: © 2021 The Author(s)
Keywords
Dialect classification, Frequency domain linear prediction, i-vectors, Long temporal variations, x-vectors
Other note
Citation
Kethireddy, R, Kadiri, S R & Gangashetty, S V 2022, ' Exploration of temporal dynamics of frequency domain linear prediction cepstral coefficients for dialect classification ', Applied Acoustics, vol. 188, 108553 . https://doi.org/10.1016/j.apacoust.2021.108553