Exploration of temporal dynamics of frequency domain linear prediction cepstral coefficients for dialect classification

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorKethireddy, Rashmien_US
dc.contributor.authorKadiri, Sudarsana Reddyen_US
dc.contributor.authorGangashetty, Suryakanth V.en_US
dc.contributor.departmentDept Signal Process and Acousten
dc.contributor.groupauthorSpeech Communication Technologyen
dc.contributor.organizationInternational Institute of Information Technology Hyderabaden_US
dc.contributor.organizationKoneru Lakshmaiah Education Foundationen_US
dc.date.accessioned2021-12-31T13:56:59Z
dc.date.available2021-12-31T13:56:59Z
dc.date.issued2022-01en_US
dc.descriptionFunding Information: The first author would like to thank the University Grants Commission India (Project No. 3582/(NET-NOV2017)) for supporting her PhD. The second author would like to thank the Academy of Finland (Projects 313390 and 330139) for supporting his stay in Finland as a Research Fellow. Funding Information: Rashmi Kethireddy received Bachelor of Technology degree from Kakatiya Institute of Technology and Science, Warangal, India, in 2011, with a specialization in Information Technology (IT). She then worked in IT services for a period of two years. Post that, she received Master of Technology degree from Osmania University, Hyderabad, India, in 2017, with a specialization in Computer Science Engineering. She qualified for University Grant Commission National Eligibility Test (UGC-NET) and hence was awarded with Junior Research Fellowship (JRF) and Senior Research Fellowship (SRF). She is currently a Ph.D., scholar at International Institute of Information Technology, Hyderabad (IIIT-H). Herresearch interests include speech signal processing, acoustic analysis, machine learning, speech dialectal challenges, and speech dialect identification. Funding Information: The first author would like to thank the University Grants Commission India (Project No. 3582/(NET-NOV2017)) for supporting her PhD. The second author would like to thank the Academy of Finland (Projects 313390 and 330139) for supporting his stay in Finland as a Research Fellow. Publisher Copyright: © 2021 The Author(s)
dc.description.abstractSpeakers exhibit dialectal traits in speech at sub-segmental, segmental, and supra-segmental levels. Any feature representation for dialect classification should appropriately represent these dialectal traits. Traditional segmental features such as mel-frequency cepstral coefficients (MFCCs) fail to represent sub-segmental and supra-segmental dialectal traits. This study proposes to use frequency domain linear prediction cepstral coefficients (FDLPCCs) for dialect classification inspired by its long temporal summarization during pole estimation. The i-vectors and x-vectors derived from both baseline (MFCCs, linear prediction cepstral coefficients (LPCCs), perceptual LPCCs (PLPCCs), RASTA filtered PLPCCs (PLPCC-R) and proposed (FDLPCC) features are used for identifying the dialects with support vector machine (SVM) and feed-forward neural network (FFNN) as classifiers. Proposed FDLPCC features have shown to perform better than baseline features such as MFCCs and PLPCC-Rs (best among LPCCs variants) by an absolute improvement of 3.4% and 3.9% (in unweighted average recall (UAR)), with i-vector + SVM system and 1.6% and 4.6% (in UAR), i-vector + FFNN system respectively. It is also found that there exists a complementary information between the proposed and baseline features. Furthermore current studies are compared with previous studies and it is found that performances of current studies are better than previous studies.en
dc.description.versionPeer revieweden
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationKethireddy, R, Kadiri, S R & Gangashetty, S V 2022, ' Exploration of temporal dynamics of frequency domain linear prediction cepstral coefficients for dialect classification ', Applied Acoustics, vol. 188, 108553 . https://doi.org/10.1016/j.apacoust.2021.108553en
dc.identifier.doi10.1016/j.apacoust.2021.108553en_US
dc.identifier.issn0003-682X
dc.identifier.otherPURE UUID: 51b053d2-630d-42a9-8445-2bbb97b44a37en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/51b053d2-630d-42a9-8445-2bbb97b44a37en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85121220456&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/77296551/Kethireddy_Exploration_of_temporal_dynamics.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/111979
dc.identifier.urnURN:NBN:fi:aalto-2021123111119
dc.language.isoenen
dc.publisherElsevier Limited
dc.relation.ispartofseriesApplied Acousticsen
dc.relation.ispartofseriesVolume 188en
dc.rightsopenAccessen
dc.subject.keywordDialect classificationen_US
dc.subject.keywordFrequency domain linear predictionen_US
dc.subject.keywordi-vectorsen_US
dc.subject.keywordLong temporal variationsen_US
dc.subject.keywordx-vectorsen_US
dc.titleExploration of temporal dynamics of frequency domain linear prediction cepstral coefficients for dialect classificationen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion
Files