Browsing by Author "Tirronen, Saska"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Detection and Multi-class Classification of Voice Disorders from Speech Recordings(2022-01-24) Tirronen, Saska; Kadiri, Sudarsana; Perustieteiden korkeakoulu; Alku, PaavoAutomatic detection of voice disorders from speech signal has the potential to improve the reliability of medical diagnosis. Most of the earlier studies have focused on the binary detection of disorders without a discrimination between different disorder types. In this thesis, a systematic examination of different speaking tasks, audio features, and classifiers was conducted in the contexts of binary detection and multi-class classification. The goal was to find the system that achieves the best classification performance, and to study the complementary information between different speaking tasks and features. The examined speaking tasks were the sustained pronunciation of a vowel and a pronunciation of a sentence. The examined features included a set of cepstral coefficients and perturbation measures. Several commonly used classifiers were included. The primary multi-class classifier in this thesis was a hierarchical classifier, that has not been studied often in the domain. The hierarchy is a sequence of increasingly detailed classifications, which is based on a practical scenario. First, the classification was performed between healthy and disordered speech, followed by the classification between hyper functional dysphonia and vocal fold paresis. The results indicate that the proposed hierarchical system performs comparably or better than the traditionally used multi-class systems, achieving the multi-class classification accuracies of 59.00 % and 62.31 % for female and male speakers, respectively. The best accuracies in the first step of the hierarchy were 78.58 % and 79.87 % for female and male speakers, respectively. In the classification between the disorder types, the best accuracies were 66.20 % and 73.11 % for female and male speakers, respectively. In addition, this thesis reports several findings regarding the performances of different speaking tasks, features and classifiers.Item Hierarchical Multi-class Classification of Voice Disorders Using Self-supervised Models and Glottal Features(IEEE, 2023-02-06) Tirronen, Saska; Kadiri, Sudarsana; Alku, Paavo; Department of Information and Communications Engineering; Speech Communication TechnologyPrevious studies on the automatic classification of voice disorders have mostly investigated the binary classification task, which aims to distinguish pathological voice from healthy voice. Using multi-class classifiers, however, more fine-grained identification of voice disorders can be achieved, which is more helpful for clinical practitioners. Unfortunately, there is little publicly available training data for many voice disorders, which lowers the classification performance on data from unseen speakers. Earlier studies have shown that the usage of glottal source features can reduce data redundancy in detection of laryngeal voice disorders. Another approach to tackle the problems caused by scarcity of training data is to utilize deep learning models, such as wav2vec 2.0 and HuBERT, that have been pre-trained on larger databases. Since the aforementioned approaches have not been thoroughly studied in the multi-class classification of voice disorders, they will be jointly studied in the present work. In addition, we study a hierarchical classifier, which enables task-wise feature optimization and more efficient utilization of data. In this work, the aforementioned three approaches are compared with traditional mel frequency cepstral coefficient (MFCC) features and one-vs-rest and one-vs-one SVM classifiers. The results in a 3-class classification problem between healthy voice and two laryngeal disorders (hyperfunctional dysphonia and vocal fold paresis) indicate that all the studied methods outperform the baselines. The best performance was achieved by using features from wav2vec 2.0 LARGE together with hierarchical classification. The balanced classification accuracy of the system was 62.77% for male speakers, and 55.36% for female speakers, which outperformed the baseline systems by an absolute improvement of 15.76% and 6.95% for male and female speakers, respectively.Item Kognitiivisen robotiikan ja keinoälyn mahdollisuudet työelämässä(2017-09-15) Tirronen, Saska; Vartiainen, Matti; Perustieteiden korkeakoulu; Mäki, EerikkiItem Utilizing WAV2VEC in database-independent voice disorder detection(2023) Tirronen, Saska; Javanmardi, Farhad; Kodali, Manila; Kadiri, Sudarsana; Alku, Paavo; Speech Communication Technology; Department of Information and Communications EngineeringAutomatic detection of voice disorders from acoustic speech signals can help to improve reliability of medical diagnosis. However, the real-life environment in which speech signals are recorded for diagnosis can be different from the environment in which the detection system’s training data was originally collected. This mismatch between the recording conditions can decrease detection performance in practical scenarios. In this work, we propose to use a pre-trained wav2vec 2.0 model as a feature extractor to build automatic detection systems for voice disorders. The embeddings from the first layers of the context network contain information about phones, and these features are useful in voice disorder detection. We evaluate the performance of the wav2vec features in single-database and crossdatabase scenarios to study their generalizability to unseen speakers and recording conditions. The results indicate that the wav2vec features generalize better than popular spectral and cepstral baseline features.