A machine learning-based singing timbre evaluation system using a small amount of data

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

Sähkötekniikan korkeakoulu | Master's thesis

Date

2023-06-12

Department

Major/Subject

Speech and Language Technology

Mcode

ELEC3068

Degree programme

CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)

Language

en

Pages

43+7

Series

Abstract

In singing education, a person's timbre is crucial for providing feedback and improving vocal performance. Traditional methods of evaluating singing timbre require face-to-face evaluation by a professional, which is time costly. In addition, the evaluation of singing timber contains multiple vague and cognitive descriptions, making it challenging to comprehend and improve. Therefore, this thesis developed a machine learning-based singing timbre evaluation system using a small amount of data that can provide a valuable tool for both students and educators. This master’s thesis aimed to develop the use of a linear regression machine learning system to estimate a person's singing timbre with a small amount of data. In addition, it provided five perceptual dimensions, including twang-loft, front-back, clearness, cleanliness, and nasal voice that covered all aspects of singing timbre as a future standard. 165 audio files recorded from male and female volunteers were trained and tested. The singers comprised 18 sopranos, 91 altos, 26 tenors, and 30 basses. Each group of singers sang the same traditional Chinese folk song Songbie in different keys, depending on the singer group he/she belongs to. The average accuracy of all dimensions of the four volunteer groups: Soprano, Alto, Tenor, and Bass are 63.7%, 71.5%, 74.5%, and 82.8%. In general, this indicates that male data could be estimated more accurately than female data. Moreover, cleanliness had the best results (82.4%) among all dimensions. All these results considered different potential affecting factors such as the length of clip, the use of filter, and audio features. Overall, preliminary results show good performance in estimating timbre from singing voices using a linear regression machine learning system with a small amount of data.

Description

Supervisor

Alku, Paavo

Thesis advisor

Alku, Paavo

Keywords

singing, timbre, voice, evaluation, machine, learning

Other note

Citation