Incorporating Global Context in Automatic Chord Transcription: A Transformer-Based Decoding Approach
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
105
Series
Abstract
Automatic chord transcription (ACT) from audio stands as a challenging task in the field of music information retrieval, finding practical applications ranging from music analysis to song transcription. While early research focused on signal processing and handcrafted features for algorithmic solutions, later advancements have leveraged machine learning techniques, particularly deep learning models, to improve performance in ACT. However, the great majority of existing research primarily focused on local observations, generally neglecting the broader harmonic context that gives a meaning to chord progressions. This thesis investigates the use of attention-based models, particularly transformer encoders, to enhance ACT by modeling and integrating global harmonic information in the prediction process. Rather than training an end-to-end audio-to-transcription model, this study proposes a framework that employs a shallow transformer encoder to refine the predictions of an existing acoustic model that operates on limited temporal context. Computational inefficiencies of self-attention are mitigated by aggregating frame-level predictions over a beat synchronous representation, simultaneously enabling the model to capture metric aspects of harmony. Additionally, the proposed pipeline includes a key detection model to assist the choice of enharmonic spelling of pitches, addressing a commonly overlooked issue of ACT. The approach is evaluated using a dataset of popular songs compiled ad-hoc for this study from a commercial library of guitar chord transcriptions. Results indicate that the proposed method improves the accuracy of chord transcription, offering a lightweight alternative to existing end-to-end transformer models. This work contributes to the advancement of ACT by emphasizing global harmonic structure, while limiting the computational requirements for real-world usability.Description
Supervisor
Zhou, QuanThesis advisor
Fontanelli, DanieleKlapuri, Anssi