Incorporating Global Context in Automatic Chord Transcription: A Transformer-Based Decoding Approach

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Electrical Engineering | Master's thesis

Department

Major/Subject

Mcode

Language

en

Pages

105

Series

Abstract

Automatic chord transcription (ACT) from audio stands as a challenging task in the field of music information retrieval, finding practical applications ranging from music analysis to song transcription. While early research focused on signal processing and handcrafted features for algorithmic solutions, later advancements have leveraged machine learning techniques, particularly deep learning models, to improve performance in ACT. However, the great majority of existing research primarily focused on local observations, generally neglecting the broader harmonic context that gives a meaning to chord progressions. This thesis investigates the use of attention-based models, particularly transformer encoders, to enhance ACT by modeling and integrating global harmonic information in the prediction process. Rather than training an end-to-end audio-to-transcription model, this study proposes a framework that employs a shallow transformer encoder to refine the predictions of an existing acoustic model that operates on limited temporal context. Computational inefficiencies of self-attention are mitigated by aggregating frame-level predictions over a beat synchronous representation, simultaneously enabling the model to capture metric aspects of harmony. Additionally, the proposed pipeline includes a key detection model to assist the choice of enharmonic spelling of pitches, addressing a commonly overlooked issue of ACT. The approach is evaluated using a dataset of popular songs compiled ad-hoc for this study from a commercial library of guitar chord transcriptions. Results indicate that the proposed method improves the accuracy of chord transcription, offering a lightweight alternative to existing end-to-end transformer models. This work contributes to the advancement of ACT by emphasizing global harmonic structure, while limiting the computational requirements for real-world usability.

Description

Supervisor

Zhou, Quan

Thesis advisor

Fontanelli, Daniele
Klapuri, Anssi

Other note

Citation