Hit or not? Predicting and explaining hit potential of songs in the Finnish music market using Spotify data

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Business | Master's thesis

Date

Major/Subject

Mcode

Language

en

Pages

61

Series

Abstract

In this thesis we wanted to solve three research questions. Firstly, we wanted to study the utilisation of machine learning in predicting a songs’ hit potential in the Finnish music market. Secondly, we wanted to find out what is the best way to explain a prediction of a black box machine learning model. Thirdly, we wanted to study if songs with high hit scores are heterogeneous. We defined hits to be songs that have existed in the yearly Finnish Top75 ranking (years 1990-2019). NonHits were sampled from Spotify API. Features used in our study were ready-aggregated Echo Nest features that are based on signal processing. Echo nest features can be fetched for any song that exists in Spotify. Based on the literature review we ended up using SHAP to explain hit potential. In addition, on the basis of conditional permutation feature importance method, we developed a Repeated Conditional Feature Importance Ranking approach for feature selection, which we utilized when developing our predictive model. The best performing predictive model for hit predicting was a CatBoost model. With unbalanced test set, we succeeded to have AUC ROC 0.93, accuracy 84%, recall 84% and precision 3%. Based on hierarchical clustering, we were able to find heterogeneity in the songs with high hit scores.

Description

Thesis advisor

Malo, Pekka

Other note

Citation