A multilabel classification approach to predicting lines of business in B2B insurance

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Business | Master's thesis

Department

Major/Subject

Mcode

Language

en

Pages

68

Series

Abstract

This thesis examines how machine learning can predict B2B customer purchase behavior in the insurance sector. The study uses firmographic data, digital acquisition data, and website behavior data to understand customer demographics details and interaction behavior in digital journeys. The main objective of the study is to predict which Lines of Business customers will purchase after submitting digital quote requests. This is treated as a multilabel classification problem where the customer may be interested in buying more than one product. Three models evaluated in the study are Logistic Regression, Random Forest, and XGBoost. Model performance was assessed using statistical tests on cross-validated data and macro average metrics. Among them, the XGBoost model performs the best, achieving the highest F_1 score, lowest Hamming Loss, and handles the best label imbalances. To understand how the model makes predictions, SHAP analysis was used, showing website behavior and firmographics variables are the most influential features. As a practical application of the study, the final model is intended to be used in the case company to support digitally-assisted sales from quote form requests and improve B2B customer understanding.

Description

Supervisor

Malo, Pekka

Thesis advisor

Lindén, Jani

Other note

Citation