aalto1 untyped-item.component.html
A multilabel classification approach to predicting lines of business in B2B insurance
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Business |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Department
Major/Subject
Mcode
Language
en
Pages
68
Series
Abstract
This thesis examines how machine learning can predict B2B customer purchase behavior in the insurance sector. The study uses firmographic data, digital acquisition data, and website behavior data to understand customer demographics details and interaction behavior in digital journeys. The main objective of the study is to predict which Lines of Business customers will purchase after submitting digital quote requests. This is treated as a multilabel classification problem where the customer may be interested in buying more than one product. Three models evaluated in the study are Logistic Regression, Random Forest, and XGBoost. Model performance was assessed using statistical tests on cross-validated data and macro average metrics. Among them, the XGBoost model performs the best, achieving the highest F_1 score, lowest Hamming Loss, and handles the best label imbalances. To understand how the model makes predictions, SHAP analysis was used, showing website behavior and firmographics variables are the most influential features. As a practical application of the study, the final model is intended to be used in the case company to support digitally-assisted sales from quote form requests and improve B2B customer understanding.