At times of great investments in enhancing the customer experience of both products and services, ignoring the opportunity of tailoring the development of digital interactions on a customer basis is not anymore an option for prospering in highly competitive environments. Regardless of the availability of explicit customer data, detecting and considering the characteristics of end-users are equally fundamental to achieve an acceptable level of personalization of the touchpoints. During the information age, data-driven solutions play a crucial role in this fast run.
This research has been carried out within Sanoma Media Finland Oy. The objective of the study is to explore a set of user profiling techniques, based on machine learning models, which are able to learn the segmentation of the user base on a number of different criteria.
The methods that have been implemented use different architectures, data sources, and user representations. The latter include pure interaction-based methods, such as Item2Vec, as well as combinations of semantic representations of articles content, computed through language models, such as FinBERT. All of the representations have been processed in both single task and multitask learning setups, and their performance is generally at least comparable to the existing baseline of the company.
Eventually, evidence showed that a combination of multi-task learning architecture, informative user and article representations, and a fairly large amount of data, is the key that determines the success of some of the proposed methods in outperforming the baseline and reducing the resources needed.