Combined approaches to fraud detection. A case study in Ecommerce Industry

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
School of Business | Master's thesis
Degree programme
Information and Service Management (ISM)
Electronic commerce industry has emerged just in decades, yet prospered tremendously and totally shaped the global economy with various newly introduced digital products and services far from our ever imaginations. Along with online shopping progresses, fraudulent activities have also evolved continually and hugely damaged both profit of businesses and money of customers. Thus, fraud detection has continuously been a hot topic with the exponentially growing concerns from normal practitioners to specialized researchers. The fight against frauds requires huge efforts with time and associated costs since several main challenges significantly hindered the concerned researches as well as the prediction performances of modeling techniques. These constraints involve the highly imbalanced issue in datasets, the concept drift in customers and fraudsters, the delayed supervised information and other real-world problems. Following that, the most state-of-art solutions dealt with these challenges are demonstrated systematically in three levels, involving data-level approaches, algorithm-level techniques and assessment-level methods. Based on the above solutions, the thesis conducts the empirical study on an ecommerce dataset to examine the classification performances of different combined techniques in capturing efficiently online frauds. Accordingly, a methodology framework is proposed, employing four distinctive resampling methods, involving ROS, RUS, SMOTE and SmoteTomek, coupled with three learning algorithms Logistic Regression, Random Forest and eXtreme Gradient Boosting for the model experimentation part. The research has asserted that the combined approaches of different techniques would tackle well the online fraud detection, promisingly relieving many pressures for online merchants. Additionally, the findings have also reaffirmed the validity of resampling techniques and provided an insight into appropriate assessment metrics employed under the imbalanced scenarios. On the one hand, resampling methods show a clear impact on the classification performance under highly imbalanced scenarios. On the other hand, while different modeling techniques do not outperform discriminatorily each other, they still show their advantages and prospects for further research. Lastly, the empirical results have indicated certain characteristics of online fraud behaviors usefully for further reference in online fraud detection domain.
Thesis advisor
Seppälä, Tomi
fraud detection, online fraud, imbalanced learning, resampling techniques, concept drift, ensemble methods, assessment metrics, ecommerce industry, e-commerce
Other note