Browsing by Author "Malo, Pekka"
Now showing 1 - 20 of 108
Results Per Page
Sort Options
Item Analysis of service culture and service recovery at Helsinki municipal health care: Text analytics on customer feedback-response dialogues(2022) Laaksonen, Suvi; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessItem Analyzing Commodity Codes of Import Data(2020) Anttila, Jasmina; Malo, Pekka; Erästö, Panu; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessItem Application of hierarchical agglomerative and k-means clustering on product sales data for forecasting(2022) Jain, Roma; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessThis master’s thesis applies two clustering methods to item or product sales data of a grocery retailer to potentially achieve greater business value in terms of improved quality of the demand forecasts. The main theoretical contribution of this thesis is to add to the existing information on time series clustering, with a focus on grocery retail product sales data. The practical contribution of this study is to recommend a suitable clustering method, based on the empirical analysis performed on retailer’s data, which will potentially improve the forecast accuracy of products. Weekly sales data for the years 2018 and 2019 for 3379 products are collected from one of the stores of a major European grocery retail chain to apply Hierarchical Agglomerative Clustering (HAC) (Ward,1963) and K-means (MacQueen, 1967) clustering methods on their sales time series. After clustering, sample time series from the clusters are visually analyzed for similarities in their behavior. To find empirical evidence of the effect of clustering, forecasts are computed for clustered time series for the test period of 9 weeks that is January and February 2020. These aggregated forecasts are then disaggregated to item level and forecast accuracies are calculated for each time series. Recommendations on suitable clustering methods are provided based on a comparison between the forecast accuracies achieved by using each of the clustering methods and forecast accuracies achieved by the default forecasting approach currently used by the retailer’s forecast support system (FSS). The results of this study suggest that clustering does not identify clear groups of time series similar in behavior for retailer’s data. The default forecasting approach performs better than HAC by providing better forecast accuracies for 1847 items out of the total of 3379 items. 1691 items, which is marginally more than half of the total, get better forecast accuracy with the default forecasting approach than K-means for 5 or more weeks in the test period. In conclusion, clustering methods randomly assign time series to clusters and there is no clear evidence of logical grouping of the time series. Therefore, this study recommends against using clustering as a means of improving forecasts, rather, suggests exploring more advanced forecasting techniques as an avenue for future research.Item Application of machine learning to predict occurrence of accidents at Finnish construction sites(2022) Häkkä, Ina; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessConstruction industry is one of the most hazardous industries due to high frequency of accidents and several safety hazards on the construction site. Frequent safety hazards are caused by the unique, complex and dynamic nature of construction projects, including constantly changing physical environment on construction sites, physically demanding working conditions and continuously evolving construction technology that may relocate or reform the possibility of error rather than downright eliminating it. Efforts to decrease accident frequency and mitigate safety risks on construction sites require enhanced focus on safety culture, practices and equipment as well as proactive safety management while assessing the effects of individual, managerial and environmental factors on safety performance. The use of Artificial Intelligence (AI) to develop diagnostic and predictive models is deemed the next revolutionary advancement for occupational safety in the construction industry. This study is conducted as a case study on one of Finland’s most prominent construction companies, aiming at versatilely utilizing its data to develop a predictive Machine Learning (ML) model capable of predicting the occurrence of accidents. The data includes environmental, managerial and project-specific factors from over 600 construction projects in Finland. Several studies aiming at construction accident prediction with Machine Learning exist, however, majority of the previous studies have lacked access to data of non-accident cases to complement records of accidents. Thus, these studies have focused on, for example, predicting accident severity rather than occurrence of accidents, limiting their contribution to proactive safety-risk management and accident prevention. One of the greatest strengths and contributions of this thesis is its ability to combine data of accident and non-accident cases, thus providing Machine Learning models examples that enable predictive classification of incidents on construction sites. The findings show that while uninterpretable models may have significant predictive skill for this research problem as the KNN model provided the most accurate predictions, interpretable models are desperately needed to increase awareness of factors that are most influential in predicting the occurrence of accidents. While obtaining mostly satisfactory results, none of the models was able to provide excellent results with selected predictor variables, which indicates that more extensive experimentation is required regarding feature selection. Furthermore, formulating accident occurrence prediction as a multiclass classification problem may improve accuracy and applicability of the model.Item Applying machine learning models to identify potential in micro and small segment enterprises in Finland(2022) Pajarinen, Konsta; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessItem Applying SHAP to explain packet discards in LTE network(2020) Sood, Nitesh; Viitasaari, Lauri; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessItem Are all issues created equal?(2019) Lääkkölä, Roope; Wallenius, Jyrki; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessItem Assessing economic benefits and costs of carbon sinks in boreal rotation forestry(Elsevier, 2024-09) Parkatti, Vesa-Pekka; Suominen, Antti; Tahvonen, Olli; Malo, Pekka; Department of Information and Service Management; School Common, BIZ; Department of Information and Service Management; University of HelsinkiWe study the optimal enhancement of forest carbon sinks via forest management changes in boreal even-aged Scots pine (Pinus sylvestris) forests. The economic–ecological stand-level optimization model integrates a statistical–empirical individual-tree growth model with a comprehensive model for carbon in living trees, wood products, and soil. We use reinforcement learning to optimize for rotation length, thinning timing, and thinning intensity. Carbon dioxide (CO2) pricing has a notable effect on the optimal solutions and on the corresponding CO2 flows and carbon stocks. Under a 1% interest rate, increasing the CO2 price from zero to €100 increases the discounted carbon sink by 83% and the total steady-state carbon stock by 122%. Increasing the CO2 price decreases the economic significance of thinning, and, with a high enough CO2 price, the stand is harvested only with clear-cuts, which are further postponed by CO2 price increases. Decreasing stand volume or total C stock cannot be taken as a sign of an overly mature stand. Depending on the CO2 price and interest rate, the economic benefit–cost ratio of additional carbon sinks via forest management changes varies between 1.9 and 3.7. Overall, the results reveal a high potential to increase the role of boreal managed forests in climate change mitigation.Item Association of phone app usage with personalities(2017-12-11) Li, Na; Liu, Yong; Insinööritieteiden korkeakoulu; Malo, PekkaApps are an indispensable part of our life, especially at the current time when the smartphone has become ubiquitous. Research on mobile behaviour has shown that 85% of our time is spent using apps on smartphones, offering the opportunity for massive unobtrusive data collection. Recent research suggests that app usage can be affected by users’ personality. However, little research has explored the relationship between personality and app usage. The objective of this thesis is to determine the association between personality and app usage, as well to observe the relationship between app usage and social capital. In order to accomplish this, the personality traits and social capital of 29 consenting Chinese students were identified using a self-evaluated questionnaire survey, and information on app usage was collected using the AWARE app. The data were analysed using descriptive statistics and the Partial Least Square Path Model (PLS-PM). Correlation analysis of the descriptive statistics indicated some evidence for a relationship be-tween personality traits and specific app usage. The result from PLS-PM showed several associa-tions between personality and app usage. For example, the ‘neurotic’ personality, who worries about things, tends to use weather apps more frequently. The ‘openness to experience’ personality, who likes to reflect and play with ideas, tends to spend more time on shopping apps. The person with low conscientiousness score, which indicates less organized skill, tends to use more time on entertainment, music and setting apps. Surprisingly, no strong associations, however, was found be-tween extrovert personality and social networking apps in this study. A weak association was also discovered between social capital and app usage, such as longer time usage on Wechat (a popular communication app in China) may be linked to high scores in maintained social capital. More frequent usage on Zhifubao (a popular mobile pay app in China) may improve online bridging social capital. However, results from PLS-PM needs careful interpretation because of the small sample size and sample collection bias. The result of the association between personality and app usage will not only help app designers to develop optimal app user interfaces for target users, but also enhance the recommendations to more accurately and efficiently optimize the relevance of the suggestions offered to users.Item AutoML: Comparing performance with human-designed solutions in Kaggle competitions(2024) Holopainen, Aleksi; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessThe adoption of Machine Learning (ML) has been a vital point of interest for organizations globally, but its adoption has been slowed down by high costs related to expert personnel and computational power. However, as high computational power has become cheaper and more available, a solution is emerging that solves the need for technical skills required of ML experts: AutoML. They are tools that aim to automate the ML pipeline in a way that domain experts can also start to develop their own predictive models thus further democratizing ML. This paper surveys different techniques used to automate the pipeline and compares results gained by using a newly released AutoML tool against human-designed solutions by utilizing Kaggle competitions. The results are also benchmarked against other frameworks based on the study by Erickson et al. (2020). Furthermore, it proposes a theoretical framework that can be used to assess an ML task’s difficultness while testing AutoML tools. The research consisted of taking part in 10 relatively recent competitions that had a large number of submissions and included binary classification, regression, and multiclass classification ML tasks. Based on the results, the utilized AutoML tool was on average better than a third of the human competitors. The research implicated that having a larger dataset, relatively more numerical features, and the task being binary classification had a negative impact on the framework’s performance. Compared to the other 6 frameworks, it had below average results. To summarise, using only AutoML tools to create a model is fast but it comes at a notable cost to its performance.Item A business news event detection algorithm with an application to the forest industry(2021) Nguyen, Khang; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessThe forest industry is an important industry that generates billions of euros and employs millions of workers. However, it lacks a particular type of business intelligence enjoyed by other industries, namely the extraction of knowledge from online articles. Despite many studies on this subject, no relevant study exists for the forestry industry due to the lack of a usable dataset. This thesis proposes an event detection algorithm for online articles that can be applied to both general business news and forest industry news. To that end, three research questions are examined. Firstly, the creation of a robust dataset that is inclusive of forest industry news. Secondly, establishing the feasibility of building an event detection algorithm to recognize and classify both general business and forest industry news. Lastly, proposing an optimally performing model for the said algorithm. To build an event detection algorithm, machine learning methods, particularly natural language processing, are used. The proposed solution comprises contextualized word embeddings and a classification model. Those word embeddings are created with BERT, a state-of-the-art model for text handling from Google. For model performance tuning, one approach is implemented to address the class imbalance problem. The evaluation shows that the proposed solution delivers a strong result, which indicates promising practical implementations in the forest industry. Companies in the industry should be potentially able to enjoy an aspect of business intelligence that has been employed in other industries. This thesis is the first to empirically examine the links between online news articles, events detection, and the forest industry. The thesis’s contributions are twofold. First, the thesis provides an annotated dataset for use with different machine learning methods. Secondly, it complements literature on the feasibility of an event detection algorithm applicable to both business and forestry industry news.Item A case study of application of Machine Learning in dissolving pulp production process(2019) Nguyen, Minh Quan; Malo, Pekka; Kuosmanen , Timo; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessItem Comparative Study of Methods for Bankruptcy Prediction: Empirical Evidence from Finland(2020) Heikkinen, Teemu; Rantala, Kalle; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessIn business analytics and the financial world, bankruptcy prediction has been an interesting and widely researched topic over the past few decades. The accuracy of bankruptcy predictions play a crucial role for financiers, business owners, shareholders, and supply chain managers alike. With much on the line, being able to predict bankruptcies is the basis for timely and well-founded strategic business decisions. Academic research has developed bankruptcy prediction models, belonging into two major categories: statistical and machine learning models. Statistical models include logistic regression and multiple discriminant analysis, and machine learning models include neural networks, decision trees, and support vector machines, to name a few. While the research on bankruptcy prediction has yielded numerous different prediction models, there are no clear winners. Each of the models has its pros and cons, and academic research has reached contradicting results while comparing the same models with each other. This thesis aims to further compare the prediction accuracy of the most popular bankruptcy prediction models with Finnish private limited manufacturing company data. This thesis compares the following models: Altman Z-score, Logistic Regression, two Decision Trees (C5 and CART), Neural Networks, and Support Vector Machines (SVM). The results show that with the sample data chosen, SVM is the best performing bankruptcy prediction method; measured both in terms of overall prediction accuracy and F-measure. SVM provides the most accurate predictions both in short-term and long-term predictions. Logistic regression provides the second most accuracy, falling just behind SVM by a small margin. It is worthwhile mentioning, however, that the differences in every models’ prediction accuracy and F-measures are relatively small during the first year prior to bankruptcy. SVM and logistic regression seem to sustain their prediction performance better than the other models when the prediction horizon gets longer. Yet, by stretching the prediction horizon to five years or more, it seems that no model provides results, which would be more accurate than flipping a coin. The study contributes toward a more thorough understanding of the advantages and disadvantages of the bankruptcy prediction models, and delivers insights on how the various models perform compared to each other with the Finnish private manufacturing company data.Item Comparative Study of Methods for Bankruptcy Prediction: Empirical Evidence from Finland(2020) Rantala, Kalle; Heikkinen, Teemu; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessIn business analytics and the financial world, bankruptcy prediction has been an interesting and widely researched topic over the past few decades. The accuracy of bankruptcy predictions play a crucial role for financiers, business owners, shareholders, and supply chain managers alike. With much on the line, being able to predict bankruptcies is the basis for timely and well-founded strategic business decisions. Academic research has developed bankruptcy prediction models, belonging into two major categories: statistical and machine learning models. Statistical models include logistic regression and multiple discriminant analysis, and machine learning models include neural networks, decision trees, and support vector machines, to name a few. While the research on bankruptcy prediction has yielded numerous different prediction models, there are no clear winners. Each of the models has its pros and cons, and academic research has reached contradicting results while comparing the same models with each other. This thesis aims to further compare the prediction accuracy of the most popular bankruptcy prediction models with Finnish private limited manufacturing company data. This thesis compares the following models: Altman Z-score, Logistic Regression, two Decision Trees (C5 and CART), Neural Networks, and Support Vector Machines (SVM). The results show that with the sample data chosen, SVM is the best performing bankruptcy prediction method; measured both in terms of overall prediction accuracy and F-measure. SVM provides the most accurate predictions both in short-term and long-term predictions. Logistic regression provides the second most accuracy, falling just behind SVM by a small margin. It is worthwhile mentioning, however, that the differences in every models’ prediction accuracy and F-measures are relatively small during the first year prior to bankruptcy. SVM and logistic regression seem to sustain their prediction performance better than the other models when the prediction horizon gets longer. Yet, by stretching the prediction horizon to five years or more, it seems that no model provides results, which would be more accurate than flipping a coin. The study contributes toward a more thorough understanding of the advantages and disadvantages of the bankruptcy prediction models, and delivers insights on how the various models perform compared to each other with the Finnish private manufacturing company data.Item A context-aware approach to user profiling with interactive preference learning(Helsinki: Aalto-yliopiston kauppakorkeakoulu, 2010) Malo, Pekka; Siitari, Pyry; School of Business; KauppakorkeakouluItem Convex Quantile Regression For Traffic Congestion Modelling(2019) Ylimartimo, Juho; Kuosmanen, Timo; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessPrecise and reliable prediction of highway traffic and performance becomes ever more important as the global car fleet continues to grow. Although traffic big data is more abundant than ever, traffic management is struggling to keep up with the immense growth. In the center of the global effort to fight the congestions’ chilling economic effect is a century-old framework known as traffic flow theory. This thesis argues that the tools of this framework do not utilize the present levels of computing power to the maximum and that there is a gap in modelling approaches regarding some recently discovered methodologies such as convex quantile regression, the method that this thesis proposes. This thesis explores the history of traffic flow theory, explains the methodologies that arise from the framework, and then presents constructive criticism in form of a novel modelling approach that, as far as the author knows, is unlike anything that has been ever done in the framework so far. To test out the convex quantile regression method’s predictive capabilities, we use traffic flow data from Finnish highway sensors. The convex quantile regression method has two functions. Firstly, the method can be used to build statistical confidence intervals for a measure known as highway capacity. Secondly, the method can be used as a traffic breakdown detection method. The results show a notable difference in the performances of some famous Finnish ring roads and highways in Greater Helsinki between summers and winters in terms of traffic breakdown counts. The results also show a change in the general stochastic characteristics of highway traffic between the seasons.Item Customer lifetime value prediction and customer segmentation in online retail(2023) Sallinen, Atte; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessItem Customer purchasing behavior data analysis in a B2B setting(2024) Lundström, Mia; Malo, Pekka; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessThis thesis explores some common topics in customer purchasing behavior and data analytics around it. It has been discussed in previous research that customers are often misclassified, leading to poor sales and marketing approaches. Additionally, the B2B point of view receives less attention as most purchasing behavior literature focuses on B2C e-commerce markets. To expand the scope of research around customer purchasing behavior, this thesis aims to solve two research questions. Firstly, buying patterns are investigated to cluster customers into similarly behaving segments based on transactional order data regarding engine spare parts and overhaul services. Secondly, the future potential profitability of the customers is predicted based on a statistical BTYD (Buy ‘til You Die) analysis. The findings from these research questions are then combined to form cohesive conclusions for the case company’s further sales activities. Based on the clustering analysis, the case company does appear to have distinctively different customers when it comes to purchasing behavior. This data-based clustering should be taken into consideration, as there also seems to be correlation to future sales potential. Together, the clustering analysis and predictions for the future customer values provide a sound guideline regarding which customers to focus on.Item Customers’ review mining for cell phones based on sentiment analysis(2019) Yang, Jiayan; Malo, Pekka; Erästö, Panu; Tieto- ja palvelujohtamisen laitos; Kauppakorkeakoulu; School of BusinessNatural Language Processing is the subfield of AI that helps computers understand human language. It is one of the significant applications of AI in business. The research on NLP started from the 1950s and broken down in recent decades, and it is believed to grow at a high pace in the future. The technology makes influential and substantial contributions to numerous fields such as machine translation and speech recognition, and it makes AI more available to human life. In this thesis, we will focus on one of NLP tasks, sentiment analysis, and attempt to use it to analyze and predict customers’ satisfaction for cell phones based on their textual reviews. Sentiment analysis is contextual mining of text, and it aims to detect and predict the viewpoint of customers for product and service. It is a technique of significant development and applied to all kinds of fields, such as political issues, marketing strategy, and media communication. In the past, traditional machine learning methods integrating with feature engineering are using in sentiment analysis. However, the emerging technique, deep learning, have achieved state-of-the-art in some NLP tasks and become the center of academic that replaces the traditional methods gradually. The new method also brings more insight and possibilities for further research in this field. In the thesis, we will implement both traditional statistical methods and neural network methods on customers’ textual reviews on cellphones, and carry out the sentiment analysis based on the sentence level. Subsequently, the performance of different models will be compared using quantitative metrics, and the optimal one will be selected. Also, we will discuss the details of the strengths and weaknesses of models, assess the reliability of the experiment and models in different aspects, for instance, the reasonableness of assumption and potential limitations. Our experiment is a classic multi-class problem based on an imbalanced dataset. We found that the deep learning models outperform most of traditional machine learning models in the experiment, and the workable models have similar performance. The MaxEntropy model with BoW embedding is considered as the best one given the fact that the model wins other models in multiple comparisons using different metrics. However, other models will win the optimal one if evaluated by some other metrics, and the winner will be replaced. We also discuss the limitations of the experiment and scrutinize further development.Item Data Quality in Business and Regulatory Reporting: a Case Study of a Nordic Financial Services Company(2017) Pulkki, Sari; Malo, Pekka; Tieto- ja palvelutalouden laitos; Kauppakorkeakoulu; School of Business