Real-Time Match Outcome Prediction in Elite Soccer

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2021-12-13

Department

Major/Subject

Data Science

Mcode

SCI3095

Degree programme

Master's Programme in ICT Innovation

Language

en

Pages

62 + 15

Series

Abstract

Over the years, match outcome prediction in soccer has become a prominent subject in both academic and industrial research. In a field where data is becoming increasingly accessible, making accurate forecasts about an infamously unpredictable game constitutes a challenging task for state-of-the-art machine learning models. The applications are not only found in the sports betting industry, but also within the clubs themselves, as the team's results play a predominant role in their ecosystem. To date, while most studies focus on pre-match predictions, the rapid development of companies dedicated to collecting fine-grained soccer data is also opening up new opportunities in the field of real-time analysis in which predicted outcome probabilities could be continuously updated based on the match context or the team's tactical changes. This study proposes two novel approaches for the real-time match outcome prediction task in soccer. The first approach (MC model) uses Monte Carlo methods on the shot generation and conversion processes to simulate the match from any given time. The inter-arrival times of shots and their conversion probabilities are modelled using Weibull and logistic regression models respectively, whose parameters are learned via Markov Chain Monte Carlo algorithms. The second approach (MLP model) is an extension that uses a multilayer perceptron to build upon the learnt generation and conversion parameters and various other soccer KPIs. The models predictive power is measure against Dixon and Coles' approach on the 2018-19 and 2019-20 English Premier League seasons. The study demonstrates that the MLP model clearly outperform this baseline and the MC model on real-time settings, even when the amount of training matches is highly restricted. In addition, it is shown that goal difference and time are the two KPIs standing out in terms of permutation importance score, which would tend to rule out models that are not able to account for similar match progression features for the match prediction problem. Finally, the study illustrates a use case of real-time predictions for post-match analysis and discusses several possible improvements.

Description

Supervisor

Babbar, Rohit

Thesis advisor

Karavolos, Daniel

Keywords

sports, soccer, outcome prediction, deep learning, Bayesian inference, weibull

Other note

Citation