How accurately does the expected goals model reflect goalscoring and success in football?

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Business | Bachelor's thesis

Date

2020

Major/Subject

Mcode

Degree programme

Tieto- ja palvelujohtaminen

Language

en

Pages

28

Series

Abstract

This thesis discusses the expected goal model for football and assesses the explanatory power of the model to estimate match results and score lines as well as success at the end of the season. The objective of this thesis is to evaluate the expected goals model in order to fill a gap in academic literature. By filling this gap, the model and its outputs can be more accurately used in the media as well as in real world problems where this model is applicable. Additionally, in this thesis attempt is made to improve the results the expected goals model produces by running the estimated scores through the Poisson distribution. This is done since the Poisson distribution is frequently used in the academic literature on the field as it has been observed that goals in single matches in football are Poisson distributed. In addition to the Poisson distributed expected goals model, two other models estimating the outcomes and score lines of individual matches and two probability-based methods were used as benchmarks. Shots on target-based model and naïve aggregate model were chosen as simple models to estimate the results and scores of individual matches. Market odds probabilities and probabilities produced by the Elo ranking system were used to further benchmark the performance of the expected goals model. The data used in this study is collected from the four biggest leagues in football: The English Premier League, Spanish La Liga, German Bundesliga and Italian Serie A. The seasons included in the study were the five seasons from 2014-15 to 2018-19, totalling 7230 matches. The data were collected into and modified with Excel. Comparisons indicated that while none of the models could predict or estimate the results and score lines of single matches particularly well, the Poisson distributed expected goals model came the closest with a minimal difference to the standard expected goals model. Additionally, it was found that the model is biased in estimating an excessive number of draws, while underestimating ,the amount of matches that end with either or both of the teams failing to score. However, when it came to estimating the success of teams, i.e. the amount of accumulated points by the end of the season, the standard version of expected goals fared the best with its derivative output expected points. All in all, the results imply that while it is extremely difficult to accurately model football match outcomes, the expected goals model can give some insight behind the score line and is a valuable analysis tool.

Description

Thesis advisor

Seppälä, Tomi

Keywords

sports analytics, football analytics, expected goals, football

Other note

Citation