Efficient estimation of selection-induced bias in Bayesian model selection

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorVehtari, Aki
dc.contributor.authorMcLatchie, Yann
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorVehtari, Aki
dc.date.accessioned2023-10-15T17:08:21Z
dc.date.available2023-10-15T17:08:21Z
dc.date.issued2023-10-09
dc.description.abstractModel selection aims to identify a sufficiently well performing model that is possibly simpler than the most complex model among a pool of candidates. However, the decision-making process itself can inadvertently introduce non-negligible bias when the cross-validation estimates of predictive performance are marred by excessive noise. In finite data regimes, cross-validated estimates can encourage the statistician to select one model over another when it is not actually better for future data. While this bias remains negligible in the case of few models, when the pool of candidates grows, and model selection decisions are compounded (as in forward search), the expected magnitude of selection-induced bias is likely to grow too. This paper introduces an efficient approach to estimate and correct selection-induced bias based on order statistics. Numerical experiments demonstrate the reliability of our approach in estimating both selection-induced bias and over-fitting along compounded model selection decisions, with specific application to forward search. This work represents a light-weight alternative to more computationally expensive approaches to correcting selection-induced bias, such as nested cross-validation and the bootstrap. Our approach rests on several theoretic assumptions, and we provide a diagnostic to help understand when these may not be valid and when to fall back on safer, albeit more computationally expensive approaches. The accompanying code facilitates its practical implementation and fosters further exploration in this area.en
dc.format.extent36 + 8
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/124044
dc.identifier.urnURN:NBN:fi:aalto-202310156387
dc.language.isoenen
dc.programmeMaster's Programme in Computer, Communication and Information Sciencesfi
dc.programme.majorMachine Learning, Data Science, and Artificial Intelligencefi
dc.programme.mcodeSCI3044fi
dc.subject.keywordselection-induced biasen
dc.subject.keywordBayesian model selectionen
dc.subject.keywordcross-validationen
dc.subject.keywordbias correctionen
dc.titleEfficient estimation of selection-induced bias in Bayesian model selectionen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
master_McLatchie_Yann_2023.pdf
Size:
928.92 KB
Format:
Adobe Portable Document Format