Addressing statistical and computational challenges in extreme multilabel classification with unbiased estimators, macro-averaged metrics, and hardware-aware implementations
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Doctoral thesis (article-based)
| Defence date: 2025-10-02
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
93 + app. 209
Series
Aalto University publication series Doctoral Theses, 180/2025
Abstract
This thesis tackles statistical and computational challenges in extreme multilabel classification (XMC) problems, that is, in tasks where the label space is gigantic, possibly in the millions of labels. Such problems are plagued by missing labels and data scarcity, particularly in the form of tail labels, and the enormous label space turns operations that are cheap in typical machine learning problems, such as calculating the loss in the classification layer, into computationall challenging tasks. Towards addressing the missing-label problem, this thesis derives unbiased estimators for generic multilabel loss functions under the assumption that a propensity model is available. A critical look at the propensity model that is in widespread usage in the current XMC literature is provided, in particular regarding the the problematic double role of using propensities both to compensate for missing labels and as a measure for performance on infrequent tail labels. As an alternative, macro-averaged performance metrics are proposed, and prediction algorithms aiming to optimize these metrics in two different inference frameworks are presented. The thesis presents a new approach to train linear extreme classifiers, still an important baseline, significantly faster than before, owing to a new weight initialization scheme, and code that is aware of the memory layout of modern NUMA processors. Additionally, it presents a novel way to exploit weight sparsity, already at the training stage, to reduce the on-device memory consumption. This is achieved by combining dynamic sparse training algorithms with an efficient weight storage format that at the same time allows for a fast implementation of matrix multiplication.Description
Supervising professor
Marttinen, Pekka, Prof., Aalto University, Department of Computer Science, FinlandThesis advisor
Babbar, Rohit, Prof., University of Bath, UKOther note
Parts
-
[Publication 1]: Mohammadreza Qaraei, Erik Schultheis, Priyanshu Gupta, and Rohit Babbar. Convex surrogates for unbiased loss functions in extreme classification with missing labels. In WWW ’21: Proceedings of the Web Conference 2021, Ljubljana, pages 3711–3720, April 2021.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202108098273DOI: 10.1145/3442381.3450139 View at publisher
-
[Publication 2]: Erik Schultheis and Rohit Babbar. Unbiased Loss Functions for Multilabel Classification with Missing Labels. Accepted for publication in Transactions on Machine Learning Research, September 2025.
DOI: 10.48550/arXiv.2109.11282 View at publisher
-
[Publication 3]: Erik Schultheis, Rohit Babbar, Marek Wydmuch, Krzysztof Dembczynski. On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1547–1557, August 2022.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202208244984DOI: 10.1145/3534678.3539466 View at publisher
-
[Publication 4]: Erik Schultheis, Rohit Babbar. Speeding-up one-versus-all training for extreme classification via mean-separating initialization. Machine Learning, volume 111, issue 11, pp 3953-3976, November 2022.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202211096451DOI: 10.1007/s10994-022-06228-2 View at publisher
-
[Publication 5]: Erik Schultheis, Rohit Babbar. Towards Memory-Efficient Training for Extremely Large Output Spaces–Learning with 670k Labels on a Single Commodity GPU. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 689-704, September 2023.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202408065234DOI: 10.1007/978-3-031-43418-1_41 View at publisher
- [Publication 6]: Nasib Ullah, Erik Schultheis, Mike Lasby, Yani Ioannou, Rohit Babbar. Navigating Extremes: Dynamic Sparsity in Large Output Spaces. In Advances in Neural Information Processing Systems, Vol. 37, 2024
-
[Publication 7]: Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczynski. Generalized test utilities for long-tail performance in extreme multi-label classification. In Advances in Neural Information Processing Systems, Vol. 36, 2023.
DOI: 10.48550/arXiv.2311.05081 View at publisher
-
[Publication 8]: Erik Schultheis, Wojciech Kotłowski, Marek Wydmuch, Rohit Babbar, Strom Borman, Krzysztof Dembczyński. Consistent algorithms for multilabel classification with macro-at-k metrics. In The Twelfth International Conference on Learning Representations, May 2024.
DOI: 10.48550/arXiv.2401.16594 View at publisher