Addressing statistical and computational challenges in extreme multilabel classification with unbiased estimators, macro-averaged metrics, and hardware-aware implementations

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Doctoral thesis (article-based) | Defence date: 2025-10-02

Date

Major/Subject

Mcode

Degree programme

Language

en

Pages

93 + app. 209

Series

Aalto University publication series Doctoral Theses, 180/2025

Abstract

This thesis tackles statistical and computational challenges in extreme multilabel classification (XMC) problems, that is, in tasks where the label space is gigantic, possibly in the millions of labels. Such problems are plagued by missing labels and data scarcity, particularly in the form of tail labels, and the enormous label space turns operations that are cheap in typical machine learning problems, such as calculating the loss in the classification layer, into computationall challenging tasks. Towards addressing the missing-label problem, this thesis derives unbiased estimators for generic multilabel loss functions under the assumption that a propensity model is available. A critical look at the propensity model that is in widespread usage in the current XMC literature is provided, in particular regarding the the problematic double role of using propensities both to compensate for missing labels and as a measure for performance on infrequent tail labels. As an alternative, macro-averaged performance metrics are proposed, and prediction algorithms aiming to optimize these metrics in two different inference frameworks are presented. The thesis presents a new approach to train linear extreme classifiers, still an important baseline, significantly faster than before, owing to a new weight initialization scheme, and code that is aware of the memory layout of modern NUMA processors. Additionally, it presents a novel way to exploit weight sparsity, already at the training stage, to reduce the on-device memory consumption. This is achieved by combining dynamic sparse training algorithms with an efficient weight storage format that at the same time allows for a fast implementation of matrix multiplication.

Description

Supervising professor

Marttinen, Pekka, Prof., Aalto University, Department of Computer Science, Finland

Thesis advisor

Babbar, Rohit, Prof., University of Bath, UK

Other note

Parts

  • [Publication 1]: Mohammadreza Qaraei, Erik Schultheis, Priyanshu Gupta, and Rohit Babbar. Convex surrogates for unbiased loss functions in extreme classification with missing labels. In WWW ’21: Proceedings of the Web Conference 2021, Ljubljana, pages 3711–3720, April 2021.
    DOI: 10.1145/3442381.3450139 View at publisher
  • [Publication 2]: Erik Schultheis and Rohit Babbar. Unbiased Loss Functions for Multilabel Classification with Missing Labels. Accepted for publication in Transactions on Machine Learning Research, September 2025.
    DOI: 10.48550/arXiv.2109.11282 View at publisher
  • [Publication 3]: Erik Schultheis, Rohit Babbar, Marek Wydmuch, Krzysztof Dembczynski. On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1547–1557, August 2022.
    DOI: 10.1145/3534678.3539466 View at publisher
  • [Publication 4]: Erik Schultheis, Rohit Babbar. Speeding-up one-versus-all training for extreme classification via mean-separating initialization. Machine Learning, volume 111, issue 11, pp 3953-3976, November 2022.
    DOI: 10.1007/s10994-022-06228-2 View at publisher
  • [Publication 5]: Erik Schultheis, Rohit Babbar. Towards Memory-Efficient Training for Extremely Large Output Spaces–Learning with 670k Labels on a Single Commodity GPU. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 689-704, September 2023.
    DOI: 10.1007/978-3-031-43418-1_41 View at publisher
  • [Publication 6]: Nasib Ullah, Erik Schultheis, Mike Lasby, Yani Ioannou, Rohit Babbar. Navigating Extremes: Dynamic Sparsity in Large Output Spaces. In Advances in Neural Information Processing Systems, Vol. 37, 2024
  • [Publication 7]: Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczynski. Generalized test utilities for long-tail performance in extreme multi-label classification. In Advances in Neural Information Processing Systems, Vol. 36, 2023.
    DOI: 10.48550/arXiv.2311.05081 View at publisher
  • [Publication 8]: Erik Schultheis, Wojciech Kotłowski, Marek Wydmuch, Rohit Babbar, Strom Borman, Krzysztof Dembczyński. Consistent algorithms for multilabel classification with macro-at-k metrics. In The Twelfth International Conference on Learning Representations, May 2024.
    DOI: 10.48550/arXiv.2401.16594 View at publisher

Citation