No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets
Loading...
Access rights
openAccess
CC BY
CC BY
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
30
Series
Proceedings of Machine Learning Research, Volume 267, pp. 11405-11434
Abstract
Benchmark datasets have proved pivotal to the success of graph learning, and good benchmark datasets are crucial to guide the development of the field. Recent research has highlighted prob-lems with graph-learning datasets and benchmark-ing practices revealing, for example, that meth-ods which ignore the graph structure can outper-form graph-based approaches. Such findings raise two questions: (1) What makes a good graph-learning dataset, and (2) how can we evaluate dataset quality in graph learning? Our work ad-dresses these questions. As the classic evalua-tion setup uses datasets to evaluate models, it does not apply to dataset evaluation. Hence, we start from first principles. Observing that graph-learning datasets uniquely combine two modes graph structure and node features, we introduce RINGS, a flexible and extensible mode-perturbation framework to assess the quality of graph-learning datasets based on dataset abla-tions ie., quantifying differences between the original dataset and its perturbed representations. Within this framework, we propose two mea-sures performance separability and mode com-plementarity as evaluation tools, each assess-ing the capacity of a graph dataset to benchmark the power and efficacy of graph-learning meth-ods from a distinct angle. We demonstrate the utility of our framework for dataset evaluation via extensive experiments on graph-level tasks and derive actionable recommendations for im-proving the evaluation of graph-learning methods. Our work opens new research directions in data-centric graph learning, and it constitutes a step toward the systematic evaluation of evaluations.Description
Publisher Copyright: © 2025 by the author(s).
Keywords
Other note
Citation
Coupette, C, Wayland, J, Simons, E & Rieck, B 2025, 'No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets', Proceedings of Machine Learning Research, vol. 267, pp. 11405-11434. < https://proceedings.mlr.press/v267/coupette25a.html >