Clustering energy performance certificates : a methodology for selecting representative buildings in scalable energy simulations
Loading...
Access rights
openAccess
CC BY
CC BY
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
16
Series
Energy and Buildings, Volume 350
Abstract
This study presents a methodology for selecting representative buildings through clustering of Energy Performance Certificate (EPC) features. The six-phase workflow includes EPC attribute preparation, clustering with K-Medoids, Agglomerative clustering, and Gaussian Mixture Model (GMM), and internal validation using Silhouette, Calinski-Harabasz, and Davies-Bouldin indices. An EPC database of educational buildings in Helsinki is utilised to demonstrate the applicability of scalable energy simulations. To assess thermal validity, regression models: Linear, Random Forest, and XGBoost were trained within clusters to predict District Heating (DH) demand from outdoor temperature, achieving higher accuracy than global models. Additionally, DH clustering was compared to EPC-derived labels using the Adjusted Rand Index (ARI) and Normalized Mutual Information. Formal statistical differentiation tests: ANOVA and Kruskal–Wallis with FDR correction confirmed that EPC attributes differ significantly between clusters, and medoid buildings were shown to represent cluster means, with r ranging from 0.92 to 0.98. Results show that EPC-based clusters capture thermal behaviour, enabling the selection of representative buildings without continuous monitoring. Across internal cluster validity indices, Agglomerative clustering often performed best, while externally, GMM showed the strongest alignment with DH-based clusters with ARI = 0.555 and NMI = 0.571. Four clusters were identified from linkage distance and thermal performance. Feature importance analysis highlighted air leakage rate as the dominant predictor, with UA value and EP value also influential. This generalizable methodology enables meaningful building grouping and simulation targeting without detailed metering, supporting energy policy and retrofit planning.Description
Publisher Copyright: © 2025 The Author(s).
Other note
Citation
Hajian, H, Leiber, C, Härkönen, K, Mannila, H & Ferrantelli, A 2026, 'Clustering energy performance certificates : a methodology for selecting representative buildings in scalable energy simulations', Energy and Buildings, vol. 350, 116592. https://doi.org/10.1016/j.enbuild.2025.116592