Clustering energy performance certificates : a methodology for selecting representative buildings in scalable energy simulations

Loading...
Thumbnail Image

Access rights

openAccess
CC BY
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Major/Subject

Mcode

Degree programme

Language

en

Pages

16

Series

Energy and Buildings, Volume 350

Abstract

This study presents a methodology for selecting representative buildings through clustering of Energy Performance Certificate (EPC) features. The six-phase workflow includes EPC attribute preparation, clustering with K-Medoids, Agglomerative clustering, and Gaussian Mixture Model (GMM), and internal validation using Silhouette, Calinski-Harabasz, and Davies-Bouldin indices. An EPC database of educational buildings in Helsinki is utilised to demonstrate the applicability of scalable energy simulations. To assess thermal validity, regression models: Linear, Random Forest, and XGBoost were trained within clusters to predict District Heating (DH) demand from outdoor temperature, achieving higher accuracy than global models. Additionally, DH clustering was compared to EPC-derived labels using the Adjusted Rand Index (ARI) and Normalized Mutual Information. Formal statistical differentiation tests: ANOVA and Kruskal–Wallis with FDR correction confirmed that EPC attributes differ significantly between clusters, and medoid buildings were shown to represent cluster means, with r ranging from 0.92 to 0.98. Results show that EPC-based clusters capture thermal behaviour, enabling the selection of representative buildings without continuous monitoring. Across internal cluster validity indices, Agglomerative clustering often performed best, while externally, GMM showed the strongest alignment with DH-based clusters with ARI = 0.555 and NMI = 0.571. Four clusters were identified from linkage distance and thermal performance. Feature importance analysis highlighted air leakage rate as the dominant predictor, with UA value and EP value also influential. This generalizable methodology enables meaningful building grouping and simulation targeting without detailed metering, supporting energy policy and retrofit planning.

Description

Publisher Copyright: © 2025 The Author(s).

Other note

Citation

Hajian, H, Leiber, C, Härkönen, K, Mannila, H & Ferrantelli, A 2026, 'Clustering energy performance certificates : a methodology for selecting representative buildings in scalable energy simulations', Energy and Buildings, vol. 350, 116592. https://doi.org/10.1016/j.enbuild.2025.116592