Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorSui, Jinpingen_US
dc.contributor.authorLiu, Zhenen_US
dc.contributor.authorLiu, Lien_US
dc.contributor.authorJung, Alexen_US
dc.contributor.authorLi, Xiangen_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorProfessorship Jung Alexanderen
dc.contributor.groupauthorHelsinki Institute for Information Technology (HIIT)en
dc.contributor.organizationNational University of Defense Technologyen_US
dc.date.accessioned2021-03-22T07:08:30Z
dc.date.available2021-03-22T07:08:30Z
dc.date.issued2022en_US
dc.description.abstractIn an era of ubiquitous large-scale evolving data streams, data stream clustering (DSC) has received lots of attention because the scale of the data streams far exceeds the ability of expert human analysts. It has been observed that high-dimensional data are usually distributed in a union of low-dimensional subspaces. In this article, we propose a novel sparse representation-based DSC algorithm, called evolutionary dynamic sparse subspace clustering (EDSSC). It can cope with the time-varying nature of subspaces underlying the evolving data streams, such as subspace emergence, disappearance, and recurrence. The proposed EDSSC consists of two phases: 1) static learning and 2) online clustering. During the first phase, a data structure for storing the statistic summary of data streams, called EDSSC summary, is proposed which can better address the dilemma between the two conflicting goals: 1) saving more points for accuracy of subspace clustering (SC) and 2) discarding more points for the efficiency of DSC. By further proposing an algorithm to estimate the subspace number, the proposed EDSSC does not need to know the number of subspaces. In the second phase, a more suitable index, called the average sparsity concentration index (ASCI), is proposed, which dramatically promotes the clustering accuracy compared to the conventionally utilized SCI index. In addition, the subspace evolution detection model based on the Page-Hinkley test is proposed where the appearing, disappearing, and recurring subspaces can be detected and adapted. Extinct experiments on real-world data streams show that the EDSSC outperforms the state-of-the-art online SC approaches.en
dc.description.versionPeer revieweden
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationSui, J, Liu, Z, Liu, L, Jung, A & Li, X 2022, ' Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams ', IEEE Transactions on Cybernetics, vol. 52, no. 6, pp. 4173-4186 . https://doi.org/10.1109/TCYB.2020.3023973en
dc.identifier.doi10.1109/TCYB.2020.3023973en_US
dc.identifier.issn2168-2267
dc.identifier.issn2168-2275
dc.identifier.otherPURE UUID: 7564147c-97a0-41ff-b8a9-b3ba433c9892en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/7564147c-97a0-41ff-b8a9-b3ba433c9892en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85097185993&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/117569910/SCI_Sui_etal_IEEE_Transactions_on_Cybernetics_2022.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/103217
dc.identifier.urnURN:NBN:fi:aalto-202103222495
dc.language.isoenen
dc.publisherIEEE
dc.relation.ispartofseriesIEEE Transactions on Cyberneticsen
dc.rightsopenAccessen
dc.titleDynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streamsen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionacceptedVersion
Files