Comparison of dynamic cluster methods

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2020-08-18

Department

Major/Subject

Computer Science

Mcode

SCI3042

Degree programme

Master’s Programme in Computer, Communication and Information Sciences

Language

en

Pages

69+13

Series

Abstract

Traditional clustering algorithms are popular for data analysis. However, by nature, they are designed for static objects. There are many applications where objects change their states over time. If to apply traditional clustering methods on every timestamp, this may produce totally different clusters on each timestamp. Instead, it would be preferable to obtain clusters smoothly evolving over time. Starting with Chakrabarti et al. \cite{Chakrabarti} in 2006, the era of dynamic clusters came to the world. Since then, evolutionary versions of clusters for popular algorithms were designed: Kmeans, hierarchical, spectral, DBSCAN. They assume that the algorithm cost function consists of a traditional algorithm cost and a temporal smoothness. Then Xu et al. \cite{AdaptiveEvolutionary} proposed an adaptive framework where smoothing was applied to a proximity matrix followed by static clustering. In the thesis, we compare the performance of the traditional algorithms Kmeans, spectral and DBSCAN with their evolutionary and adaptive modifications. For the experiments, two datasets were chosen: a synthetic, generated from four Gaussian distributions and simulated four seasons, and a Foursquare, with six seasons for venues in San Francisco. The criteria for performance evaluation are clustering goodness and smoothness between consecutive timestamps. Also, the similarity between clustering results was estimated. For the synthetic data, although the adaptive and evolutionary algorithms show best values, the traditional Kmeans and spectral algorithms differ from them neglectfully. This means there is no need to apply sophisticated algorithms. The results on the Foursquare data show that the evolutionary spectral versions perform better than the other algorithms taking into account a goodness-smoothness trade-off.

Description

Supervisor

Gionis, Aristides

Thesis advisor

Vasara, Petri

Keywords

evolution clustering, dynamic clustering, temporal smoothness, foursquare, clustering

Other note

Citation