Community detection in complex networks: the role of node metadata

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2017-11-03
Degree programme
88 + app. 82
Aalto University publication series DOCTORAL DISSERTATIONS, 52/2017
Recently, it was recognized that the problems lying between the order and chaos require a new scientific language and models to be developed. Network science has emerged as a promising interdisciplinary field studying the properties of all kinds of systems that emerge from interactions of large number of elements or constituents. A particularly interesting feature of complex networks is the presence of communities, or groups of nodes that have more connections between them than to the rest of the network. Communities provide an insight into the structure of the whole system and the immediate environment of each node, like circles of friends, or functionally related genes, and they have also been shown to play a role in various processes on networks. For these reasons numerous community detection algorithms have been proposed that take the network structure as input and return the communities, the nodes belong to. As the field of community detection matured, more scrutiny was applied to old and new algorithms. The researchers were not satisfied any more with good results on simple, almost toy examples, more proofs were sought for the applicability of the algorithms in the real world. At the same time, larger and more complex network datasets were becoming available, in which the need to identify meso-scale structures was even higher. A straightforward way to test the algorithms is to compare the results with the known node community assignments, which are taken to correspond to metadata labels on the nodes. In the first part of this dissertation a large number of algorithms were tested on a large number of labeled networks from different domains. Weak correspondences between metadata and communities indicate that more care has to be taken when using metadata as community labels. The relationship between the node metadata and communities is perhaps more complex than it was earlier assumed, but this does not mean that it is absent. Second part of this dissertation presents a novel approach for incorporating the metadata into community detection without assuming their usefulness. This approach enables to discriminate between metadata that are aligned with community structure and those that are not. The third part of this dissertation proposes the use of the stochastic blockmodel for modeling the citation networks of journals. The model is able to capture rich structures present in the data, while being simple, intuitive and applicable to huge networks (millions of nodes and links). By splitting the data spanning more that a hundred years into separate time windows, it was possible to track the evolution of science in time, and using the model presented in the previous part of the dissertation, the usefulness of journal classification into subject categories as predictors of the citation flows was evaluated.
Supervising professor
Kaski, Kimmo, Prof., Aalto University, Department of Computer Science, Finland
Thesis advisor
Fortunato, Santo. Prof., Indiana University, USA
Kivelä, Mikko, Assistant Prof., Aalto University, Department of Computer Science, Finland
complex networks, community detection, citation networks
Other note
  • [Publication 1]: Darko Hric, Richard K. Darst, Santo Fortunato. Community detection in networks: Structural communities versus ground truth. Physical Review E, Volume 90, Issue 6, pages 062805, December 2014.
    DOI: 10.1103/PhysRevE.90.062805 View at publisher
  • [Publication 2]: Darko Hric, Tiago P. Peixoto, Santo Fortunato. Network Structure, Metadata, and the Prediction of Missing Nodes and Annotations. Physical Review X, Volume 6, Issue 3, pages 031038, September 2016. Fulltext at Aaltodoc:
    DOI: 10.1103/PhysRevX.6.031038 View at publisher
  • [Publication 3]: Darko Hric, Kimmo Kaski, Mikko Kivelä. Stochastic Block Model Reveals the Map of Citation Patterns and Their Evolution in Time, submitted for peer review, May 2017.