Maximizing diversity over clustered data*

Loading...
Thumbnail Image

Access rights

openAccess

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

2020

Major/Subject

Mcode

Degree programme

Language

en

Pages

9
649-657

Series

Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020

Abstract

Maximum diversity aims at selecting a diverse set of high-quality objects from a collection, which is a fundamental problem and has a wide range of applications, e.g., in Web search. Diversity under a uniform or partition matroid constraint naturally describes useful cardinality or budget requirements, and admits simple approximation algorithms [5]. When applied to clustered data, however, popular algorithms such as picking objects iteratively and performing local search lose their approximation guarantees towards maximum intra-cluster diversity because they fail to optimize the objective in a global manner. We propose an algorithm that greedily adds a pair of objects instead of a singleton, and which attains a constant approximation factor over clustered data. We further extend the algorithm to the case of monotone and submodular quality function, and under a partition matroid constraint. We also devise a technique to make our algorithm scalable, and on the way we obtain a modification that gives better solutions in practice while maintaining the approximation guarantee in theory. Our algorithm achieves excellent performance, compared to strong baselines in a mix of synthetic and real-world datasets.

Description

| openaire: EC/H2020/871042/EU//SoBigData-PlusPlus

Keywords

Other note

Citation

Zhang, G & Gionis, A 2020, Maximizing diversity over clustered data* . in C Demeniconi & N Chawla (eds), Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020 . Society for Industrial and Applied Mathematics, pp. 649-657, SIAM International Conference on Data Mining, Cincinnati, Ohio, United States, 07/05/2020 . https://doi.org/10.1137/1.9781611976236.73