Maximizing diversity over clustered data*
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.author | Zhang, Guangyi | en_US |
dc.contributor.author | Gionis, Aristides | en_US |
dc.contributor.department | Department of Computer Science | en |
dc.contributor.editor | Demeniconi, Carlotta | en_US |
dc.contributor.editor | Chawla, Nitesh | en_US |
dc.contributor.groupauthor | Helsinki Institute for Information Technology (HIIT) | en |
dc.contributor.groupauthor | Adj. Prof. Gionis Aris group | en |
dc.contributor.organization | KTH Royal Institute of Technology | en_US |
dc.date.accessioned | 2020-09-11T06:29:27Z | |
dc.date.available | 2020-09-11T06:29:27Z | |
dc.date.issued | 2020 | en_US |
dc.description | | openaire: EC/H2020/871042/EU//SoBigData-PlusPlus | |
dc.description.abstract | Maximum diversity aims at selecting a diverse set of high-quality objects from a collection, which is a fundamental problem and has a wide range of applications, e.g., in Web search. Diversity under a uniform or partition matroid constraint naturally describes useful cardinality or budget requirements, and admits simple approximation algorithms [5]. When applied to clustered data, however, popular algorithms such as picking objects iteratively and performing local search lose their approximation guarantees towards maximum intra-cluster diversity because they fail to optimize the objective in a global manner. We propose an algorithm that greedily adds a pair of objects instead of a singleton, and which attains a constant approximation factor over clustered data. We further extend the algorithm to the case of monotone and submodular quality function, and under a partition matroid constraint. We also devise a technique to make our algorithm scalable, and on the way we obtain a modification that gives better solutions in practice while maintaining the approximation guarantee in theory. Our algorithm achieves excellent performance, compared to strong baselines in a mix of synthetic and real-world datasets. | en |
dc.description.version | Peer reviewed | en |
dc.format.extent | 9 | |
dc.format.extent | 649-657 | |
dc.format.mimetype | application/pdf | en_US |
dc.identifier.citation | Zhang, G & Gionis, A 2020, Maximizing diversity over clustered data* . in C Demeniconi & N Chawla (eds), Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020 . Society for Industrial and Applied Mathematics, pp. 649-657, SIAM International Conference on Data Mining, Cincinnati, Ohio, United States, 07/05/2020 . https://doi.org/10.1137/1.9781611976236.73 | en |
dc.identifier.doi | 10.1137/1.9781611976236.73 | en_US |
dc.identifier.isbn | 9781611976236 | |
dc.identifier.other | PURE UUID: 0d17f0e7-046f-41da-acf2-4e6befe6edeb | en_US |
dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/0d17f0e7-046f-41da-acf2-4e6befe6edeb | en_US |
dc.identifier.other | PURE LINK: http://www.scopus.com/inward/record.url?scp=85089176616&partnerID=8YFLogxK | en_US |
dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/51096494/Zhang_Gionis_Maximizing.1.9781611976236.73.pdf | en_US |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/46409 | |
dc.identifier.urn | URN:NBN:fi:aalto-202009115332 | |
dc.language.iso | en | en |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/871042/EU//SoBigData-PlusPlus | en_US |
dc.relation.ispartof | SIAM International Conference on Data Mining | en |
dc.relation.ispartofseries | Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020 | en |
dc.rights | openAccess | en |
dc.title | Maximizing diversity over clustered data* | en |
dc.type | A4 Artikkeli konferenssijulkaisussa | fi |
dc.type.version | publishedVersion |