Maximizing diversity over clustered data*

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorZhang, Guangyien_US
dc.contributor.authorGionis, Aristidesen_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.editorDemeniconi, Carlottaen_US
dc.contributor.editorChawla, Niteshen_US
dc.contributor.groupauthorHelsinki Institute for Information Technology (HIIT)en
dc.contributor.groupauthorAdj. Prof. Gionis Aris groupen
dc.contributor.organizationKTH Royal Institute of Technologyen_US
dc.date.accessioned2020-09-11T06:29:27Z
dc.date.available2020-09-11T06:29:27Z
dc.date.issued2020en_US
dc.description| openaire: EC/H2020/871042/EU//SoBigData-PlusPlus
dc.description.abstractMaximum diversity aims at selecting a diverse set of high-quality objects from a collection, which is a fundamental problem and has a wide range of applications, e.g., in Web search. Diversity under a uniform or partition matroid constraint naturally describes useful cardinality or budget requirements, and admits simple approximation algorithms [5]. When applied to clustered data, however, popular algorithms such as picking objects iteratively and performing local search lose their approximation guarantees towards maximum intra-cluster diversity because they fail to optimize the objective in a global manner. We propose an algorithm that greedily adds a pair of objects instead of a singleton, and which attains a constant approximation factor over clustered data. We further extend the algorithm to the case of monotone and submodular quality function, and under a partition matroid constraint. We also devise a technique to make our algorithm scalable, and on the way we obtain a modification that gives better solutions in practice while maintaining the approximation guarantee in theory. Our algorithm achieves excellent performance, compared to strong baselines in a mix of synthetic and real-world datasets.en
dc.description.versionPeer revieweden
dc.format.extent9
dc.format.extent649-657
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationZhang, G & Gionis, A 2020, Maximizing diversity over clustered data* . in C Demeniconi & N Chawla (eds), Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020 . Society for Industrial and Applied Mathematics, pp. 649-657, SIAM International Conference on Data Mining, Cincinnati, Ohio, United States, 07/05/2020 . https://doi.org/10.1137/1.9781611976236.73en
dc.identifier.doi10.1137/1.9781611976236.73en_US
dc.identifier.isbn9781611976236
dc.identifier.otherPURE UUID: 0d17f0e7-046f-41da-acf2-4e6befe6edeben_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/0d17f0e7-046f-41da-acf2-4e6befe6edeben_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85089176616&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/51096494/Zhang_Gionis_Maximizing.1.9781611976236.73.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/46409
dc.identifier.urnURN:NBN:fi:aalto-202009115332
dc.language.isoenen
dc.relationinfo:eu-repo/grantAgreement/EC/H2020/871042/EU//SoBigData-PlusPlusen_US
dc.relation.ispartofSIAM International Conference on Data Miningen
dc.relation.ispartofseriesProceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020en
dc.rightsopenAccessen
dc.titleMaximizing diversity over clustered data*en
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files