Provable randomized rounding for minimum-similarity diversification

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorOrdozgoiti, Brunoen_US
dc.contributor.authorMahadevan, Ananthen_US
dc.contributor.authorMatakos, Antonisen_US
dc.contributor.authorGionis, Aristidesen_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorAdj. Prof. Gionis Aris groupen
dc.contributor.organizationUniversity of Helsinkien_US
dc.contributor.organizationKTH Royal Institute of Technologyen_US
dc.date.accessioned2022-01-26T07:48:25Z
dc.date.available2022-01-26T07:48:25Z
dc.date.issued2022-03en_US
dc.description| openaire: EC/H2020/834862/EU//REBOUND | openaire: EC/H2020/871042/EU//SoBigData-PlusPlus
dc.description.abstractWhen searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature.en
dc.description.versionPeer revieweden
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationOrdozgoiti, B, Mahadevan, A, Matakos, A & Gionis, A 2022, ' Provable randomized rounding for minimum-similarity diversification ', Data Mining and Knowledge Discovery, vol. 36, no. 2, pp. 709-738 . https://doi.org/10.1007/s10618-021-00811-2en
dc.identifier.doi10.1007/s10618-021-00811-2en_US
dc.identifier.issn1384-5810
dc.identifier.otherPURE UUID: 814f8454-42e5-4f35-8f5d-ae609b6dc226en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/814f8454-42e5-4f35-8f5d-ae609b6dc226en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85122726042&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/78530861/Provable_randomized_rounding_for_minimum_similarity_diversification.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/112556
dc.identifier.urnURN:NBN:fi:aalto-202201261457
dc.language.isoenen
dc.publisherSPRINGER
dc.relationinfo:eu-repo/grantAgreement/EC/H2020/871042/EU//SoBigData-PlusPlusen_US
dc.relation.ispartofseriesData Mining and Knowledge Discoveryen
dc.rightsopenAccessen
dc.subject.keywordDiversificationen_US
dc.subject.keywordQuadratic programmingen_US
dc.subject.keywordRandomized roundingen_US
dc.subject.keywordRecommender systemsen_US
dc.titleProvable randomized rounding for minimum-similarity diversificationen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion

Files