Algorithms for finding orders and analyzing sets of chains

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorUkkonen, Antti
dc.contributor.departmentTietojenkäsittelytieteen laitosfi
dc.date.accessioned2012-08-20T06:30:03Z
dc.date.available2012-08-20T06:30:03Z
dc.date.issued2008
dc.description.abstractRankings of items are a useful concept in a variety of applications, such as clickstream analysis, some voting methods, bioinformatics, and other fields of science such as paleontology. This thesis addresses two problems related to such data. The first problem is about finding orders, while the second one is about analyzing sets of orders. We address two different tasks in the problem of finding orders. We can find orders either by computing an aggregate of a set of known orders, or by constructing an order for a previously unordered data set. For the first task we show that bucket orders, a subclass of partial orders, are a useful structure for summarizing sets of orders. We formulate an optimization problem for finding such partial orders, show that it is NP-hard, and give an efficient randomized algorithm for finding approximate solutions to it. Moreover, we show that the expected cost of a solution found by the randomized algorithm differs from the optimal solution only by a constant factor. For the second approach we propose a simple method for sampling orders for 0–1 vectors that is based on the consecutive ones property. For analyzing orders, we discuss three different methods. First, we give an algorithm for clustering sets of orders. The algorithm is a variant of Lloyd's iteration for solving the k-means problem. We also give two different approaches for mapping orders to vectors in a high-dimensional Euclidean space. These mappings are used on one hand for clustering, and on the other hand for creating two dimensional visualizations (scatterplots) for sets of orders. Finally, we discuss randomization testing in case of orders. To this end we propose an MCMC algorithm for creating random sets of orders that preserve certain well defined properties of a given set of orders. The random data sets can be used to assess the statistical significance of the results obtained e.g. by clustering.en
dc.format.extentVerkkokirja (2036 KB, 132 s.)
dc.format.mimetypeapplication/pdf
dc.identifier.isbn978-951-22-9374-2
dc.identifier.isbn978-951-22-9373-5 (printed)#8195;
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/4493
dc.identifier.urnURN:ISBN:978-951-22-9374-2
dc.language.isoenen
dc.publisherTeknillinen korkeakouluen
dc.subject.keyworddata analysisen
dc.subject.keywordrankingen
dc.subject.keywordapproximation algorithmen
dc.subject.keywordclusteringen
dc.subject.keywordvisualizationen
dc.subject.keyworddimensionality reductionen
dc.subject.keywordMCMCen
dc.subject.keywordrandomization testingen
dc.subject.otherComputer scienceen
dc.titleAlgorithms for finding orders and analyzing sets of chainsen
dc.typeG4 Monografiaväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotVäitöskirja (monografia)fi
dc.type.ontasotDoctoral dissertation (monograph)en
local.aalto.digiauthask
local.aalto.digifolderAalto_67222

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
isbn9789512293742.pdf
Size:
1.94 MB
Format:
Adobe Portable Document Format