Browsing by Author "Ukkonen, Antti"
Now showing 1 - 7 of 7
- Results Per Page
- Sort Options
- Algorithms for finding orders and analyzing sets of chains
Doctoral dissertation (monograph)(2008) Ukkonen, AnttiRankings of items are a useful concept in a variety of applications, such as clickstream analysis, some voting methods, bioinformatics, and other fields of science such as paleontology. This thesis addresses two problems related to such data. The first problem is about finding orders, while the second one is about analyzing sets of orders. We address two different tasks in the problem of finding orders. We can find orders either by computing an aggregate of a set of known orders, or by constructing an order for a previously unordered data set. For the first task we show that bucket orders, a subclass of partial orders, are a useful structure for summarizing sets of orders. We formulate an optimization problem for finding such partial orders, show that it is NP-hard, and give an efficient randomized algorithm for finding approximate solutions to it. Moreover, we show that the expected cost of a solution found by the randomized algorithm differs from the optimal solution only by a constant factor. For the second approach we propose a simple method for sampling orders for 0–1 vectors that is based on the consecutive ones property. For analyzing orders, we discuss three different methods. First, we give an algorithm for clustering sets of orders. The algorithm is a variant of Lloyd's iteration for solving the k-means problem. We also give two different approaches for mapping orders to vectors in a high-dimensional Euclidean space. These mappings are used on one hand for clustering, and on the other hand for creating two dimensional visualizations (scatterplots) for sets of orders. Finally, we discuss randomization testing in case of orders. To this end we propose an MCMC algorithm for creating random sets of orders that preserve certain well defined properties of a given set of orders. The random data sets can be used to assess the statistical significance of the results obtained e.g. by clustering. - Approximate Top-K Retrieval from Hidden Relations
Faculty of Information and Natural Sciences |(2010) Ukkonen, AnttiWe consider the evaluation of approximate top-k queries from relations with a-priori unknown values. Such relations can arise for example in the context of expensive predicates, or cloud-based data sources. The task is to find an approximate top-k set that is close to the exact one while keeping the total processing cost low. The cost of a query is the sum of the costs of the entries that are read from the hidden relation. A novel aspect of this work is that we consider prior information about the values in the hidden matrix. We propose an algorithm that uses regression models at query time to assess whether a row of the matrix can enter the top-k set given that only a subset of its values are known. The regression models are trained with existing data that follows the same distribution as the relation subjected to the query. To evaluate the algorithm and to compare it with a method proposed previously in literature, we conduct experiments using data from a context sensitive Wikipedia search engine. The results indicate that the proposed method outperforms the baseline algorithms in terms of the cost while maintaining a high accuracy of the returned results. - Data mining techniques for discovering partial orders
Helsinki University of Technology | Master's thesis(2004) Ukkonen, AnttiOlkoon jonkin mielivaltaisen prosessin lopputulos joukko S joukon SIGMA alkioiden järjestyksiä. Nämä järjestykset voivat koskea kaikkia SIGMA:n alkioita tai vain jotakin sen osajoukkoa. Oletetaan, että kaikki järjestykset joukossa S ovat tuntemattomien osittainjärjestysten generoimia. Järjestys on osittainjärjestyksen P generoima, mikäli se on P:n lineaarinen laajennus tai voidaan muodostaa sellaisesta jättämällä joitakin SIGMA:n alkioita pois. Tässä diplomityössä suunnitellaan algoritmeja näiden piilevien osittainjärjestysten etsintään. Menetelmillä ei ole osittainjärjestyksistä a priori tietoa ja ne käsittelevät ainoastaan joukkoa S. Yhden ja useamman osittainjärjestyksen etsintä tehdään erikseen. Yhden osittainjärjestyksen löytämiseksi esitellään algoritmi GREEDY-ORDER. Tämä algoritmi maksimoi kustannusfunktiota lisäämällä konstruoitavaan osittainjärjestykseen vain sellaiset järjestetyt parit (i, j), joiden frekvenssi on suurempi kuin päinvastaisella järjestyksellä (j, i). Useamman osittainjärjestyksen etsintä ratkaistaan ryhmittelemällä S ensin siten, että samaan ryhmään kuuluvat järjestykset ovat saman osittainjärjestyksen generoimia. Lopuksi GREEDY-ORDER algoritmia sovelletaan kuhunkin ryhmään. Ryhmittelyä varten esitetään kaksi menetelmää. Ensimmäinen vaihtoehto perustuu syötettä S esittävän tietokannan suljettujen kattavien joukkojen laskentaan ja näiden perusteella määriytyvän joukon peitto-ongelman ratkaisemiseen. Toinen menetelmä perustuu paikalliseen hakuun kaikkien mahdollisten ryhmittelyjen joukosta. Haussa pyritään maksimoimaan sopivaa painofunktiota, mikä tapahtuu joko hill climbing -algoritmilla tai simuloidulla jäähdytyksellä. - Isotope effect on flow oscillations in simulations of ohmic tokamak discharges
Perustieteiden korkeakoulu | Bachelor's thesis(2016-02-24) Ukkonen, Antti - Learning Embeddings from Probabilistic Triplet Comparisons
Perustieteiden korkeakoulu | Master's thesis(2018-10-08) Mojsilovic, StefanLearning from relative similarity comparisons has gained interest in the data science community in the past 20 years. We introduce a new way to capture relative similarity comparisons called probabilistic triplets that alleviates extreme decisions under high uncertainty, and provides finer-grained information than ordinary triplets. We describe a new method called t-SPTE that finds an embedding of objects in a Euclidean space using probabilistic triplets datasets as its input. The problem is formulated as a least squares optimization of differences between the labeled triplet probabilities and the triplet probabilities coming from the stochastic neighborhood model in the embedding space. We experimentally show that our approach improves upon previous methods, notably t-STE, needing less labeled triplets and producing higher quality embeddings. - Potential consequences of hypothetical domestic nuclear power plant accidents
Perustieteiden korkeakoulu | Master's thesis(2019-06-18) Ukkonen, AnttiKnowledge of the potential consequences of a nuclear power plant accident is an important aspect on emergency preparedness. The protective actions during a nuclear emergency are based on the radiological consequences of the release. This thesis studies three hypothetical nuclear power accident scenarios with different magnitudes. The operating Finnish power reactor units Loviisa 1&2 and Olkiluoto 1&2 are included in the study. Modeling of the consequences is based on historical weather data from years 2012-2015 retrieved with AROME and HARMONIE operative weather forecast models. Dispersion and deposition calculations are done with SILAM dispersion model. Dose rates and doses are calculated with threat assessment tool TIUKU, developed by the Finnish Radiation and Nuclear Safety Authority. Post-processing of the data is done with Python and its computational libraries. The results are compared to operational intervention levels and dose criteria. The sufficiency of emergency planning zones (EPZs) is analysed as well. Comparison shows that the protective actions are needed outside the emergency planning zones in the worst scenario studied, otherwise the EPZs suite the studied scenarios. - Randomization algorithms for large sparse networks
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2019-05-30) Puolamäki, Kai; Henelius, Andreas; Ukkonen, AnttiIn many domains it is necessary to generate surrogate networks, e.g., for hypothesis testing of different properties of a network. Generating surrogate networks typically requires that different properties of the network are preserved, e.g., edges may not be added or deleted and edge weights may be restricted to certain intervals. In this paper we present an efficient property-preserving Markov chain Monte Carlo method termed CycleSampler for generating surrogate networks in which (1) edge weights are constrained to intervals and vertex strengths are preserved exactly, and (2) edge and vertex strengths are both constrained to intervals. These two types of constraints cover a wide variety of practical use cases. The method is applicable to both undirected and directed graphs. We empirically demonstrate the efficiency of the CycleSampler method on real-world data sets. We provide an implementation of CycleSampler in R, with parts implemented in C.