### Browsing by Author "Kubjas, Kaie"

Now showing 1 - 20 of 33

###### Results Per Page

###### Sort Options

Item 3D Genome Reconstruction from Partially Phased Hi-C Data(Springer, 2024-04) Cifuentes, Diego; Draisma, Jan; Henriksson, Oskar; Korchmaros, Annachiara; Kubjas, Kaie; Department of Mathematics and Systems Analysis; Algebra and Discrete Mathematics; Statistics and Mathematical Data Science; Georgia Institute of Technology; University of Bern; University of Copenhagen; Leipzig UniversityThe 3-dimensional (3D) structure of the genome is of significant importance for many cellular processes. In this paper, we study the problem of reconstructing the 3D structure of chromosomes from Hi-C data of diploid organisms, which poses additional challenges compared to the better-studied haploid setting. With the help of techniques from algebraic geometry, we prove that a small amount of phased data is sufficient to ensure finite identifiability, both for noiseless and noisy data. In the light of these results, we propose a new 3D reconstruction method based on semidefinite programming, paired with numerical algebraic geometry and local optimization. The performance of this method is tested on several simulated datasets under different noise levels and with different amounts of phased data. We also apply it to a real dataset from mouse X chromosomes, and we are then able to recover previously known structural features.Item An Algebraic and Combinatorial Perspective on Rank-one Tensor Completion(2018-12-04) Rendon Jaramillo, Mateo; Kubjas, Kaie; Perustieteiden korkeakoulu; Kubjas, KaieWe investigate matrix completion and tensor completion from two different ap- proaches. Matrix completion is introduced as a particular instance of the Rank Minimization Problem, and a heuristic for solving this optimization problem is induced by the nuclear norm of a matrix [9]. Minimizing the nuclear norm of matrices agreeing with a set of observed entries yields a reconstruction of low rank. Moreover, we explore some guarantees on the least number of entries that allow exact reconstruction [6]. The same problem is studied using the tools of algebra and combinatorics. A cor- respondence between partial matrices and bipartite graphs allows certain prop- erties of bipartite graphs to characterize completion to rank one [13]. Moreover, a local approach for completion is possible and the finitely completable entries from a partial matrix are computed by a matroid closure [22]. An analogous development is considered for the case of tensor completion. For- mulating an optimization approach for tensor completion leads us to study the rank and nuclear norm for tensors. Computing these quantities, as well as other tensor properties, was found to be an NP-hard problem [10, 28]. However, effi- cient heuristics for tensor completion based on the nuclear norm of the flattenings of a tensor have been proposed in [11, 19, 30, 37, 41]. The algebraic and combinatorial approach is revisited for rank-one tensor com- pletion. A characterization of rank one tensors using minors can be described as the kernel of a parameterization map. Generated by binomials, the kernel reveals connections to toric geometry and other combinatorial objects that are used to describe the conditions for rank-one tensor completion [21].Item Algebraic Geometry of Deep Polynomial Neural Networks(2022-09-02) Elenius, Theo; Kubjas, Kaie; Perustieteiden korkeakoulu; Kubjas, KaieItem Algebraic methods for maximum likelihood estimation(2017-06-30) Malinen, Aki; Kubjas, Kaie; Perustieteiden korkeakoulu; Camilla Hollanti, CamillaItem Best nonnegative rank-2 matrix approximations(2024-06-18) Lindy, Etna; Sodomaco, Luca; Perustieteiden korkeakoulu; Kubjas, KaieA nonnegative rank-$r$ matrix factorization (NMF) of $X \in \RRp^{m\times n}$ takes the form $X= WH$, where $W \in \RRp^{m\times r}$, $H \in \RRp^{r\times n}$, and the matrices $W$ and $H$ are nonnegative. The nonnegative rank of the matrix $X$, denoted by $\nrnk{X}$, is the smallest $r$ for which such a decomposition exists. When the nonnegative matrix $X$ has rank two, we have $\rnk{X} = \nrnk{X}$. In the NMF-$r$ problem, one approximates a given datamatrix $U\in \RRp^{m\times n}$ with a matrix of nonnegative rank-$r$. The best approximation is the closest point among the nonnegative rank-$r$ matrices. The NMF problem has a variety of applications in data analysis, including image processing, data mining, and signal processing. The nonnegativity of the components $W$ and $H$ naturally highlights repeating features from the data, since the terms in the product cannot cancel out. In this thesis, I inspect the NMF-2 problem for small matrices of size $3\times 4$ and $4\times 4$. The problem is approached algebraically to ensure the optimality of the solution: when the number of critical points (Euclidean distance degree) is known in advance, one may ensure that all the critical points have been successfully found in the numerical computations. We redefine the NMF-2 problem as multiple rank-2 optimization problems with forced zeros. The zeros of the matrix are called the zero pattern. The NMF-2 problem has been previously studied, and a connection between the rank-2 optimum and the NMF-2 optimum through sampling matrices of size $3\times 3$ was witnessed. Furthermore, some zero patterns were never encountered as the optimum in the sampling. In this thesis, I continue the experiments for $3\times 4$ and $4\times 4$ matrices to see whether these observations still hold in the larger setting. To find the best nonnegative rnak-2 matrix approximation, we first identify the relevant zero patterns, compute the ED degrees for the corresponding algebraic varieties, and finally solve the problem numerically for uniformly random integer matrices. We were able to characterize the relevant zero patterns to three types of patterns, and prove that no other patterns may occur as an optimum in the $3\times 3$, $3\times 4$, and $4\times 4$ cases. We also investigated further the connection between the negative entries of the rank-2 optimum and the optimum pattern, but based on the numerical results, the optimum pattern cannot be directly read from the negative entries of the rank-2 optimum as witnessed in the sampling of $3\times 3$ matrices.Item Classifying critical points of Euclidean distance from real plane curves(2021-01-14) Lahti, Aleksi; Sodomaco, Luca; Perustieteiden korkeakoulu; Kubjas, KaieItem Determinantal Generalizations of Instrumental Variables(DE GRUYTER, 2018-03) Weihs, Luca; Robinson, Bill; Dufresne, Emilie; Kenkel, Jennifer; Kubjas, Kaie; McGee, Reginald L., II; Nguyen, Nhan; Robeva, Elina; Drton, Mathias; University of Washington; Denison University; University of Nottingham; University of Utah; Statistics and Mathematical Data Science; Ohio State University; University of Montana; Massachusetts Institute of Technology MIT; Department of Mathematics and Systems AnalysisLinear structural equation models relate the components of a random vector using linear interdependencies and Gaussian noise. Each such model can be naturally associated with a mixed graph whose vertices correspond to the components of the random vector. The graph contains directed edges that represent the linear relationships between components, and bidirected edges that encode unobserved confounding. We study the problem of generic identifiability, that is, whether a generic choice of linear and confounding effects can be uniquely recovered from the joint covariance matrix of the observed random vector. An existing combinatorial criterion for establishing generic identifiability is the half-trek criterion (HTC), which uses the existence of trek systems in the mixed graph to iteratively discover generically invertible linear equation systems in polynomial time. By focusing on edges one at a time, we establish new sufficient and necessary conditions for generic identifiability of edge effects extending those of the HTC. In particular, we show how edge coefficients can be recovered as quotients of subdeterminants of the covariance matrix, which constitutes a determinantal generalization of formulas obtained when using instrumental variables for identification.Item Estimating the Dimension of Algebraic Varieties From Samples(2021-10-29) Valve, Venla; Kubjas, Kaie; Perustieteiden korkeakoulu; Kubjas, KaieItem Euclidean Distance Geometry and Its Applications in 3D Genome Reconstruction(2020-12-18) Pajala, Aapo; Kubjas, Kaie; Perustieteiden korkeakoulu; Korte, RiikkaItem Exact solutions in log-concave maximum likelihood estimation(ACADEMIC PRESS, 2023-02) Grosdos, Alexandros; Heaton, Alexander; Kubjas, Kaie; Kuznetsova, Olga; Scholten, Georgy; Sorea, Miruna Ştefana; Osnabrück University; Max Planck Institute for Mathematics in the Sciences; Statistics and Mathematical Data Science; Department of Mathematics and Systems Analysis; North Carolina State UniversityWe study probability density functions that are log-concave. Despite the space of all such densities being infinite-dimensional, the maximum likelihood estimate is the exponential of a piecewise linear function determined by finitely many quantities, namely the function values, or heights, at the data points. We explore in what sense exact solutions to this problem are possible. First, we show that the heights given by the maximum likelihood estimate are generically transcendental. For a cell in one dimension, the maximum likelihood estimator is expressed in closed form using the generalized W-Lambert function. Even more, we show that finding the log-concave maximum likelihood estimate is equivalent to solving a collection of polynomial-exponential systems of a special form. Even in the case of two equations, very little is known about solutions to these systems. As an alternative, we use Smale's α-theory to refine approximate numerical solutions and to certify solutions to log-concave density estimation.Item Finding polynomial equations from samples(2021-12-30) Puhto, Inari; Kubjas, Kaie; Perustieteiden korkeakoulu; Kubjas, KaieItem Generalization of Descartes' rule of signs to multivariate polynomials with real exponents(2024-06-18) Kekäläinen, Oula; Kubjas, Kaie; Perustieteiden korkeakoulu; Kubjas, KaieIn this thesis, we consider a generalization of Descartes' rule of signs to multivariate polynomials with real exponents defined in the positive orthant. Descartes' rule of signs bounds the number of positive real roots that a univariate polynomial has by the number of sign changes in the coefficient sequence of the polynomial. This generalization is due to Elisanda Feliu and Mate Telek and it aims to bound the number of connected components, where a polynomial takes negative values in the positive orthant with the data from the exponents and signs of the coefficients. We identify the exponents of a multivariate polynomial as vectors and use their geometry to form the bounds. To each exponent vector we assign a sign, the sign of the corresponding coefficient. The bounds then depend on both the geometry of the exponent vectors and their signs. We present existing results for this generalization in a self-contained form. Using the existing results, we categorize polynomials with a fixed number of terms, up to four terms. We examine the overlap between conditions that show that a polynomial has at most one connected component where it obtains negative values. We also discuss whether existing ideas and bounds could be extended to polynomials that currently we do not have bounds for.Item The geometry of rank-one tensor completion(Society for Industrial and Applied Mathematics (SIAM), 2017) Kahle, Thomas; Kubjas, Kaie; Kummer, Mario; Rosen, Zvi; Otto von Guericke University Magdeburg; Statistics and Mathematical Data Science; Max Planck Institute for Mathematics in the Sciences; University of Pennsylvania; Department of Mathematics and Systems AnalysisThe geometry of the set of restrictions of rank-one tensors to some of their coordinates is studied. This gives insight into the problem of rank-one completion of partial tensors. Particular emphasis is put on the semialgebraic nature of the problem, which arises for real tensors with constraints on the parameters. The algebraic boundary of the completable region is described for tensors parametrized by probability distributions and where the number of observed entries equals the number of parameters. If the observations are on the diagonal of a tensor of format $d\times\dots\times d$, the complete semialgebraic description of the completable region is found.Item Holonomic least angle regression(2017-10-03) Härkönen, Marc; Sei, Tomonari; Kubjas, Kaie; Perustieteiden korkeakoulu; Hollanti, CamillaOne of the main problems studied in statistic is the fitting of models. If we assume that the "true" distribution belongs to some model, we can use data and observation to find the distribution that fits the best. Ideally, we would like to explain a large dataset with as few parameters as possible. This is because models with few parameters are simpler, so computation becomes faster and interpreting the model also becomes easier. The downside is that simpler models tend to have larger errors than less simple ones. In generalized linear models each parameter corresponds to one covariate. One might then ask which of these actually useful, or the most "impactful" in our fitting procedure, and how do we actually decide which covariates to include in the model. There have been numerous attempts at automatizing this process. Most notably, the Least Angle Regression algorithm, or LARS, by Efron et al. [2004] is a computationally efficient algorithm that ranks the covariates of a linear model. The LARS algorithm was extended by Hirose and Komaki [2010] for a class of distributions in the generalized linear model by using properties of the manifold of exponential families as dually flat manifolds. However, this extension assumes that the normalizing constant of the joint distribution of observations is "easy" to compute. This is often not the case, for excample the normalizing constant may contain complicated integral. We circumvent this issue if normalizing constant satisfies a holonomic system. In this thesis, we present a modification of the holonomic gradient method [Nakayama et al., 2011] and add it to the extended LARS algorithm. We call this the holonomic extended least angle regression algorithm, or HELARS. The algorithm was implemented using the statistical software R, and was tested with real and simulated datasets.Item Homotopy and homology of topological spaces: their functoriality and relations between them(2024-06-17) Yang, Michael; Orlich, Milo; Perustieteiden korkeakoulu; Kubjas, KaieTopology can be seen as a study of many generalizations. Since virtually any set of elements together with a defined topology can be considered to be a topological space, it would be natural for there to be many ways to generalize these spaces, i.e., unify them into groups of similarities. Perhaps the most useful criteria for such grouping would be homotopy equivalence, the notion that spaces that can be continuously and reversibly deformed into each other are equivalent. Its importance can be seen in many properties that are homotopy invariant, which means that these properties yield the same result for spaces that are homotopy equivalent. The homotopy groups and homology groups of a topological space are concepts that form the cornerstones of algebraic topology, and as it happens to be, they are also homotopy invariant properties. Hence, most calculations in analyzing the nature of a topological space $X$ begin with obtaining the homotopy and homology groups, which associates the space with a property that can be used to compare it to other toplogical spaces. Calculating these groups from their very definitions is generally straightforward for simpler spaces. However, for more complicated spaces, it is in most cases way more efficient to instead use certain derived tools and constructions to simplify the calculations. The thesis will begin with definitions of the $n$-th homotopy group $\pi_n(X)$ and homology group $\hm{n}(X)$ for a topological space $X$, and then move on to establishing a clear categorical relation between the groups $\pi_1(X)$ and $\hm{1}(X)$. Their homotopy invariance will be proven and this property will be utilized to find the homotopy and homology groups of a “sideways eight" graph in $\rr^2$, with the help of the Seifert--Van Kampen theorem for fundamental groups and the Mayer--Vietoris exact sequence for homology groups. The resulting fundamental group and first homology group for the graph are obtained through an elaborate process, with the aims of demonstrating the usefulness of the concepts involved.Item Identifying 3D Genome Organization in Diploid Organisms via Euclidean Distance Geometry(Society for Industrial and Applied Mathematics (SIAM), 2022-02-28) Belyaeva, Anastasiya; Kubjas, Kaie; Sun, Lawrence J.; Uhler, Caroline; Statistics and Mathematical Data Science; Department of Mathematics and Systems AnalysisThe spatial organization of the genome in the cell nucleus plays an important role for gene regulation, replication of the deoxyribonucleic acid (DNA), and genomic integrity. Through the development of chromosome conformation capture experiments (such as 3C, 4C, and Hi-C) it is now possible to obtain the contact frequencies of the DNA at the whole-genome level. In this paper, we study the problem of reconstructing the three-dimensional (3D) organization of the genome from such whole-genome contact frequencies. A standard approach is to transform the contact frequencies into noisy distance measurements and then apply semidefinite programming formulations to obtain the 3D configuration. However, neglected in such reconstructions is the fact that most eukaryotes including humans are diploid and therefore contain two copies of each genomic locus. We prove that the 3D organization of the DNA is not identifiable from the distance measurements derived from contact frequencies in diploid organisms. In fact, there areinfinitely many solutions even in the noise-free setting. We then discuss various additional biologically relevant and experimentally measurable constraints (including distances between neighboring genomic loci and higher-order interactions) and prove identifiability under these conditions. Furthermore, we provide semidefinite programming formulations for computing the 3D embedding of the DNA with these additional constraints and show that we can recover the true 3D embedding with high accuracy from both noiseless and noisy measurements. Finally, we apply our algorithm to real pairwise and higher-order contact frequency data and show that we can recover known genome organization patterns.Item Introduction to Persistent Homology(2022-05-31) Anttonen, Akseli; Kubjas, Kaie; Perustieteiden korkeakoulu; Kubjas, KaieItem Mathematical Modelling of Mental Rotation(2022-12-16) Koponen, Matias; Deny, Stéphane; Perustieteiden korkeakoulu; Kubjas, KaieItem Maximal number of subsets occurring as subsets(2021-11-11) Ruoppa, Lassi; Kohonen, Jukka; Perustieteiden korkeakoulu; Kubjas, KaieItem Maximum Likelihood Estimation of Symmetric Group-Based Models via Numerical Algebraic Geometry(Springer New York, 2019-02-15) Kosta, Dimitra; Kubjas, Kaie; University of Glasgow; Statistics and Mathematical Data Science; Department of Mathematics and Systems AnalysisPhylogenetic models admit polynomial parametrization maps in terms of the root distribution and transition probabilities along the edges of the phylogenetic tree. For symmetric continuous-time group-based models, Matsen studied the polynomial inequalities that characterize the joint probabilities in the image of these parametrizations (Matsen in IEEE/ACM Trans Comput Biol Bioinform 6:89–95, 2009). We employ this description for maximum likelihood estimation via numerical algebraic geometry. In particular, we explore an example where the maximum likelihood estimate does not exist, which would be difficult to discover without using algebraic methods.