aalto1 untyped-item.component.html
Exploring the structure in deep networks: Group, manifold and category theory
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Department
Mcode
Language
en
Pages
77
Series
Abstract
Modern deep learning has achieved remarkable success in recent years, yet we lack a comprehensive understanding of why it performs well in some tasks while failing in others. This thesis develops a mathematical framework for understanding and designing neural networks through the lenses of group theory, differential geometry, and category theory.
We begin by analyzing the symmetry structure of parameter spaces. For a traditional deep learning structure: linear layers + non-linear activation + regularization, we prove that the linear part possesses maximal $\mathrm{GL}_n(\mathbb{R})$ symmetry. Nonlinear activations break this symmetry to proper subgroups; we analyze ReLU and sigmoid, for example. Then we study how regularization with different norms affects symmetry, especially Schatten-$p$ norms and entry-wise $\ell_p$ norms. This work connects the choice of activation/regularization and the geometry of representations we want to learn.
We then introduce Path Equivariant Networks (PENs), which generalize classical group equivariance from point-wise constraints $F(g \cdot x) = \rho(g) \cdot F(x)$ to path-wise constraints on manifolds. We prove that classical group equivariance arises as a special case under certain conditions. As an extension of this idea, we introduce content-pose decomposition, which factors the data manifolds into a symmetry-carrying pose (living in the group $G$) and a symmetry-free content (living in the quotient $U = X/G$).
Finally, we provide a categorical formalization where equivariant maps are natural transformations between functors. The naturality condition captures the essence of symmetry-preserving computation.
This work contributes to the theoretical foundation that the design of neural networks is fundamentally a choice of structures we want to retain in the data.