Browsing by Author "Kurimo, Mikko, Prof., Aalto University, Department of Signal Processing and Acoustics, Finland"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
Item Emergence of representations in natural data(Aalto University, 2017) Väyrynen, Jaakko; Creutz, Mathias, Dr., University of Helsinki, Finland; Signaalinkäsittelyn ja akustiikan laitos; Department of Signal Processing and Acoustics; Sähkötekniikan korkeakoulu; School of Electrical Engineering; Kurimo, Mikko, Prof., Aalto University, Department of Signal Processing and Acoustics, FinlandThis dissertation models natural image and language data with data-driven methods with focus in the interpretation of the emergent representation. Cognitive development and processing learns to handle input from the surrounding environment. Similarly, data-driven methods offer a flexible way to find exploratory views of the data. Independent Component Analysis (ICA) is a proven unsupervised method especially in the field of neural signal processing. It can extract cognitively relevant source signals from seemingly garbled signal mixtures with the assumption of statistical independence. The concept is closely related to sparse coding, which is neurobiologically efficient and is a view of how sensory information is processed in the brain. In the analysis of small video segments, another statistical concept, temporal coherence, is applied and the results are compared to those of ICA. The representations learned share major characteristics with those measured from the early processing in the visual cortex. A unified model which combines sparseness, temporal coherence and topological organization is introduced. With similar methodological tools, the focus is shifted to natural language data with only minimal preprocessing in order to create language-independent methods. The meaning of words can be modeled with contextual co-occurrence information collected from a large corpus and vector space models. In contrast to classical methods utilizing second-order statistics, the ICA method can reveal the underlying sparse structure and make the representation more interpretable. In addition to validating the applied unsupervised methodology, the experimental results indicate that the parametrization of the data has a very large effect on the representation learned. With the developed analysis tools, the structure learned is matched to syntactic and semantic features at different levels. For translated sentence pairs, the result is a multilingual representation for words. The increased sparsity of the representations learned is validated by further nonlinear thresholding. The findings can be utilized to build distributional models for words which match better with semantic theories of word classes and relationships among word meanings in natural language processing tasks where more interpretability is desired.