From insights to innovations : data mining, visualization, and user interfaces

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Doctoral thesis (article-based)
Checking the digitized thesis and permission for publishing
Instructions for the author
Degree programme
76, [78]
Dissertations in computer and information science. Report D, 5
This thesis is about data mining (DM) and visualization methods for gaining insight into multidimensional data. Novel, exploratory data analysis tools and adaptive user interfaces are developed by tailoring and combining existing DM and visualization methods in order to advance in different applications. The thesis presents new visual data mining (VDM) methods that are also implemented in software toolboxes and applied to industrial and biomedical signals: First, we propose a method that has been applied to investigating industrial process data. The self-organizing map (SOM) is combined with scatterplots using the traditional color linking or interactive brushing. The original contribution is to apply color linked or brushed scatterplots and the SOM to visually survey local dependencies between a pair of attributes in different parts of the SOM. Clusters can be visualized on a SOM with different colors, and we also present how a color coding can be automatically obtained by using a proximity preserving projection of the SOM model vectors. Second, we present a new method for an (interactive) visualization of cluster structures in a SOM. By using a contraction model, the regular grid of a SOM visualization is smoothly changed toward a presentation that shows better the proximities in the data space. Third, we propose a novel VDM method for investigating the reliability of estimates resulting from a stochastic independent component analysis (ICA) algorithm. The method can be extended also to other problems of similar kind. As a benchmarking task, we rank independent components estimated on a biomedical data set recorded from the brain and gain a reasonable result. We also utilize DM and visualization for mobile-awareness and personalization. We explore how to infer information about the usage context from features that are derived from sensory signals. The signals originate from a mobile phone with on-board sensors for ambient physical conditions. In previous studies, the signals are transformed into descriptive (fuzzy or binary) context features. In this thesis, we present how the features can be transformed into higher-level patterns, contexts, by rather simple statistical methods: we propose and test using minimum-variance cost time series segmentation, ICA, and principal component analysis (PCA) for this purpose. Both time-series segmentation and PCA revealed meaningful contexts from the features in a visual data exploration. We also present a novel type of adaptive soft keyboard where the aim is to obtain an ergonomically better, more comfortable keyboard. The method starts from some conventional keypad layout, but it gradually shifts the keys into new positions according to the user's grasp and typing pattern. Related to the applications, we present two algorithms that can be used in a general context: First, we describe a binary mixing model for independent binary sources. The model resembles the ordinary ICA model, but the summation is replaced by the Boolean operator OR and the multiplication by AND. We propose a new, heuristic method for estimating the binary mixing matrix and analyze its performance experimentally. The method works for signals that are sparse enough. We also discuss differences on the results when using different objective functions in the FastICA estimation algorithm. Second, we propose "global iterative replacement" (GIR), a novel, greedy variant of a merge-split segmentation method. Its performance compares favorably to that of the traditional top-down binary split segmentation algorithm.
adaptive user interface, adaptive keyboard, context-awareness, data mining, independent component analysis, information visualization, mobile-awareness, proximity preserving projection, self-organizing map, sensor fusion, time series segmentation, visual data mining
Other note
  • Johan Himberg, Jussi Ahola, Esa Alhoniemi, Juha Vesanto and Olli Simula, 2001. The self-organizing map as a tool in knowledge engineering. In: Nikhil R. Pal (editor), Pattern Recognition in Soft Computing Paradigm, volume 2 in FLSI Soft Computing Series, pages 38-65. [article1.pdf] © 2001 World Scientific Publishing Company. By permission.
  • Johan Himberg, 1998. Enhancing SOM-based data visualization by linking different data projections. In: Proceedings of the First International Symposium on Intelligent Data Engineering and Learning (IDEAL'98). Hong Kong, 14-16 October 1998, pages 427-434. [article2.pdf] © 1998 Springer-Verlag. By permission.
  • Johan Himberg, 2000. A SOM based cluster visualization and its application for false coloring. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN 2000). Como, Italy, 24-27 July 2000, volume 3, pages 587-592. [article3.pdf] © 2000 IEEE. By permission.
  • Johan Himberg and Aapo Hyvärinen, 2003. Icasso: software for investigating the reliability of ICA estimates by clustering and visualization. In: Proceedings of the 13th IEEE International Workshop on Neural Networks for Signal Processing (NNSP 2003). Toulouse, France, 17-19 September 2003, pages 259-268. [article4.pdf] © 2003 IEEE. By permission.
  • Johan Himberg, Jani Mäntyjärvi and Panu Korpipää, 2001. Using PCA and ICA for exploratory data analysis in situation awareness. In: Proceedings of the 2001 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2001). Baden-Baden, Germany, 20-22 August 2001, pages 127-131. [article5.pdf] © 2001 IEEE. By permission.
  • Johan Himberg and Aapo Hyvärinen, 2001. Independent component analysis for binary data: an experimental study. In: Proceedings of the 3rd International Conference on Independent Component Analysis and Blind Signal Separation (ICA 2001). San Diego, California, USA, 9-12 December 2001, pages 552-556. [article6.pdf] © 2001 by authors.
  • Johan Himberg, Kalle Korpiaho, Heikki Mannila, Johanna Tikanmäki and Hannu T. T. Toivonen, 2001. Time series segmentation for context recognition in mobile devices. In: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM 2001). San Jose, California, USA, 29 November - 2 December, 2001, pages 203-210. [article7.pdf] © 2001 IEEE. By permission.
  • Johan Himberg, Jonna Häkkilä, Jani Mäntyjärvi and Petri Kangas, 2003. On-line personalization of a touch screen based keyboard. In: Proceedings of the 8th International Conference on Intelligent User Interfaces (IUI 2003). Miami, Florida, USA, 12-15 January 2003, pages 77-84.
Permanent link to this item