Text mining with the WEBSOM
No Thumbnail Available
URL
Journal Title
Journal ISSN
Volume Title
Doctoral thesis (article-based)
Checking the digitized thesis and permission for publishing
Instructions for the author
Instructions for the author
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2000-12-11
Major/Subject
Mcode
Degree programme
Language
en
Pages
54, [81]
Series
Acta polytechnica Scandinavica. Ma, Mathematics and computing series, 110
Abstract
The emerging field of text mining applies methods from data mining and exploratory data analysis to analyzing text collections and to conveying information to the user in an intuitive manner. Visual, map-like displays provide a powerful and fast medium for portraying information about large collections of text. Relationships between text items and collections, such as similarity, clusters, gaps and outliers can be communicated naturally using spatial relationships, shading, and colors. In the WEBSOM method the self-organizing map (SOM) algorithm is used to automatically organize very large and high-dimensional collections of text documents onto two-dimensional map displays. The map forms a document landscape where similar documents appear close to each other at points of the regular map grid. The landscape can be labeled with automatically identified descriptive words that convey properties of each area and also act as landmarks during exploration. With the help of an HTML-based interactive tool the ordered landscape can be used in browsing the document collection and in performing searches on the map. An organized map offers an overview of an unknown document collection helping the user in familiarizing herself with the domain. Map displays that are already familiar can be used as visual frames of reference for conveying properties of unknown text items. Static, thematically arranged document landscapes provide meaningful backgrounds for dynamic visualizations of for example time-related properties of the data. Search results can be visualized in the context of related documents. Experiments on document collections of various sizes, text types, and languages show that the WEBSOM method is scalable and generally applicable. Preliminary results in a text retrieval experiment indicate that even when the additional value provided by the visualization is disregarded the document maps perform at least comparably with more conventional retrieval methods.Description
Keywords
self-organizing map, document maps, visual user interfaces, information exploration, text retrieval, large text collections
Other note
Parts
- Lagus, K., Kaski, S., Honkela, T., and Kohonen, T. (1996). Browsing digital libraries with the aid of self-organizing maps. Proceedings of the Fifth International World Wide Web Conference WWW5, May 6-10, Paris, France, pp. 71-79. [article1.pdf] © 1996 authors.
- Lagus, K., Honkela, T., Kaski, S., and Kohonen, T. (1996). Self-organizing maps of document collections: a new approach to interactive exploration. In Simoudis, E., Han, J., and Fayyad, U., editors, Proceedings of the Second International Conference on Knowledge Discovery & Data Mining (KDD'96), pp. 238-243. AAAI Press, Menlo Park, CA. [article2.pdf] © 1996 AAAI. Reprinted with permission.
- Lagus, K. (1998) Generalizability of the WEBSOM method to document collections of various types. In Proceedings of 6th European Congress on Intelligent Techniques & Soft Computing (EUFIT'98), vol. 1, pp. 210-214, Verlag Mainz, Aachen, Germany. [article3.pdf] © 1998 authors.
- Kaski, S., Honkela, T., Lagus, K., and Kohonen, T. (1998). WEBSOM – self-organizing maps of document collections. Neurocomputing, vol. 21, pp. 101-117. [article4.pdf] © 1998 Elsevier Science. Reprinted with permission.
- Lagus, K. and Kaski, S. (1999) Keyword selection method for characterizing text document maps. In Proceedings of the Ninth International Conference on Artificial Neural Networks (ICANN'99), vol. 1, pp. 371-376. IEE Press, London. [article5.pdf] © 1999 IEE. Reprinted with permission.
- Lagus, K., Honkela, T., Kaski, S., and Kohonen, T. (1999). WEBSOM for textual data mining. Artificial Intelligence Review, vol. 13, issue 5/6, pp. 345-364. [article6.pdf] © 1999 Kluwer Academic Publishers. Reprinted with permission.
- Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., and Saarela, A. (2000). Self organization of a massive text document collection. IEEE Transactions on Neural Networks, Special Issue on Neural Networks for Data Mining and Knowledge Discovery, vol. 11, pp. 574-585. [article7.pdf] © 2000 IEEE. Reprinted with permission. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
- Lagus, K. (2000). Text retrieval using self-organized document maps. Technical Report A61, Helsinki University of Technology, Laboratory of Computer and Information Science. ISBN 951-22-5145-0. [article8.pdf] © 2000 author.