Text mining with the WEBSOM

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

Doctoral thesis (article-based)
Checking the digitized thesis and permission for publishing
Instructions for the author

Date

2000-12-11

Major/Subject

Mcode

Degree programme

Language

en

Pages

54, [81]

Series

Acta polytechnica Scandinavica. Ma, Mathematics and computing series, 110

Abstract

The emerging field of text mining applies methods from data mining and exploratory data analysis to analyzing text collections and to conveying information to the user in an intuitive manner. Visual, map-like displays provide a powerful and fast medium for portraying information about large collections of text. Relationships between text items and collections, such as similarity, clusters, gaps and outliers can be communicated naturally using spatial relationships, shading, and colors. In the WEBSOM method the self-organizing map (SOM) algorithm is used to automatically organize very large and high-dimensional collections of text documents onto two-dimensional map displays. The map forms a document landscape where similar documents appear close to each other at points of the regular map grid. The landscape can be labeled with automatically identified descriptive words that convey properties of each area and also act as landmarks during exploration. With the help of an HTML-based interactive tool the ordered landscape can be used in browsing the document collection and in performing searches on the map. An organized map offers an overview of an unknown document collection helping the user in familiarizing herself with the domain. Map displays that are already familiar can be used as visual frames of reference for conveying properties of unknown text items. Static, thematically arranged document landscapes provide meaningful backgrounds for dynamic visualizations of for example time-related properties of the data. Search results can be visualized in the context of related documents. Experiments on document collections of various sizes, text types, and languages show that the WEBSOM method is scalable and generally applicable. Preliminary results in a text retrieval experiment indicate that even when the additional value provided by the visualization is disregarded the document maps perform at least comparably with more conventional retrieval methods.

Description

Keywords

self-organizing map, document maps, visual user interfaces, information exploration, text retrieval, large text collections

Other note

Parts

  • Lagus, K., Kaski, S., Honkela, T., and Kohonen, T. (1996). Browsing digital libraries with the aid of self-organizing maps. Proceedings of the Fifth International World Wide Web Conference WWW5, May 6-10, Paris, France, pp. 71-79. [article1.pdf] © 1996 authors.
  • Lagus, K., Honkela, T., Kaski, S., and Kohonen, T. (1996). Self-organizing maps of document collections: a new approach to interactive exploration. In Simoudis, E., Han, J., and Fayyad, U., editors, Proceedings of the Second International Conference on Knowledge Discovery & Data Mining (KDD'96), pp. 238-243. AAAI Press, Menlo Park, CA. [article2.pdf] © 1996 AAAI. Reprinted with permission.
  • Lagus, K. (1998) Generalizability of the WEBSOM method to document collections of various types. In Proceedings of 6th European Congress on Intelligent Techniques & Soft Computing (EUFIT'98), vol. 1, pp. 210-214, Verlag Mainz, Aachen, Germany. [article3.pdf] © 1998 authors.
  • Kaski, S., Honkela, T., Lagus, K., and Kohonen, T. (1998). WEBSOM – self-organizing maps of document collections. Neurocomputing, vol. 21, pp. 101-117. [article4.pdf] © 1998 Elsevier Science. Reprinted with permission.
  • Lagus, K. and Kaski, S. (1999) Keyword selection method for characterizing text document maps. In Proceedings of the Ninth International Conference on Artificial Neural Networks (ICANN'99), vol. 1, pp. 371-376. IEE Press, London. [article5.pdf] © 1999 IEE. Reprinted with permission.
  • Lagus, K., Honkela, T., Kaski, S., and Kohonen, T. (1999). WEBSOM for textual data mining. Artificial Intelligence Review, vol. 13, issue 5/6, pp. 345-364. [article6.pdf] © 1999 Kluwer Academic Publishers. Reprinted with permission.
  • Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., and Saarela, A. (2000). Self organization of a massive text document collection. IEEE Transactions on Neural Networks, Special Issue on Neural Networks for Data Mining and Knowledge Discovery, vol. 11, pp. 574-585. [article7.pdf] © 2000 IEEE. Reprinted with permission. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
  • Lagus, K. (2000). Text retrieval using self-organized document maps. Technical Report A61, Helsinki University of Technology, Laboratory of Computer and Information Science. ISBN 951-22-5145-0. [article8.pdf] © 2000 author.

Citation

Permanent link to this item

https://urn.fi/urn:nbn:fi:tkk-002573