Deriving a Rule Set from a Large Set of Data
No Thumbnail Available
URL
Journal Title
Journal ISSN
Volume Title
Helsinki University of Technology |
Diplomityö
Checking the digitized thesis and permission for publishing
Instructions for the author
Instructions for the author
Authors
Date
2006
Major/Subject
Informaatiotekniikka
Mcode
T-115
Degree programme
Language
en
Pages
37
Series
Abstract
The acquisition of correct data is of great importance for all data mining tasks. Data errors in product data can be very costly for a company and improving the data quality is therefore of high importance. By making the acquisition process more efficient a possible bottleneck in the product management can also be removed. In this work methods for finding rules and correlations from the data are presented. Special emphasis is placed on methods capable of handling large amounts of data and on pre processing the data to make it more easily handled. Clustering is used to divide the data into smaller data sets which can be handled more efficiently than the whole data. This also makes it possible to better find local patterns in the data. The clustering is implemented using self-organizing maps. To find rules in the data set both correlation analysis and association rules are used. Both methods can be used both globally on the whole data set and locally on the data clusters. The methods presented are then applied to a product data set provided by Nokia Networks. Here the goal is to predict data needed for an Enterprise Resource Planning system using data from a Product Data Management system.Description
Supervisor
Simula, OlliThesis advisor
Silvola, RistoKeywords
self-organizing map, itseorganisoiva kartta, själv-organiserande karta, clustering, klusterointi, kluster, association rules, assosiaatio, associations regler, product data, säännöt, produkt data, tuotedata