Deriving a Rule Set from a Large Set of Data

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

Helsinki University of Technology | Diplomityö
Checking the digitized thesis and permission for publishing
Instructions for the author

Date

2006

Major/Subject

Informaatiotekniikka

Mcode

T-115

Degree programme

Language

en

Pages

37

Series

Abstract

The acquisition of correct data is of great importance for all data mining tasks. Data errors in product data can be very costly for a company and improving the data quality is therefore of high importance. By making the acquisition process more efficient a possible bottleneck in the product management can also be removed. In this work methods for finding rules and correlations from the data are presented. Special emphasis is placed on methods capable of handling large amounts of data and on pre processing the data to make it more easily handled. Clustering is used to divide the data into smaller data sets which can be handled more efficiently than the whole data. This also makes it possible to better find local patterns in the data. The clustering is implemented using self-organizing maps. To find rules in the data set both correlation analysis and association rules are used. Both methods can be used both globally on the whole data set and locally on the data clusters. The methods presented are then applied to a product data set provided by Nokia Networks. Here the goal is to predict data needed for an Enterprise Resource Planning system using data from a Product Data Management system.

Description

Supervisor

Simula, Olli

Thesis advisor

Silvola, Risto

Keywords

self-organizing map, itseorganisoiva kartta, själv-organiserande karta, clustering, klusterointi, kluster, association rules, assosiaatio, associations regler, product data, säännöt, produkt data, tuotedata

Other note

Citation