Extending data mining techniques for frequent pattern discovery trees, low-entropy sets, and crossmining

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.advisor Mannila, Heikki, Prof.
dc.contributor.author Heikinheimo, Hannes
dc.date.accessioned 2012-08-24T07:42:15Z
dc.date.available 2012-08-24T07:42:15Z
dc.date.issued 2010
dc.identifier.isbn 978-952-60-3004-3 (electronic)
dc.identifier.isbn 978-952-60-3003-6 (printed) #8195;
dc.identifier.issn 1797-5069
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/4738
dc.description.abstract The idea of frequent pattern discovery is to find frequently occurring events in large databases. Such data mining techniques can be useful in various domains. For instance, in recommendation and e-commerce systems frequently occurring product purchase combinations are essential in user preference modeling. In the ecological domain, patterns of frequently occurring groups of species can be used to reveal insight into species interaction dynamics. Over the past few years, most frequent pattern mining research has concentrated on efficiency (speed) of mining algorithms. However, it has been argued within the community that while efficiency of the mining task is no longer a bottleneck, there is still an urgent need for methods that derive compact, yet high quality results with good application properties. The aim of this thesis is to address this need. The first part of the thesis discusses a new type of tree pattern class for expressing hierarchies of general and more specific attributes in unstructured binary data. The new pattern class is shown to have advantageous properties, and to discover relationships in data that cannot be expressed alone with the more traditional frequent itemset or association rule patterns. The second and third parts of the thesis discuss the use of entropy as a score measure for frequent pattern mining. A new pattern class is defined, low-entropy sets, which allow to express more general types of occurrence structure than with frequent itemsets. The concept can also be easily applied to tree types of pattern. Furthermore, by applying minimum description length in pattern selection for low-entropy sets it is shown experimentally that in most cases the collections of selected patterns are much smaller than by using frequent itemsets. The fourth part of the thesis examines the idea of crossmining itemsets, that is, relating itemsets to numerical variables in a database of mixed data types. The problem is formally defined and turns out to be NP-hard, although it is approximately solvable within a constant-factor of the optimum solution. Experiments show that the algorithm finds itemsets that convey structure in both the binary and the numerical part of the data. en
dc.format.extent Verkkokirja (1837 KB, 103 s.)
dc.format.mimetype application/pdf
dc.language.iso en en
dc.publisher Aalto-yliopiston teknillinen korkeakoulu en
dc.relation.ispartofseries TKK dissertations in information and computer science, 15 en
dc.subject.other Computer science
dc.title Extending data mining techniques for frequent pattern discovery trees, low-entropy sets, and crossmining en
dc.type G4 Monografiaväitöskirja fi
dc.contributor.school Aalto-yliopiston teknillinen korkeakoulu fi
dc.contributor.department Tietojenkäsittelytieteen laitos fi
dc.contributor.department Department of Information and Computer Science en
dc.subject.keyword data analysis en
dc.subject.keyword frequent patterns en
dc.subject.keyword trees en
dc.subject.keyword entropy en
dc.subject.keyword minimum description length en
dc.subject.keyword pattern selection en
dc.subject.keyword clustering en
dc.subject.keyword mining mixed data types en
dc.identifier.urn URN:ISBN:978-952-60-3004-3
dc.type.dcmitype text en
dc.type.ontasot Väitöskirja (monografia) fi
dc.type.ontasot Doctoral dissertation (monograph) en
dc.contributor.supervisor Mannila, Heikki, Prof.


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

My Account