Learning Centre

Multiple hypothesis testing in pattern discovery

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Hanhijärvi, Sami
dc.contributor.author Puolamäki, Kai
dc.contributor.author Garriga, Gemma C.
dc.date.accessioned 2011-11-28T13:24:13Z
dc.date.available 2011-11-28T13:24:13Z
dc.date.issued 2009
dc.identifier.isbn 978-952-248-181-8
dc.identifier.issn 1797-5042
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/901
dc.description.abstract The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used with a generic data mining algorithm. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive) in the strong sense. We evaluate the performance of our solution on both real and generated data. The results show that our method controls the FWER while maintaining the power of the test. en
dc.format.extent 31
dc.format.mimetype application/pdf
dc.language.iso en en
dc.publisher Helsinki University of Technology en
dc.publisher Teknillinen korkeakoulu fi
dc.relation.ispartofseries TKK reports in information and computer science en
dc.relation.ispartofseries 21 en
dc.subject.other Computer science en
dc.title Multiple hypothesis testing in pattern discovery en
dc.type D4 Julkaistu kehittämis- tai tutkimusraportti taikka -selvitys fi
dc.contributor.school Faculty of Information and Natural Sciences en
dc.contributor.school Informaatio- ja luonnontieteiden tiedekunta fi
dc.contributor.department Department of Information and Computer Science en
dc.contributor.department Tietojenkäsittelytieteen laitos fi
dc.subject.keyword multiple hypothesis testing en
dc.subject.keyword randomization en
dc.subject.keyword empirical p-values en
dc.subject.keyword frequent itemsets en
dc.subject.keyword pattern mining en
dc.identifier.urn urn:nbn:fi:tkk-013062
dc.type.dcmitype text en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse