Advances in the theory of nearest neighbor distributions
Loading...
Journal Title
Journal ISSN
Volume Title
Aalto-yliopiston teknillinen korkeakoulu |
Doctoral thesis (monograph)
Checking the digitized thesis and permission for publishing
Instructions for the author
Instructions for the author
Author
Date
2010
Major/Subject
Mcode
Degree programme
Language
en
Pages
Verkkokirja (1586 KB, 127 s.)
Series
TKK dissertations in information and computer science,
18
Abstract
A large part of non-parametric statistical techniques are in one way or another related to the geometric properties of random point sets. This connection is present both in the design of estimators and theoretical convergence studies. One such relation between geometry and probability occurs in the application of non-parametric techniques for computing information theoretic entropies: it has been shown that the moments of the nearest neighbor distance distributions for a set of independent identically distributed random variables are asymptotically characterized by the Rényi entropies of the underlying probability density. As entropy estimation is a problem of major importance, this connection motivates an extensive study of nearest neighbor distances and distributions. In this thesis, new results in the theory of nearest neighbor distributions are derived using both geometric and probabilistic proof techniques. The emphasis is on results that are useful for finite samples and not only in the asymptotic limit of an infinite sample. Previously, in the literature it has been shown that after imposing sufficient regularity assumptions, the moments of the nearest neighbor distances can be approximated by invoking a Taylor series argument providing the connection to the Rényi entropies. However, the theoretical results provide limited understanding to the nature of the error in the approximation. As a central result of the thesis, it is shown that if the random points take values in a compact set (e.g. according to the uniform distribution), then under sufficient regularity, a higher order moment expansion is possible. Asymptotically, the result completely characterizes the error for the original low order approximation. Instead of striving for exact computation of the moments through a Taylor series expansion, in some cases inequalities are more useful. In the thesis, it is shown that concrete upper and lower bounds can be established under general assumptions. In fact, the upper bounds rely only on a geometric analysis. The thesis also contains applications to two problems in nonparametric statistics, residual variance and Rényi entropy estimation. A well-established nearest neighbor entropy estimator is analyzed and it is shown that by taking the boundary effect into account, estimation bias can be significantly reduced. Secondly, the convergence properties of a recent residual variance estimator are analyzed.Description
Keywords
nearest neighbor, computational geometry, entropy, residual variance, correlation, nonparametric