Effect of different distance measures in result of cluster analysis

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Insinööritieteiden korkeakoulu | Master's thesis
Degree programme
Master’s Programme in Geomatics
The objective of this master’s thesis was to explore different distance measures that could be used in clustering and to evaluate how different distance measures in K-medoid clustering method would affect the clustering output. The different distance measures used in this research includes Euclidean, Squared Euclidean, Manhattan, Chebyshev and Mahalanobis distance. To achieve the research objective, K-medoid method with different distance measures was applied to a spatial dataset to explore relative information revealed by each distance measure. The effect of each distance measure on output is documented and the output was further compared with each other to reveal the differences between each distance measure. The study starts with literature review of cluster analysis process where necessary steps for performing cluster analysis are explained. In literature section, different clustering methods with particular characteristics of each method are described that would serve as basis for choice of clustering method. Data description and data analysis is included thereafter which is followed by interpretation of clustering result and its use for Terrain analysis. Terrain analysis has its significance in forest industry, military as well as crisis management and is usually concerned with off-road mobility of a vehicle or a group of vehicles between given locations. In case of terrain analysis, clustering could be used to group the similar areas and determine the off-road mobility of a particular vehicle. This result could be further categorized according to suitability of the item in the cluster and interpreted using expert evaluation in order to reveal useful information about mobility in a terrain. Cluster Validation measures were applied to output of clustering to determine the differences between different distance measures. The findings of this study indicate that in the study area, there exists some level of differences in the result of clustering when different distance measures are used. This difference is then interpreted with the help of input dataset and expert opinion to understand the effect of different distance measures in the dataset. Finally, the study provides basis for mobility analysis with help of clustering output.
Virrantaus, Kirsi-Kanerva
Thesis advisor
Nikander, Jussi
cluster analysis, distance measures, K-medoid clustering, terrain analysis
Other note