Data-Efficient Vision Transformer for modeling forest biomass and tree structures from hyperspectral imagery
No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Author
Date
2021-10-18
Department
Major/Subject
Machine Learning, Data Science and Artificial Intelligence
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
72
Series
Abstract
Hyperspectral and multispectral imaging technologies for remote sensing have been enjoying an enormous deal of fame in the modern technology era, owing to the advantage of holding rich geographical information, in comparison with RGB or greyscale imagining technologies. The remotely sensed data have been employed in various tasks, such as monitoring Earth’s surface, environmental risk analysis, and forest monitoring and modeling, etc. However, collecting, processing, and utilizing this data for predictive modeling remains an arduous task in modern-day machine learning due to the factors such as the complex and imbalanced nature of HSI, variable nature of spectral and spatial features, and abundant noise in spectral channels. In the AIROBEST project, a data processing scheme and a novel convolutional neural network have been proposed for predicting several forest variables. Despite the resultant fairly accurate results, there was a need felt to improve the classification results. This thesis addresses this challenge by accentuating the use of the Vision Transformer (ViT) for modeling forest biomass and tree structures. We have adopted the distillation training technique along with several data augmentation techniques to generalize the ViT well on unseen data. We have carried out several experiments to study the effect of various image and patch size combinations and augmentation techniques on the model’s performance. Furthermore, the resultant accuracies were recorded and compared with the benchmark results from AIROBEST. Altogether, the baseline Vision Transformer yields a good overall accuracy of 87.38% and a mean accuracy of 73.35%. We have achieved a higher overall accuracy by a margin of 3.9% and mean accuracy by 1.53% than benchmark results. Lastly, the thesis discusses the drawbacks of the work and provides suggestions to potentially overcome the shortcomings and improve the current results.Description
Supervisor
Laaksonen, JormaThesis advisor
Anwer, Rao MuhammadNaseer, Muzammal
Keywords
Machine Learning, remote sensing, hyperspectral image, attention, vision transformer, neural network