Data-Efficient Vision Transformer for modeling forest biomass and tree structures from hyperspectral imagery

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorAnwer, Rao Muhammad
dc.contributor.advisorNaseer, Muzammal
dc.contributor.authorBin Shafaat, Ahmed
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorLaaksonen, Jorma
dc.date.accessioned2021-10-24T17:10:15Z
dc.date.available2021-10-24T17:10:15Z
dc.date.issued2021-10-18
dc.description.abstractHyperspectral and multispectral imaging technologies for remote sensing have been enjoying an enormous deal of fame in the modern technology era, owing to the advantage of holding rich geographical information, in comparison with RGB or greyscale imagining technologies. The remotely sensed data have been employed in various tasks, such as monitoring Earth’s surface, environmental risk analysis, and forest monitoring and modeling, etc. However, collecting, processing, and utilizing this data for predictive modeling remains an arduous task in modern-day machine learning due to the factors such as the complex and imbalanced nature of HSI, variable nature of spectral and spatial features, and abundant noise in spectral channels. In the AIROBEST project, a data processing scheme and a novel convolutional neural network have been proposed for predicting several forest variables. Despite the resultant fairly accurate results, there was a need felt to improve the classification results. This thesis addresses this challenge by accentuating the use of the Vision Transformer (ViT) for modeling forest biomass and tree structures. We have adopted the distillation training technique along with several data augmentation techniques to generalize the ViT well on unseen data. We have carried out several experiments to study the effect of various image and patch size combinations and augmentation techniques on the model’s performance. Furthermore, the resultant accuracies were recorded and compared with the benchmark results from AIROBEST. Altogether, the baseline Vision Transformer yields a good overall accuracy of 87.38% and a mean accuracy of 73.35%. We have achieved a higher overall accuracy by a margin of 3.9% and mean accuracy by 1.53% than benchmark results. Lastly, the thesis discusses the drawbacks of the work and provides suggestions to potentially overcome the shortcomings and improve the current results.en
dc.format.extent72
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/110589
dc.identifier.urnURN:NBN:fi:aalto-202110249767
dc.language.isoenen
dc.programmeMaster’s Programme in Computer, Communication and Information Sciencesen
dc.programme.majorMachine Learning, Data Science and Artificial Intelligenceen
dc.programme.mcodeSCI3044fi
dc.subject.keywordMachine Learningen
dc.subject.keywordremote sensingen
dc.subject.keywordhyperspectral imageen
dc.subject.keywordattentionen
dc.subject.keywordvision transformeren
dc.subject.keywordneural networken
dc.titleData-Efficient Vision Transformer for modeling forest biomass and tree structures from hyperspectral imageryen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessno

Files