Analyzing the effect of human alignment using Bilinear Layer-wise relevance propagation
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Department
Major/Subject
Mcode
Language
en
Pages
55
Series
Abstract
Aligning deep neural networks with human perception is increasingly recognized as important in producing AI systems that are accurate and transparent. In this work, we use BiLRP, an extension of Layer-wise Relevance Propagation (LRP) that systematically decomposes similarity scores across neural network layers. Unlike standard LRP, BiLRP focuses on pairwise comparisons, highlighting precisely which input features on both sides contribute to the notion of similarity in a model. Through a series of experiments, we uncover notable differences between unaligned models and human-aligned models. For example, while unaligned models tend to rely on narrow cues such as local textures, applying an alignment layer (for example, on a ResNet50 model) encourages the network to incorporate broader contextual cues. In one representative case, we observe that alignment often increases the relevance attributed to contextual (background) features (by roughly 5–10% in our tests), illustrating how object–background relationships can be reshaped to better match human intuition. Overall, our findings with BiLRP provide insight into how model representations evolve from purely data-driven to more human-like interpretations. By systematically analyzing how alignment alters both model behavior and internal representations, this work provides a detailed evaluation of the alignment process using BiLRP. It also highlights the trade-offs that arise between performance, interpretability, and context awareness.Description
Supervisor
Zhou, QuanThesis advisor
Müller, Klaus-RobertMontavon, Grégoire