Analyzing the effect of human alignment using Bilinear Layer-wise relevance propagation

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Electrical Engineering | Master's thesis

Department

Major/Subject

Mcode

Language

en

Pages

55

Series

Abstract

Aligning deep neural networks with human perception is increasingly recognized as important in producing AI systems that are accurate and transparent. In this work, we use BiLRP, an extension of Layer-wise Relevance Propagation (LRP) that systematically decomposes similarity scores across neural network layers. Unlike standard LRP, BiLRP focuses on pairwise comparisons, highlighting precisely which input features on both sides contribute to the notion of similarity in a model. Through a series of experiments, we uncover notable differences between unaligned models and human-aligned models. For example, while unaligned models tend to rely on narrow cues such as local textures, applying an alignment layer (for example, on a ResNet50 model) encourages the network to incorporate broader contextual cues. In one representative case, we observe that alignment often increases the relevance attributed to contextual (background) features (by roughly 5–10% in our tests), illustrating how object–background relationships can be reshaped to better match human intuition. Overall, our findings with BiLRP provide insight into how model representations evolve from purely data-driven to more human-like interpretations. By systematically analyzing how alignment alters both model behavior and internal representations, this work provides a detailed evaluation of the alignment process using BiLRP. It also highlights the trade-offs that arise between performance, interpretability, and context awareness.

Description

Supervisor

Zhou, Quan

Thesis advisor

Müller, Klaus-Robert
Montavon, Grégoire

Other note

Citation