aalto1 untyped-item.component.html

Limits of audio-visual congruence using 3D videos in VR

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Sähkötekniikan korkeakoulu | Master's thesis

Department

Mcode

ELEC3030

Language

en

Pages

51+14

Series

Abstract

In AR and VR experiences, virtual acoustic rendering is often accompanied by visual rendering. In such situations, aural and visual presentation should give the same impression of a sound source in terms of localization, i.e., they should be perceived as congruent. This congruence is directly connected to the extensively investigated ventriloquism effect. However, not many studies on ventriloquism are insightful in the context of audio-visual congruence, due to slight differences between the needed listening test paradigms. Additionally, both topics have rarely been investigated in VR/AR before. This thesis presents a new listening test, which determines the perceived audiovisual congruence of audio-visual stimuli in VR and relates these results to the participant’s localization performance. This listening test compares groups of experienced vs. inexperienced listeners, a visual rendering of a 3D loudspeaker model vs. a synchronized human avatar as well as loudspeaker playback vs. non-individualized headphone rendering. All conditions are tested with both horizontal and vertical offsets between audio and visual rendering. After identifying the need for audio-visual speech stimuli suitable for this test, a new, high quality audio-visual speech corpus was recorded as part of this thesis. For horizontal offsets, results show that the avatar rendering increased perceived congruence significantly. Experienced listeners were stricter, but only when the loudspeaker model was shown. Moreover, a correlation between localization precision and perceived congruence was found for the human avatar rendering. For vertical offsets, the angular range of congruence was generally large, and localization errors high. This thesis contributes audio-visual congruence ranges and localization errors in VR. Findings of the presented study have implications for example for AR telepresence applications. Furthermore, due to the diversity of its content, the newly recorded audio-visual speech corpus might be useful in future listening experiments.

Description

Supervisor

Lokki, Tapio

Thesis advisor

Meyer-Kahlen, Nils

Other note

Citation

Endorsement

Review

Supplemented By

Referenced By