Enabling technologies for audio augmented reality systems

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2014-05-02
Checking the digitized thesis and permission for publishing
Instructions for the author
Degree programme
113 + app. 59
Aalto University publication series DOCTORAL DISSERTATIONS, 39/2014
Audio augmented reality (AAR) refers to technology that embeds computer-generated auditory content into a user's real acoustic environment. An AAR system has specific requirements that set it apart from regular human--computer interfaces: an audio playback system to allow the simultaneous perception of real and virtual sounds; motion tracking to enable interactivity and location-awareness; the design and implementation of auditory display to deliver AAR content; and spatial rendering to display spatialised AAR content. This thesis presents a series of studies on enabling technologies to meet these requirements. A binaural headset with integrated microphones is assumed as the audio playback system, as it allows mobility and precise control over the ear input signals. Here, user position and orientation tracking methods are proposed that rely on speech signals recorded at the binaural headset microphones. To evaluate the proposed methods, the head orientations and positions of three conferees engaged in a discussion were tracked. The binaural microphones improved tracking performance substantially. The proposed methods are applicable to acoustic tracking with other forms of user-worn microphones. Results from a listening test investigating the effect of auditory display parameters on user performance are reported. The parameters studied were derived from the design choices to be made when implementing auditory display. The results indicate that users are able to detect a sound sample among distractors and estimate sample numerosity accurately with both speech and non-speech audio, if the samples are presented with adequate temporal separation. Whether or not samples were separated spatially had no effect on user performance. However, with spatially separated samples, users were able to detect a sample among distractors and simultaneously localise it. The results of this study are applicable to a variety of AAR applications that require conveying sample presence or numerosity. Spatial rendering is commonly implemented by convolving virtual sounds with head-related transfer functions (HRTFs). Here, a framework is proposed that interpolates HRTFs measured at arbitrary directions and distances. The framework employs Delaunay triangulation to group HRTFs into subsets suitable for interpolation and barycentric coordinates as interpolation weights. The proposed interpolation framework allows the realtime rendering of virtual sources in the near-field via HRTFs measured at various distances.
Supervising professor
Savioja, Lauri, Prof., Aalto University, Department of Media Technology, Finland
Thesis advisor
Lokki, Tapio, Assoc. prof., Aalto University, Department of Media Technology, Finland
Puolamäki, Kai, Dr., Aalto University, Department of Information and Computer Science, Finland
audio augmented reality, acoustic tracking, auditory display, HRTF interpolation
Other note
  • [Publication 1]: H. Gamper, S. Tervo and T. Lokki. Head orientation tracking using binaural headset microphones. In Proc. Int. Conv. Audio Engineering Society, New York, NY, USA, paper number 8538, October 2011.
  • [Publication 2]: H. Gamper, S. Tervo and T. Lokki. Speaker tracking for teleconferencing via binaural headset microphones. In Proc. Int. Workshop on Acoustic Signal Enhancement (IWAENC), Aachen, Germany, 4 pages (online proceedings), September 2012.
  • [Publication 3]: H. Gamper, C. Dicke, M. Billinghurst and K. Puolamäki. Sound sam- ple detection and numerosity estimation using auditory display. ACM Transactions on Applied Perception, Vol. 10(1), pages 1–18,
    DOI: http: //dx.doi.org/10.1145/2422105.2422109, February 2013. View at publisher
  • [Publication 4]: H. Gamper. Selection and interpolation of head-related transfer functions for rendering moving virtual sound sources. In Proc. Int. Conf. Digital Audio Effects (DAFx), Maynooth, Ireland, 7 pages (online proceedings), September 2013.
  • [Publication 5]: H. Gamper. Head-related transfer function interpolation in azimuth, elevation, and distance. J. Acoust. Soc. America, 134(6), pages EL547– EL554, December 2013.