Multichannel speaker diarization with arbitrary microphone arrays

Loading...
Thumbnail Image

Access rights

openAccess
acceptedVersion

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

Major/Subject

Mcode

Degree programme

Language

en

Pages

Series

AES Europe 2023: 154th Audio Engineering Society Convention

Abstract

Speaker diarization remains a field with potential for improvement. In meeting scenarios, the task of labeling audio with the corresponding speaker identities, can be further assisted with the exploitation of spatial features. In the present work, a framework is designed, to evaluate the combination of speaker embeddings with Time Difference Of Arrival (TDOA) values. Speaker embeddings are extracted using two popular pre-trained models, ECAPA – TDNN and Xvectors. TDOA values for every speech segment are calculated using the Generalized Cross Correlation (GCC) method with phase transform (PHAT) weights (GCC – PHAT). The outputs of GCC – PHAT and deep neural network (DNN) systems are fused by concatenation and used as the input to spectral clustering. The objective of the proposed framework is to evaluate the potential of exploiting available microphone arrays in meetings and the investigation of complementary information between TDOA and speaker embeddings. The system is evaluated on two different datasets, the AVLab Speaker Localization and a multichannel dataset created in the context of the present work. Furthermore, an additional dataset using mobile phones embedded microphones is created and openly distributed to assist research groups to find solutions to complex problems such as speaker localization and diarization with arbitrary arrays comprising microphones of different characteristics and quality.

Description

Publisher Copyright: © 2023 AES Europe. All Rights Reserved.

Keywords

Other note

Citation

Xylogiannis, P, Vryzas, N, Bountourakis, V & Dimoulas, C 2023, Multichannel speaker diarization with arbitrary microphone arrays. in AES Europe 2023 : 154th Audio Engineering Society Convention. AES Europe 2023: 154th Audio Engineering Society Convention, Curran Associates Inc., Audio Engineering Society Convention, Espoo, Finland, 13/05/2023. < http://www.aes.org/e-lib/browse.cfm?elib=22114 >