Multichannel speaker diarization with arbitrary microphone arrays
Loading...
Access rights
openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
Series
AES Europe 2023: 154th Audio Engineering Society Convention
Abstract
Speaker diarization remains a field with potential for improvement. In meeting scenarios, the task of labeling audio with the corresponding speaker identities, can be further assisted with the exploitation of spatial features. In the present work, a framework is designed, to evaluate the combination of speaker embeddings with Time Difference Of Arrival (TDOA) values. Speaker embeddings are extracted using two popular pre-trained models, ECAPA – TDNN and Xvectors. TDOA values for every speech segment are calculated using the Generalized Cross Correlation (GCC) method with phase transform (PHAT) weights (GCC – PHAT). The outputs of GCC – PHAT and deep neural network (DNN) systems are fused by concatenation and used as the input to spectral clustering. The objective of the proposed framework is to evaluate the potential of exploiting available microphone arrays in meetings and the investigation of complementary information between TDOA and speaker embeddings. The system is evaluated on two different datasets, the AVLab Speaker Localization and a multichannel dataset created in the context of the present work. Furthermore, an additional dataset using mobile phones embedded microphones is created and openly distributed to assist research groups to find solutions to complex problems such as speaker localization and diarization with arbitrary arrays comprising microphones of different characteristics and quality.Description
Publisher Copyright: © 2023 AES Europe. All Rights Reserved.
Keywords
Other note
Citation
Xylogiannis, P, Vryzas, N, Bountourakis, V & Dimoulas, C 2023, Multichannel speaker diarization with arbitrary microphone arrays. in AES Europe 2023 : 154th Audio Engineering Society Convention. AES Europe 2023: 154th Audio Engineering Society Convention, Curran Associates Inc., Audio Engineering Society Convention, Espoo, Finland, 13/05/2023. < http://www.aes.org/e-lib/browse.cfm?elib=22114 >