Spherediar: An Effective Speaker Diarization System for Meeting Data

No Thumbnail Available

Access rights

openAccess

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

2019-12-01

Major/Subject

Mcode

Degree programme

Language

en

Pages

8
373-380

Series

2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

Abstract

In this paper, we present SphereDiar, a speaker diarization system composed of three novel subsystems: The Sphere-Speaker (SS) neural network, designed for speaker embedding extraction, a segmentation method called Homogeneity Based Segmentation (HBS) and a clustering algorithm called Top Two Silhouettes (Top2S). The system is evaluated on a set of over 200 manually transcribed multiparty meetings. The evaluation reveals that the system can be further simplified by omitting the use of HBS. Furthermore, we illustrate that SphereDiar achieves state-of-The-Art results with two different meeting data sets.

Description

| openaire: EC/H2020/780069/EU//MeMAD

Keywords

segmentation, silhouette coefficients, speaker diarization, speaker embeddings, spherical K-means

Other note

Citation

Kaseva, T, Rouhe, A & Kurimo, M 2019, Spherediar: An Effective Speaker Diarization System for Meeting Data . in 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings ., 9003967, IEEE, pp. 373-380, IEEE Automatic Speech Recognition and Understanding Workshop, Singapore, Singapore, 15/12/2019 . https://doi.org/10.1109/ASRU46091.2019.9003967