Dive: End-to-End Speech Diarization Via Iterative Speaker Embedding

Type: Article

Publication Date: 2021-12-13

Citations: 6

DOI: https://doi.org/10.1109/asru51503.2021.9688178

Abstract

We introduce DIVE, an end-to-end speaker diarization sys-tem. DIVE presents the diarization task as an iterative pro-cess: it repeatedly builds a representation for each speaker before predicting their voice activity conditioned on the ex-tracted representations. This strategy intrinsically resolves the speaker ordering ambiguity without requiring the classi-cal permutation invariant training loss. In contrast with prior work, our model does not rely on pretrained speaker represen-tations and jointly optimizes all parameters of the system with a multi-speaker voice activity loss. DIVE does not require the training speaker identities and allows efficient window-based training. Importantly, our loss explicitly excludes unreliable speaker turn boundaries from training, which is adapted to the standard collar-based Diarization Error Rate (DER) eval-uation. Overall, these contributions yield a system redefining the state-of-the-art on the CALLHOME benchmark, with 6.7% DER compared to 7.8% for the best alternative.

Locations

  • arXiv (Cornell University) - View - PDF
  • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) - View

Similar Works

Action Title Year Authors
+ DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding 2021 Neil Zeghidour
Olivier Teboul
David Grangier
+ DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding. 2021 Neil Zeghidour
Olivier Teboul
David Grangier
+ Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer 2023 Zhengyang Chen
Bing Han
Shuai Wang
Yanmin Qian
+ PDF Chat Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer 2024 Zhengyang Chen
Bing Han
Shuai Wang
Yanmin Qian
+ PDF Chat Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios 2024 Juan Ignacio Álvarez-Trejos
BeltrĂĄn Labrador
Alicia Lozano-DĂ­ez
+ pyannote.audio: neural building blocks for speaker diarization 2019 Hervé Bredin
Ruiqing Yin
Juan Manuel Coria
Grégory Gelly
Pavel Korshunov
Marvin Lavechin
Diego Fustes
Hadrien Titeux
Wassim Bouaziz
Marie-Philippe Gill
+ pyannote.audio: neural building blocks for speaker diarization 2019 Hervé Bredin
Ruiqing Yin
Juan Manuel Coria
Grégory Gelly
Pavel Korshunov
Marvin Lavechin
Diego Fustes
Hadrien Titeux
Wassim Bouaziz
Marie-Philippe Gill
+ PDF Chat Pyannote.Audio: Neural Building Blocks for Speaker Diarization 2020 Hervé Bredin
Ruiqing Yin
Juan Manuel Coria
Grégory Gelly
Pavel Korshunov
Marvin Lavechin
Diego Fustes
Hadrien Titeux
Wassim Bouaziz
Marie-Philippe Gill
+ PDF Chat End-to-End Neural Diarization: From Transformer to Conformer 2021 Yi Chieh Liu
Eun‐Jung Han
Chul Lee
Andreas Stolcke
+ End-to-end Neural Diarization: From Transformer to Conformer 2021 Yi Chieh Liu
Eun‐Jung Han
Chul Lee
Andreas Stolcke
+ PDF Chat Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization 2024 Jenthe Thienpondt
Kris Demuynck
+ PDF Chat Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization 2024 Jenthe Thienpondt
Kris Demuynck
+ Multi-channel Conversational Speaker Separation via Neural Diarization 2023 Hassan Taherian
DeLiang Wang
+ PDF Chat Multi-Channel Conversational Speaker Separation via Neural Diarization 2024 Hassan Taherian
DeLiang Wang
+ PDF Chat Self-supervised Speaker Diarization 2022 Yehoshua Dissen
Felix Kreuk
Joseph Keshet
+ Self-supervised Speaker Diarization 2022 Yehoshua Dissen
Felix Kreuk
Joseph Keshet
+ End-to-End Neural Speaker Diarization with Permutation-Free Objectives 2019 Yusuke Fujita
Naoyuki Kanda
Shota Horiguchi
Kenji Nagamatsu
Shinji Watanabe
+ PDF Chat End-to-End Neural Speaker Diarization with Permutation-Free Objectives 2019 Yusuke Fujita
Naoyuki Kanda
Shota Horiguchi
Kenji Nagamatsu
Shinji Watanabe
+ PDF Chat Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization 2024 Xiang Li
Vivek Govindan
Rohit Paturi
Sundararajan Srinivasan
+ PDF Chat Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization 2024 Xiang Li
Vivek Govindan
Rohit Paturi
Sundararajan Srinivasan

Works Cited by This (24)

Action Title Year Authors
+ PDF Chat Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification 2015 Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
+ MUSAN: A Music, Speech, and Noise Corpus 2015 David Snyder
Guoguo Chen
Daniel Povey
+ PDF Chat Permutation invariant training of deep models for speaker-independent multi-talker speech separation 2017 Dong Yu
Morten KolbĂŠk
Zheng‐Hua Tan
Jesper Jensen
+ WaveNet: A Generative Model for Raw Audio 2016 AĂ€ron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alexander Graves
Nal Kalchbrenner
Andrew Senior
Koray Kavukcuoglu
+ PDF Chat Fully Supervised Speaker Diarization 2019 Aonan Zhang
Quan Wang
Zhenyao Zhu
John Paisley
Chong Wang
+ PDF Chat All-neural Online Source Separation, Counting, and Diarization for Meeting Analysis 2019 Thilo von Neumann
Keisuke Kinoshita
Marc Delcroix
Shoko Araki
Tomohiro Nakatani
Reinhold Haeb‐Umbach
+ PDF Chat Speaker Diarization with LSTM 2018 Quan Wang
Carlton Downey
Li Wan
P. Mansfield
Ignacio Lopz Moreno
+ PDF Chat LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization 2019 Qingjian Lin
Ruiqing Yin
Ming Li
Hervé Bredin
Claude Barras
+ PDF Chat End-to-End Neural Speaker Diarization with Permutation-Free Objectives 2019 Yusuke Fujita
Naoyuki Kanda
Shota Horiguchi
Kenji Nagamatsu
Shinji Watanabe
+ PDF Chat End-to-End Neural Speaker Diarization with Self-Attention 2019 Yusuke Fujita
Naoyuki Kanda
Shota Horiguchi
Yawen Xue
Kenji Nagamatsu
Shinji Watanabe