Ask a Question

Prefer a chat interface with context about you and your work?

Dive: End-to-End Speech Diarization Via Iterative Speaker Embedding

Dive: End-to-End Speech Diarization Via Iterative Speaker Embedding

We introduce DIVE, an end-to-end speaker diarization sys-tem. DIVE presents the diarization task as an iterative pro-cess: it repeatedly builds a representation for each speaker before predicting their voice activity conditioned on the ex-tracted representations. This strategy intrinsically resolves the speaker ordering ambiguity without requiring the classi-cal permutation invariant training …