Dive: End-to-End Speech Diarization Via Iterative Speaker Embedding
Dive: End-to-End Speech Diarization Via Iterative Speaker Embedding
We introduce DIVE, an end-to-end speaker diarization sys-tem. DIVE presents the diarization task as an iterative pro-cess: it repeatedly builds a representation for each speaker before predicting their voice activity conditioned on the ex-tracted representations. This strategy intrinsically resolves the speaker ordering ambiguity without requiring the classi-cal permutation invariant training …