Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
In this paper, we propose a quality-aware end-to-end audio-visual neural speaker diarization framework, which comprises three key techniques. First, our audio-visual model takes both audio and visual features as inputs, utilizing a series of binary classification output layers to simultaneously identify the activities of all speakers. This end-to-end framework is …