Ask a Question

Prefer a chat interface with context about you and your work?

Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation

Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation

Target-speaker speech recognition aims to recognize target-speaker speech from noisy environments with background noise and interfering speakers. This work presents a joint framework that combines time-domain target-speaker speech extraction and Recurrent Neural Network Transducer (RNN-T). To stabilize the joint-training, we propose a multi-stage training strategy that pre-trains and fine-tunes each …