Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data
Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data
Streaming end-to-end automatic speech recognition (ASR) models are widely used on smart speakers and on-device applications. Since these models are expected to transcribe speech with minimal latency, they are constrained to be causal with no future context, compared to their non-streaming counterparts. Consequently, streaming models usually perform worse than non-streaming …