Towards Fast and Accurate Streaming End-To-End ASR
Towards Fast and Accurate Streaming End-To-End ASR
End-to-end (E2E) models fold the acoustic, pronunciation and language models of a conventional speech recognition model into one neural network with a much smaller number of parameters than a conventional ASR system, thus making it suitable for on-device applications. For example, recurrent neural network transducer (RNN-T) as a streaming E2E …