A Better and Faster end-to-end Model for Streaming ASR
A Better and Faster end-to-end Model for Streaming ASR
End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency …