Ask a Question

Prefer a chat interface with context about you and your work?

A Better and Faster end-to-end Model for Streaming ASR

A Better and Faster end-to-end Model for Streaming ASR

End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency …