FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible. However, emitting fast without degrading quality, as measured by word error rate (WER), is highly challenging. Existing approaches including Early and Late Penalties [1] and Constrained Alignments [2], [3] penalize emission delay by …