Optimizing Expected Word Error Rate via Sampling for Speech Recognition
Optimizing Expected Word Error Rate via Sampling for Speech Recognition
State-level minimum Bayes risk (sMBR) training has become the de facto standard for sequence-level training of speech recognition acoustic models.It has an elegant formulation using the expectation semiring, and gives large improvements in word error rate (WER) over models trained solely using crossentropy (CE) or connectionist temporal classification (CTC).sMBR training …