Efficient Minimum Word Error Rate Training of RNN-Transducer for End-to-End Speech Recognition

Type: Article

Publication Date: 2020-10-25

Citations: 45

DOI: https://doi.org/10.21437/interspeech.2020-1557

Abstract

In this work, we propose a novel and efficient minimum word error rate (MWER) training method for RNN-Transducer (RNN-T).Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.The hypothesis probability scores and back-propagated gradients are calculated efficiently using the forward-backward algorithm.Moreover, the proposed method allows us to decouple the decoding and training processes, and thus we can perform offline parallel-decoding and MWER training for each subset iteratively.Experimental results show that this proposed semi-on-the-fly method can speed up the on-the-fly method by 6 times and result in a similar WER improvement (3.6%) over a baseline RNN-T model.The proposed MWER training can also effectively reduce high-deletion errors (9.2% WER-reduction) introduced by RNN-T models when EOS is added for endpointer.Further improvement can be achieved if we use a proposed RNN-T rescoring method to re-rank hypotheses and use external RNN-LM to perform additional rescoring.The best system achieves a 5% relative improvement on an English test-set of real far-field recordings and a 11.6% WER reduction on music-domain utterances.

Locations

  • arXiv (Cornell University) - View - PDF
  • Interspeech 2022 - View

Similar Works

Action Title Year Authors
+ Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition 2020 Jinxi Guo
Gautam Tiwari
Jasha Droppo
Maarten Van Segbroeck
Che-Wei Huang
Andreas Stolcke
Roland Maas
+ Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition 2019 Chao Weng
Chengzhu Yu
Jia Cui
Chunlei Zhang
Dong Yu
+ PDF Chat Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition 2020 Chao Weng
Chengzhu Yu
Jia Cui
Chunlei Zhang
Dong Yu
+ PDF Chat On Addressing Practical Challenges for RNN-Transducer 2021 Rui Zhao
Jian Xue
Jinyu Li
Wenning Wei
Lei He
Yifan Gong
+ PDF Chat Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer 2022 Cong-Thanh Do
Mohan Li
Rama Doddipatla
+ On Addressing Practical Challenges for RNN-Transducer 2021 Rui Zhao
Jian Xue
Jinyu Li
Wenning Wei
Lei He
Yifan Gong
+ PDF Chat On Addressing Practical Challenges for RNN-Transducer 2021 Rui Zhao
Jian Xue
Jinyu Li
Wei Wenning
Lei He
Yifan Gong
+ Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer 2022 Cong-Thanh Do
Mohan Li
Rama Doddipatla
+ PDF Chat Efficient Training of Neural Transducer for Speech Recognition 2022 Wei Zhou
Wilfried Michel
Ralf Schlüter
Hermann Ney
+ Efficient Training of Neural Transducer for Speech Recognition 2022 Wei Zhou
Wilfried Michel
Ralf Schlüter
Hermann Ney
+ PDF Chat Improving RNN Transducer Modeling for End-to-End Speech Recognition 2019 Jinyu Li
Rui Zhao
Hu Hu
Yifan Gong
+ Improving RNN Transducer Modeling for End-to-End Speech Recognition 2019 Jinyu Li
Rui Zhao
Hu Hu
Yifan Gong
+ PDF Chat On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer 2021 Liang Lu
Zhong Meng
Naoyuki Kanda
Jinyu Li
Yifan Gong
+ On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer 2020 Liang Lu
Zhong Meng
Naoyuki Kanda
Jinyu Li
Yifan Gong
+ PDF Chat On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer 2020 Liang Lu
Zhong Meng
Naoyuki Kanda
Jinyu Li
Yifan Gong
+ Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition 2022 Yusuke Shinohara
Shinji Watanabe
+ A review of on-device fully neural end-to-end automatic speech recognition algorithms 2020 Chanwoo Kim
Dhananjaya Gowda
Dongsoo Lee
Jiyeon Kim
Ankur Kumar
Sung-Soo Kim
Abhinav Garg
Changwoo Han
+ PDF Chat A Review of On-Device Fully Neural End-to-End Automatic Speech Recognition Algorithms 2020 Chanwoo Kim
Dhananjaya Gowda
Dongsoo Lee
Jiyeon Kim
Ankur Kumar
Sung-Soo Kim
Abhinav Garg
Changwoo Han
+ PDF Chat EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding 2015 Yajie Miao
Mohammad Gowayyed
Florian Metze
+ EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding 2015 Yajie Miao
Mohammad Gowayyed
Florian Metze

Works Cited by This (9)

Action Title Year Authors
+ Sequence Transduction with Recurrent Neural Networks 2012 Alex Graves
+ PDF Chat SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition 2019 Daniel Park
William Chan
Yu Zhang
Chung‐Cheng Chiu
Barret Zoph
Ekin D. Cubuk
Quoc V. Le
+ PDF Chat Streaming End-to-end Speech Recognition for Mobile Devices 2019 Yanzhang He
Tara N. Sainath
Rohit Prabhavalkar
Ian McGraw
Raziel Álvarez
Ding Zhao
David Rybach
Anjuli Kannan
Yonghui Wu
Ruoming Pang
+ PDF Chat Optimizing Expected Word Error Rate via Sampling for Speech Recognition 2017 Matt Shannon
+ PDF Chat Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer 2017 Kanishka Rao
Haşim Sak
Rohit Prabhavalkar
+ PDF Chat Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models 2018 Rohit Prabhavalkar
Tara N. Sainath
Yonghui Wu
Patrick Nguyen
Zhifeng Chen
Chung‐Cheng Chiu
Anjuli Kannan
+ PDF Chat The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation 2018 Mia Xu Chen
Orhan Fırat
Ankur Bapna
Melvin Johnson
Wolfgang Macherey
George Foster
Llion Jones
Mike Schuster
Noam Shazeer
Niki Parmar
+ PDF Chat Towards Fast and Accurate Streaming End-To-End ASR 2020 Bo Li
Shuo-Yiin Chang
Tara N. Sainath
Ruoming Pang
Yanzhang He
Trevor Strohman
Yonghui Wu
+ PDF Chat Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition 2020 Chao Weng
Chengzhu Yu
Jia Cui
Chunlei Zhang
Dong Yu