Efficient Minimum Word Error Rate Training of RNN-Transducer for End-to-End Speech Recognition

Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas

Type: Article

Publication Date: 2020-10-25

Citations: 45

DOI: https://doi.org/10.21437/interspeech.2020-1557

Abstract

In this work, we propose a novel and efficient minimum word error rate (MWER) training method for RNN-Transducer (RNN-T).Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.The hypothesis probability scores and back-propagated gradients are calculated efficiently using the forward-backward algorithm.Moreover, the proposed method allows us to decouple the decoding and training processes, and thus we can perform offline parallel-decoding and MWER training for each subset iteratively.Experimental results show that this proposed semi-on-the-fly method can speed up the on-the-fly method by 6 times and result in a similar WER improvement (3.6%) over a baseline RNN-T model.The proposed MWER training can also effectively reduce high-deletion errors (9.2% WER-reduction) introduced by RNN-T models when EOS is added for endpointer.Further improvement can be achieved if we use a proposed RNN-T rescoring method to re-rank hypotheses and use external RNN-LM to perform additional rescoring.The best system achieves a 5% relative improvement on an English test-set of real far-field recordings and a 11.6% WER reduction on music-domain utterances.

Locations

arXiv (Cornell University) - View - PDF
Interspeech 2022 - View

Similar Works

Action	Title	Year	Authors
+	Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition	2020	Jinxi Guo Gautam Tiwari Jasha Droppo Maarten Van Segbroeck Che-Wei Huang Andreas Stolcke Roland Maas
+	Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition	2019	Chao Weng Chengzhu Yu Jia Cui Chunlei Zhang Dong Yu
+ PDF Chat	Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition	2020	Chao Weng Chengzhu Yu Jia Cui Chunlei Zhang Dong Yu
+ PDF Chat	On Addressing Practical Challenges for RNN-Transducer	2021	Rui Zhao Jian Xue Jinyu Li Wenning Wei Lei He Yifan Gong
+ PDF Chat	Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer	2022	Cong-Thanh Do Mohan Li Rama Doddipatla
+	On Addressing Practical Challenges for RNN-Transducer	2021	Rui Zhao Jian Xue Jinyu Li Wenning Wei Lei He Yifan Gong
+ PDF Chat	On Addressing Practical Challenges for RNN-Transducer	2021	Rui Zhao Jian Xue Jinyu Li Wei Wenning Lei He Yifan Gong
+	Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer	2022	Cong-Thanh Do Mohan Li Rama Doddipatla
+ PDF Chat	Efficient Training of Neural Transducer for Speech Recognition	2022	Wei Zhou Wilfried Michel Ralf Schlüter Hermann Ney
+	Efficient Training of Neural Transducer for Speech Recognition	2022	Wei Zhou Wilfried Michel Ralf Schlüter Hermann Ney
+ PDF Chat	Improving RNN Transducer Modeling for End-to-End Speech Recognition	2019	Jinyu Li Rui Zhao Hu Hu Yifan Gong
+	Improving RNN Transducer Modeling for End-to-End Speech Recognition	2019	Jinyu Li Rui Zhao Hu Hu Yifan Gong
+ PDF Chat	On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer	2021	Liang Lu Zhong Meng Naoyuki Kanda Jinyu Li Yifan Gong
+	On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer	2020	Liang Lu Zhong Meng Naoyuki Kanda Jinyu Li Yifan Gong
+ PDF Chat	On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer	2020	Liang Lu Zhong Meng Naoyuki Kanda Jinyu Li Yifan Gong
+	Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition	2022	Yusuke Shinohara Shinji Watanabe
+	A review of on-device fully neural end-to-end automatic speech recognition algorithms	2020	Chanwoo Kim Dhananjaya Gowda Dongsoo Lee Jiyeon Kim Ankur Kumar Sung-Soo Kim Abhinav Garg Changwoo Han
+ PDF Chat	A Review of On-Device Fully Neural End-to-End Automatic Speech Recognition Algorithms	2020	Chanwoo Kim Dhananjaya Gowda Dongsoo Lee Jiyeon Kim Ankur Kumar Sung-Soo Kim Abhinav Garg Changwoo Han
+ PDF Chat	EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding	2015	Yajie Miao Mohammad Gowayyed Florian Metze
+	EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding	2015	Yajie Miao Mohammad Gowayyed Florian Metze

Works That Cite This (40)

Action	Title	Year	Authors
+ PDF Chat	On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer	2021	Liang Lu Zhong Meng Naoyuki Kanda Jinyu Li Yifan Gong
+ PDF Chat	Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI	2022	Jinchuan Tian Jianwei Yu Chao Weng Shi-Xiong Zhang Dan Su Dong Yu Yuexian Zou
+	On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer	2020	Liang Lu Zhong Meng Naoyuki Kanda Jinyu Li Yifan Gong
+ PDF Chat	Integrating Lattice-Free MMI Into End-to-End Speech Recognition	2022	Jinchuan Tian Jianwei Yu Chao Weng Yuexian Zou Dong Yu
+	Recent Advances in End-to-End Automatic Speech Recognition	2022	Jinyu Li
+ PDF Chat	JOIST: A Joint Speech and Text Streaming Model for ASR	2023	Tara N. Sainath Rohit Prabhavalkar Ankur Bapna Yu Zhang Zhouyuan Huo Zhehuai Chen Bo Li Weiran Wang Trevor Strohman
+	Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR	2020	Naoyuki Kanda Zhong Meng Liang Lu Yashesh Gaur Xiaofei Wang Zhuo Chen Takuya Yoshioka
+	Powerful and Extensible WFST Framework for Rnn-Transducer Losses	2023	Aleksandr Laptev Vladimir Bataev Igor Gitman Boris Ginsburg
+ PDF Chat	Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator	2022	Guangzhi Sun Chao Zhang Philip C. Woodland
+ PDF Chat	Tied &amp; Reduced RNN-T Decoder	2021	Rami Botros Tara N. Sainath Robert David Emmanuel Guzman Wei Li Yanzhang He

Works Cited by This (9)

Action	Title	Year	Authors
+	Sequence Transduction with Recurrent Neural Networks	2012	Alex Graves
+ PDF Chat	SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition	2019	Daniel Park William Chan Yu Zhang Chung‐Cheng Chiu Barret Zoph Ekin D. Cubuk Quoc V. Le
+ PDF Chat	Streaming End-to-end Speech Recognition for Mobile Devices	2019	Yanzhang He Tara N. Sainath Rohit Prabhavalkar Ian McGraw Raziel Álvarez Ding Zhao David Rybach Anjuli Kannan Yonghui Wu Ruoming Pang
+ PDF Chat	Optimizing Expected Word Error Rate via Sampling for Speech Recognition	2017	Matt Shannon
+ PDF Chat	Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer	2017	Kanishka Rao Haşim Sak Rohit Prabhavalkar
+ PDF Chat	Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models	2018	Rohit Prabhavalkar Tara N. Sainath Yonghui Wu Patrick Nguyen Zhifeng Chen Chung‐Cheng Chiu Anjuli Kannan
+ PDF Chat	The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation	2018	Mia Xu Chen Orhan Fırat Ankur Bapna Melvin Johnson Wolfgang Macherey George Foster Llion Jones Mike Schuster Noam Shazeer Niki Parmar
+ PDF Chat	Towards Fast and Accurate Streaming End-To-End ASR	2020	Bo Li Shuo-Yiin Chang Tara N. Sainath Ruoming Pang Yanzhang He Trevor Strohman Yonghui Wu
+ PDF Chat	Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition	2020	Chao Weng Chengzhu Yu Jia Cui Chunlei Zhang Dong Yu