On Addressing Practical Challenges for RNN-Transducer

Type: Article

Publication Date: 2021-12-13

Citations: 13

DOI: https://doi.org/10.1109/asru51503.2021.9688101

Abstract

In this paper, several works are proposed to address practi-cal challenges for deploying RNN Transducer (RNN-T) based speech recognition systems. These challenges are adapting a well-trained RNN-T model to a new domain without col-lecting the audio data, obtaining time stamps and confidence scores at word level. We solve the first challenge with a splicing data method which concatenates the speech segments ex-tracted from the source domain data. To get time stamps, a phone prediction branch is added to the RNN-T model by sharing the encoder for the purpose of forced alignment. Fi-nally, we obtain word level confidence scores by utilizing sev-eral types of features calculated during decoding and from a confusion network. Evaluated with Microsoft production data, the splicing data adaptation method improves the base-line and adaptation with the text to speech method by 58.03% and 15.25% relative word error rate reduction, respectively. The proposed time stamping method can get less than 50 mil-lisecond word timing difference from the ground truth align-ment on average while maintaining the recognition accuracy. We also obtain high confidence annotation performance with limited computation cost.

Locations

  • arXiv (Cornell University) - View - PDF
  • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) - View

Similar Works

Action Title Year Authors
+ On Addressing Practical Challenges for RNN-Transducer 2021 Rui Zhao
Jian Xue
Jinyu Li
Wenning Wei
Lei He
Yifan Gong
+ PDF Chat On Addressing Practical Challenges for RNN-Transducer 2021 Rui Zhao
Jian Xue
Jinyu Li
Wei Wenning
Lei He
Yifan Gong
+ PDF Chat Exploring Pre-Training with Alignments for RNN Transducer Based End-to-End Speech Recognition 2020 Hu Hu
Rui Zhao
Jinyu Li
Liang Lu
Yifan Gong
+ Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition 2020 Hu Hu
Rui Zhao
Jinyu Li
Liang Lu
Yifan Gong
+ PDF Chat Improving RNN Transducer Modeling for End-to-End Speech Recognition 2019 Jinyu Li
Rui Zhao
Hu Hu
Yifan Gong
+ Improving RNN Transducer Modeling for End-to-End Speech Recognition 2019 Jinyu Li
Rui Zhao
Hu Hu
Yifan Gong
+ Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition 2020 Jinxi Guo
Gautam Tiwari
Jasha Droppo
Maarten Van Segbroeck
Che-Wei Huang
Andreas Stolcke
Roland Maas
+ PDF Chat Efficient Minimum Word Error Rate Training of RNN-Transducer for End-to-End Speech Recognition 2020 Jinxi Guo
Gautam Tiwari
Jasha Droppo
Maarten Van Segbroeck
Che-Wei Huang
Andreas Stolcke
Roland Maas
+ PDF Chat Efficient Training of Neural Transducer for Speech Recognition 2022 Wei Zhou
Wilfried Michel
Ralf Schlüter
Hermann Ney
+ Efficient Training of Neural Transducer for Speech Recognition 2022 Wei Zhou
Wilfried Michel
Ralf Schlüter
Hermann Ney
+ Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network 2021 Janne Pylkkönen
Antti Ukkonen
Juho Kilpikoski
Samu Tamminen
Hannes Heikinheimo
+ PDF Chat Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network 2021 Janne Pylkkönen
Antti Ukkonen
Juho Kilpikoski
Samu Tamminen
Hannes Heikinheimo
+ Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network 2021 Janne Pylkkönen
Antti Ukkonen
Juho Kilpikoski
Samu Tamminen
Hannes Heikinheimo
+ Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers 2021 Juntae Kim
Jeehye Lee
+ PDF Chat Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers 2022 Juntae Kim
Jeehye Lee
+ On Language Model Integration for RNN Transducer based Speech Recognition 2021 Wei Zhou
Zuoyun Zheng
Ralf Schlüter
Hermann Ney
+ PDF Chat On Language Model Integration for RNN Transducer Based Speech Recognition 2022 Wei Zhou
Zuoyun Zheng
Ralf Schlüter
Hermann Ney
+ Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition 2019 Chao Weng
Chengzhu Yu
Jia Cui
Chunlei Zhang
Dong Yu
+ Word-level confidence estimation for RNN transducers 2021 Mingqiu Wang
Hagen Soltau
Laurent El Shafey
Izhak Shafran
+ PDF Chat Word-Level Confidence Estimation for RNN Transducers 2021 Mingqiu Wang
Hagen Soltau
Laurent El Shafey
Izhak Shafran

Works Cited by This (25)

Action Title Year Authors
+ Sequence Transduction with Recurrent Neural Networks 2012 Alex Graves
+ PDF Chat Finding consensus in speech recognition: word error minimization and other applications of confusion networks 2000 Lidia Mangu
Eric Brill
Andreas Stolcke
+ Exploring Neural Transducers for End-to-End Speech Recognition 2017 Eric Battenberg
Jitong Chen
Rewon Child
Adam Coates
Yashesh Gaur
Yi Li
Hairong Liu
Sanjeev Satheesh
David Seetapun
Anuroop Sriram
+ Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice 2018 Yan Deng
Lei He
Frank K. Soong
+ Improving Performance of End-to-End ASR on Numeric Sequences 2019 Cal Peyser
Hao Zhang
Tara N. Sainath
Zelin Wu
+ PDF Chat Streaming End-to-end Speech Recognition for Mobile Devices 2019 Yanzhang He
Tara N. Sainath
Rohit Prabhavalkar
Ian McGraw
Raziel Álvarez
Ding Zhao
David Rybach
Anjuli Kannan
Yonghui Wu
Ruoming Pang
+ PDF Chat State-of-the-Art Speech Recognition with Sequence-to-Sequence Models 2018 Chung‐Cheng Chiu
Tara N. Sainath
Yonghui Wu
Rohit Prabhavalkar
Patrick Nguyen
Zhifeng Chen
Anjuli Kannan
Ron J. Weiss
Kanishka Rao
Ekaterina Gonina
+ PDF Chat EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding 2015 Yajie Miao
Mohammad Gowayyed
Florian Metze
+ PDF Chat Advancing Acoustic-to-Word CTC Model 2018 Jinyu Li
Guoli Ye
Amit Das
Rui Zhao
Yifan Gong
+ PDF Chat Improving RNN Transducer Modeling for End-to-End Speech Recognition 2019 Jinyu Li
Rui Zhao
Hu Hu
Yifan Gong