Toward Streaming ASR with Non-Autoregressive Insertion-Based Model

Type: Article

Publication Date: 2021-08-27

Citations: 3

DOI: https://doi.org/10.21437/interspeech.2021-1131

Abstract

Neural end-to-end (E2E) models have become a promising technique to realize practical automatic speech recognition (ASR) systems.When realizing such a system, one important issue is the segmentation of audio to deal with streaming input or long recording.After audio segmentation, the ASR model with a small real-time factor (RTF) is preferable because the latency of the system can be faster.Recently, E2E ASR based on non-autoregressive models becomes a promising approach since it can decode an N -length token sequence with less than N iterations.We propose a system to concatenate audio segmentation and non-autoregressive ASR to realize high accuracy and low RTF ASR.As a non-autoregressive ASR, the insertion-based model is used.In addition, instead of concatenating separated models for segmentation and ASR, we introduce a new architecture that realizes audio segmentation and non-autoregressive ASR by a single neural network.Experimental results on Japanese and English dataset show that the method achieved a reasonable trade-off between accuracy and RTF compared with baseline autoregressive Transformer and connectionist temporal classification.

Locations

  • arXiv (Cornell University) - View - PDF
  • Interspeech 2022 - View

Similar Works

Action Title Year Authors
+ Toward Streaming ASR with Non-Autoregressive Insertion-based Model 2020 Yuya Fujita
Tianzi Wang
Shinji Watanabe
Motoi Omachi
+ PDF Chat Toward Streaming ASR with Non-Autoregressive Insertion-based Model 2020 Yuya Fujita
Tianzi Wang
Shinji Watanabe
Motoi Omachi
+ End-to-End ASR and Audio Segmentation with Non-autoregressive Insertion-based model 2020 Yuya Fujita
Shinji Watanabe
Motoi Omachi
+ PDF Chat Streaming End-to-End ASR Based on Blockwise Non-Autoregressive Models 2021 Tianzi Wang
Yuya Fujita
Xuankai Chang
Shinji Watanabe
+ Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models 2021 Tianzi Wang
Yuya Fujita
Xuankai Chang
Shinji Watanabe
+ Semi-Autoregressive Streaming ASR With Label Context 2023 Siddhant Arora
George Saon
Shinji Watanabe
Brian Kingsbury
+ Semi-Autoregressive Streaming ASR with Label Context 2024 Siddhant Arora
George Saon
Shinji Watanabe
Brian Kingsbury
+ Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model 2020 Zhifu Gao
Shiliang Zhang
Ming Lei
Ian McLoughlin
+ Insertion-Based Modeling for End-to-End Automatic Speech Recognition 2020 Yuya Fujita
Shinji Watanabe
Motoi Omachi
Xuankai Chan
+ PDF Chat Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition 2022 Zhengkun Tian
Jiangyan Yi
Jianhua Tao
Shuai Zhang
Zhengqi Wen
+ PDF Chat Insertion-Based Modeling for End-to-End Automatic Speech Recognition 2020 Yuya Fujita
Shinji Watanabe
Motoi Omachi
Xuankai Chang
+ PDF Chat CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR 2024 Wenbo Zhao
Z.G. Li
Chuan Yu
Zhijian Ou
+ Delay-penalized transducer for low-latency streaming ASR 2022 Wei Kang
Zengwei Yao
Fangjun Kuang
Liyong Guo
Xiaoyu Yang
Long lin
Piotr Żelasko
Daniel Povey
+ Streaming Target-Speaker ASR with Neural Transducer 2022 Takafumi Moriya
Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
Takahiro Shinozaki
+ Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition 2020 Binbin Zhang
Di Wu
Zhuoyuan Yao
Xiong Wang
Fan Yu
Chao Yang
Liyong Guo
Yaguang Hu
Lei Xie
Xin Lei
+ Delay-Penalized Transducer for Low-Latency Streaming ASR 2023 Wei Kang
Zengwei Yao
Fangjun Kuang
Liyong Guo
Xiaoyu Yang
Long Lin
Piotr Żelasko
Daniel Povey
+ An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR 2021 Huaibo Zhao
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
+ WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit 2021 Zhuoyuan Yao
Di Wu
Xiong Wang
Binbin Zhang
Fan Yu
Chao Yang
Zhendong Peng
Xiaoyu Chen
Lei Xie
Xin Lei
+ Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition 2020 Zhengkun Tian
Jiangyan Yi
Jianhua Tao
Ye Bai
Shuai Zhang
Zhengqi Wen
+ PDF Chat Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition 2020 Zhengkun Tian
Jiangyan Yi
Jianhua Tao
Ye Bai
Shuai Zhang
Zhengqi Wen

Works Cited by This (22)

Action Title Year Authors
+ Deep Speech 2: End-to-End Speech Recognition in English and Mandarin 2015 Dario Amodei
Rishita Anubhai
Eric Battenberg
Carl Case
Jared Casper
Bryan Catanzaro
Jingdong Chen
Mike Chrzanowski
Adam Coates
Greg Diamos
+ Non-Autoregressive Neural Machine Translation 2017 Jiatao Gu
James Bradbury
Caiming Xiong
Victor O. K. Li
Richard Socher
+ Levenshtein Transformer 2019 Jiatao Gu
Changhan Wang
Jake Zhao
+ KERMIT: Generative Insertion-Based Modeling for Sequences 2019 William Chan
Nikita Kitaev
Kelvin Guu
Mitchell Stern
Jakob Uszkoreit
+ Insertion Transformer: Flexible Sequence Generation via Insertion Operations 2019 Mitchell Stern
William Chan
Jamie Kiros
Jakob Uszkoreit
+ PDF Chat Streaming End-to-end Speech Recognition for Mobile Devices 2019 Yanzhang He
Tara N. Sainath
Rohit Prabhavalkar
Ian McGraw
Raziel Álvarez
Ding Zhao
David Rybach
Anjuli Kannan
Yonghui Wu
Ruoming Pang
+ PDF Chat ESPnet: End-to-End Speech Processing Toolkit 2018 Shinji Watanabe
Takaaki Hori
Shigeki Karita
Tomoki Hayashi
Jiro Nishitoba
Yuya Unno
Nelson Enrique Yalta Soplin
Jahn Heymann
Matthew Wiesner
Nanxin Chen
+ Transformer-XL: Attentive Language Models beyond a Fixed-Length Context 2019 Zihang Dai
Zhilin Yang
Yiming Yang
Jaime Carbonell
Quoc V. Le
Ruslan Salakhutdinov
+ PDF Chat A Comparative Study on Transformer vs RNN in Speech Applications 2019 Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
Ziyan Jiang
Masao Someki
Nelson Enrique Yalta Soplin
Ryuichi Yamamoto
Xiaofei Wang
+ PDF Chat Two-Pass End-to-End Speech Recognition 2019 Tara N. Sainath
Ruoming Pang
David Rybach
Yanzhang He
Rohit Prabhavalkar
Wei Li
Mirkó Visontai
Qiao Liang
Trevor Strohman
Yonghui Wu