Low Latency End-to-End Streaming Speech Recognition with a Scout Network

Type: Preprint

Publication Date: 2020-01-01

Citations: 22

DOI: https://doi.org/10.48550/arxiv.2003.10369

View

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat Low Latency End-to-End Streaming Speech Recognition with a Scout Network 2020 Chengyi Wang
Yu Wu
Liang Lu
Shujie Liu
Jinyu Li
Guoli Ye
Ming Zhou
+ Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition 2020 Binbin Zhang
Di Wu
Zhuoyuan Yao
Xiong Wang
Fan Yu
Chao Yang
Liyong Guo
Yaguang Hu
Lei Xie
Xin Lei
+ Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition 2023 Mohan Li
Rama Doddipatla
Cătălin Zorilă
+ Streaming automatic speech recognition with the transformer model 2020 Niko Moritz
Takaaki Hori
Jonathan Le Roux
+ Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames 2023 Chengdong Liang
Xiao-Lei Zhang
Binbin Zhang
Di Wu
Shengqiang Li
Xingchen Song
Zhendong Peng
Fuping Pan
+ Streaming automatic speech recognition with the transformer model 2020 Niko Moritz
Takaaki Hori
Jonathan Le Roux
+ PDF Chat Streaming Automatic Speech Recognition with the Transformer Model 2020 Niko Moritz
Takaaki Hori
Jonathan Le
+ Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames 2022 Chengdong Liang
Xiao-Lei Zhang
Binbin Zhang
Di Wu
Shengqiang Li
Xingchen Song
Zhendong Peng
Fuping Pan
+ Reducing the Latency of End-to-End Streaming Speech Recognition Models with a Scout Network 2020 Chengyi Wang
Yu Wu
Shujie Liu
Jinyu Li
Liang Lu
Guoli Ye
Ming Zhou
+ Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset 2020 Xie Chen
Yu Wu
Zhenghao Wang
Shujie Liu
Jinyu Li
+ PDF Chat Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset 2021 Chen Xie
Yu Wu
Zhenghao Wang
Shujie Liu
Jinyu Li
+ Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition 2020 Wenyong Huang
Wenchao Hu
Yu Ting Yeung
Xiao Dong Chen
+ WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit 2021 Zhuoyuan Yao
Di Wu
Xiong Wang
Binbin Zhang
Fan Yu
Chao Yang
Zhendong Peng
Xiaoyu Chen
Lei Xie
Xin Lei
+ PDF Chat WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit 2021 Zhuoyuan Yao
Di Wu
Xiong Wang
Binbin Zhang
Fan Yu
Chao Yang
Zhendong Peng
Xiaoyu Chen
Lei Xie
Xin Lei
+ Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition 2023 Vahid Noroozi
Somshubra Majumdar
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
+ PDF Chat Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture 2020 Haoran Miao
Gaofeng Cheng
Pengyuan Zhang
Yonghong Yan
+ Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR 2022 Emiru Tsunoo
Chaitanya Narisetty
Michael Hentschel
Yosuke Kashiwagi
Shinji Watanabe
+ Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition 2020 Ching-Feng Yeh
Yongqiang Wang
Yangyang Shi
Chunyang Wu
Frank Zhang
Julian M. W. Chan
Michael L. Seltzer
+ PDF Chat Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition 2020 Wenyong Huang
Wenchao Hu
Yu Ting Yeung
Xiao Chen
+ PDF Chat Run-and-Back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-Decoder ASR 2022 Emiru Tsunoo
Chaitanya Narisetty
Michael Hentschel
Yosuke Kashiwagi
Shinji Watanabe

Cited by (21)

Action Title Year Authors
+ Parallelizing Legendre Memory Unit Training 2021 Narsimha Chilkuri
Chris Eliasmith
+ Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition 2020 Shiliang Zhang
Zhifu Gao
Haoneng Luo
Ming Lei
Jie Gao
Zhijie Yan
Lei Xie
+ Recent Advances in End-to-End Automatic Speech Recognition 2022 Jinyu Li
+ Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR 2021 Junkun Chen
Mingbo Ma
Renjie Zheng
Liang Huang
+ Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data 2020 Thibault Doutre
Wei Han
Min Ma
Zhiyun Lu
Chung‐Cheng Chiu
Ruoming Pang
Arun Narayanan
Ananya Misra
Yu Zhang
Liangliang Cao
+ Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition 2020 Maarten Van Segbroeck
Sri Harish Mallidi
Brian King
I‐Ming Chen
Gurpreet Chadha
Roland Maas
+ FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization 2020 Jiahui Yu
Chung‐Cheng Chiu
Bo Li
Shuo-Yiin Chang
Tara N. Sainath
Yanzhang He
Arun Narayanan
Wei Han
Anmol Gulati
Yonghui Wu
+ A Better and Faster End-to-End Model for Streaming ASR 2020 Bo Li
Anmol Gulati
Jiahui Yu
Tara N. Sainath
Chung‐Cheng Chiu
Arun Narayanan
Shuo-Yiin Chang
Ruoming Pang
Yanzhang He
James Qin
+ PDF Chat Visualization: The Missing Factor in Simultaneous Speech Translation 2022 Sara Papi
Matteo Negri
Marco Turchi
+ Dissecting User-Perceived Latency of On-Device E2E Speech Recognition 2021 Yuan Shangguan
Rohit Prabhavalkar
Hang Su
Jay Mahadeokar
Yangyang Shi
Jiatong Zhou
Chunyang Wu
Duc Le
Ozlem Kalinli
Christian Fuegen
+ Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR 2021 Junkun Chen
Mingbo Ma
Renjie Zheng
Liang Huang
+ VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording 2021 Hirofumi Inaguma
Tatsuya Kawahara
+ PDF Chat FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization 2021 Jiahui Yu
Chung‐Cheng Chiu
Bo Li
Shuo-Yiin Chang
Tara N. Sainath
Yanzhang He
Arun Narayanan
Wei Han
Anmol Gulati
Yonghui Wu
+ PDF Chat Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition 2021 Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
Mike Seltzer
+ Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition 2020 Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
Mike Seltzer
+ Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition 2021 Niko Moritz
Takaaki Hori
Jonathan Le Roux
+ Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation 2021 Akmal Haidar
Chao Xing
Mehdi Rezagholizadeh
+ Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios 2021 Jay Mahadeokar
Yangyang Shi
Yuan Shangguan
Chunyang Wu
Alex Xiao
Hang Su
Duc Le
Ozlem Kalinli
Christian Fuegen
Michael L. Seltzer
+ PDF Chat Streaming Transformer Transducer based Speech Recognition Using Non-Causal Convolution 2022 Yangyang Shi
Chunyang Wu
Dilin Wang
Alex Xiao
Jay Mahadeokar
Xiaohui Zhang
Chunxi Liu
Ke Li
Yuan Shangguan
Varun Nagaraja
+ AV Taris: Online Audio-Visual Speech Recognition. 2020 George Sterpu
Naomi Harte
+ PDF Chat A Better and Faster end-to-end Model for Streaming ASR 2021 Bo Li
Anmol Gulati
Jiahui Yu
Tara N. Sainath
Chung‐Cheng Chiu
Arun Narayanan
Shuo-Yiin Chang
Ruoming Pang
Yanzhang He
James Qin

Citing (22)

Action Title Year Authors
+ First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs 2014 Andrew L. Maas
Awni Hannun
Daniel Jurafsky
Andrew Y. Ng
+ Sequence Transduction with Recurrent Neural Networks 2012 Alex Graves
+ PDF Chat Speech recognition with deep recurrent neural networks 2013 Alex Graves
Abdelrahman Mohamed
Geoffrey E. Hinton
+ Deep Speech 2: End-to-End Speech Recognition in English and Mandarin 2015 Dario Amodei
Rishita Anubhai
Eric Battenberg
Carl Case
Jared Casper
Bryan Catanzaro
Jingdong Chen
Mike Chrzanowski
Adam Coates
Greg Diamos
+ An Online Attention-based Model for Speech Recognition 2018 Ruchao Fan
Pan Zhou
Wei Chen
Jia Jia
Gang Liu
+ PDF Chat Self-attention Aligner: A Latency-control End-to-end Model for ASR Using Self-attention Network and Chunk-hopping 2019 Linhao Dong
Feng Wang
Bo Xu
+ PDF Chat SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition 2019 Daniel Park
William Chan
Yu Zhang
Chung‐Cheng Chiu
Barret Zoph
Ekin D. Cubuk
Quoc V. Le
+ PDF Chat End-to-end attention-based large vocabulary speech recognition 2016 Dzmitry Bahdanau
Jan Chorowski
Dmitriy Serdyuk
PhilĂŠmon Brakel
Yoshua Bengio
+ Attention is All you Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
+ PDF Chat Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer 2017 Kanishka Rao
Haşim Sak
Rohit Prabhavalkar
+ PDF Chat Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates 2018 Taku Kudo
+ Transformer-XL: Attentive Language Models beyond a Fixed-Length Context 2019 Zihang Dai
Zhilin Yang
Yiming Yang
Jaime Carbonell
Quoc V. Le
Ruslan Salakhutdinov
+ Monotonic Chunkwise Attention 2018 Chung‐Cheng Chiu
Colin Raffel
+ PDF Chat A Comparative Study on Transformer vs RNN in Speech Applications 2019 Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
Ziyan Jiang
Masao Someki
Nelson Enrique Yalta Soplin
Ryuichi Yamamoto
Xiaofei Wang
+ Towards Online End-to-end Transformer Automatic Speech Recognition 2019 Emiru Tsunoo
Yosuke Kashiwagi
Toshiyuki Kumakura
Shinji Watanabe
+ PDF Chat Transformer-Based Acoustic Modeling for Hybrid Speech Recognition 2020 Yongqiang Wang
Abdelrahman Mohamed
Dieu Ngan Le
Chunxi Liu
Alex Xiao
Jay Mahadeokar
Hongzhao Huang
Andros Tjandra
Xiaohui Zhang
Frank Zhang
+ Semantic Mask for Transformer based End-to-End Speech Recognition 2019 Chengyi Wang
Yu Wu
Yujiao Du
Jinyu Li
Shujie Liu
Liang Lu
Shuo Ren
Guoli Ye
Sheng Zhao
Ming Zhou
+ Synchronous Transformers for End-to-End Speech Recognition 2019 Zhengkun Tian
Jiangyan Yi
Ye Bai
Jianhua Tao
Shuai Zhang
Zhengqi Wen
+ PDF Chat Transformer ASR with Contextual Block Processing 2019 Emiru Tsunoo
Yosuke Kashiwagi
Toshiyuki Kumakura
Shinji Watanabe
+ PDF Chat Streaming Automatic Speech Recognition with the Transformer Model 2020 Niko Moritz
Takaaki Hori
Jonathan Le
+ PDF Chat Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss 2020 Qian Zhang
Lu Han
Haşim Sak
Anshuman Tripathi
Erik McDermott
Stephen Koo
Shankar Kumar
+ Attention-Based Models for Speech Recognition 2015 Jan Chorowski
Dzmitry Bahdanau
Dmitriy Serdyuk
Kyunghyun Cho
Yoshua Bengio