+
PDF
Chat
|
Streaming End-to-end Speech Recognition for Mobile Devices
|
2019
|
Yanzhang He
Tara N. Sainath
Rohit Prabhavalkar
Ian McGraw
Raziel Ălvarez
Ding Zhao
David Rybach
Anjuli Kannan
Yonghui Wu
Ruoming Pang
|
3
|
+
|
Attention Is All You Need
|
2017
|
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Ĺukasz Kaiser
Illia Polosukhin
|
3
|
+
PDF
Chat
|
Joint CTC-attention based end-to-end speech recognition using multi-task learning
|
2017
|
Suyoun Kim
Takaaki Hori
Shinji Watanabe
|
3
|
+
|
Sequence Transduction with Recurrent Neural Networks
|
2012
|
Alex Graves
|
2
|
+
|
Listen, Attend and Spell
|
2015
|
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
|
2
|
+
PDF
Chat
|
Towards Fast and Accurate Streaming End-To-End ASR
|
2020
|
Bo Li
Shuo-Yiin Chang
Tara N. Sainath
Ruoming Pang
Yanzhang He
Trevor Strohman
Yonghui Wu
|
2
|
+
PDF
Chat
|
Conformer: Convolution-augmented Transformer for Speech Recognition
|
2020
|
Anmol Gulati
James Qin
ChungâCheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
|
2
|
+
PDF
Chat
|
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization
|
2021
|
Jiahui Yu
ChungâCheng Chiu
Bo Li
Shuo-Yiin Chang
Tara N. Sainath
Yanzhang He
Arun Narayanan
Wei Han
Anmol Gulati
Yonghui Wu
|
2
|
+
PDF
Chat
|
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
|
2018
|
ChungâCheng Chiu
Tara N. Sainath
Yonghui Wu
Rohit Prabhavalkar
Patrick Nguyen
Zhifeng Chen
Anjuli Kannan
Ron J. Weiss
Kanishka Rao
Ekaterina Gonina
|
2
|
+
|
Attention-Based Models for Speech Recognition
|
2015
|
Jan Chorowski
Dzmitry Bahdanau
Dmitriy Serdyuk
Kyunghyun Cho
Yoshua Bengio
|
2
|
+
PDF
Chat
|
Improving RNN Transducer Modeling for End-to-End Speech Recognition
|
2019
|
Jinyu Li
Rui Zhao
Hu Hu
Yifan Gong
|
2
|
+
PDF
Chat
|
A New Training Pipeline for an Improved Neural Transducer
|
2020
|
Albert Zeyer
AndrĂŠ Merboldt
Ralf SchlĂźter
Hermann Ney
|
2
|
+
PDF
Chat
|
Generalized End-to-End Loss for Speaker Verification
|
2018
|
Li Wan
Quan Wang
Alan Papir
Ignacio LĂłpez Moreno
|
1
|
+
PDF
Chat
|
Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model
|
2018
|
Bo Li
Tara N. Sainath
Khe Chai Sim
Michiel Bacchiani
Eugene Weinstein
Patrick Nguyen
Zhifeng Chen
Yanghui Wu
Kanishka Rao
|
1
|
+
PDF
Chat
|
Exploring Attention Mechanism for Acoustic-based Classification of Speech Utterances into System-directed and Non-system-directed
|
2019
|
Atta Norouzian
Bogdan Mazoure
Dermot Connolly
Daniel Willett
|
1
|
+
PDF
Chat
|
An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model
|
2018
|
Anjuli Kannan
Yonghui Wu
Patrick Nguyen
Tara N. Sainath
ZhiJeng Chen
Rohit Prabhavalkar
|
1
|
+
PDF
Chat
|
Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer
|
2017
|
Kanishka Rao
HaĹim Sak
Rohit Prabhavalkar
|
1
|
+
PDF
Chat
|
Massively Multilingual Adversarial Speech Recognition
|
2019
|
Oliver Adams
Matthew Wiesner
Shinji Watanabe
David Yarowsky
|
1
|
+
PDF
Chat
|
Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition
|
2017
|
Taesup Kim
Inchul Song
Yoshua Bengio
|
1
|
+
PDF
Chat
|
Lattice rescoring strategies for long short term memory language models in speech recognition
|
2017
|
Shankar Kumar
Michael Nirschl
Daniel Holtmann-Rice
Hank Liao
Ananda Theertha Suresh
Felix Yu
|
1
|
+
PDF
Chat
|
Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models
|
2018
|
Rohit Prabhavalkar
Tara N. Sainath
Yonghui Wu
Patrick Nguyen
Zhifeng Chen
ChungâCheng Chiu
Anjuli Kannan
|
1
|
+
PDF
Chat
|
Exploring neural transducers for end-to-end speech recognition
|
2017
|
Eric Battenberg
Jitong Chen
Rewon Child
Adam Coates
Yashesh Gaur Yi Li
Hairong Liu
Sanjeev Satheesh
Anuroop Sriram
Zhenyao Zhu
|
1
|
+
PDF
Chat
|
Deep Residual Learning for Small-Footprint Keyword Spotting
|
2018
|
Raphael Tang
Jimmy Lin
|
1
|
+
PDF
Chat
|
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
|
2018
|
Jun Wang
Jie Chen
Dan Su
Lianwu Chen
Meng Yu
Yanmin Qian
Dong Yu
|
1
|
+
PDF
Chat
|
Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model
|
2019
|
Anjuli Kannan
Arindrima Datta
Tara N. Sainath
Eugene Weinstein
Bhuvana Ramabhadran
Yonghui Wu
Ankur Bapna
Zhifeng Chen
Seungji Lee
|
1
|
+
PDF
Chat
|
End-to-End Multi-Speaker Speech Recognition Using Speaker Embeddings and Transfer Learning
|
2019
|
Pavel Denisov
Ngoc Thang Vu
|
1
|
+
PDF
Chat
|
On the End-to-End Solution to Mandarin-English Code-Switching Speech Recognition
|
2019
|
Zhiping Zeng
Yerbolat Khassanov
Van Tung Pham
Haihua Xu
Eng Siong Chng
Haizhou Li
|
1
|
+
PDF
Chat
|
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
|
2019
|
Quan Wang
Hannah Muckenhirn
Kevin Wilson
Prashant Sridhar
Zelin Wu
John R. Hershey
Rif A. Saurous
Ron J. Weiss
Jia Ye
Ignacio LĂłpez Moreno
|
1
|
+
|
Optimizing Speech Recognition For The Edge
|
2019
|
Yuan Shangguan
Jian Li
Qiao Liang
Raziel Ălvarez
Ian McGraw
|
1
|
+
PDF
Chat
|
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency
|
2020
|
Tara N. Sainath
Yanzhang He
Bo Li
Arun Narayanan
Ruoming Pang
Antoine Bruguier
Shuo-Yiin Chang
Wei Li
Raziel Ălvarez
Zhifeng Chen
|
1
|
+
PDF
Chat
|
Specaugment on Large Scale Datasets
|
2020
|
Daniel Park
Yu Zhang
ChungâCheng Chiu
Youzheng Chen
Bo Li
William Chan
Quoc V. Le
Yonghui Wu
|
1
|
+
PDF
Chat
|
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
|
2020
|
Qian Zhang
Lu Han
HaĹim Sak
Anshuman Tripathi
Erik McDermott
Stephen Koo
Shankar Kumar
|
1
|
+
PDF
Chat
|
Hybrid Autoregressive Transducer (HAT)
|
2020
|
Ehsan Variani
David Rybach
Cyril Allauzen
Michael Riley
|
1
|
+
PDF
Chat
|
Personal VAD: Speaker-Conditioned Voice Activity Detection
|
2020
|
Shaojin Ding
Quan Wang
Shuo-Yiin Chang
Li Wan
Ignacio LĂłpez Moreno
|
1
|
+
PDF
Chat
|
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters
|
2020
|
Vineel Pratap
Anuroop Sriram
Paden Tomasello
Awni Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
Ronan Collobert
|
1
|
+
PDF
Chat
|
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition
|
2020
|
Quan Wang
Ignacio LĂłpez Moreno
Mert SaÄlam
Kevin R. Wilson
Alan Chiao
Renjie Liu
Yanzhang He
WÄi Li
Jason Pelecanos
Marily Nika
|
1
|
+
PDF
Chat
|
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers
|
2020
|
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Tianyan Zhou
Takuya Yoshioka
|
1
|
+
PDF
Chat
|
SpEx: Multi-Scale Time Domain Speaker Extraction Network
|
2020
|
Chenglin Xu
Wei Rao
Eng Siong Chng
Haizhou Li
|
1
|
+
PDF
Chat
|
Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview
|
2020
|
Peter Bell
Joachim Fainberg
OndĹej Klejch
Jinyu Li
Steve Renals
PaweĹ ĹwiÄtojaĹski
|
1
|
+
|
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording
|
2020
|
Cong Han
Yi Luo
Chenda Li
Tianyan Zhou
Keisuke Kinoshita
Shinji Watanabe
Marc Delcroix
Hakan ErdoÄan
John R. Hershey
Nima Mesgarani
|
1
|
+
PDF
Chat
|
Transformer-Transducers for Code-Switched Speech Recognition
|
2021
|
Siddharth Dalmia
Yuzong Liu
Srikanth Ronanki
Katrin Kirchhoff
|
1
|
+
PDF
Chat
|
A Better and Faster end-to-end Model for Streaming ASR
|
2021
|
Bo Li
Anmol Gulati
Jiahui Yu
Tara N. Sainath
ChungâCheng Chiu
Arun Narayanan
Shuo-Yiin Chang
Ruoming Pang
Yanzhang He
James Qin
|
1
|
+
PDF
Chat
|
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation
|
2021
|
Jiatong Shi
Chunlei Zhang
Chao Weng
Shinji Watanabe
Meng Yu
Dong Yu
|
1
|
+
PDF
Chat
|
Less is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging
|
2021
|
Rohit Prabhavalkar
Yanzhang He
David Rybach
Sean Campbell
Arun Narayanan
Trevor Strohman
Tara N. Sainath
|
1
|
+
PDF
Chat
|
Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset
|
2021
|
Chen Xie
Yu Wu
Zhenghao Wang
Shujie Liu
Jinyu Li
|
1
|
+
PDF
Chat
|
Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition
|
2021
|
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
Mike Seltzer
|
1
|
+
PDF
Chat
|
Personalized Keyphrase Detection Using Speaker and Environment Information
|
2021
|
Rajeev V. Rikhye
Quan Wang
Qiao Liang
Yanzhang He
Ding Zhao
Yiteng Huang
Arun Narayanan
Ian McGraw
|
1
|
+
PDF
Chat
|
Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition
|
2021
|
Jason Pelecanos
Quan Wang
Ignacio LĂłpez Moreno
|
1
|
+
PDF
Chat
|
Scaling End-to-End Models for Large-Scale Multilingual ASR
|
2021
|
Bo Li
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Zhang Yu
James Qin
Parisa Haghani
Wei Huang
Min Ma
Junwen Bai
|
1
|
+
PDF
Chat
|
Cross-Attention Conformer for Context Modeling in Speech Enhancement for ASR
|
2021
|
Arun Narayanan
ChungâCheng Chiu
Tom OâMalley
Quan Wang
Yanzhang He
|
1
|