Hybrid Autoregressive Transducer (HAT)

Type: Article

Publication Date: 2020-04-09

Citations: 127

DOI: https://doi.org/10.1109/icassp40776.2020.9053600

Abstract

This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoder-decoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to measure the quality of the internal language model that can be used to decide whether inference with an external language model is beneficial or not. We evaluate our proposed model on a large-scale voice search task. Our experiments show significant improvements in WER compared to the state-of-the-art approaches <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> .

Locations

  • arXiv (Cornell University) - View - PDF
  • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - View

Similar Works

Action Title Year Authors
+ Hybrid Autoregressive Transducer (hat) 2020 Ehsan Variani
David Rybach
Cyril Allauzen
Michael Riley
+ Hybrid Autoregressive Transducer (hat) 2020 Ehsan Variani
David Rybach
Cyril Allauzen
Michael A. Riley
+ PDF Chat On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer 2021 Liang Lu
Zhong Meng
Naoyuki Kanda
Jinyu Li
Yifan Gong
+ On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer 2020 Liang Lu
Zhong Meng
Naoyuki Kanda
Jinyu Li
Yifan Gong
+ PDF Chat On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer 2020 Liang Lu
Zhong Meng
Naoyuki Kanda
Jinyu Li
Yifan Gong
+ PDF Chat Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR 2024 Hainan Xu
Travis Bartley
Vladimir Bataev
Boris Ginsburg
+ PDF Chat A Non-autoregressive Model for Joint STT and TTS 2025 Vishal Sunder
Brian Kingsbury
George Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Eric Fosler‐Lussier
Luis Lastras
+ Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks 2023 Yun Tang
Anna Sun
Hirofumi Inaguma
Xinyue Chen
Ning Dong
Xutai Ma
Paden Tomasello
Juan Pino
+ Attention-based Transducer for Online Speech Recognition 2020 Bin Wang
Yan Yin
Hui Lin
+ PDF Chat Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition 2022 Zhengkun Tian
Jiangyan Yi
Jianhua Tao
Shuai Zhang
Zhengqi Wen
+ An improved hybrid CTC-Attention model for speech recognition 2018 Zhe Yuan
Zhuoran Lyu
Jiwei Li
Xi Zhou
+ Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models 2021 Mohammad Zeineldeen
Aleksandr Glushko
Wilfried Michel
Albert Zeyer
Ralf Schlüter
Hermann Ney
+ Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models 2021 Mohammad Zeineldeen
Aleksandr Glushko
Wilfried Michel
Albert Zeyer
Ralf Schlüter
Hermann Ney
+ PDF Chat Investigating Methods to Improve Language Model Integration for Attention-Based Encoder-Decoder ASR Models 2021 Mohammad Zeineldeen
Aleksandr Glushko
Wilfried Michel
Albert Zeyer
Ralf Schlüter
Hermann Ney
+ PDF Chat Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition 2024 Keyu An
Zerui Li
Zhifu Gao
Shiliang Zhang
+ PDF Chat Efficient Training of Neural Transducer for Speech Recognition 2022 Wei Zhou
Wilfried Michel
Ralf Schlüter
Hermann Ney
+ Efficient Training of Neural Transducer for Speech Recognition 2022 Wei Zhou
Wilfried Michel
Ralf Schlüter
Hermann Ney
+ End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results 2014 Jan Chorowski
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
+ Self-Attention Linguistic-Acoustic Decoder. 2018 Santiago Pascual
Antonio Bonafonte
Joan Serrà
+ PDF Chat Modular Hybrid Autoregressive Transducer 2023 Zhong Meng
Tongzhou Chen
Rohit Prabhavalkar
Yu Zhang
Gary Wang
Kartik Audhkhasi
Jesse Emond
Trevor Strohman
Bhuvana Ramabhadran
Wei Huang

Works That Cite This (106)

Action Title Year Authors
+ PDF Chat Alignment Restricted Streaming Recurrent Neural Network Transducer 2021 Jay Mahadeokar
Yuan Shangguan
Duc Le
Gil Keren
Hang Su
Thong Le
Ching-Feng Yeh
Christian Fuegen
Michael L. Seltzer
+ PDF Chat Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion 2021 Duc Le
Mahaveer Jain
Gil Keren
Suyoun Kim
Yangyang Shi
Jay Mahadeokar
Julian Chan
Yuan Shangguan
Christian Fuegen
Ozlem Kalinli
+ PDF Chat Investigating Methods to Improve Language Model Integration for Attention-Based Encoder-Decoder ASR Models 2021 Mohammad Zeineldeen
Aleksandr Glushko
Wilfried Michel
Albert Zeyer
Ralf Schlüter
Hermann Ney
+ PDF Chat On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer 2021 Liang Lu
Zhong Meng
Naoyuki Kanda
Jinyu Li
Yifan Gong
+ PDF Chat Lookup-Table Recurrent Language Models for Long Tail Speech Recognition 2021 Wen-Chin Huang
Tara N. Sainath
Cal Peyser
Shankar Kumar
David Rybach
Trevor Strohman
+ PDF Chat An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition 2023 Niko Moritz
Frank Seide
Duc Le
Jay Mahadeokar
Christian Fuegen
+ Improving Scheduled Sampling for Neural Transducer-Based ASR 2023 Takafumi Moriya
Takanori Ashihara
Hiroshi Sato
Kohei Matsuura
Tomohiro Tanaka
Ryo Masumura
+ PDF Chat A New Training Pipeline for an Improved Neural Transducer 2020 Albert Zeyer
André Merboldt
Ralf Schlüter
Hermann Ney
+ PDF Chat Tree-Constrained Pointer Generator for End-to-End Contextual Speech Recognition 2021 Guangzhi Sun
Chao Zhang
Philip C. Woodland
+ PDF Chat Graph Neural Networks for Contextual ASR With the Tree-Constrained Pointer Generator 2024 Guangzhi Sun
Chao Zhang
Philip C. Woodland

Works Cited by This (15)

Action Title Year Authors
+ Sequence Transduction with Recurrent Neural Networks 2012 Alex Graves
+ On Using Monolingual Corpora in Neural Machine Translation 2015 Çaǧlar Gülçehre
Orhan Fırat
Kelvin Xu
Kyunghyun Cho
Loïc Barrault
Huei-Chi Lin
Fethi Bougares
Holger Schwenk
Yoshua Bengio
+ PDF Chat Memoir on the Probability of the Causes of Events 1986 Pierre Simon Laplace
+ Sequence Level Training with Recurrent Neural Networks 2015 Marc’Aurelio Ranzato
Sumit Chopra
Michael Auli
Wojciech Zaremba
+ PDF Chat Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition 2017 Hagen Soltau
Hank Liao
Haşim Sak
+ PDF Chat Towards Better Decoding and Language Model Integration in Sequence to Sequence Models 2017 Jan Chorowski
Navdeep Jaitly
+ Comparison of Decoding Strategies for CTC Acoustic Models 2017 Thomas Zenkel
Ramon Sanabria
Florian Metze
Jan Niehues
Matthias Sperber
Sebastian Stüker
Alex Waibel
+ Model Unit Exploration for Sequence-to-Sequence Speech Recognition. 2019 Kazuki Irie
Rohit Prabhavalkar
Anjuli Kannan
Antoine Bruguier
David Rybach
Patrick Nguyen
+ PDF Chat Streaming End-to-end Speech Recognition for Mobile Devices 2019 Yanzhang He
Tara N. Sainath
Rohit Prabhavalkar
Ian McGraw
Raziel Álvarez
Ding Zhao
David Rybach
Anjuli Kannan
Yonghui Wu
Ruoming Pang
+ PDF Chat An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model 2018 Anjuli Kannan
Yonghui Wu
Patrick Nguyen
Tara N. Sainath
ZhiJeng Chen
Rohit Prabhavalkar