Towards Better Decoding and Language Model Integration in Sequence to Sequence Models

Type: Article

Publication Date: 2017-08-16

Citations: 334

DOI: https://doi.org/10.21437/interspeech.2017-343

Abstract

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates replacing complex data processing pipelines, such as an entire automatic speech recognition system, with a single neural network trained in an end-to-end fashion.In this contribution, we analyse an attention-based seq2seq speech recognition system that directly transcribes recordings into characters.We observe two shortcomings: overconfidence in its predictions and a tendency to produce incomplete transcriptions when language models are used.We propose practical solutions to both problems achieving competitive speaker independent word error rates on the Wall Street Journal dataset: without separate language models we reach 10.6% WER, while together with a trigram language model, we reach 6.7% WER.

Locations

  • arXiv (Cornell University) - View - PDF
  • Interspeech 2022 - View

Similar Works

Action Title Year Authors
+ Towards better decoding and language model integration in sequence to sequence models 2016 Jan Chorowski
Navdeep Jaitly
+ Towards better decoding and language model integration in sequence to sequence models 2016 Jan Chorowski
Navdeep Jaitly
+ PDF Chat An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model 2018 Anjuli Kannan
Yonghui Wu
Patrick Nguyen
Tara N. Sainath
ZhiJeng Chen
Rohit Prabhavalkar
+ An analysis of incorporating an external language model into a sequence-to-sequence model 2017 Anjuli Kannan
Yonghui Wu
Patrick Nguyen
Tara N. Sainath
Zhifeng Chen
Rohit Prabhavalkar
+ An analysis of incorporating an external language model into a sequence-to-sequence model 2017 Anjuli Kannan
Yonghui Wu
Patrick Nguyen
Tara N. Sainath
Zhifeng Chen
Rohit Prabhavalkar
+ Cold Fusion: Training Seq2Seq Models Together with Language Models 2017 Anuroop Sriram
Heewoo Jun
Sanjeev Satheesh
Adam Coates
+ Cold Fusion: Training Seq2Seq Models Together with Language Models 2017 Anuroop Sriram
Heewoo Jun
Sanjeev Satheesh
Adam Coates
+ PDF Chat Cold Fusion: Training Seq2Seq Models Together with Language Models 2018 Anuroop Sriram
Heewoo Jun
Sanjeev Satheesh
Adam Coates
+ Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition 2018 Raden Mu'az Mun'im
Nakamasa Inoue
Koichi Shinoda
+ PDF Chat Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition 2019 Raden Mu'az Mun'im
Nakamasa Inoue
Koichi Shinoda
+ OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models 2018 Oleksii Kuchaiev
Boris Ginsburg
Igor Gitman
Vitaly Lavrukhin
Carl Case
Paulius Micikevicius
+ PDF Chat Improved Training of End-to-end Attention Models for Speech Recognition 2018 Albert Zeyer
Kazuki Irie
Ralf Schlüter
Hermann Ney
+ PDF Chat A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition 2018 Shubham Toshniwal
Anjuli Kannan
Chung‐Cheng Chiu
Yonghui Wu
Tara N. Sainath
Karen Livescu
+ A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition 2018 Shubham Toshniwal
Anjuli Kannan
Chung‐Cheng Chiu
Yonghui Wu
Tara N. Sainath
Karen Livescu
+ PDF Chat Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling 2018 Jaejin Cho
Murali Karthick Baskar
Ruizhi Li
Matthew Wiesner
Sri Harish Mallidi
Nelson Yalta
Martin Karafiát
Shinji Watanabe
Takaaki Hori
+ Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard 2020 Zoltán Tüske
George Saon
Kartik Audhkhasi
Brian Kingsbury
+ Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard 2020 Zoltán Tüske
George Saon
Kartik Audhkhasi
Brian Kingsbury
+ PDF Chat Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard 2020 Zoltán Tüske
George Saon
Kartik Audhkhasi
Brian Kingsbury
+ PDF Chat SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech 2024 Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
+ Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling 2018 Jaejin Cho
Murali Karthick Baskar
Ruizhi Li
Matthew Wiesner
Sri Harish Mallidi
Nelson Yalta
Martin Karafiát
Shinji Watanabe
Takaaki Hori

Works That Cite This (190)

Action Title Year Authors
+ PDF Chat SoftSeg: Advantages of soft versus binary training for image segmentation 2021 Charley Gros
Andréanne Lemay
Julien Cohen‐Adad
+ PDF Chat On the Limit of English Conversational Speech Recognition 2021 Zoltán Tüske
George Saon
Brian Kingsbury
+ Multi-Stream End-to-End Speech Recognition 2019 Ruizhi Li
Xiaofei Wang
Sri Harish Mallidi
Shinji Watanabe
Takaaki Hori
Hynek Heřmanský
+ PDF Chat On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer 2021 Liang Lu
Zhong Meng
Naoyuki Kanda
Jinyu Li
Yifan Gong
+ PDF Chat StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR 2021 Hirofumi Inaguma
Tatsuya Kawahara
+ PDF Chat Back-Translation-Style Data Augmentation for end-to-end ASR 2018 Tomoki Hayashi
Shinji Watanabe
Yu Zhang
Tomoki Toda
Takaaki Hori
Ramón Fernández Astudillo
Kazuya Takeda
+ Two-Pass End-to-End Speech Recognition 2019 Tara N. Sainath
Ruoming Pang
David Rybach
Yanzhang He
Rohit Prabhavalkar
Wei Li
Mirkó Visontai
Qiao Liang
Trevor Strohman
Yonghui Wu
+ Neural Machine Translation 2020 Philipp Koehn
+ PDF Chat Espresso: A Fast End-to-End Neural Speech Recognition Toolkit 2019 Yiming Wang
Tongfei Chen
Xu H
Shuoyang Ding
Hang Lv
Yiwen Shao
Nanyun Peng
Lei Xie
Shinji Watanabe
Sanjeev Khudanpur
+ PDF Chat No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models 2018 Tara N. Sainath
Rohit Prabhavalkar
Shankar Kumar
Seungji Lee
Anjuli Kannan
David Rybach
Vlad Schogol
Patrick Nguyen
Bo Li
Yonghui Wu

Works Cited by This (25)

Action Title Year Authors
+ EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding 2015 Yajie Miao
Mohammad Gowayyed
Florian Metze
+ Listen, Attend and Spell 2015 William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
+ PDF Chat Effective Approaches to Attention-based Neural Machine Translation 2015 Thang Luong
Hieu Pham
Christopher D. Manning
+ On Using Monolingual Corpora in Neural Machine Translation 2015 Çaǧlar Gülçehre
Orhan Fırat
Kelvin Xu
Kyunghyun Cho
Loïc Barrault
Huei-Chi Lin
Fethi Bougares
Holger Schwenk
Yoshua Bengio
+ Deep Speech: Scaling up end-to-end speech recognition 2014 Awni Hannun
Carl Case
Jared Casper
Bryan Catanzaro
Greg Diamos
Erich Elsen
Ryan Prenger
Sanjeev Satheesh
Shubho Sengupta
Adam Coates
+ End-to-End Attention-based Large Vocabulary Speech Recognition 2015 Dzmitry Bahdanau
Jan Chorowski
Dmitriy Serdyuk
Philémon Brakel
Yoshua Bengio
+ Sequence to Sequence Learning with Neural Networks 2014 Ilya Sutskever
Oriol Vinyals
Quoc V. Le
+ Neural Machine Translation by Jointly Learning to Align and Translate 2014 Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
+ PDF Chat Rethinking the Inception Architecture for Computer Vision 2016 Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jon Shlens
Zbigniew Wojna
+ Deep Speech 2: End-to-End Speech Recognition in English and Mandarin 2015 Dario Amodei
Rishita Anubhai
Eric Battenberg
Carl Case
Jared Casper
Bryan Catanzaro
Jingdong Chen
Mike Chrzanowski
Adam Coates
Greg Diamos