Scaling End-to-End Models for Large-Scale Multilingual ASR

Type: Article

Publication Date: 2021-12-13

Citations: 42

DOI: https://doi.org/10.1109/asru51503.2021.9687871

Abstract

Building ASR models across many languages is a challenging multi-task learning problem due to large variations and heavily unbalanced data. Existing work has shown positive transfer from high resource to low resource languages. However, degradations on high resource languages are commonly observed due to interference from the heterogeneous multilingual data and reduction in per-language capacity. We conduct a capacity study on a 15-language task, with the amount of data per language varying from 7.6K to 53.5K hours. We adopt GShard [1] to efficiently scale up to 10B parameters. Empirically, we find that (1) scaling the number of model parameters is an effective way to solve the capacity bottleneck - our 500M-param model already outperforms monolingual baselines and scaling it to 1B and 10B brought further quality gains; (2) larger models are not only more data efficient, but also more efficient in terms of training cost as measured in TPU days - the 1B-param model reaches the same accuracy at 34% of training time as the 500M-param model; (3) given a fixed capacity budget, adding depth works better than width and large encoders do better than large decoders; (4) with continuous training, they can be adapted to new languages and domains.

Locations

  • arXiv (Cornell University) - View - PDF
  • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) - View

Similar Works

Action Title Year Authors
+ Scaling End-to-End Models for Large-Scale Multilingual ASR 2021 Bo Li
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Yu Zhang
James Qin
Parisa Haghani
Wen-Chin Huang
Min Ma
+ Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities 2023 Andros Tjandra
Nayan Singhal
David Zhang
Ozlem Kalinli
Abdelrahman Mohamed
Manh Duc Le
Michael L. Seltzer
+ Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities 2022 Andros Tjandra
Nayan Singhal
David Zhang
Ozlem Kalinli
Abdelrahman Mohamed
Manh Duc Le
Michael L. Seltzer
+ PDF Chat LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language 2024 Çağrı Toraman
+ Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning 2023 Zhongzhi Yu
Yang Zhang
Kaizhi Qian
Yonggan Fu
Yingyan Lin
+ Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters 2020 Vineel Pratap
Anuroop Sriram
Paden Tomasello
Awni Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
Ronan Collobert
+ Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters 2020 Vineel Pratap
Anuroop Sriram
Paden Tomasello
Awni Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
Ronan Collobert
+ PDF Chat EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models 2024 Shaoxiong Ji
Zihao Li
Indraneil Paul
Jouni Paavola
Peiqin Lin
Pinzhen Chen
DayyĂĄn O'Brien
Hengyu Luo
Hinrich SchĂŒtze
Jörg Tiedemann
+ PDF Chat Distilling a Pretrained Language Model to a Multilingual ASR Model 2022 Kwanghee Choi
Hyung‐Min Park
+ PDF Chat Targeted Multilingual Adaptation for Low-resource Language Families 2024 Carlton Downey
Terra Blevins
Dhwani Serai
Dwija Parikh
Shane Steinert‐Threlkeld
+ PDF Chat Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models 2024 James Vo
+ PDF Chat Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters 2020 Vineel Pratap
Anuroop Sriram
Paden Tomasello
Awni Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
Ronan Collobert
+ Extrapolating Large Language Models to Non-English by Aligning Languages 2023 Wenhao Zhu
Yunzhe Lv
Qingxiu Dong
Fei Yuan
Jingjing Xu
Shujian Huang
Lingpeng Kong
Jiajun Chen
Lei Li
+ Tricks for Training Sparse Translation Models 2021 Dheeru Dua
Shruti Bhosale
Vedanuj Goswami
James H. Cross
Michael Lewis
Angela Fan
+ PDF Chat Language models scale reliably with over-training and on downstream tasks 2024 Samir Yitzhak Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
Rulin Shao
Jean Mercat
Alex Chengyu Fang
Jeffrey Li
Sedrick Scott Keh
+ PDF Chat Exploring Design Choices for Building Language-Specific LLMs 2024 Atula Tejaswi
Nilesh Gupta
Eunsol Choi
+ MaLA-500: Massive Language Adaptation of Large Language Models 2024 Peiqin Lin
Shaoxiong Ji
Jörg Tiedemann
André F. T. Martins
Hinrich SchĂŒtze
+ PDF Chat Aya 23: Open Weight Releases to Further Multilingual Progress 2024 Viraat Aryabumi
John Dang
Dwarak Talupuru
Saurabh Dash
David Cairuz
Hangyu Lin
Bharat Venkitesh
Madeline Smith
Kelly Marchisio
Sebastian Ruder
+ Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning 2022 Barun Patra
Saksham Singhal
Shaohan Huang
Zewen Chi
Dong Li
Furu Wei
Vishrav Chaudhary
Song Xia
+ PDF Chat LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models 2024 Xi Chen
Songyang Zhang
Qibing Bai
Chaoyu Chen
Satoshi Nakamura

Works That Cite This (29)

Action Title Year Authors
+ Towards Zero-Shot Code-Switched Speech Recognition 2023 Brian Yan
Matthew Wiesner
Ondƙej Klejch
Preethi Jyothi
Shinji Watanabe
+ UML: A Universal Monolingual Output Layer For Multilingual Asr 2023 Chao Zhang
Bo Li
Tara N. Sainath
Trevor Strohman
Shuo-Yiin Chang
+ PDF Chat FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech 2023 Alexis Conneau
Min Ma
Simran Khanuja
Yu Zhang
Vera Axelrod
Siddharth Dalmia
Jason Riesa
Clara E. Rivera
Ankur Bapna
+ PDF Chat Language Adaptive Cross-Lingual Speech Representation Learning with Sparse Sharing Sub-Networks 2022 Yizhou Lu
Mingkun Huang
Xinghua Qu
Pengfei Wei
Zejun Ma
+ Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer 2023 Kunal Dhawan
KDimating Rekesh
Boris Ginsburg
+ PDF Chat BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition 2022 Yu Zhang
Daniel Park
Wei Han
James Qin
Anmol Gulati
Joel Shor
Aren Jansen
Yuanzhong Xu
Yanping Huang
Shibo Wang
+ PDF Chat A Configurable Multilingual Model is All You Need to Recognize All Languages 2022 Long Zhou
Jinyu Li
Eric Sun
Shujie Liu
+ Learning ASR Pathways: A Sparse Multilingual ASR Model 2023 Mu Yang
Andros Tjandra
Chunxi Liu
David Zhang
Duc Le
Ozlem Kalinli
+ PDF Chat A Language Agnostic Multilingual Streaming On-Device ASR System 2022 Bo Li
Tara N. Sainath
Ruoming Pang
Shuo-Yiin Chang
Qiumin Xu
Trevor Strohman
Vince Chen
Qiao Liang
Heguang Liu
Yanzhang He
+ PDF Chat Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data 2023 Yifan Peng
Jinchuan Tian
Brian Yan
Dan Berrebbi
Xuankai Chang
Xinjian Li
Jiatong Shi
Siddhant Arora
William Chen
Roshan Sharma

Works Cited by This (32)

Action Title Year Authors
+ Sequence Transduction with Recurrent Neural Networks 2012 Alex Graves
+ Listen, Attend and Spell 2015 William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
+ TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems 2016 Martı́n Abadi
Ashish Agarwal
Paul Barham
Eugene Brevdo
Zhifeng Chen
Craig Citro
Gregory S. Corrado
Andy Davis
Jay B. Dean
Matthieu Devin
+ PDF Chat Joint CTC-attention based end-to-end speech recognition using multi-task learning 2017 Suyoun Kim
Takaaki Hori
Shinji Watanabe
+ Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages 2018 Shiyu Zhou
Shuang Xu
Bo Xu
+ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018 Jacob Devlin
Ming‐Wei Chang
Kenton Lee
Kristina Toutanova
+ Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling 2019 Jonathan Shen
Patrick Nguyen
Yonghui Wu
Zhifeng Chen
Mia Xu Chen
Jia Ye
Anjuli Kannan
Tara N. Sainath
Yuan Cao
Chung‐Cheng Chiu
+ PDF Chat SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition 2019 Daniel Park
William Chan
Yu Zhang
Chung‐Cheng Chiu
Barret Zoph
Ekin D. Cubuk
Quoc V. Le
+ Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges 2019 Naveen Arivazhagan
Ankur Bapna
Orhan Fırat
Dmitry Lepikhin
Melvin Johnson
Maxim Krikun
Mia Xu Chen
Yuan Cao
George Foster
Colin Cherry
+ PDF Chat Streaming End-to-end Speech Recognition for Mobile Devices 2019 Yanzhang He
Tara N. Sainath
Rohit Prabhavalkar
Ian McGraw
Raziel Álvarez
Ding Zhao
David Rybach
Anjuli Kannan
Yonghui Wu
Ruoming Pang