Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe

Type: Article

Publication Date: 2023-01-01

Citations: 3

DOI: https://doi.org/10.18653/v1/2023.findings-acl.338

Abstract

Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model’s parameters. New language-specific embeddings can then be efficiently trained over the mini-model and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MINIJOINT, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MINIPOST, where we start from a regular pretrained model, build a mini-model by extracting and freezing a few layers, and learn a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using up to 2.3x less compute on average.

Locations

arXiv (Cornell University) - View - PDF
Findings of the Association for Computational Linguistics: ACL 2022 - View - PDF

Similar Works

Action	Title	Year	Authors
+	Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training	2022	Kelly Marchisio Patrick Lewis Yihong Chen Mikel Artetxe
+ PDF Chat	Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models	2024	James Vo
+ PDF Chat	WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models	2022	Benjamin Minixhofer Fabian Paischer Navid Rekabsaz
+ PDF Chat	DEPT: Decoupled Embeddings for Pre-training Language Models	2024	Alex Iacob Lorenzo Sani Meghdad Kurmanji William F. Shen Xinchi Qiu Dongqi Cai Yan Gao Nicholas D. Lane
+ PDF Chat	LangSAMP: Language-Script Aware Multilingual Pretraining	2024	Yihong Liu Haotian Ye Chunlan Ma Mingyang Wang Hinrich Schütze
+	Efficiently Adapting Pretrained Language Models To New Languages	2023	Zoltan Csaki Pian Pawakapan Urmish Thakker Qiantong Xu
+	bert2BERT: Towards Reusable Pretrained Language Models	2021	Cheng Chen Yichun Yin Lifeng Shang Xin Jiang Yujia Qin Feng‐Yu Wang Zhi Wang Xiao Dong Chen Zhiyuan Liu Qun Liu
+	Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning	2021	Runxin Xu Fuli Luo Zhiyuan Zhang Chuanqi Tan Baobao Chang Songfang Huang Fei Huang
+ PDF Chat	Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning	2021	Runxin Xu Fuli Luo Zhiyuan Zhang Chuanqi Tan Baobao Chang Songfang Huang Fei Huang
+	Rethinking embedding coupling in pre-trained language models	2020	Hyung Won Chung Thibault Févry Henry Tsai Melvin Johnson Sebastian Ruder
+	Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning	2022	Shuo Xie Jiahao Qiu Ankita Pasad Li Du Qing Qu Hongyuan Mei
+	Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning	2022	Shuo Xie Jiahao Qiu Ankita Pasad Li Du Qing Qu Hongyuan Mei
+	MaLA-500: Massive Language Adaptation of Large Language Models	2024	Peiqin Lin Shaoxiong Ji Jörg Tiedemann André F. T. Martins Hinrich Schütze
+	SAS: Self-Augmentation Strategy for Language Model Pre-training	2021	Yifei Xu Jingqiao Zhang He Ru Liangzhu Ge Chao Yang Cheng Yang Ying Wu
+ PDF Chat	SAS: Self-Augmentation Strategy for Language Model Pre-training	2022	Yifei Xu Jingqiao Zhang He Ru Liangzhu Ge Chao Yang Cheng Yang Ying Wu
+ PDF Chat	UNKs Everywhere: Adapting Multilingual Language Models to New Scripts	2021	Jonas Pfeiffer Ivan Vulić Iryna Gurevych Sebastian Ruder
+	UNKs Everywhere: Adapting Multilingual Language Models to New Scripts	2020	Jonas Pfeiffer Ivan Vulić Iryna Gurevych Sebastian Ruder
+	Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch	2023	Le Yu Bowen Yu Haiyang Yu Fei Huang Yongbin Li
+ PDF Chat	Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization	2024	Mohammad Samragh Iman Mirzadeh Keivan Alizadeh Vahid Fartash Faghri Minsik Cho Moin Nabi Devang Naik Mehrdad Farajtabar
+	UNKs Everywhere: Adapting Multilingual Language Models to New Scripts	2020	Jonas Pfeiffer Ivan Vulić Iryna Gurevych Sebastian Ruder

Works That Cite This (1)

Action	Title	Year	Authors
+	Efficient Multilingual Language Model Compression through Vocabulary Trimming	2023	Asahi Ushio Yi Zhou José Camacho-Collados

Works Cited by This (34)

Action	Title	Year	Authors
+ PDF Chat	XNLI: Evaluating Cross-lingual Sentence Representations	2018	Alexis Conneau Ruty Rinott Guillaume Lample Adina Williams Samuel Bowman Holger Schwenk Veselin Stoyanov
+	Cross-lingual Language Model Pretraining	2019	Guillaume Lample Alexis Conneau
+ PDF Chat	fairseq: A Fast, Extensible Toolkit for Sequence Modeling	2019	Myle Ott Sergey Edunov Alexei Baevski Angela Fan Sam Gross Nathan Ng David Grangier Michael Auli
+	SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing	2018	Taku Kudo John T. E. Richardson
+	Parameter-Efficient Transfer Learning for NLP	2019	Neil Houlsby Andrei Giurgiu Stanisław Jastrzȩbski Bruna Morrone Quentin de Laroussilhe Andréa Gesmundo Mona Attariyan Sylvain Gelly
+	RoBERTa: A Robustly Optimized BERT Pretraining Approach	2019	Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer Veselin Stoyanov
+	PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification	2019	Yinfei Yang Yuan Zhang Chris Tar Jason Baldridge
+	Simple, Scalable Adaptation for Neural Machine Translation	2019	Ankur Bapna Orhan Fırat
+	From English To Foreign Languages: Transferring Pre-trained Language Models	2020	Ke Tran
+	BERTje: A Dutch BERT Model	2019	Wietse de Vries Andreas van Cranenburgh Arianna Bisazza Tommaso Caselli Gertjan van Noord Malvina Nissim