Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Type: Article

Publication Date: 2023-01-01

Citations: 3

DOI: https://doi.org/10.18653/v1/2023.findings-acl.338

Abstract

Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model’s parameters. New language-specific embeddings can then be efficiently trained over the mini-model and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MINIJOINT, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MINIPOST, where we start from a regular pretrained model, build a mini-model by extracting and freezing a few layers, and learn a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using up to 2.3x less compute on average.

Locations

  • arXiv (Cornell University) - View - PDF
  • Findings of the Association for Computational Linguistics: ACL 2022 - View - PDF

Similar Works

Action Title Year Authors
+ Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training 2022 Kelly Marchisio
Patrick Lewis
Yihong Chen
Mikel Artetxe
+ PDF Chat Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models 2024 James Vo
+ PDF Chat WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models 2022 Benjamin Minixhofer
Fabian Paischer
Navid Rekabsaz
+ PDF Chat DEPT: Decoupled Embeddings for Pre-training Language Models 2024 Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
+ PDF Chat LangSAMP: Language-Script Aware Multilingual Pretraining 2024 Yihong Liu
Haotian Ye
Chunlan Ma
Mingyang Wang
Hinrich Schütze
+ Efficiently Adapting Pretrained Language Models To New Languages 2023 Zoltan Csaki
Pian Pawakapan
Urmish Thakker
Qiantong Xu
+ bert2BERT: Towards Reusable Pretrained Language Models 2021 Cheng Chen
Yichun Yin
Lifeng Shang
Xin Jiang
Yujia Qin
Feng‐Yu Wang
Zhi Wang
Xiao Dong Chen
Zhiyuan Liu
Qun Liu
+ Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning 2021 Runxin Xu
Fuli Luo
Zhiyuan Zhang
Chuanqi Tan
Baobao Chang
Songfang Huang
Fei Huang
+ PDF Chat Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning 2021 Runxin Xu
Fuli Luo
Zhiyuan Zhang
Chuanqi Tan
Baobao Chang
Songfang Huang
Fei Huang
+ Rethinking embedding coupling in pre-trained language models 2020 Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
+ Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning 2022 Shuo Xie
Jiahao Qiu
Ankita Pasad
Li Du
Qing Qu
Hongyuan Mei
+ Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning 2022 Shuo Xie
Jiahao Qiu
Ankita Pasad
Li Du
Qing Qu
Hongyuan Mei
+ MaLA-500: Massive Language Adaptation of Large Language Models 2024 Peiqin Lin
Shaoxiong Ji
Jörg Tiedemann
André F. T. Martins
Hinrich Schütze
+ SAS: Self-Augmentation Strategy for Language Model Pre-training 2021 Yifei Xu
Jingqiao Zhang
He Ru
Liangzhu Ge
Chao Yang
Cheng Yang
Ying Wu
+ PDF Chat SAS: Self-Augmentation Strategy for Language Model Pre-training 2022 Yifei Xu
Jingqiao Zhang
He Ru
Liangzhu Ge
Chao Yang
Cheng Yang
Ying Wu
+ PDF Chat UNKs Everywhere: Adapting Multilingual Language Models to New Scripts 2021 Jonas Pfeiffer
Ivan Vulić
Iryna Gurevych
Sebastian Ruder
+ UNKs Everywhere: Adapting Multilingual Language Models to New Scripts 2020 Jonas Pfeiffer
Ivan Vulić
Iryna Gurevych
Sebastian Ruder
+ Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch 2023 Le Yu
Bowen Yu
Haiyang Yu
Fei Huang
Yongbin Li
+ PDF Chat Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization 2024 Mohammad Samragh
Iman Mirzadeh
Keivan Alizadeh Vahid
Fartash Faghri
Minsik Cho
Moin Nabi
Devang Naik
Mehrdad Farajtabar
+ UNKs Everywhere: Adapting Multilingual Language Models to New Scripts 2020 Jonas Pfeiffer
Ivan Vulić
Iryna Gurevych
Sebastian Ruder