Ask a Question

Prefer a chat interface with context about you and your work?

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full …