An Empirical Comparison of Vocabulary Expansion and Initialization
Approaches for Language Models
An Empirical Comparison of Vocabulary Expansion and Initialization
Approaches for Language Models
Language Models (LMs) excel in natural language processing tasks for English but show reduced performance in most other languages. This problem is commonly tackled by continually pre-training and fine-tuning these models for said languages. A significant issue in this process is the limited vocabulary coverage in the original model's tokenizer, …