Ask a Question

Prefer a chat interface with context about you and your work?

An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models

An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models

Language Models (LMs) excel in natural language processing tasks for English but show reduced performance in most other languages. This problem is commonly tackled by continually pre-training and fine-tuning these models for said languages. A significant issue in this process is the limited vocabulary coverage in the original model's tokenizer, …