Mitigating Language-Level Performance Disparity in mPLMs via Teacher
Language Selection and Cross-lingual Self-Distillation
Mitigating Language-Level Performance Disparity in mPLMs via Teacher
Language Selection and Cross-lingual Self-Distillation
Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow these disparities by supervise fine-tuning the mPLMs with multilingual data. However, obtaining labeled multilingual data is time-consuming, and fine-tuning mPLM with …