Ask a Question

Prefer a chat interface with context about you and your work?

DEPT: Decoupled Embeddings for Pre-training Language Models

DEPT: Decoupled Embeddings for Pre-training Language Models

Language Model pre-training benefits from a broader data mixture to enhance performance across domains and languages. However, training on such heterogeneous text corpora is complex, requiring extensive and cost-intensive efforts. Since these data sources vary in lexical, syntactic, and semantic aspects, they cause negative interference or the "curse of multilinguality". …