Ask a Question

Prefer a chat interface with context about you and your work?

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora. In this paper, we propose a new method for this task based on multilingual sentence embeddings. In contrast to previous approaches, which …