Ask a Question

Prefer a chat interface with context about you and your work?

Sequence-Level Knowledge Distillation

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches.However to reach competitive performance, NMT models need to be exceedingly large.In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006;Hinton et al., 2015) that have proven successful for reducing …