Sequence-Level Knowledge Distillation
Sequence-Level Knowledge Distillation
Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches.However to reach competitive performance, NMT models need to be exceedingly large.In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006;Hinton et al., 2015) that have proven successful for reducing …