Ask a Question

Prefer a chat interface with context about you and your work?

Compressing Visual-linguistic Model via Knowledge Distillation

Compressing Visual-linguistic Model via Knowledge Distillation

Despite exciting progress in pre-training for visual-linguistic (VL) representations, very few aspire to a small VL model. In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model. The major challenge arises from the inconsistent regional visual tokens extracted from …