DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Transformers are successfully applied to computer vision due to their powerful modeling capacity with self-attention. However, the excellent performance of transformers heavily depends on enormous training images. Thus, a data-efficient transformer solution is urgently needed. In this work, we propose an early knowledge distillation framework, which is termed as DearKD, …