AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
Built on top of self-attention mechanisms, vision transformers have demonstrated remarkable performance on a variety of tasks recently. While achieving excellent performance, they still require relatively intensive computational cost that scales up drastically as the numbers of patches, self-attention heads and transformer blocks increase. In this paper, we argue that …