Ask a Question

Prefer a chat interface with context about you and your work?

ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition

ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition

Vision Transformers (ViTs) have achieved remarkable success in various computer vision tasks. However, ViTs have a huge computational cost due to their inherent reliance on multi-head self-attention (MHSA), prompting efforts to accelerate ViTs for practical applications. To this end, recent works aim to reduce the number of tokens, mainly focusing …