Ask a Question

Prefer a chat interface with context about you and your work?

MetaFormer is Actually What You Need for Vision

MetaFormer is Actually What You Need for Vision

Transformers have shown great potential in computer vision tasks. A common belief is their attention-based token mixer module contributes most to their competence. However, recent works show the attention-based module in transformers can be replaced by spatial MLPs and the resulted models still perform quite well. Based on this observation, …