Ask a Question

Prefer a chat interface with context about you and your work?

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

Vision Transformers have been the most popular network architecture in visual recognition recently due to the strong ability of encode global information. However, its high computational cost when processing high-resolution images limits the applications in downstream tasks. In this paper, we take a deep look at the internal structure of …