Ask a Question

Prefer a chat interface with context about you and your work?

CAT: Content-Adaptive Image Tokenization

CAT: Content-Adaptive Image Tokenization

Most existing image tokenizers encode images into a fixed number of tokens or patches, overlooking the inherent variability in image complexity. To address this, we introduce Content-Adaptive Tokenizer (CAT), which dynamically adjusts representation capacity based on the image content and encodes simpler images into fewer tokens. We design a caption-based …