Ask a Question

Prefer a chat interface with context about you and your work?

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Large Vision-Language Models (VLMs) have been extended to understand both images and videos. Visual token compression is leveraged to reduce the considerable token length of visual inputs. To meet the needs of different tasks, existing high-performance models usually process images and videos separately with different token compression strategies, limiting the …