PVC: Progressive Visual Token Compression for Unified Image and Video
Processing in Large Vision-Language Models
PVC: Progressive Visual Token Compression for Unified Image and Video
Processing in Large Vision-Language Models
Large Vision-Language Models (VLMs) have been extended to understand both images and videos. Visual token compression is leveraged to reduce the considerable token length of visual inputs. To meet the needs of different tasks, existing high-performance models usually process images and videos separately with different token compression strategies, limiting the …