Ask a Question

Prefer a chat interface with context about you and your work?

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

The recent surge in high-quality visual instruction tuning samples from closed-source vision-language models (VLMs) such as GPT-4V has accelerated the release of open-source VLMs across various model sizes. However, scaling VLMs to improve performance using larger models brings significant computational challenges, especially for deployment on resource-constrained devices like mobile platforms …