Ask a Question

Prefer a chat interface with context about you and your work?

MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs

MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs

Video large language models (Video-LLMs) have made significant progress in understanding videos. However, processing multiple frames leads to lengthy visual token sequences, presenting challenges such as the limited context length cannot accommodate the entire video, and the inclusion of irrelevant frames hinders visual perception. Hence, effective frame selection is crucial. …