MDP3: A Training-free Approach for List-wise Frame Selection in
Video-LLMs
MDP3: A Training-free Approach for List-wise Frame Selection in
Video-LLMs
Video large language models (Video-LLMs) have made significant progress in understanding videos. However, processing multiple frames leads to lengthy visual token sequences, presenting challenges such as the limited context length cannot accommodate the entire video, and the inclusion of irrelevant frames hinders visual perception. Hence, effective frame selection is crucial. …