Ask a Question

Prefer a chat interface with context about you and your work?

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in creating video-language understanding systems with human-like senses since a …