Ask a Question

Prefer a chat interface with context about you and your work?

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering

Video question answering requires the models to understand and reason about both the complex video and language data to correctly derive the answers. Existing efforts have been focused on designing sophisticated cross-modal interactions to fuse the information from two modalities, while encoding the video and question holistically as frame and …