Ask a Question

Prefer a chat interface with context about you and your work?

Weakly Supervised Relative Spatial Reasoning for Visual Question Answering

Weakly Supervised Relative Spatial Reasoning for Visual Question Answering

Vision-and-language (V&L) reasoning necessitates perception of visual concepts such as objects and actions, understanding semantics and language grounding, and reasoning about the interplay between the two modalities. One crucial aspect of visual reasoning is spatial understanding, which involves understanding relative locations of objects, i.e. implicitly learning the geometry of the …