Ask a Question

Prefer a chat interface with context about you and your work?

Scene Text Visual Question Answering

Scene Text Visual Question Answering

Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the Visual Question Answering process. …