Exploring Human-Like Attention Supervision in Visual Question Answering
Exploring Human-Like Attention Supervision in Visual Question Answering
Attention mechanisms have been widely applied in the Visual Question Answering (VQA) task, as they help to focus on the area-of-interest of both visual and textual information. To answer the questions correctly, the model needs to selectively target different areas of an image, which suggests that an attention-based model may …