Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
In Visual Question Answering, most existing approaches adopt the pipeline of representing an image via pre-trained CNNs, and then using the uninterpretable CNN features in conjunction with the question to predict the answer. Although such end-to-end models might report promising performance, they rarely provide any insight, apart from the answer, …