Learning Convolutional Text Representations for Visual Question Answering
Learning Convolutional Text Representations for Visual Question Answering
Visual question answering (VQA) is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks (CNNs) while texts are typically modeled through recurrent neural networks (RNNs). In this work, we perform a detailed …