Ask a Question

Prefer a chat interface with context about you and your work?

Deep visual-semantic alignments for generating image descriptions

Deep visual-semantic alignments for generating image descriptions

We present a model that generates natural language descriptions of images and their regions. Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data. Our alignment model is based on a novel combination of Convolutional Neural Networks over image …