Ask a Question

Prefer a chat interface with context about you and your work?

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict non-visual words such as the and of. Other words that may seem …