Supervising Neural Attention Models for Video Captioning by Human Gaze Data
Supervising Neural Attention Models for Video Captioning by Human Gaze Data
The attention mechanisms in deep neural networks are inspired by humans attention that sequentially focuses on the most relevant parts of the information over time to generate prediction output. The attention parameters in those models are implicitly trained in an end-to-end manner, yet there have been few trials to explicitly …