Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning
In this paper, the problem of describing visual contents of a video sequence with natural language is addressed. Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) in a novel encoder-decoder-reconstructor architecture, which leverages both forward …