Ask a Question

Prefer a chat interface with context about you and your work?

Memory-Attended Recurrent Network for Video Captioning

Memory-Attended Recurrent Network for Video Captioning

Typical techniques for video captioning follow the encoder-decoder framework, which can only focus on one source video being processed. A potential disadvantage of such design is that it cannot capture the multiple visual context information of a word appearing in more than one relevant videos in training data. To tackle …