Ask AI a math question

Related Paper

Memory-Attended Recurrent Network for Video Captioning

Typical techniques for video captioning follow the encoder-decoder framework, which can only focus on one source video being processed. A potential disadvantage of such design is that it cannot capture the multiple visual context information of a word appearing in more than one relevant videos in training data. To tackle …

Ask a Question