End-to-End Video Captioning With Multitask Reinforcement Learning
End-to-End Video Captioning With Multitask Reinforcement Learning
Although end-to-end (E2E) learning has led to impressive progress on a variety of visual understanding tasks, it is often impeded by hardware constraints (e.g., GPU memory) and is prone to overfitting. When it comes to video captioning, one of the most challenging benchmark tasks in computer vision, those limitations of …