Object-Aware Aggregation With Bidirectional Temporal Graph for Video Captioning
Object-Aware Aggregation With Bidirectional Temporal Graph for Video Captioning
Video captioning aims to automatically generate natural language descriptions of video content, which has drawn a lot of attention recent years. Generating accurate and fine-grained captions needs to not only understand the global content of video, but also capture the detailed object information. Meanwhile, video representations have great impact on …