Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Diverse video captioning aims to generate a set of sentences to describe the given video in various aspects. Mainstream methods are trained with independent pairs of a video and a caption from its ground-truth set without exploiting the intra-set relationship, resulting in low diversity of generated captions. Different from them, …