Weakly Supervised Temporal Adjacent Network for Language Grounding
Weakly Supervised Temporal Adjacent Network for Language Grounding
Temporal language grounding (TLG) is a fundamental and challenging problem for vision and language understanding. Existing methods mainly focus on fully supervised setting with temporal boundary labels for training, which, however, suffers expensive cost of annotation. In this work, we are dedicated to weakly supervised TLG, where multiple description sentences …