Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that have the same category as others. However, most previous methods underestimate such information. …