Relationship-Embedded Representation Learning for Grounding Referring Expressions
Relationship-Embedded Representation Learning for Grounding Referring Expressions
Grounding referring expressions in images aims to locate the object instance in an image described by a referring expression. It involves a joint understanding of natural language and image content, and is essential for a range of visual tasks related to human-computer interaction. As a language-to-vision matching task, the core …