Ask a Question

Prefer a chat interface with context about you and your work?

Identity-Aware Textual-Visual Matching with Latent Co-attention

Identity-Aware Textual-Visual Matching with Latent Co-attention

Textual-visual matching aims at measuring similarities between sentence descriptions and images. Most existing methods tackle this problem without effectively utilizing identity-level annotations. In this paper, we propose an identity-aware two-stage framework for the textual-visual matching problem. Our stage-1 CNN-LSTM network learns to embed cross-modal features with a novel Cross-Modal Cross-Entropy …