Prefer a chat interface with context about you and your work?
CLAREL: Classification via retrieval loss for zero-shot learning
We address the problem of learning cross-modal representations. We propose an instance-based deep metric learning approach in joint visual and textual space. The key novelty of this paper is that it shows that using per-image semantic supervision leads to substantial improvement in zero-shot performance over using class-only supervision. We also …