Ask a Question

Prefer a chat interface with context about you and your work?

NLIP: Noise-Robust Language-Image Pre-training

NLIP: Noise-Robust Language-Image Pre-training

Large-scale cross-modal pre-training paradigms have recently shown ubiquitous success on a wide range of downstream tasks, e.g., zero-shot classification, retrieval and image captioning. However, their successes highly rely on the scale and quality of web-crawled data that naturally contain much incomplete and noisy information (e.g., wrong or irrelevant contents). Existing …