Ask a Question

Prefer a chat interface with context about you and your work?

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with the …