LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with the …