MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large
Vision-Language Models
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large
Vision-Language Models
Visual preference alignment involves training Large Vision-Language Models (LVLMs) to predict human preferences between visual inputs. This is typically achieved by using labeled datasets of chosen/rejected pairs and employing optimization algorithms like direct preference optimization (DPO). Existing visual alignment methods, primarily designed for single-image scenarios, struggle to effectively handle the …