InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward
Model
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward
Model
Despite the promising performance of Large Vision Language Models (LVLMs) in visual understanding, they occasionally generate incorrect outputs. While reward models (RMs) with reinforcement learning or test-time scaling offer the potential for improving generation quality, a critical gap remains: publicly available multi-modal RMs for LVLMs are scarce, and the implementation …