RoDE: Linear Rectified Mixture of Diverse Experts for Food Large
Multi-Modal Models
RoDE: Linear Rectified Mixture of Diverse Experts for Food Large
Multi-Modal Models
Large Multi-modal Models (LMMs) have significantly advanced a variety of vision-language tasks. The scalability and availability of high-quality training data play a pivotal role in the success of LMMs. In the realm of food, while comprehensive food datasets such as Recipe1M offer an abundance of ingredient and recipe information, they …