Ask a Question

Prefer a chat interface with context about you and your work?

RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models

RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models

Large Multi-modal Models (LMMs) have significantly advanced a variety of vision-language tasks. The scalability and availability of high-quality training data play a pivotal role in the success of LMMs. In the realm of food, while comprehensive food datasets such as Recipe1M offer an abundance of ingredient and recipe information, they …