FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint
Textual and Visual Clues
FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint
Textual and Visual Clues
Multi-modal reasoning plays a vital role in bridging the gap between textual and visual information, enabling a deeper understanding of the context. This paper presents the Feature Swapping Multi-modal Reasoning (FSMR) model, designed to enhance multi-modal reasoning through feature swapping. FSMR leverages a pre-trained visual-language model as an encoder, accommodating …