Improving Multi-modal Large Language Model through Boosting Vision
Capabilities
Improving Multi-modal Large Language Model through Boosting Vision
Capabilities
We focus on improving the visual understanding capability for boosting the vision-language models. We propose \textbf{Arcana}, a multiModal language model, which introduces two crucial techniques. First, we present Multimodal LoRA (MM-LoRA), a module designed to enhance the decoder. Unlike traditional language-driven decoders, MM-LoRA consists of two parallel LoRAs -- one …