SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large
Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large
Language Models
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the …