Ask a Question

Prefer a chat interface with context about you and your work?

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

Previous open-source large multimodal models (LMMs) have faced several limitations: (1) they often lack native integration, requiring adapters to align visual representations with pre-trained large language models (LLMs); (2) many are restricted to single-modal generation; (3) while some support multimodal generation, they rely on separate diffusion models for visual modeling …