Ask a Question

Prefer a chat interface with context about you and your work?

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

While open-source multi-modal language models perform well on simple question answering tasks, they often fail on complex questions that require multiple capabilities, such as fine-grained recognition, visual grounding, and reasoning, and that demand multi-step solutions. We present TACO, a family of multi-modal large action models designed to improve performance on …