Ask a Question

Prefer a chat interface with context about you and your work?

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure action is essential for a practical robotic system. Recently, Multimodal Large Language Models (MLLMs) have shown promise in visual instruction following and demonstrated strong reasoning …