Self-Corrected Multimodal Large Language Model for End-to-End Robot
Manipulation
Self-Corrected Multimodal Large Language Model for End-to-End Robot
Manipulation
Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure action is essential for a practical robotic system. Recently, Multimodal Large Language Models (MLLMs) have shown promise in visual instruction following and demonstrated strong reasoning …