Spatiotemporal Predictive Pre-training for Robotic Motor Control

Type: Preprint

Publication Date: 2024-03-08

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2403.05304

Abstract

Robotic motor control necessitates the ability to predict the dynamics of environments and interaction objects. However, advanced self-supervised pre-trained visual representations (PVRs) in robotic motor control, leveraging large-scale egocentric videos, often focus solely on learning the static content features of sampled image frames. This neglects the crucial temporal motion clues in human video data, which implicitly contain key knowledge about sequential interacting and manipulating with the environments and objects. In this paper, we present a simple yet effective robotic motor control visual pre-training framework that jointly performs spatiotemporal predictive learning utilizing large-scale video data, termed as STP. Our STP samples paired frames from video clips. It adheres to two key designs in a multi-task learning manner. First, we perform spatial prediction on the masked current frame for learning content features. Second, we utilize the future frame with an extremely high masking ratio as a condition, based on the masked current frame, to conduct temporal prediction of future frame for capturing motion features. These efficient designs ensure that our representation focusing on motion information while capturing spatial details. We carry out the largest-scale evaluation of PVRs for robotic motor control to date, which encompasses 21 tasks within a real-world Franka robot arm and 5 simulated environments. Extensive experiments demonstrate the effectiveness of STP as well as unleash its generality and data efficiency by further post-pre-training and hybrid pre-training.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning 2023 Jialong Wu
Haoyu Ma
Chaoyi Deng
Mingsheng Long
+ Masked Visual Pre-training for Motor Control 2022 Tete Xiao
Ilija Radosavovic
Trevor Darrell
Jitendra Malik
+ Self-Supervised Visual Planning with Temporal Skip Connections 2017 Frederik Ebert
Chelsea Finn
Alex X. Lee
Sergey Levine
+ Self-Supervised Visual Planning with Temporal Skip Connections 2017 Frederik Ebert
Chelsea Finn
Alex X. Lee
Sergey Levine
+ PDF Chat Deep visual foresight for planning robot motion 2017 Chelsea Finn
Sergey Levine
+ Robotic Visuomotor Control with Unsupervised Forward Model Learned from Videos. 2021 Haoqi Yuan
Ruihai Wu
Andrew Zhao
Haipeng Zhang
Zihan Ding
Hao Dong
+ Deep Visual Foresight for Planning Robot Motion 2016 Chelsea Finn
Sergey Levine
+ Deep Visual Foresight for Planning Robot Motion 2016 Chelsea Finn
Sergey Levine
+ PDF Chat DMotion: Robotic Visuomotor Control with Unsupervised Forward Model Learned from Videos 2021 Haoqi Yuan
Ruihai Wu
Andrew Zhao
Haipeng Zhang
Zihan Ding
Hao Dong
+ DMotion: Robotic Visuomotor Control with Unsupervised Forward Model Learned from Videos 2021 Haoqi Yuan
Ruihai Wu
Andrew Zhao
Haipeng Zhang
Zihan Ding
Hao Dong
+ Robot Learning with Sensorimotor Pre-training 2023 Ilija Radosavovic
Baifeng Shi
Letian Fu
Ken Goldberg
Trevor Darrell
Jitendra Malik
+ Time-Agnostic Prediction: Predicting Predictable Video Frames 2018 Dinesh Jayaraman
Frederik Ebert
Alexei A. Efros
Sergey Levine
+ Time-Agnostic Prediction: Predicting Predictable Video Frames 2018 Dinesh Jayaraman
Frederik Ebert
Alexei A. Efros
Sergey Levine
+ Action-conditioned Deep Visual Prediction with RoAM, a new Indoor Human Motion Dataset for Autonomous Robots 2023 Meenakshi Sarkar
Vinayak Honkote
Dibyendu Das
Debasish Ghose
+ PDF Chat Action-conditioned Deep Visual Prediction with RoAM, a new Indoor Human Motion Dataset for Autonomous Robots 2023 Meenakshi Sarkar
Vinayak Honkote
Dibyendu Das
Debasish Ghose
+ PDF Chat Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations 2024 Yucheng Hu
Yanjiang Guo
Pengchao Wang
Xiaoyu Chen
Yen‐Jen Wang
Jianke Zhang
Koushil Sreenath
Chaochao Lu
Jianyu Chen
+ PDF Chat EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation 2025 Siyuan Huang
Liliang Chen
Pengfei Zhou
Shengcong Chen
Zhengkai Jiang
Yue Hu
Peng Gao
Hongsheng Li
Mike Yao
Guang-hui Ren
+ R3M: A Universal Visual Representation for Robot Manipulation 2022 Suraj Nair
Aravind Rajeswaran
Vikash Kumar
Chelsea Finn
Abhinav Gupta
+ PDF Chat Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods 2023 Ya Jing
Xuelin Zhu
Xingbin Liu
Qie Sima
Taozheng Yang
Yunhai Feng
Tao Kong
+ Motion-Scenario Decoupling for Rat-Aware Video Position Prediction: Strategy and Benchmark 2023 Xiaofeng Liu
Jiaxin Gao
Yaohua Liu
Risheng Liu
Nenggan Zheng

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors