Path integral guided policy search

Yevgen Chebotar, Mrinal Kalakrishnan, Ali Abdullah Yahya, Adrian Li, Stefan Schaal, Sergey Levine

Type: Article

Publication Date: 2017-05-01

Citations: 135

DOI: https://doi.org/10.1109/icra.2017.7989384

Chat PDF

Abstract

We present a policy search method for learning complex feedback control policies that map from high-dimensional sensory inputs to motor torques, for manipulation tasks with discontinuous contact dynamics. We build on a prior technique called guided policy search (GPS), which iteratively optimizes a set of local policies for specific instances of a task, and uses these to train a complex, high-dimensional global policy that generalizes across task instances. We extend GPS in the following ways: (1) we propose the use of a model-free local optimizer based on path integral stochastic optimal control (PI <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ), which enables us to learn local policies for tasks with highly discontinuous contact dynamics; and (2) we enable GPS to train on a new set of task instances in every iteration by using on-policy sampling: this increases the diversity of the instances that the policy is trained on, and is crucial for achieving good generalization. We show that these contributions enable us to learn deep neural network policies that can directly perform torque control from visual input. We validate the method on a challenging door opening task and a pick-and-place task, and we demonstrate that our approach substantially outperforms the prior LQR-based local policy optimizer on these tasks. Furthermore, we show that on-policy sampling significantly increases the generalization ability of these policies.

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Path Integral Guided Policy Search	2016	Yevgen Chebotar Mrinal Kalakrishnan Ali Abdullah Yahya Adrian Li Stefan Schaal Sergey Levine
+	Path Integral Guided Policy Search	2016	Yevgen Chebotar Mrinal Kalakrishnan Ali Abdullah Yahya Adrian Li Stefan Schaal Sergey Levine
+	Guided Policy Search via Approximate Mirror Descent	2016	William Montgomery Sergey Levine
+	Learning Robust Manipulation Skills with Guided Policy Search via Generative Motor Reflexes	2018	Philipp Ennen Pia Bresenitz René Vossen Frank Hees
+	Learning Robust Manipulation Skills with Guided Policy Search via Generative Motor Reflexes	2018	Philipp Ennen Pia Bresenitz René Vossen Frank Hees
+ PDF Chat	Learning Robust Manipulation Skills with Guided Policy Search via Generative Motor Reflexes	2019	Philipp Ennen Pia Bresenitz René Vossen Frank Hees
+	Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization	2016	Chelsea Finn Sergey Levine Pieter Abbeel
+	Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning	2023	Hien X. Bui Michael Posa
+	Guided Policy Search as Approximate Mirror Descent	2016	William Montgomery Sergey Levine
+	Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning	2017	Yevgen Chebotar Karol Hausman Marvin Zhang Gaurav S. Sukhatme Stefan Schaal Sergey Levine
+ PDF Chat	Learning contact-rich manipulation skills with guided policy search	2015	Sergey Levine Nolan Wagener Pieter Abbeel
+	Guided Policy Search as Approximate Mirror Descent	2016	William Montgomery Sergey Levine
+	A Comparison of Action Spaces for Learning Manipulation Tasks	2019	Patrick Varin Lev Grossman Scott Kuindersma
+ PDF Chat	A Comparison of Action Spaces for Learning Manipulation Tasks	2019	Patrick Varin Lev Grossman Scott Kuindersma
+	Learning Dexterous Manipulation from Suboptimal Experts	2020	Rae Jeong Jost Tobias Springenberg Jackie Kay Daniel Zheng Yuxiang Zhou Alexandre Galashov Nicolas Heess Francesco Nori
+	Learning Contact-Rich Manipulation Skills with Guided Policy Search	2015	Sergey Levine Nolan Wagener Pieter Abbeel
+	Learning Contact-Rich Manipulation Skills with Guided Policy Search	2015	Sergey Levine Nolan Wagener Pieter Abbeel
+ PDF Chat	Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation	2024	H. A. Le Manal M. Gabriel Tai Hoang Gerhard Neumann Ngo Anh Vien
+	LVIS: Learning from Value Function Intervals for Contact-Aware Robot Controllers	2018	Robin Deits Twan Koolen Russ Tedrake
+	LVIS: Learning from Value Function Intervals for Contact-Aware Robot Controllers	2018	Robin Deits Twan Koolen Russ Tedrake

Cited by (83)

Action	Title	Year	Authors
+ PDF Chat	Optimal adaptive inspection and maintenance planning for deteriorating structural systems	2021	Elizabeth Bismut Dániel Straub
+	Supervised Learning and Reinforcement Learning of Feedback Models for Reactive Behaviors: Tactile Feedback Testbed	2020	Giovanni Sutanto Katharina Rombach Yevgen Chebotar Zhe Su Stefan Schaal Gaurav S. Sukhatme Franziska Meier
+ PDF Chat	Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks	2019	Michelle A. Lee Yuke Zhu Krishnan Srinivasan Parth Shah Silvio Savarese Li Fei-Fei Animesh Garg Jeannette Bohg
+	Reinforcement Learning on Variable Impedance Controller for High-Precision Robotic Assembly	2019	Jianlan Luo Eugen Solowjow Chengtao Wen Juan Aparicio Ojea Alice M. Agogino Aviv Tamar Pieter Abbeel
+ PDF Chat	Critic PI2: Master Continuous Planning via Policy Improvement with Path Integrals and Deep Actor-Critic Reinforcement Learning	2021	He Ba Jiajun Fan Xian Guo Jianye Hao
+ PDF Chat	Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system	2018	Kendall Lowrey Svetoslav Kolev Jeremy Dao Aravind Rajeswaran Emanuel Todorov
+	Model-based Lookahead Reinforcement Learning	2019	Zhang-Wei Hong Joni Pajarinen Jan Peters
+	Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system	2018	Kendall Lowrey Svetoslav Kolev Jeremy Dao Aravind Rajeswaran Emanuel Todorov
+ PDF Chat	Prescribed Performance Control Guided Policy Improvement for Satisfying Signal Temporal Logic Tasks	2019	Péter Várnai Dimos V. Dimarogonas
+	Curiosity-Driven Experience Prioritization via Density Estimation	2019	Rui Zhao Volker Tresp
+	Region Growing Curriculum Generation for Reinforcement Learning	2018	Artem Molchanov Karol Hausman Gaurav S. Sukhatme
+	Robot Playing Kendama with Model-Based and Model-Free Reinforcement Learning	2020	Shidi Li
+	End-To-End Robotic Reinforcement Learning without Reward Engineering	2019	Avi Singh Larry Yang Chelsea Finn Sergey Levine
+ PDF Chat	Collective robot reinforcement learning with distributed asynchronous guided policy search	2017	Ali Abdullah Yahya Adrian Li Mrinal Kalakrishnan Yevgen Chebotar Sergey Levine
+	ACDER: Augmented Curiosity-Driven Experience Replay	2020	Boyao Li Tao Lü Jiayi Li Ning Lu Yinghao Cai Shuo Wang
+ PDF Chat	Learning Modular Robot Control Policies	2023	Julian Whitman Matthew Travers Howie Choset
+ PDF Chat	ACDER: Augmented Curiosity-Driven Experience Replay	2020	Boyao Li Tao Lü Jiayi Li Ning Lu Yinghao Cai Shuo Wang
+	Experience Augmentation: Boosting and Accelerating Off-Policy Multi-Agent Reinforcement Learning	2020	Zhenhui Ye Yining Chen Guanghua Song Bowei Yang Shen Fan
+	Reinforcement and Imitation Learning for Diverse Visuomotor Skills	2018	Yuke Zhu Ziyu Wang Josh Merel Andrei A. Rusu Tom Erez Serkan Cabi Saran Tunyasuvunakool János Kramár Raia Hadsell Nando de Freitas
+	A Survey of Behavior Learning Applications in Robotics -- State of the Art and Perspectives	2019	Alexander Fabisch Christoph Petzoldt Marc Otto Frank Kirchner
+ PDF Chat	Hybrid Control for Learning Motor Skills	2021	Ian Abraham Alexander Broad Allison Pinosky Brenna Argall Todd Murphey
+ PDF Chat	Reinforcement learning using expectation maximization based guided policy search for stochastic dynamics	2021	Prakash Mallick Zhiyiong Chen Mohsen Zamani
+	Unsupervised Perceptual Rewards for Imitation Learning	2017	Pierre Sermanet Kelvin Xu Sergey Levine
+	Guided Policy Improvement for Satisfying STL Tasks using Funnel Adaptation	2020	Péter Várnai Dimos V. Dimarogonas
+ PDF Chat	A multilevel approach for stochastic nonlinear optimal control	2020	Ajay Jasra Jeremy Heng Yaxian Xu Adrian N. Bishop
+ PDF Chat	Reinforcement Learning on Variable Impedance Controller for High-Precision Robotic Assembly	2019	Jianlan Luo Eugen Solowjow Chengtao Wen Juan Aparicio Ojea Alice M. Agogino Aviv Tamar Pieter Abbeel
+	Reinforcement Learning Using Expectation Maximization Based Guided Policy Search for Stochastic Dynamics	2020	Prakash Mallick Zhiyong Chen Mohsen Zamani
+ PDF Chat	Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention	2021	Abhishek Gupta Justin Yu Tony Z. Zhao Vikash Kumar Aaron Rovinsky Kelvin Xu T. Devlin Sergey Levine
+ PDF Chat	Local Policy Optimization for Trajectory-Centric Reinforcement Learning	2020	Patrik Kolaric Devesh K. Jha Arvind U. Raghunathan Frank L. Lewis Mouhacine Benosman Diego Romeres Daniel Nikovski
+ PDF Chat	Deep predictive policy training using reinforcement learning	2017	Ali Ghadirzadeh Atsuto Maki Danica Kragić Mårten Björkman
+	How to train your robot with deep reinforcement learning: lessons we have learned	2021	Julian Ibarz Jie Tan Chelsea Finn Mrinal Kalakrishnan Peter Pástor Sergey Levine
+	Motion Planning Networks: Bridging the Gap Between Learning-Based and Classical Motion Planners	2020	Ahmed H. Qureshi Yinglong Miao Anthony Simeonov Michael C. Yip
+ PDF Chat	Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks	2020	Michelle A. Lee Yuke Zhu Peter Zachares Matthew Tan Krishnan Srinivasan Silvio Savarese Li Fei-Fei Animesh Garg Jeannette Bohg
+ PDF Chat	Complex Robotic Manipulation via Graph-Based Hindsight Goal Generation	2021	Zhenshan Bing Matthias Brucker Fabrice O. Morin Rui Li Xiaojie Su Kai Huang Alois Knoll
+	Local Policy Optimization for Trajectory-Centric Reinforcement Learning	2020	Patrik Kolaric Devesh K. Jha Arvind U. Raghunathan Frank L. Lewis Mouhacine Benosman Diego Romeres Daniel Nikovski
+	Complex Robotic Manipulation via Graph-Based Hindsight Goal Generation	2020	Zhenshan Bing Matthias Brucker Fabrice O. Morin Kai Huang Alois Knoll
+	Benchmarking Model-Based Reinforcement Learning	2019	Tingwu Wang Xuchan Bao Ignasi Clavera Jerrick Hoang Yeming Wen Eric Langlois Shunshi Zhang Guodong Zhang Pieter Abbeel Jimmy Ba
+	Planning under Uncertainty to Goal Distributions.	2020	Adam Conkey Tucker Hermans
+ PDF Chat	robo-gym – An Open Source Toolkit for Distributed Deep Reinforcement Learning on Real and Simulated Robots	2020	Matteo Lucchi Friedemann Zindler Stephan Mühlbacher-Karrer Horst Pichler
+ PDF Chat	Active Learning of Discrete-Time Dynamics for Uncertainty-Aware Model Predictive Control	2023	Alessandro Saviolo Jonathan Frey Abhishek Rathod Moritz Diehl Giuseppe Loianno

Citing (8)

Action	Title	Year	Authors
+ PDF Chat	Deep learning in neural networks: An overview	2014	Jürgen Schmidhuber
+	Path Integral Policy Improvement with Covariance Matrix Adaptation	2012	Freek Stulp Olivier Sigaud
+ PDF Chat	Learning contact-rich manipulation skills with guided policy search	2015	Sergey Levine Nolan Wagener Pieter Abbeel
+	End-to-End Training of Deep Visuomotor Policies	2015	Sergey Levine Chelsea Finn Trevor Darrell Pieter Abbeel
+	Guided Policy Search as Approximate Mirror Descent	2016	William Montgomery Sergey Levine
+	Continuous control with deep reinforcement learning	2016	Timothy Lillicrap Jonathan J. Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver Daan Wierstra
+ PDF Chat	Going Further with Point Pair Features	2016	Stefan Hinterstoißer Vincent Lepetit Naresh Rajkumar Kurt Konolige
+	Guided Policy Search as Approximate Mirror Descent	2016	William Montgomery Sergey Levine