Unsupervised Learning of Long-Term Motion Dynamics for Videos

Zelun Luo, Boya Peng, De-An Huang, Alexandre Alahi, Li Fei-Fei

Type: Article

Publication Date: 2017-07-01

Citations: 190

DOI: https://doi.org/10.1109/cvpr.2017.751

Chat PDF

Abstract

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network based Encoder-Decoder framework to predict these sequences of flows. We argue that in order for the decoder to reconstruct these sequences, the encoder must learn a robust video representation that captures long-term motion dependencies and spatial-temporal relations. We demonstrate the effectiveness of our learned temporal representations on activity classification across multiple modalities and datasets such as NTU RGB+D and MSR Daily Activity 3D. Our framework is generic to any input modality, i.e., RGB, depth, and RGB-D videos.

Locations

arXiv (Cornell University) - View - PDF
Infoscience (Ecole Polytechnique Fédérale de Lausanne) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Unsupervised Learning of Long-Term Motion Dynamics for Videos	2017	Zelun Luo Boya Peng De-An Huang Alexandre Alahi Li Fei-Fei
+	Im2Flow: Motion Hallucination from Static Images for Action Recognition	2017	Ruohan Gao Bo Xiong Kristen Grauman
+ PDF Chat	Im2Flow: Motion Hallucination from Static Images for Action Recognition	2018	Ruohan Gao Bo Xiong Kristen Grauman
+ PDF Chat	ActionFlowNet: Learning Motion Representation for Action Recognition	2018	Joe Yue-Hei Ng Jonghyun Choi Jan Neumann Larry S. Davis
+	Long-term Temporal Convolutions for Action Recognition	2016	Gül Varol Ivan Laptev Cordelia Schmid
+ PDF Chat	Long-Term Temporal Convolutions for Action Recognition	2017	Gül Varol Ivan Laptev Cordelia Schmid
+ PDF Chat	D3D: Distilled 3D Networks for Video Action Recognition	2020	Jonathan Stroud David A. Ross Chen Sun Jia Deng Rahul Sukthankar
+	Video Representation Learning by Recognizing Temporal Transformations	2020	Simon Jenni Givi Meishvili Paolo Favaro
+	Video Representation Learning by Recognizing Temporal Transformations	2020	Simon Jenni Givi Meishvili Paolo Favaro
+ PDF Chat	DynamoNet: Dynamic Action and Motion Network	2019	Ali Diba Vivek Sharma Luc Van Gool Rainer Stiefelhagen
+	DynamoNet: Dynamic Action and Motion Network	2019	Ali Diba Vivek Sharma Luc Van Gool Rainer Stiefelhagen
+	Masked Motion Encoding for Self-Supervised Video Representation Learning	2022	Xinyu Sun Peihao Chen Liangwei Chen Thomas H. Li Mingkui Tan Chuang Gan
+	D3D: Distilled 3D Networks for Video Action Recognition	2018	Jonathan Stroud David A. Ross Chen Sun Jia Deng Rahul Sukthankar
+ PDF Chat	Masked Motion Encoding for Self-Supervised Video Representation Learning	2023	Xinyu Sun Peihao Chen Liangwei Chen Changhao Li Thomas H. Li Mingkui Tan Chuang Gan
+	D3D: Distilled 3D Networks for Video Action Recognition	2018	Jonathan Stroud David A. Ross Chen Sun Jia Deng Rahul Sukthankar
+	Hidden Two-Stream Convolutional Networks for Action Recognition	2017	Yi Zhu Zhenzhong Lan Shawn Newsam Alexander G. Hauptmann
+ PDF Chat	Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video	2016	Dinesh Jayaraman Kristen Grauman
+	Slow and steady feature analysis: higher order temporal coherence in video	2015	Dinesh Jayaraman Kristen Grauman
+	Action Recognition Using Volumetric Motion Representations	2019	Michael Peven Gregory D. Hager Austin Reiter
+	Self-Supervised Spatio-Temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics	2019	Jiangliu Wang Jianbo Jiao Linchao Bao Shengfeng He Yunhui Liu Wei Liu

Cited by (95)

Action	Title	Year	Authors
+ PDF Chat	EEGFuseNet: Hybrid Unsupervised Deep Feature Characterization and Fusion for High-Dimensional EEG With an Application to Emotion Recognition	2021	Zhen Liang Rushuang Zhou Li Zhang Linling Li Gan Huang Zhiguo Zhang Shin Ishii
+ PDF Chat	DYAN: A Dynamical Atoms-Based Network for Video Prediction	2018	WenQian Liu Abhishek Sharma Octavia Camps Mario Sznaier
+ PDF Chat	Contrastive Multiview Coding	2020	Yonglong Tian Dilip Krishnan Phillip Isola
+ PDF Chat	Convolutional Relational Machine for Group Activity Recognition	2019	Sina Mokhtarzadeh Azar Mina Ghadimi Atigh Ahmad Nickabadi Alexandre Alahi
+	Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey	2020	Longlong Jing Yingli Tian
+ PDF Chat	FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation	2023	Tarun Kalluri Deepak Pathak Manmohan Chandraker Du Tran
+	Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition	2020	Shihao Xu Haocong Rao Xiping Hu Bin Hu
+	Video-based Human Action Recognition using Deep Learning: A Review	2022	Hieu H. Pham Louahdi Khoudour Alain Crouzil Pablo Zegers Sergio A. Velastín
+ PDF Chat	Forecasting Hands and Objects in Future Frames	2019	Chenyou Fan Jangwon Lee Michael S. Ryoo
+ PDF Chat	Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization	2021	Rui Qian Yuxi Li Huabin Liu John See Shuangrui Ding Xian Liu Dian Li Weiyao Lin
+ PDF Chat	Dual Contrastive Learning for Spatio-temporal Representation	2022	Shuangrui Ding Rui Qian Hongkai Xiong
+ PDF Chat	Video Representation Learning by Recognizing Temporal Transformations	2020	Simon Jenni Givi Meishvili Paolo Favaro
+	Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey	2019	Longlong Jing Yingli Tian
+	Graph Distillation for Action Detection with Privileged Modalities	2017	Zelun Luo Jun-Ting Hsieh Lu Jiang Juan Carlos Niebles Li Fei-Fei
+	Time-Aware and View-Aware Video Rendering for Unsupervised Representation Learning	2018	Shruti Vyas Yogesh Singh Rawat Mubarak Shah
+	Viewpoint Invariant Action Recognition using RGB-D Videos	2017	Jian Liu Naveed Akhtar Ajmal Mian
+	Unsupervised Human 3D Pose Representation with Viewpoint and Pose Disentanglement	2020	Qiang Nie Ziwei Liu Yunhui Liu
+	Home Action Genome: Cooperative Compositional Action Understanding.	2021	Nishant Rai Haofeng Chen Jingwei Ji Rishi Desai Kazuki Kozuka Shun Ishizaka Ehsan Adeli Juan Carlos Niebles
+ PDF Chat	Video Generation From Single Semantic Label Map	2019	Junting Pan Chengyu Wang Xu Jia Jing Shao Lu Sheng Junjie Yan Xiaogang Wang
+ PDF Chat	Motion-supervised Co-Part Segmentation	2021	Aliaksandr Siarohin Subhankar Roy Stéphane Lathuilière Sergey Tulyakov Elisa Ricci Nicu Sebe
+ PDF Chat	TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition	2022	Haodong Duan Nanxuan Zhao Kai Chen Dahua Lin
+ PDF Chat	Adversarial Self-supervised Learning for Semi-supervised 3D Action Recognition	2020	Chenyang Si Xuecheng Nie Wei Wang Liang Wang Tieniu Tan Jiashi Feng
+	Video Generation from Single Semantic Label Map	2019	Junting Pan Chengyu Wang Xu Jia Jing Shao Lu Sheng Junjie Yan Xiaogang Wang
+ PDF Chat	Home Action Genome: Cooperative Compositional Action Understanding	2021	Nishant Rai Haofeng Chen Jingwei Ji Rishi Desai Kazuki Kozuka Shun Ishizaka Ehsan Adeli Juan Carlos Niebles
+ PDF Chat	Learning to recognise 3D human action from a new skeleton‐based representation using deep convolutional neural networks	2018	Huy‐Hieu Pham Louahdi Khoudour Alain Crouzil Pablo Zegers Sergio A. Velastín
+ PDF Chat	Contrast-Reconstruction Representation Learning for Self-Supervised Skeleton-Based Action Recognition	2022	Peng Wang Jun Wen Chenyang Si Yuntao Qian Liang Wang
+ PDF Chat	Dual Motion GAN for Future-Flow Embedded Video Prediction	2017	Xiaodan Liang Lisa Lee Wei Dai Eric P. Xing
+ PDF Chat	Predicting Deeper into the Future of Semantic Segmentation	2017	Pauline Luc Natalia Neverova Camille Couprie Jakob Verbeek Yann LeCun
+	Unsupervised Bi-directional Flow-based Video Generation from one Snapshot	2019	Lu Sheng Junting Pan Jiaming Guo Jing Shao Xiaogang Wang Chen Change Loy
+ PDF Chat	Exploiting deep residual networks for human action recognition from skeletal data	2018	Huy‐Hieu Pham Louahdi Khoudour Alain Crouzil Pablo Zegers Sergio A. Velastín
+	Viewpoint Invariant Action Recognition Using RGB-D Videos	2018	Jian Liu Naveed Akhtar Ajmal Mian
+ PDF Chat	Static and Dynamic Concepts for Self-supervised Video Representation Learning	2022	Rui Qian Shuangrui Ding Xian Liu Dahua Lin
+	Improving Video Generation for Multi-functional Applications	2017	Bernhard Kratzwald Zhiwu Huang Danda Pani Paudel U. Dinesh Acharya Luc Van Gool
+ PDF Chat	Improving Spatiotemporal Self-supervision by Deep Reinforcement Learning	2018	Uta Büchler Biagio Brattoli Björn Ommer
+	Towards an Understanding of Our World by GANing Videos in the Wild.	2017	Bernhard Kratzwald Zhiwu Huang Danda Pani Paudel Luc Van Gool
+ PDF Chat	Graph Distillation for Action Detection with Privileged Modalities	2018	Zelun Luo Jun-Ting Hsieh Lu Jiang Juan Carlos Niebles Li Fei-Fei
+	A Comprehensive Study of Deep Video Action Recognition	2020	Yi Zhu Xinyu Li Chunhui Liu Mohammadreza Zolfaghari Yuanjun Xiong Chongruo Wu Zhi Zhang Joseph Tighe R. Manmatha Mu Li
+ PDF Chat	Recurrent Flow-Guided Semantic Forecasting	2019	Adam M. Terwilliger Garrick Brazil Xiaoming Liu
+	Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning	2021	Siyuan Yang Jun Liu Shijian Lu M.H. Er Alex C. Kot
+	Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition	2020	Haocong Rao Shihao Xu Xiping Hu Jun Cheng Bin Hu

Citing (27)

Action	Title	Year	Authors
+	Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting	2015	Xingjian Shi Zhourong Chen Hao Wang Dit‐Yan Yeung Wai Kin Wong Wang‐chun Woo
+	Video (language) modeling: a baseline for generative models of natural videos.	2014	Marc’Aurelio Ranzato Arthur Szlam Joan Bruna Michaël Mathieu Ronan Collobert Sumit Chopra
+	Very Deep Convolutional Networks for Large-Scale Image Recognition	2014	Karen Simonyan Andrew Zisserman
+	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift	2015	Sergey Ioffe Christian Szegedy
+ PDF Chat	HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition	2014	Hossein Rahmani Arif Mahmood Du Q. Huynh Ajmal Mian
+ PDF Chat	Beyond Gaussian Pyramid: Multi-skip Feature Stacking for action recognition	2015	Zhenzhong Lan Ming Lin Xuanchong Li Alexander G. Hauptmann Bhiksha Raj
+ PDF Chat	Beyond short snippets: Deep networks for video classification	2015	Joe Yue-Hei Ng Matthew Hausknecht Sudheendra Vijayanarasimhan Oriol Vinyals Rajat Monga George Toderici
+ PDF Chat	Dense Optical Flow Prediction from a Static Image	2015	Jacob Walker Abhinav Gupta Martial Hebert
+ PDF Chat	Action recognition with trajectory-pooled deep-convolutional descriptors	2015	Limin Wang Yu Qiao Xiaoou Tang
+ PDF Chat	ImageNet Large Scale Visual Recognition Challenge	2015	Olga Russakovsky Jia Deng Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael S. Bernstein
+	Striving for Simplicity: The All Convolutional Net	2014	Jost Tobias Springenberg Alexey Dosovitskiy Thomas Brox Martin Riedmiller
+	Sequence to Sequence Learning with Neural Networks	2014	Ilya Sutskever Oriol Vinyals Quoc V. Le
+	Two-Stream Convolutional Networks for Action Recognition in Videos	2014	Karen Simonyan Andrew Zisserman
+ PDF Chat	Pedestrian Detection with Unsupervised Multi-stage Feature Learning	2013	Pierre Sermanet Koray Kavukcuoglu Soumith Chintala Yann LeCun
+ PDF Chat	Unsupervised Learning of Visual Representations Using Videos	2015	Xiaolong Wang Abhinav Gupta
+ PDF Chat	Towards Viewpoint Invariant 3D Human Pose Estimation	2016	Albert Haque Boya Peng Zelun Luo Alexandre Alahi Serena Yeung Li Fei-Fei
+	UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild	2012	Khurram Soomro Amir Zamir Mubarak Shah
+ PDF Chat	Recurrent Attention Models for Depth-Based Person Identification	2016	Albert Haque Alexandre Alahi Li Fei-Fei
+ PDF Chat	Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification	2016	Ishan Misra C. Lawrence Zitnick Martial Hebert
+	Generating Videos with Scene Dynamics	2016	Carl Vondrick Hamed Pirsiavash Antonio Torralba
+	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift	2015	Sergey Ioffe Christian Szegedy
+	Striving for Simplicity: The All Convolutional Net	2014	Jost Tobias Springenberg Alexey Dosovitskiy Thomas Brox Martin Riedmiller
+	Adam: A Method for Stochastic Optimization	2014	Diederik P. Kingma Jimmy Ba
+ PDF Chat	NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis	2016	Amir Shahroudy Jun Liu Tian-Tsong Ng Gang Wang
+ PDF Chat	Unsupervised Visual Representation Learning by Context Prediction	2015	Carl Doersch Abhinav Gupta Alexei A. Efros
+	Video (language) modeling: a baseline for generative models of natural videos	2014	Marc’Aurelio Ranzato Arthur Szlam Joan Bruna Michaël Mathieu Ronan Collobert Sumit Chopra
+	Towards Good Practices for Very Deep Two-Stream ConvNets	2015	Limin Wang Yuanjun Xiong Zhe Wang Yu Qiao