Unsupervised Learning of Long-Term Motion Dynamics for Videos

Type: Article

Publication Date: 2017-07-01

Citations: 190

DOI: https://doi.org/10.1109/cvpr.2017.751

Chat PDF

Abstract

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network based Encoder-Decoder framework to predict these sequences of flows. We argue that in order for the decoder to reconstruct these sequences, the encoder must learn a robust video representation that captures long-term motion dependencies and spatial-temporal relations. We demonstrate the effectiveness of our learned temporal representations on activity classification across multiple modalities and datasets such as NTU RGB+D and MSR Daily Activity 3D. Our framework is generic to any input modality, i.e., RGB, depth, and RGB-D videos.

Locations

  • arXiv (Cornell University) - View - PDF
  • Infoscience (Ecole Polytechnique FĂ©dĂ©rale de Lausanne) - View - PDF

Similar Works

Action Title Year Authors
+ Unsupervised Learning of Long-Term Motion Dynamics for Videos 2017 Zelun Luo
Boya Peng
De-An Huang
Alexandre Alahi
Li Fei-Fei
+ Im2Flow: Motion Hallucination from Static Images for Action Recognition 2017 Ruohan Gao
Bo Xiong
Kristen Grauman
+ PDF Chat Im2Flow: Motion Hallucination from Static Images for Action Recognition 2018 Ruohan Gao
Bo Xiong
Kristen Grauman
+ PDF Chat ActionFlowNet: Learning Motion Representation for Action Recognition 2018 Joe Yue-Hei Ng
Jonghyun Choi
Jan Neumann
Larry S. Davis
+ Long-term Temporal Convolutions for Action Recognition 2016 GĂŒl Varol
Ivan Laptev
Cordelia Schmid
+ PDF Chat Long-Term Temporal Convolutions for Action Recognition 2017 GĂŒl Varol
Ivan Laptev
Cordelia Schmid
+ PDF Chat D3D: Distilled 3D Networks for Video Action Recognition 2020 Jonathan Stroud
David A. Ross
Chen Sun
Jia Deng
Rahul Sukthankar
+ Video Representation Learning by Recognizing Temporal Transformations 2020 Simon Jenni
Givi Meishvili
Paolo Favaro
+ Video Representation Learning by Recognizing Temporal Transformations 2020 Simon Jenni
Givi Meishvili
Paolo Favaro
+ PDF Chat DynamoNet: Dynamic Action and Motion Network 2019 Ali Diba
Vivek Sharma
Luc Van Gool
Rainer Stiefelhagen
+ DynamoNet: Dynamic Action and Motion Network 2019 Ali Diba
Vivek Sharma
Luc Van Gool
Rainer Stiefelhagen
+ Masked Motion Encoding for Self-Supervised Video Representation Learning 2022 Xinyu Sun
Peihao Chen
Liangwei Chen
Thomas H. Li
Mingkui Tan
Chuang Gan
+ D3D: Distilled 3D Networks for Video Action Recognition 2018 Jonathan Stroud
David A. Ross
Chen Sun
Jia Deng
Rahul Sukthankar
+ PDF Chat Masked Motion Encoding for Self-Supervised Video Representation Learning 2023 Xinyu Sun
Peihao Chen
Liangwei Chen
Changhao Li
Thomas H. Li
Mingkui Tan
Chuang Gan
+ D3D: Distilled 3D Networks for Video Action Recognition 2018 Jonathan Stroud
David A. Ross
Chen Sun
Jia Deng
Rahul Sukthankar
+ Hidden Two-Stream Convolutional Networks for Action Recognition 2017 Yi Zhu
Zhenzhong Lan
Shawn Newsam
Alexander G. Hauptmann
+ PDF Chat Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video 2016 Dinesh Jayaraman
Kristen Grauman
+ Slow and steady feature analysis: higher order temporal coherence in video 2015 Dinesh Jayaraman
Kristen Grauman
+ Action Recognition Using Volumetric Motion Representations 2019 Michael Peven
Gregory D. Hager
Austin Reiter
+ Self-Supervised Spatio-Temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics 2019 Jiangliu Wang
Jianbo Jiao
Linchao Bao
Shengfeng He
Yunhui Liu
Wei Liu

Cited by (95)

Action Title Year Authors
+ PDF Chat EEGFuseNet: Hybrid Unsupervised Deep Feature Characterization and Fusion for High-Dimensional EEG With an Application to Emotion Recognition 2021 Zhen Liang
Rushuang Zhou
Li Zhang
Linling Li
Gan Huang
Zhiguo Zhang
Shin Ishii
+ PDF Chat DYAN: A Dynamical Atoms-Based Network for Video Prediction 2018 WenQian Liu
Abhishek Sharma
Octavia Camps
Mario Sznaier
+ PDF Chat Contrastive Multiview Coding 2020 Yonglong Tian
Dilip Krishnan
Phillip Isola
+ PDF Chat Convolutional Relational Machine for Group Activity Recognition 2019 Sina Mokhtarzadeh Azar
Mina Ghadimi Atigh
Ahmad Nickabadi
Alexandre Alahi
+ Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey 2020 Longlong Jing
Yingli Tian
+ PDF Chat FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation 2023 Tarun Kalluri
Deepak Pathak
Manmohan Chandraker
Du Tran
+ Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition 2020 Shihao Xu
Haocong Rao
Xiping Hu
Bin Hu
+ Video-based Human Action Recognition using Deep Learning: A Review 2022 Hieu H. Pham
Louahdi Khoudour
Alain Crouzil
Pablo Zegers
Sergio A. VelastĂ­n
+ PDF Chat Forecasting Hands and Objects in Future Frames 2019 Chenyou Fan
Jangwon Lee
Michael S. Ryoo
+ PDF Chat Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization 2021 Rui Qian
Yuxi Li
Huabin Liu
John See
Shuangrui Ding
Xian Liu
Dian Li
Weiyao Lin
+ PDF Chat Dual Contrastive Learning for Spatio-temporal Representation 2022 Shuangrui Ding
Rui Qian
Hongkai Xiong
+ PDF Chat Video Representation Learning by Recognizing Temporal Transformations 2020 Simon Jenni
Givi Meishvili
Paolo Favaro
+ Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey 2019 Longlong Jing
Yingli Tian
+ Graph Distillation for Action Detection with Privileged Modalities 2017 Zelun Luo
Jun-Ting Hsieh
Lu Jiang
Juan Carlos Niebles
Li Fei-Fei
+ Time-Aware and View-Aware Video Rendering for Unsupervised Representation Learning 2018 Shruti Vyas
Yogesh Singh Rawat
Mubarak Shah
+ Viewpoint Invariant Action Recognition using RGB-D Videos 2017 Jian Liu
Naveed Akhtar
Ajmal Mian
+ Unsupervised Human 3D Pose Representation with Viewpoint and Pose Disentanglement 2020 Qiang Nie
Ziwei Liu
Yunhui Liu
+ Home Action Genome: Cooperative Compositional Action Understanding. 2021 Nishant Rai
Haofeng Chen
Jingwei Ji
Rishi Desai
Kazuki Kozuka
Shun Ishizaka
Ehsan Adeli
Juan Carlos Niebles
+ PDF Chat Video Generation From Single Semantic Label Map 2019 Junting Pan
Chengyu Wang
Xu Jia
Jing Shao
Lu Sheng
Junjie Yan
Xiaogang Wang
+ PDF Chat Motion-supervised Co-Part Segmentation 2021 Aliaksandr Siarohin
Subhankar Roy
Stéphane LathuiliÚre
Sergey Tulyakov
Elisa Ricci
Nicu Sebe
+ PDF Chat TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition 2022 Haodong Duan
Nanxuan Zhao
Kai Chen
Dahua Lin
+ PDF Chat Adversarial Self-supervised Learning for Semi-supervised 3D Action Recognition 2020 Chenyang Si
Xuecheng Nie
Wei Wang
Liang Wang
Tieniu Tan
Jiashi Feng
+ Video Generation from Single Semantic Label Map 2019 Junting Pan
Chengyu Wang
Xu Jia
Jing Shao
Lu Sheng
Junjie Yan
Xiaogang Wang
+ PDF Chat Home Action Genome: Cooperative Compositional Action Understanding 2021 Nishant Rai
Haofeng Chen
Jingwei Ji
Rishi Desai
Kazuki Kozuka
Shun Ishizaka
Ehsan Adeli
Juan Carlos Niebles
+ PDF Chat Learning to recognise 3D human action from a new skeleton‐based representation using deep convolutional neural networks 2018 Huy‐Hieu Pham
Louahdi Khoudour
Alain Crouzil
Pablo Zegers
Sergio A. VelastĂ­n
+ PDF Chat Contrast-Reconstruction Representation Learning for Self-Supervised Skeleton-Based Action Recognition 2022 Peng Wang
Jun Wen
Chenyang Si
Yuntao Qian
Liang Wang
+ PDF Chat Dual Motion GAN for Future-Flow Embedded Video Prediction 2017 Xiaodan Liang
Lisa Lee
Wei Dai
Eric P. Xing
+ PDF Chat Predicting Deeper into the Future of Semantic Segmentation 2017 Pauline Luc
Natalia Neverova
Camille Couprie
Jakob Verbeek
Yann LeCun
+ Unsupervised Bi-directional Flow-based Video Generation from one Snapshot 2019 Lu Sheng
Junting Pan
Jiaming Guo
Jing Shao
Xiaogang Wang
Chen Change Loy
+ PDF Chat Exploiting deep residual networks for human action recognition from skeletal data 2018 Huy‐Hieu Pham
Louahdi Khoudour
Alain Crouzil
Pablo Zegers
Sergio A. VelastĂ­n
+ Viewpoint Invariant Action Recognition Using RGB-D Videos 2018 Jian Liu
Naveed Akhtar
Ajmal Mian
+ PDF Chat Static and Dynamic Concepts for Self-supervised Video Representation Learning 2022 Rui Qian
Shuangrui Ding
Xian Liu
Dahua Lin
+ Improving Video Generation for Multi-functional Applications 2017 Bernhard Kratzwald
Zhiwu Huang
Danda Pani Paudel
U. Dinesh Acharya
Luc Van Gool
+ PDF Chat Improving Spatiotemporal Self-supervision by Deep Reinforcement Learning 2018 Uta BĂŒchler
Biagio Brattoli
Björn Ommer
+ Towards an Understanding of Our World by GANing Videos in the Wild. 2017 Bernhard Kratzwald
Zhiwu Huang
Danda Pani Paudel
Luc Van Gool
+ PDF Chat Graph Distillation for Action Detection with Privileged Modalities 2018 Zelun Luo
Jun-Ting Hsieh
Lu Jiang
Juan Carlos Niebles
Li Fei-Fei
+ A Comprehensive Study of Deep Video Action Recognition 2020 Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi Zhang
Joseph Tighe
R. Manmatha
Mu Li
+ PDF Chat Recurrent Flow-Guided Semantic Forecasting 2019 Adam M. Terwilliger
Garrick Brazil
Xiaoming Liu
+ Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning 2021 Siyuan Yang
Jun Liu
Shijian Lu
M.H. Er
Alex C. Kot
+ Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition 2020 Haocong Rao
Shihao Xu
Xiping Hu
Jun Cheng
Bin Hu

Citing (27)

Action Title Year Authors
+ Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 2015 Xingjian Shi
Zhourong Chen
Hao Wang
Dit‐Yan Yeung
Wai Kin Wong
Wang‐chun Woo
+ Video (language) modeling: a baseline for generative models of natural videos. 2014 Marc’Aurelio Ranzato
Arthur Szlam
Joan Bruna
Michaël Mathieu
Ronan Collobert
Sumit Chopra
+ Very Deep Convolutional Networks for Large-Scale Image Recognition 2014 Karen Simonyan
Andrew Zisserman
+ Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 2015 Sergey Ioffe
Christian Szegedy
+ PDF Chat HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition 2014 Hossein Rahmani
Arif Mahmood
Du Q. Huynh
Ajmal Mian
+ PDF Chat Beyond Gaussian Pyramid: Multi-skip Feature Stacking for action recognition 2015 Zhenzhong Lan
Ming Lin
Xuanchong Li
Alexander G. Hauptmann
Bhiksha Raj
+ PDF Chat Beyond short snippets: Deep networks for video classification 2015 Joe Yue-Hei Ng
Matthew Hausknecht
Sudheendra Vijayanarasimhan
Oriol Vinyals
Rajat Monga
George Toderici
+ PDF Chat Dense Optical Flow Prediction from a Static Image 2015 Jacob Walker
Abhinav Gupta
Martial Hebert
+ PDF Chat Action recognition with trajectory-pooled deep-convolutional descriptors 2015 Limin Wang
Yu Qiao
Xiaoou Tang
+ PDF Chat ImageNet Large Scale Visual Recognition Challenge 2015 Olga Russakovsky
Jia Deng
Hao Su
Jonathan Krause
Sanjeev Satheesh
Sean Ma
Zhiheng Huang
Andrej Karpathy
Aditya Khosla
Michael S. Bernstein
+ Striving for Simplicity: The All Convolutional Net 2014 Jost Tobias Springenberg
Alexey Dosovitskiy
Thomas Brox
Martin Riedmiller
+ Sequence to Sequence Learning with Neural Networks 2014 Ilya Sutskever
Oriol Vinyals
Quoc V. Le
+ Two-Stream Convolutional Networks for Action Recognition in Videos 2014 Karen Simonyan
Andrew Zisserman
+ PDF Chat Pedestrian Detection with Unsupervised Multi-stage Feature Learning 2013 Pierre Sermanet
Koray Kavukcuoglu
Soumith Chintala
Yann LeCun
+ PDF Chat Unsupervised Learning of Visual Representations Using Videos 2015 Xiaolong Wang
Abhinav Gupta
+ PDF Chat Towards Viewpoint Invariant 3D Human Pose Estimation 2016 Albert Haque
Boya Peng
Zelun Luo
Alexandre Alahi
Serena Yeung
Li Fei-Fei
+ UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild 2012 Khurram Soomro
Amir Zamir
Mubarak Shah
+ PDF Chat Recurrent Attention Models for Depth-Based Person Identification 2016 Albert Haque
Alexandre Alahi
Li Fei-Fei
+ PDF Chat Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification 2016 Ishan Misra
C. Lawrence Zitnick
Martial Hebert
+ Generating Videos with Scene Dynamics 2016 Carl Vondrick
Hamed Pirsiavash
Antonio Torralba
+ Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 2015 Sergey Ioffe
Christian Szegedy
+ Striving for Simplicity: The All Convolutional Net 2014 Jost Tobias Springenberg
Alexey Dosovitskiy
Thomas Brox
Martin Riedmiller
+ Adam: A Method for Stochastic Optimization 2014 Diederik P. Kingma
Jimmy Ba
+ PDF Chat NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis 2016 Amir Shahroudy
Jun Liu
Tian-Tsong Ng
Gang Wang
+ PDF Chat Unsupervised Visual Representation Learning by Context Prediction 2015 Carl Doersch
Abhinav Gupta
Alexei A. Efros
+ Video (language) modeling: a baseline for generative models of natural videos 2014 Marc’Aurelio Ranzato
Arthur Szlam
Joan Bruna
Michaël Mathieu
Ronan Collobert
Sumit Chopra
+ Towards Good Practices for Very Deep Two-Stream ConvNets 2015 Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao