End-to-End Human Pose and Mesh Reconstruction with Transformers

Kevin Lin, Lijuan Wang, Zicheng Liu

Type: Article

Publication Date: 2021-06-01

Citations: 506

DOI: https://doi.org/10.1109/cvpr46437.2021.00199

Abstract

We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image. Our method uses a transformer encoder to jointly model vertex-vertex and vertex-joint interactions, and outputs 3D joint coordinates and mesh vertices simultaneously. Compared to existing techniques that regress pose and shape parameters, METRO does not rely on any parametric mesh models like SMPL, thus it can be easily extended to other objects such as hands. We further relax the mesh topology and allow the transformer self-attention mechanism to freely attend between any two vertices, making it possible to learn non-local relationships among mesh vertices and joints. With the proposed masked vertex modeling, our method is more robust and effective in handling challenging situations like partial occlusions. METRO generates new state-of-the-art results for human mesh reconstruction on the public Human3.6M and 3DPW datasets. Moreover, we demonstrate the generalizability of METRO to 3D hand reconstruction in the wild, outperforming existing state-of-the-art methods on FreiHAND dataset.

Locations

arXiv (Cornell University) - View - PDF
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - View

Similar Works

Action	Title	Year	Authors
+	End-to-End Human Pose and Mesh Reconstruction with Transformers	2020	Kevin Lin Lijuan Wang Zicheng Liu
+	Leveraging the Learnable Vertex-Vertex Relationship to Generalize Human Pose and Mesh Reconstruction for In-the-Wild Scenes	2022	Trung Tran-Quang Cuong Than-Cao Hai Nguyen-Thanh Hoang Si Hong
+ PDF Chat	Leveraging the Learnable Vertex-Vertex Relationship to Generalize Human Pose and Mesh Reconstruction for In-the-Wild Scenes	2022	Trung Quang Tran Cuong Cao Than Hai Thanh Nguyen Hoang Si Hong
+ PDF Chat	MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction	2024	Kevin Lin Chung-Ching Lin Lin Liang Zicheng Liu Lijuan Wang
+	MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction	2022	Kevin Lin Chung-Ching Lin Liang Lin Zicheng Liu Lijuan Wang
+ PDF Chat	SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation	2024	Xiangyu Xu Lijuan Liu Shuicheng Yan
+ PDF Chat	A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose	2022	Ce Zheng Matías Mendieta Pu Wang Aidong Lu Chen Chen
+	A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose	2021	Ce Zheng Matías Mendieta Pu Wang Aidong Lu Chen Chen
+ PDF Chat	PostoMETRO: Pose Token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery	2024	Wendi Yang Zihang Jiang Shang Zhao S. Kevin Zhou
+	Mesh Graphormer	2021	Kevin Lin Lijuan Wang Zicheng Liu
+	Mesh Graphormer	2021	Kevin Lin Lijuan Wang Zicheng Liu
+ PDF Chat	FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER	2023	Ce Zheng Matías Mendieta Taojiannan Yang Guo-Jun Qi Chen Chen
+	FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER	2022	Ce Zheng Matías Mendieta Taojiannan Yang Guo-Jun Qi Chen Chen
+	Dual Grid Net: hand mesh vertex regression from single depth maps	2019	Chengde Wan Thomas Probst Luc Van Gool Angela Yao
+ PDF Chat	Learning Human Mesh Recovery in 3D Scenes	2023	Zehong Shen Zhi Cen Sida Peng Qing Shuai Hujun Bao Xiaowei Zhou
+	Learning Human Mesh Recovery in 3D Scenes	2023	Zehong Shen Zhi Cen Sida Peng Qing Shuai Hujun Bao Xiaowei Zhou
+	Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose	2020	Hongsuk Choi Gyeongsik Moon Kyoung Mu Lee
+	Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose	2020	Hongsuk Choi Gyeongsik Moon Kyoung Mu Lee
+	Pixel-Aligned Non-parametric Hand Mesh Reconstruction	2022	Shijian Jiang Guwen Han Danhang Tang Yang Zhou Xiang Li Jiming Chen Qi Ye
+ PDF Chat	THOR-Net: End-to-end Graformer-based Realistic Two Hands and Object Reconstruction with Self-supervision	2023	Ahmed Tawfik Aboukhadra Jameel Malik Ahmed Elhayek Nadia Robertini Didier Stricker

Works That Cite This (202)

Action	Title	Year	Authors
+ PDF Chat	RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-Consistent Dataset	2023	Zhongjin Luo Shengcai Cai Jinguo Dong Ruibo Ming Liangdong Qiu Xiaohang Zhan Xiaoguang Han
+ PDF Chat	A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose	2022	Ce Zheng Matías Mendieta Pu Wang Aidong Lu Chen Chen
+ PDF Chat	Regular Splitting Graph Network for 3D Human Pose Estimation	2023	Md. Tanvir Hassan A. Ben Hamza
+ PDF Chat	FAN-Trans: Online Knowledge Distillation for Facial Action Unit Detection	2023	Jing Yang Jie Shen Yiming Lin Yordan Hristov Maja Pantić
+ PDF Chat	End-to-end weakly-supervised single-stage multiple 3D hand mesh reconstruction from a single RGB image	2023	Jinwei Ren Jianke Zhu Jialiang Zhang
+	CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting	2023	Shaoxiang Guo Qing Cai Lin Qi Junyu Dong
+ PDF Chat	Spatially Multi-conditional Image Generation	2023	Nikola Popović Ritika Chakraborty Danda Pani Paudel Thomas Probst Luc Van Gool
+ PDF Chat	Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers	2023	Moritz Einfalt Katja Ludwig Rainer Lienhart
+ PDF Chat	Learnable Human Mesh Triangulation for 3D Human Pose and Shape Estimation	2023	Sung-Ho Chun Sungbum Park Ju Yong Chang
+ PDF Chat	Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats	2023	István Sárándi Alexander Hermans Bastian Leibe

Works Cited by This (45)

Action	Title	Year	Authors
+	Generalized procrustes analysis	1975	J. C. Gower
+ PDF Chat	ImageNet Large Scale Visual Recognition Challenge	2015	Olga Russakovsky Jia Deng Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael S. Bernstein
+	Neural Machine Translation by Jointly Learning to Align and Translate	2014	Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio
+ PDF Chat	Deep Residual Learning for Image Recognition	2016	Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
+ PDF Chat	A Decomposable Attention Model for Natural Language Inference	2016	Ankur P. Parikh Oscar Täckström Dipanjan Das Jakob Uszkoreit
+ PDF Chat	Unite the People: Closing the Loop Between 3D and 2D Human Representations	2017	Christoph Lassner Javier Romero Martin Kiefel Federica Bogo Michael J. Black Peter Gehler
+ PDF Chat	MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior	2018	Xiaowei Zhou Menglong Zhu Georgios Pavlakos Spyridon Leonardos Konstantinos G. Derpanis Kostas Daniilidis
+	Self-supervised Learning of Motion Capture	2017	Hsiao-Yu Fish Tung Hsiao-Wei Tung Ersin Yumer Katerina Fragkiadaki
+ PDF Chat	Embodied hands	2017	Javier Romero Dimitrios Tzionas Michael J. Black
+ PDF Chat	Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision	2017	Dushyant Mehta Helge Rhodin Dan Casas Pascal Fua Oleksandr Sotnychenko Weipeng Xu Christian Theobalt