Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

Shuang Li, Yilun Du, Antonio Torralba, Josef Šivic, Bryan Russell

Type: Article

Publication Date: 2021-10-01

Citations: 14

DOI: https://doi.org/10.1109/iccv48922.2021.00186

Abstract

We introduce the task of weakly supervised learning for detecting human and object interactions in videos. Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object. To address these challenges, we introduce a contrastive weakly supervised training loss that aims to jointly associate spatiotemporal regions in a video with an action and object vocabulary and encourage temporal continuity of the visual appearance of moving objects as a form of self-supervision. To train our model, we introduce a dataset comprising over 6.5k videos with human-object interaction annotations that have been semi-automatically curated from sentence captions associated with the videos. We demonstrate improved performance over weakly supervised baselines adapted to our task on our video dataset.

Locations

arXiv (Cornell University) - View - PDF
2021 IEEE/CVF International Conference on Computer Vision (ICCV) - View

Similar Works

Action	Title	Year	Authors
+	Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions	2021	Shuang Li Yilun Du Antonio Torralba Josef Šivic Bryan Russell
+	Activity Driven Weakly Supervised Object Detection	2019	Zhenheng Yang Dhruv Mahajan Deepti Ghadiyaram Ram Nevatia Vignesh Ramanathan
+ PDF Chat	Activity Driven Weakly Supervised Object Detection	2019	Zhenheng Yang Dhruv Mahajan Deepti Ghadiyaram Ram Nevatia Vignesh Ramanathan
+	ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos	2021	Meng-Jiun Chiou Chun-Yu Liao Liwei Wang Roger Zimmermann Jiashi Feng
+ PDF Chat	Detecting Human-Object Interaction with Mixed Supervision	2021	Suresh Kumaraswamy Miaojing Shi Ewa Kijak
+	No More Shortcuts: Realizing the Potential of Temporal Self-Supervision	2023	Ishan Rajendrakumar Dave Simon Jenni Mubarak Shah
+	Human-Object Interaction Detection via Weak Supervision	2021	Mert Kilickaya A.W.M. Smeulders
+ PDF Chat	FreeA: Human-object Interaction Detection using Free Annotation Labels	2024	Yuxiao Wang Zhenao Wei Xinyu Jiang Lei Yu Weiying Xue Jinxiu Liu Qi Liu
+ PDF Chat	Learning to Detect Human-Object Interactions	2018	Yu-Wei Chao Yunfan Liu Xieyang Liu Huayi Zeng Jia Deng
+	Learning to Detect Human-Object Interactions	2017	Yu-Wei Chao Yunfan Liu Xieyang Liu Huayi Zeng Jia Deng
+	Video-based Human-Object Interaction Detection from Tubelet Tokens	2022	Danyang Tu Wei Sun Xiongkuo Min Guangtao Zhai Wei Shen
+	Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos	2022	Arnav Chakravarthy Zhiyuan Fang Yezhou Yang
+	SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos	2022	Salar Hosseini Khorasgani Yuxuan Chen Florian Shkurti
+	iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection	2018	Chen Gao Yuliang Zou Jia‐Bin Huang
+	iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection.	2018	Chen Gao Yuliang Zou Jia‐Bin Huang
+ PDF Chat	A Review of Human-Object Interaction Detection	2024	Yuxiao Wang Qiwei Xiong Lei Yu Weiying Xue Qi Liu Zhenao Wei
+	Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction	2018	Luowei Zhou Nathan Louis Jason J. Corso
+	Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos	2020	Anurag Arnab Chen Sun Arsha Nagrani Cordelia Schmid
+	Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos	2020	Anurag Arnab Chen Sun Arsha Nagrani Cordelia Schmid
+ PDF Chat	Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos	2022	Arnav Chakravarthy Zhiyuan Fang Yezhou Yang

Works That Cite This (2)

Action	Title	Year	Authors
+ PDF Chat	Interaction Region Visual Transformer for Egocentric Action Anticipation	2024	Debaditya Roy Ramanathan Rajendiran Basura Fernando
+ PDF Chat	Helping Hands: An Object-Aware Ego-Centric Video Recognition Model	2023	Chuhan Zhang Ankush Gupta Andrew Zisserman

Works Cited by This (49)

Action	Title	Year	Authors
+	Visual Semantic Role Labeling	2015	Saurabh Gupta Jitendra Malik
+ PDF Chat	A dataset for Movie Description	2015	Anna Rohrbach Marcus Rohrbach Niket Tandon Bernt Schiele
+	Scikit-learn: Machine Learning in Python	2012	Fabián Pedregosa Gaël Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron J. Weiss Vincent Dubourg
+ PDF Chat	Detecting Visual Relationships with Deep Relational Networks	2017	Bo Dai Yuqi Zhang Dahua Lin
+	Weakly-Supervised Learning of Visual Relations	2017	Julia Peyre Ivan Laptev Cordelia Schmid Josef Šivic
+ PDF Chat	Improving Visual Relationship Detection Using Semantic Modeling of Scene Descriptions	2017	Stephan Baier Yunpu Ma Volker Tresp
+	Towards Automatic Learning of Procedures from Web Instructional Videos	2017	Luowei Zhou Chenliang Xu Jason J. Corso
+	Scaling Egocentric Vision: The EPIC-KITCHENS Dataset	2018	Dima Damen Hazel Doughty Giovanni Maria Farinella Sanja Fidler Antonino Furnari Evangelos Kazakos Davide Moltisanti Jonathan Munro Toby Perrett Will Price
+	Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction	2018	Luowei Zhou Nathan Louis Jason J. Corso
+ PDF Chat	Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features	2018	Xu Yang Hanwang Zhang Jianfei Cai