Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

Type: Article

Publication Date: 2021-10-01

Citations: 14

DOI: https://doi.org/10.1109/iccv48922.2021.00186

Abstract

We introduce the task of weakly supervised learning for detecting human and object interactions in videos. Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object. To address these challenges, we introduce a contrastive weakly supervised training loss that aims to jointly associate spatiotemporal regions in a video with an action and object vocabulary and encourage temporal continuity of the visual appearance of moving objects as a form of self-supervision. To train our model, we introduce a dataset comprising over 6.5k videos with human-object interaction annotations that have been semi-automatically curated from sentence captions associated with the videos. We demonstrate improved performance over weakly supervised baselines adapted to our task on our video dataset.

Locations

  • arXiv (Cornell University) - View - PDF
  • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) - View

Similar Works

Action Title Year Authors
+ Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions 2021 Shuang Li
Yilun Du
Antonio Torralba
Josef Šivic
Bryan Russell
+ Activity Driven Weakly Supervised Object Detection 2019 Zhenheng Yang
Dhruv Mahajan
Deepti Ghadiyaram
Ram Nevatia
Vignesh Ramanathan
+ PDF Chat Activity Driven Weakly Supervised Object Detection 2019 Zhenheng Yang
Dhruv Mahajan
Deepti Ghadiyaram
Ram Nevatia
Vignesh Ramanathan
+ ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos 2021 Meng-Jiun Chiou
Chun-Yu Liao
Liwei Wang
Roger Zimmermann
Jiashi Feng
+ PDF Chat Detecting Human-Object Interaction with Mixed Supervision 2021 Suresh Kumaraswamy
Miaojing Shi
Ewa Kijak
+ No More Shortcuts: Realizing the Potential of Temporal Self-Supervision 2023 Ishan Rajendrakumar Dave
Simon Jenni
Mubarak Shah
+ Human-Object Interaction Detection via Weak Supervision 2021 Mert Kilickaya
A.W.M. Smeulders
+ PDF Chat FreeA: Human-object Interaction Detection using Free Annotation Labels 2024 Yuxiao Wang
Zhenao Wei
Xinyu Jiang
Lei Yu
Weiying Xue
Jinxiu Liu
Qi Liu
+ PDF Chat Learning to Detect Human-Object Interactions 2018 Yu-Wei Chao
Yunfan Liu
Xieyang Liu
Huayi Zeng
Jia Deng
+ Learning to Detect Human-Object Interactions 2017 Yu-Wei Chao
Yunfan Liu
Xieyang Liu
Huayi Zeng
Jia Deng
+ Video-based Human-Object Interaction Detection from Tubelet Tokens 2022 Danyang Tu
Wei Sun
Xiongkuo Min
Guangtao Zhai
Wei Shen
+ Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos 2022 Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
+ SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos 2022 Salar Hosseini Khorasgani
Yuxuan Chen
Florian Shkurti
+ iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection 2018 Chen Gao
Yuliang Zou
Jia‐Bin Huang
+ iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection. 2018 Chen Gao
Yuliang Zou
Jia‐Bin Huang
+ PDF Chat A Review of Human-Object Interaction Detection 2024 Yuxiao Wang
Qiwei Xiong
Lei Yu
Weiying Xue
Qi Liu
Zhenao Wei
+ Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction 2018 Luowei Zhou
Nathan Louis
Jason J. Corso
+ Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos 2020 Anurag Arnab
Chen Sun
Arsha Nagrani
Cordelia Schmid
+ Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos 2020 Anurag Arnab
Chen Sun
Arsha Nagrani
Cordelia Schmid
+ PDF Chat Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos 2022 Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang