FineAction: A Fine-Grained Video Dataset for Temporal Action Localization

Type: Article

Publication Date: 2022-01-01

Citations: 34

DOI: https://doi.org/10.1109/tip.2022.3217368

Abstract

Temporal action localization (TAL) is an important and challenging problem in video understanding. However, most existing TAL benchmarks are built upon the coarse granularity of action classes, which exhibits two major limitations in this task. First, coarse-level actions can make the localization models overfit in high-level context information, and ignore the atomic action details in the video. Second, the coarse action classes often lead to the ambiguous annotations of temporal boundaries, which are inappropriate for temporal action localization. To tackle these problems, we develop a novel large-scale and fine-grained video dataset, coined as FineAction, for temporal action localization. In total, FineAction contains 103K temporal instances of 106 action categories, annotated in 17K untrimmed videos. Compared to the existing TAL datasets, our FineAction takes distinct characteristics of fine action classes with rich diversity, dense annotations of multiple instances, and co-occurring actions of different classes, which introduces new opportunities and challenges for temporal action localization. To benchmark FineAction, we systematically investigate the performance of several popular temporal localization methods on it, and deeply analyze the influence of fine-grained instances in temporal action localization. As a minor contribution, we present a simple baseline approach for handling the fine-grained action detection, which achieves an mAP of 13.17% on our FineAction. We believe that FineAction can advance research of temporal action localization and beyond. The dataset is available at https://deeperaction.github.io/datasets/fineaction.

Locations

  • IEEE Transactions on Image Processing - View
  • arXiv (Cornell University) - View - PDF
  • PubMed - View

Similar Works

Action Title Year Authors
+ FineAction: A Fine-Grained Video Dataset for Temporal Action Localization 2021 Yi Liu
Limin Wang
Yali Wang
Xiao Ma
Yu Qiao
+ FineAction: A Fined Video Dataset for Temporal Action Localization. 2021 Yi Liu
Limin Wang
Xiao Ma
Yali Wang
Yu Qiao
+ Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions 2022 Zhi Li
Lu He
Huijuan Xu
+ AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions 2017 Chunhui Gu
Chen Sun
David A. Ross
Carl Vondrick
Caroline Pantofaru
Yeqing Li
Sudheendra Vijayanarasimhan
George Toderici
Susanna Ricco
Rahul Sukthankar
+ PDF Chat HTNet: Anchor-free Temporal Action Localization with Hierarchical Transformers 2022 Tae-Kyung Kang
Gunhee Lee
Seong‐Whan Lee
+ HTNet: Anchor-free Temporal Action Localization with Hierarchical Transformers 2022 Tae-Kyung Kang
Gunhee Lee
Seong‐Whan Lee
+ PDF Chat Modeling Multi-Label Action Dependencies for Temporal Action Localization 2021 Praveen Tirupattur
Kevin Duarte
Yogesh Singh Rawat
Mubarak Shah
+ Modeling Multi-Label Action Dependencies for Temporal Action Localization 2021 Praveen Tirupattur
Kevin Duarte
Yogesh Singh Rawat
Mubarak Shah
+ Modeling Multi-Label Action Dependencies for Temporal Action Localization 2021 Praveen Tirupattur
KĂŠvin Duarte
Yogesh Singh Rawat
Mubarak Shah
+ Learning to Refactor Action and Co-occurrence Features for Temporal Action Localization 2022 Kun Xia
Le Wang
Sanping Zhou
Nanning Zheng
Wei Tang
+ PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points 2022 Jing Tan
Xiaotong Zhao
Xintian Shi
Bing Kang
Limin Wang
+ PDF Chat About Time: Advances, Challenges, and Outlooks of Action Understanding 2024 Alexandros Stergiou
Ronald Poppe
+ Temporal Action Segmentation: An Analysis of Modern Techniques 2022 Guodong Ding
Fadime Şener
Angela Yao
+ PDF Chat Learning to Refactor Action and Co-occurrence Features for Temporal Action Localization 2022 Kun Xia
Le Wang
Sanping Zhou
Nanning Zheng
Wei Tang
+ SF-Net: Single-Frame Supervision for Temporal Action Localization 2020 Fan Ma
Linchao Zhu
Yi Yang
Shengxin Zha
Gourab Kundu
Matt Feiszli
Zheng Shou
+ Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context 2021 Ziyi Liu
Le Wang
Wei Tang
Junsong Yuan
Nanning Zheng
Gang Hua
+ PDF Chat Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context 2021 Ziyi Liu
Le Wang
Wei Tang
Junsong Yuan
Nanning Zheng
Gang Hua
+ PDF Chat Open-Vocabulary Spatio-Temporal Action Detection 2024 Tao Wu
Shuqiu Ge
Jie Qin
Gangshan Wu
Limin Wang
+ Action Temporal Localization in Untrimmed Videos via Multi-stage CNNs. 2016 Zheng Shou
Dongang Wang
Shih‐Fu Chang
+ Proposal-based Temporal Action Localization with Point-level Supervision 2023 Yuan Yin
Yifei Huang
Ryosuke Furuta
Yoichi Sato

Works That Cite This (10)

Action Title Year Authors
+ PDF Chat Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions 2022 Zhi Li
Lu He
Huijuan Xu
+ PDF Chat Open-Vocabulary Video Relation Extraction 2024 Wentao Tian
Zheng Wang
Yuqian Fu
Jingjing Chen
Lechao Cheng
+ PDF Chat Deep Learning-Based Action Detection in Untrimmed Videos: A Survey 2022 Elahe Vahdani
Yingli Tian
+ PDF Chat VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking 2023 Limin Wang
Bingkun Huang
Zhiyu Zhao
Tong Zhan
Yinan He
Yi Wang
Yali Wang
Yu Qiao
+ PDF Chat BasicTAD: An astounding RGB-Only baseline for temporal action detection 2023 Min Yang
Guo Chen
Yin-Dong Zheng
Tong LĂź
Limin Wang
+ Glitch in the matrix: A large scale benchmark for content driven audio–visual forgery detection and localization 2023 Zhixi Cai
Shreya Ghosh
Abhinav Dhall
Tom Gedeon
Kalin Stefanov
Munawar Hayat
+ PDF Chat Fine-grained Activities of People Worldwide 2023 Jeffrey Byrne
Greg Castaùón
Zhongheng Li
Gil J. Ettinger
+ Foundation Models for Video Understanding: A Survey 2024 Neelu Madan
Andreas Møgelmose
Rajat Modi
Yogesh S Rawat
Thomas B. Moeslund
+ PDF Chat Foundation Models for Video Understanding: A Survey 2024 Neelu Madan
Andreas Møgelmose
Rajat Modi
Yogesh S Rawat
Thomas B. Moeslund
+ PDF Chat Cross-Video Contextual Knowledge Exploration and Exploitation for Ambiguity Reduction in Weakly Supervised Temporal Action Localization 2023 Songchun Zhang
Chunhui Zhao

Works Cited by This (46)

Action Title Year Authors
+ PDF Chat Learning Spatiotemporal Features with 3D Convolutional Networks 2015 Du Tran
Lubomir Bourdev
Rob Fergus
Lorenzo Torresani
Manohar Paluri
+ Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks 2017 Yu–Gang Jiang
Zuxuan Wu
Jun Wang
Xiangyang Xue
Shih‐Fu Chang
+ Two-Stream Convolutional Networks for Action Recognition in Videos 2014 Karen Simonyan
Andrew Zisserman
+ PDF Chat Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos 2017 Serena Yeung
Olga Russakovsky
Ning Jin
Mykhaylo Andriluka
Greg Mori
Li Fei-Fei
+ PDF Chat The THUMOS challenge on action recognition for videos “in the wild” 2016 Haroon Idrees
Amir Zamir
Yu‐Gang Jiang
Alex Gorban
Ivan Laptev
Rahul Sukthankar
Mubarak Shah
+ Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding 2016 Gunnar A. Sigurdsson
GĂźl Varol
Xiaolong Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
+ UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild 2012 Khurram Soomro
Amir Zamir
Mubarak Shah
+ PDF Chat Temporal Segment Networks: Towards Good Practices for Deep Action Recognition 2016 Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
+ YouTube-8M: A Large-Scale Video Classification Benchmark 2016 Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
George Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
+ PDF Chat CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos 2017 Zheng Shou
Jonathan Chan
Alireza Zareian
Kazuyuki Miyazawa
Shih‐Fu Chang