Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

Type: Preprint

Publication Date: 2022-01-01

Citations: 27

DOI: https://doi.org/10.48550/arxiv.2210.06031

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ VindLU: A Recipe for Effective Video-and-Language Pretraining 2022 Feng Cheng
Xizi Wang
Jie Lei
David Crandall
Mohit Bansal
Gedas Bertasius
+ PDF Chat VindLU: A Recipe for Effective Video-and-Language Pretraining 2023 Feng Cheng
Xizi Wang
Jie Lei
David Crandall
Mohit Bansal
Gedas Bertasius
+ PDF Chat AdaCM$^2$: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction 2024 Yuanbin Man
Ying Huang
Chengming Zhang
Bingzhe Li
Wei Niu
Miao Yin
+ TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding 2023 Shuhuai Ren
Sishuo Chen
Shicheng Li
Xu Sun
Lu Hou
+ TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding 2023 Shuhuai Ren
Sishuo Chen
Shicheng Li
Xu Sun
Lu Hou
+ PDF Chat Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models 2024 Jinhui Yi
Syed Talal Wasim
Yan-An Luo
Muzammal Naseer
JĂźergen Gall
+ VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling 2021 Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
+ VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation 2021 Linjie Li
Jie Lei
Zhe Gan
Licheng Yu
Yen-Chun Chen
Rohit Pillai
Yu Cheng
Luowei Zhou
Xin Wang
William Yang Wang
+ PDF Chat Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training 2021 Chenyi Lei
Shixian Luo
Yong Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
Chunyan Miao
Houqiang Li
+ VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation 2021 Linjie Li
Jie Lei
Zhe Gan
Licheng Yu
Yen-Chun Chen
Rohit Pillai
Yu Cheng
Luowei Zhou
Xin Wang
William Yang Wang
+ Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training 2021 Chenyi Lei
Shixian Luo
Yong Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
Chunyan Miao
Houqiang Li
+ PDF Chat LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding 2024 Yuxuan Wang
Yueqian Wang
Pengfei Wu
Jianxin Liang
Dongyan Zhao
Zilong Zheng
+ Temporal Perceiving Video-Language Pre-training 2023 Fan Ma
Xiaojie Jin
Heng Wang
Jingjia Huang
Linchao Zhu
Jiashi Feng
Yi Yang
+ VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending 2023 Xingjian He
Sihan Chen
Fan Ma
Zhicheng Huang
Xiaojie Jin
Zikang Liu
Dongmei Fu
Yi Yang
Jing Liu
Jiashi Feng
+ HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training 2020 Linjie Li
Yen‐Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
+ HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training 2020 Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
+ PDF Chat Clover: Towards A Unified Video-Language Alignment and Fusion Model 2023 Jingjia Huang
Yinan Li
Jiashi Feng
Xinglong Wu
Xiaoshuai Sun
Rongrong Ji
+ Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding 2023 Ruyang Liu
Jingjia Huang
Wei Gao
Thomas H. Li
Ge Li
+ Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer 2023 Guangyi Chen
X Liu
Guangrun Wang
Kun Zhang
Philip H. S. Torr
Xiao–Ping Zhang
Yansong Tang
+ HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training 2022 Qinghao Ye
Guohai Xu
Ming Yan
Haiyang Xu
Qian Qi
Ji Zhang
Fei Huang

Works That Cite This (15)

Action Title Year Authors
+ Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning 2023 Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Ĺ ivic
Cordelia Schmid
+ PDF Chat CoVR: Learning Composed Video Retrieval from Web Video Captions 2024 Lucas Ventura
Antoine Yang
Cordelia Schmid
GĂźl Varol
+ PDF Chat SINC: Self-Supervised In-Context Learning for Vision-Language Tasks 2023 Yi-Syuan Chen
Yun-Zhu Song
Cheng Yu Yeo
Bei Liu
Jianlong Fu
Hong-Han Shuai
+ TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding 2023 Shuhuai Ren
Sishuo Chen
Shicheng Li
Xu Sun
Lu Hou
+ PDF Chat Test of Time: Instilling Video-Language Models with a Sense of Time 2023 Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
+ PDF Chat Are current long-term video understanding datasets long-term? 2023 Ombretta Strafforello
Klamer Schutte
Jan van Gemert
+ PDF Chat Selective Structured State-Spaces for Long-Form Video Understanding 2023 Jue Wang
Wentao Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Roszilah Hamid
+ PDF Chat EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone 2023 Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Rama Chellappa
Pengchuan Zhang
+ PDF Chat Long-range Multimodal Pretraining for Movie Understanding 2023 Dawit Mureja Argaw
Joon‐Young Lee
Markus Woodson
In So Kweon
Fabian Caba Heilbron
+ PDF Chat Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos 2023 Sixun Dong
Huazhang Hu
Dongze Lian
Weixin Luo
Yicheng Qian
Shenghua Gao

Works Cited by This (0)

Action Title Year Authors