Expanding Language-Image Pretrained Models for General Video Recognition

Type: Preprint

Publication Date: 2022-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2208.02816

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ Prompting Visual-Language Models for Efficient Video Understanding 2021 Chen Ju
Tengda Han
Kunhao Zheng
Ya Zhang
Weidi Xie
+ Harvest Video Foundation Models via Efficient Post-Pretraining 2023 Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
+ EZ-CLIP: Efficient Zeroshot Video Action Recognition 2023 Shahzad Ahmad
Sukalpa Chanda
Yogesh Singh Rawat
+ Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners 2022 Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
Xudong Lin
Shuohang Wang
Ziyi Yang
Chenguang Zhu
Derek Hoiem
+ OmniVL:One Foundation Model for Image-Language and Video-Language Tasks 2022 Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu–Gang Jiang
Lu Yuan
+ VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling 2021 Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
+ Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding 2023 Ruyang Liu
Jingjia Huang
Wei Gao
Thomas H. Li
Ge Li
+ InternVideo: General Video Foundation Models via Generative and Discriminative Learning 2022 Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
Zhiyu Zhao
Hongjie Zhang
Jilan Xu
Yi Liu
Zun Wang
+ Videoprompter: an ensemble of foundational models for zero-shot video understanding 2023 Adeel Yousaf
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Mubarak Shah
+ Large Language Models are Good Prompt Learners for Low-Shot Image Classification 2023 Zhaoheng Zheng
Jingmin Wei
Xuefeng Hu
Haidong Zhu
Ram Nevatia
+ PDF Chat PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning 2024 Lin Xu
Yilin Zhao
Daquan Zhou
Zhijie Lin
See Kiong Ng
Jiashi Feng
+ MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval 2022 Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
+ PDF Chat FILS: Self-Supervised Video Feature Prediction In Semantic Language Space 2024 Mona Ahmadian
Frank GuĂŠrin
Andrew Gilbert
+ PDF Chat Implicit Temporal Modeling with Learnable Alignment for Video Recognition 2023 Shuyuan Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Han Hu
Yu–Gang Jiang
+ Implicit Temporal Modeling with Learnable Alignment for Video Recognition 2023 Shuyuan Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Han Hu
Yu–Gang Jiang
+ PDF Chat Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization 2024 Yang Jin
Zhicheng Sun
Kun Xu
Kun Xu
Liwei Chen
Hao Jiang
Quzhe Huang
Chengru Song
Yuliang Liu
Di Zhang
+ Spatiotemporally Discriminative Video-Language Pre-Training with Text Grounding 2023 Yuanhao Xiong
L. Zhao
Boqing Gong
Ming–Hsuan Yang
Florian Schroff
Ting Liu
Cho‐Jui Hsieh
Liangzhe Yuan
+ PDF Chat SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training 2023 Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
+ SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training 2022 Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
+ Video-LLaVA: Learning United Visual Representation by Alignment Before Projection 2023 Bin Lin
Bin Zhu
Ye Yang
Munan Ning
Peng Jin
Yuan Li

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors