Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

Type: Preprint

Publication Date: 2023-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2310.05010

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat Building an Open-Vocabulary Video CLIP Model With Better Architectures, Optimization and Data 2024 Zuxuan Wu
Zejia Weng
Wujian Peng
Xitong Yang
Ang Li
Larry S. Davis
Yu–Gang Jiang
+ Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization 2023 Zejia Weng
Xitong Yang
Ang Li
Zuxuan Wu
Yu–Gang Jiang
+ EZ-CLIP: Efficient Zeroshot Video Action Recognition 2023 Shahzad Ahmad
Sukalpa Chanda
Yogesh Singh Rawat
+ Frozen CLIP Models are Efficient Video Learners 2022 Ziyi Lin
Shijie Geng
Renrui Zhang
Peng Gao
Gerard de Melo
Xiaogang Wang
Jifeng Dai
Yu Qiao
Hongsheng Li
+ PDF Chat OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning 2024 Mushui Liu
Bozheng Li
Yunlong Yu
+ PDF Chat Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition 2024 Kun-Yu Lin
Henghui Ding
Jiaming Zhou
Yi-Xing Peng
Zhilin Zhao
Chen Change Loy
Wei‐Shi Zheng
+ MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge 2023 Wei Lin
Leonid Karlinsky
Nina Shvetsova
Horst Possegger
Mateusz KoziƄski
Rameswar Panda
Rogério Feris
Hilde Kuehne
Horst Bischof
+ PDF Chat OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning 2024 Mushui Liu
Bozheng Li
Yunlong Yu
+ PDF Chat MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge 2023 Wei Lin
Leonid Karlinsky
Nina Shvetsova
Horst Possegger
Mateusz KoziƄski
Rameswar Panda
Rogério Feris
Hilde Kuehne
Horst Bischof
+ OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition 2023 Tongjia Chen
Hongshan Yu
Zhengeng Yang
Zechuan Li
Wei Sun
Chen Chen
+ PDF Chat FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition 2024 Xiaohu Huang
Hao Zhou
Kun Yao
Kai Han
+ FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks 2022 Santiago Castro
Fabian Caba Heilbron
+ MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval 2022 Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
+ PDF Chat Revisiting Classifier: Transferring Vision-Language Models for Video Recognition 2023 Wenhao Wu
Zhun Sun
Wanli Ouyang
+ Revisiting Classifier: Transferring Vision-Language Models for Video Recognition 2022 Wenhao Wu
Zhun Sun
Wanli Ouyang
+ CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval 2021 Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
+ Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting 2023 Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Mubarak Shah
+ Fine-tuned CLIP Models are Efficient Video Learners 2022 Hanoona Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
+ PDF Chat Fine-tuned CLIP Models are Efficient Video Learners 2023 Hanoona Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
+ Orthogonal Temporal Interpolation for Zero-Shot Video Recognition 2023 Yan Zhu
Junbao Zhuo
Bin Ma
Jiajia Geng
Xiaoming Wei
Xiaolin Wei
Shuhui Wang

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors