+
PDF
Chat
|
Conditional Prompt Learning for Vision-Language Models
|
2022
|
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
|
3
|
+
PDF
Chat
|
Class-Agnostic Object Detection with Multi-modal Transformer
|
2022
|
Muhammad Maaz
Hanoona Rasheed
Salman Khan
Fahad Shahbaz Khan
Rao Muhammad Anwer
Ming–Hsuan Yang
|
3
|
+
PDF
Chat
|
Decoupling Zero-Shot Semantic Segmentation
|
2022
|
Jian Ding
Nan Xue
Gui-Song Xia
Dengxin Dai
|
3
|
+
PDF
Chat
|
Learning to Prompt for Vision-Language Models
|
2022
|
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
|
3
|
+
PDF
Chat
|
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
|
2022
|
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
|
3
|
+
|
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
|
2012
|
Khurram Soomro
Amir Zamir
Mubarak Shah
|
3
|
+
|
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
|
2021
|
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
|
3
|
+
|
Fine-Grained Visual Classification of Aircraft
|
2013
|
Subhransu Maji
Esa Rahtu
Juho Kannala
Matthew B. Blaschko
Andrea Vedaldi
|
2
|
+
PDF
Chat
|
MaPLe: Multi-modal Prompt Learning
|
2023
|
Muhammad Uzair Khattak
Hanoona Rasheed
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
|
2
|
+
|
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
|
2022
|
Manli Shu
Weili Nie
De-An Huang
Zhiding Yu
Tom Goldstein
Anima Anandkumar
Chaowei Xiao
|
2
|
+
PDF
Chat
|
Expanding Language-Image Pretrained Models for General Video Recognition
|
2022
|
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
|
2
|
+
PDF
Chat
|
Multiview Transformers for Video Recognition
|
2022
|
Yan Shen
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
|
2
|
+
|
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
|
2020
|
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
Georg Heigold
Sylvain Gelly
|
2
|
+
PDF
Chat
|
Image Segmentation Using Text and Image Prompts
|
2022
|
Timo Lüddecke
Alexander S. Ecker
|
2
|
+
PDF
Chat
|
Learning to Prompt for Continual Learning
|
2022
|
Zifeng Wang
Zizhao Zhang
Chen‐Yu Lee
Han Zhang
Ruoxi Sun
Xiaoqi Ren
Guolong Su
Vincent Perot
Jennifer Dy
Tomas Pfister
|
2
|
+
|
Unsupervised Prompt Learning for Vision-Language Models
|
2022
|
Tony Jun Huang
Jack O. Chu
Fangyun Wei
|
2
|
+
PDF
Chat
|
Prompt Distribution Learning
|
2022
|
Yuning Lu
Jianzhuang Liu
Yonggang Zhang
Yajing Liu
Xinmei Tian
|
2
|
+
PDF
Chat
|
LiT: Zero-Shot Transfer with Locked-image text Tuning
|
2022
|
Xiaohua Zhai
Xiao Wang
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
|
2
|
+
|
A Short Note about Kinetics-600
|
2018
|
João Carreira
Eric Noland
Andras Banki-Horvath
Chloe Hillier
Andrew Zisserman
|
2
|
+
|
Language-driven Semantic Segmentation
|
2022
|
Boyi Li
Kilian Q. Weinberger
Serge Belongie
Vladlen Koltun
René Ranftl
|
2
|
+
|
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
|
2021
|
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
|
2
|
+
PDF
Chat
|
Open-Vocabulary DETR with Conditional Matching
|
2022
|
Yuhang Zang
Wei Li
Kaiyang Zhou
Chen Huang
Chen Change Loy
|
2
|
+
PDF
Chat
|
Prompting Visual-Language Models for Efficient Video Understanding
|
2022
|
Chen Ju
Tengda Han
Kunhao Zheng
Ya Zhang
Weidi Xie
|
2
|
+
PDF
Chat
|
PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images
|
2022
|
Chengjian Feng
Yujie Zhong
Zequn Jie
Xiangxiang Chu
Haibing Ren
Xiaolin Wei
Weidi Xie
Lin Ma
|
2
|
+
PDF
Chat
|
SlowFast Networks for Video Recognition
|
2019
|
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
|
2
|
+
|
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
|
2021
|
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
|
2
|
+
|
Spatiotemporal Residual Networks for Video Action Recognition
|
2016
|
Christoph Feichtenhofer
Axel Pinz
Richard P. Wildes
|
2
|
+
PDF
Chat
|
Prompt-aligned Gradient for Prompt Tuning
|
2023
|
Beier Zhu
Yulei Niu
Yucheng Han
Yue Wu
Hanwang Zhang
|
2
|
+
|
The Kinetics Human Action Video Dataset
|
2017
|
Andrew Zisserman
João Carreira
Karen Simonyan
Will Kay
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
T.C. Green
Trevor Back
|
2
|
+
PDF
Chat
|
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
|
2017
|
João Carreira
Andrew Zisserman
|
2
|
+
|
Is Space-Time Attention All You Need for Video Understanding?
|
2021
|
Gedas Bertasius
Heng Wang
Lorenzo Torresani
|
2
|
+
PDF
Chat
|
Rethinking the Inception Architecture for Computer Vision
|
2016
|
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jon Shlens
Zbigniew Wojna
|
2
|
+
|
Learning Transferable Visual Models From Natural Language Supervision
|
2021
|
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
|
2
|
+
PDF
Chat
|
EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
|
2019
|
Patrick Helber
Benjamin Bischke
Andreas Dengel
Damian Borth
|
2
|
+
PDF
Chat
|
The “Something Something” Video Database for Learning and Evaluating Visual Common Sense
|
2017
|
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzyńska
Susanne Westphal
Heuna Kim
Valentin Haenel
Ingo Fruend
P.N. Yianilos
Moritz Mueller-Freitag
|
2
|
+
|
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
|
2021
|
Chao Jia
Yinfei Yang
Ye Xia
Yi‐Ting Chen
Zarana Parekh
Hieu Pham
Quoc V. Le
Yun-Hsuan Sung
Zhen Li
Tom Duerig
|
2
|
+
PDF
Chat
|
CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features
|
2019
|
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Seong Joon Oh
Youngjoon Yoo
Junsuk Choe
|
2
|
+
|
FILIP: Fine-grained Interactive Language-Image Pre-Training
|
2021
|
Lewei Yao
Runhui Huang
Lu Hou
Guansong Lu
Minzhe Niu
Hang Xu
Xiaodan Liang
Zhenguo Li
Xin Jiang
Chunjing Xu
|
2
|
+
PDF
Chat
|
Natural Adversarial Examples
|
2021
|
Dan Hendrycks
Kevin Zhao
Steven Basart
Jacob Steinhardt
Dawn Song
|
2
|
+
PDF
Chat
|
ViViT: A Video Vision Transformer
|
2021
|
Anurag Arnab
Mostafa Dehghani
Georg Heigold
Chen Sun
Mario Lučić
Cordelia Schmid
|
2
|
+
PDF
Chat
|
Describing Textures in the Wild
|
2014
|
Mircea Cimpoi
Subhransu Maji
Iasonas Kokkinos
Sammy Mohamed
Andrea Vedaldi
|
2
|
+
|
Florence: A New Foundation Model for Computer Vision
|
2021
|
Lu Yuan
Dongdong Chen
Yi‐Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
|
2
|
+
PDF
Chat
|
DualPrompt: Complementary Prompting for Rehearsal-Free Continual Learning
|
2022
|
Zifeng Wang
Zizhao Zhang
Sayna Ebrahimi
Ruoxi Sun
Han Zhang
Chen-Yu Lee
Xiaoqi Ren
Guolong Su
Vincent Perot
Jennifer Dy
|
2
|
+
|
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
|
2022
|
Hanoona Rasheed
Muhammad Maaz
Muhammad Uzair Khattak
Salman Khan
Fahad Shahbaz Khan
|
2
|
+
PDF
Chat
|
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
|
2021
|
Dan Hendrycks
Steven Basart
Norman Mu
Saurav Kadavath
Fengqiu Wang
Evan Dorundo
Rahul Desai
Tyler Zhu
Samyak Parajuli
Mike Guo
|
2
|
+
PDF
Chat
|
A Closer Look at Spatiotemporal Convolutions for Action Recognition
|
2018
|
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
|
1
|
+
PDF
Chat
|
Non-local Neural Networks
|
2018
|
Xiaolong Wang
Ross Girshick
Abhinav Gupta
Kaiming He
|
1
|
+
PDF
Chat
|
Towards Universal Representation for Unseen Action Recognition
|
2018
|
Yi Zhu
Yang Long
Yu Guan
Shawn Newsam
Ling Shao
|
1
|
+
|
Two-Stream Convolutional Networks for Action Recognition in Videos
|
2014
|
Karen Simonyan
Andrew Zisserman
|
1
|
+
|
A Short Note on the Kinetics-700-2020 Human Action Dataset
|
2020
|
João Carreira
Eric Noland
Chloe Hillier
Andrew Zisserman
|
1
|