Projects
Reading
People
Chat
SU\G
(𝔸)
/K·U
Projects
Reading
People
Chat
Sign Up
Light
Dark
System
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
,
Sihan Chen
,
Longteng Guo
,
H. Li
,
Xingjian He
,
Jing Liu
Type:
Article
Publication Date:
2023-10-26
Citations:
0
DOI:
https://doi.org/10.1145/3581783.3612388
Share
Locations
arXiv (Cornell University) -
View
-
PDF
Similar Works
Action
Title
Year
Authors
+
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
2023
Zikang Liu
Sihan Chen
Longteng Guo
H. Li
Xingjian He
Jing Liu
+
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
2021
Soravit Changpinyo
Piyush Sharma
Nan Ding
Radu Soricut
+
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
2021
Soravit Changpinyo
Piyush Sharma
Nan Ding
Radu Soricut
+
PDF
Chat
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
2021
Soravit Changpinyo
Piyush Sharma
Nan Ding
Radu Soricut
+
Tackling VQA with Pretrained Foundation Models without Further Training
2023
Alvin De Jun Tan
Bingquan Shen
+
PDF
Chat
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models
2024
Yufei Zhan
Hongyin Zhao
Yousong Zhu
Fan Yang
Ming Tang
Jinqiao Wang
+
PDF
Chat
CompCap: Improving Multimodal Large Language Models with Composite Captions
2024
Xiaohong Chen
Satya Narayan Shukla
Mahmoud Azab
Ananya Singh
Qifan Wang
David Dawei Yang
Shengyun Peng
Hanchao Yu
Yan Shen
Xuewen Zhang
+
PDF
Chat
Unified Vision-Language Pre-Training for Image Captioning and VQA
2020
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
+
Unified Vision-Language Pre-Training for Image Captioning and VQA
2019
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
+
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
2020
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
Dinei Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
+
PDF
Chat
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
2020
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
Dinei Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
+
PromptCap: Prompt-Guided Task-Aware Image Captioning
2022
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
+
PDF
Chat
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
2024
Övgü Özdemir
Erdem Akagündüz
+
Joint Image Captioning and Question Answering
2018
Jialin Wu
Zeyuan Hu
Raymond J. Mooney
+
Generating Question Relevant Captions to Aid Visual Question Answering
2019
Jialin Wu
Zeyuan Hu
Raymond J. Mooney
+
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
2022
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
Linjie Li
Zicheng Liu
Ce Liu
Yann LeCun
Nanyun Peng
+
Generating Question Relevant Captions to Aid Visual Question Answering
2019
Jialin Wu
Zeyuan Hu
Raymond J. Mooney
+
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
2022
Junnan Li
Dongxu Li
Caiming Xiong
Steven C. H. Hoi
+
CapsFusion: Rethinking Image-Text Data at Scale
2023
Qiying Yu
Quan Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Xinlong Wang
Jingjing Liu
+
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
2022
Ziniu Hu
Ahmet İşcen
Chen Sun
Zirui Wang
Kai-Wei Chang
Yizhou Sun
Cordelia Schmid
David A. Ross
Alireza Fathi
Cited by (0)
Action
Title
Year
Authors
Citing (23)
Action
Title
Year
Authors
+
PDF
Chat
Show and tell: A neural image caption generator
2015
Oriol Vinyals
Alexander Toshev
Samy Bengio
Dumitru Erhan
+
PDF
Chat
VQA: Visual Question Answering
2015
Stanislaw Antol
Aishwarya Agrawal
Jiasen Lu
Margaret Mitchell
Dhruv Batra
C. Lawrence Zitnick
Devi Parikh
+
PDF
Chat
Dense Captioning with Joint Inference and Visual Context
2017
Linjie Yang
Kevin Tang
Shuicheng Yan
Li-Jia Li
+
PDF
Chat
Self-Critical Sequence Training for Image Captioning
2017
Steven J. Rennie
Etienne Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
+
Generating Natural Questions About an Image
2016
Nasrin Mostafazadeh
Ishan Misra
Jacob Devlin
Margaret Mitchell
Xiaodong He
Lucy Vanderwende
+
PDF
Chat
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
2019
Drew A. Hudson
Christopher D. Manning
+
A Corpus for Reasoning about Natural Language Grounded in Photographs
2019
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
+
PDF
Chat
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
2016
Justin Johnson
Andrej Karpathy
Li Fei-Fei
+
PDF
Chat
Visual Question Generation as Dual Task of Visual Question Answering
2018
Yikang Li
Nan Duan
Bolei Zhou
Xiao Chu
Wanli Ouyang
Xiaogang Wang
Ming Zhou
+
PDF
Chat
Cycle-Consistency for Robust Visual Question Answering
2019
Meet Shah
Xinlei Chen
Marcus Rohrbach
Devi Parikh
+
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
2019
Hao Tan
Mohit Bansal
+
PDF
Chat
Context and Attribute Grounded Dense Captioning
2019
Guojun Yin
Lu Sheng
Bin Liu
Nenghai Yu
Xiaogang Wang
Jing Shao
+
PDF
Chat
In Defense of Grid Features for Visual Question Answering
2020
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
+
PDF
Chat
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
2021
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
Baining Guo
+
PDF
Chat
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
2021
Soravit Changpinyo
Piyush Sharma
Nan Ding
Radu Soricut
+
PDF
Chat
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
2021
Jihyung Kil
Cheng Zhang
Dong Xuan
Wei‐Lun Chao
+
PDF
Chat
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
2021
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
+
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
2021
Wenhui Wang
Hangbo Bao
Dong Li
Furu Wei
+
PDF
Chat
LiT: Zero-Shot Transfer with Locked-image text Tuning
2022
Xiaohua Zhai
Xiao Wang
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
+
Flamingo: a Visual Language Model for Few-Shot Learning
2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
Arthur Mensch
Katie Millican
Malcolm Reynolds
+
Training language models to follow instructions with human feedback
2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
+
PDF
Chat
All You May Need for VQA are Image Captions
2022
Soravit Changpinyo
Doron Kukliansy
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
+
PDF
Chat
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
2023
Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
Chun Yuan
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang