Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner

Type: Article

Publication Date: 2023-10-26

Citations: 0

DOI: https://doi.org/10.1145/3581783.3612388

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner 2023 Zikang Liu
Sihan Chen
Longteng Guo
H. Li
Xingjian He
Jing Liu
+ Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts 2021 Soravit Changpinyo
Piyush Sharma
Nan Ding
Radu Soricut
+ Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts 2021 Soravit Changpinyo
Piyush Sharma
Nan Ding
Radu Soricut
+ PDF Chat Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts 2021 Soravit Changpinyo
Piyush Sharma
Nan Ding
Radu Soricut
+ Tackling VQA with Pretrained Foundation Models without Further Training 2023 Alvin De Jun Tan
Bingquan Shen
+ PDF Chat Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models 2024 Yufei Zhan
Hongyin Zhao
Yousong Zhu
Fan Yang
Ming Tang
Jinqiao Wang
+ PDF Chat CompCap: Improving Multimodal Large Language Models with Composite Captions 2024 Xiaohong Chen
Satya Narayan Shukla
Mahmoud Azab
Ananya Singh
Qifan Wang
David Dawei Yang
Shengyun Peng
Hanchao Yu
Yan Shen
Xuewen Zhang
+ PDF Chat Unified Vision-Language Pre-Training for Image Captioning and VQA 2020 Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
+ Unified Vision-Language Pre-Training for Image Captioning and VQA 2019 Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
+ TAP: Text-Aware Pre-training for Text-VQA and Text-Caption 2020 Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
Dinei Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
+ PDF Chat TAP: Text-Aware Pre-training for Text-VQA and Text-Caption 2020 Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
Dinei Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
+ PromptCap: Prompt-Guided Task-Aware Image Captioning 2022 Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
+ PDF Chat Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts 2024 Övgü Özdemir
Erdem Akagündüz
+ Joint Image Captioning and Question Answering 2018 Jialin Wu
Zeyuan Hu
Raymond J. Mooney
+ Generating Question Relevant Captions to Aid Visual Question Answering 2019 Jialin Wu
Zeyuan Hu
Raymond J. Mooney
+ Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone 2022 Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
Linjie Li
Zicheng Liu
Ce Liu
Yann LeCun
Nanyun Peng
+ Generating Question Relevant Captions to Aid Visual Question Answering 2019 Jialin Wu
Zeyuan Hu
Raymond J. Mooney
+ BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation 2022 Junnan Li
Dongxu Li
Caiming Xiong
Steven C. H. Hoi
+ CapsFusion: Rethinking Image-Text Data at Scale 2023 Qiying Yu
Quan Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Xinlong Wang
Jingjing Liu
+ REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory 2022 Ziniu Hu
Ahmet İşcen
Chen Sun
Zirui Wang
Kai-Wei Chang
Yizhou Sun
Cordelia Schmid
David A. Ross
Alireza Fathi

Cited by (0)

Action Title Year Authors

Citing (23)

Action Title Year Authors
+ PDF Chat Show and tell: A neural image caption generator 2015 Oriol Vinyals
Alexander Toshev
Samy Bengio
Dumitru Erhan
+ PDF Chat VQA: Visual Question Answering 2015 Stanislaw Antol
Aishwarya Agrawal
Jiasen Lu
Margaret Mitchell
Dhruv Batra
C. Lawrence Zitnick
Devi Parikh
+ PDF Chat Dense Captioning with Joint Inference and Visual Context 2017 Linjie Yang
Kevin Tang
Shuicheng Yan
Li-Jia Li
+ PDF Chat Self-Critical Sequence Training for Image Captioning 2017 Steven J. Rennie
Etienne Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
+ Generating Natural Questions About an Image 2016 Nasrin Mostafazadeh
Ishan Misra
Jacob Devlin
Margaret Mitchell
Xiaodong He
Lucy Vanderwende
+ PDF Chat GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering 2019 Drew A. Hudson
Christopher D. Manning
+ A Corpus for Reasoning about Natural Language Grounded in Photographs 2019 Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
+ PDF Chat DenseCap: Fully Convolutional Localization Networks for Dense Captioning 2016 Justin Johnson
Andrej Karpathy
Li Fei-Fei
+ PDF Chat Visual Question Generation as Dual Task of Visual Question Answering 2018 Yikang Li
Nan Duan
Bolei Zhou
Xiao Chu
Wanli Ouyang
Xiaogang Wang
Ming Zhou
+ PDF Chat Cycle-Consistency for Robust Visual Question Answering 2019 Meet Shah
Xinlei Chen
Marcus Rohrbach
Devi Parikh
+ LXMERT: Learning Cross-Modality Encoder Representations from Transformers 2019 Hao Tan
Mohit Bansal
+ PDF Chat Context and Attribute Grounded Dense Captioning 2019 Guojun Yin
Lu Sheng
Bin Liu
Nenghai Yu
Xiaogang Wang
Jing Shao
+ PDF Chat In Defense of Grid Features for Visual Question Answering 2020 Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
+ PDF Chat Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 2021 Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
Baining Guo
+ PDF Chat Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts 2021 Soravit Changpinyo
Piyush Sharma
Nan Ding
Radu Soricut
+ PDF Chat Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering 2021 Jihyung Kil
Cheng Zhang
Dong Xuan
Wei‐Lun Chao
+ PDF Chat Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval 2021 Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
+ VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts 2021 Wenhui Wang
Hangbo Bao
Dong Li
Furu Wei
+ PDF Chat LiT: Zero-Shot Transfer with Locked-image text Tuning 2022 Xiaohua Zhai
Xiao Wang
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
+ Flamingo: a Visual Language Model for Few-Shot Learning 2022 Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
Arthur Mensch
Katie Millican
Malcolm Reynolds
+ Training language models to follow instructions with human feedback 2022 Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
+ PDF Chat All You May Need for VQA are Image Captions 2022 Soravit Changpinyo
Doron Kukliansy
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
+ PDF Chat Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks 2023 Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
Chun Yuan
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang