H. Li

Follow

Generating author description...

Common Coauthors
Coauthor Papers Together
Zikang Liu 2
Longteng Guo 2
Xingjian He 2
Jing Liu 2
Sihan Chen 1
Sihan Chen 1
Commonly Cited References
Action Title Year Authors # of times referenced
+ PDF Chat Show and tell: A neural image caption generator 2015 Oriol Vinyals
Alexander Toshev
Samy Bengio
Dumitru Erhan
1
+ PDF Chat VQA: Visual Question Answering 2015 Stanislaw Antol
Aishwarya Agrawal
Jiasen Lu
Margaret Mitchell
Dhruv Batra
C. Lawrence Zitnick
Devi Parikh
1
+ PDF Chat Dense Captioning with Joint Inference and Visual Context 2017 Linjie Yang
Kevin Tang
Shuicheng Yan
Li-Jia Li
1
+ PDF Chat Self-Critical Sequence Training for Image Captioning 2017 Steven J. Rennie
Etienne Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
1
+ Generating Natural Questions About an Image 2016 Nasrin Mostafazadeh
Ishan Misra
Jacob Devlin
Margaret Mitchell
Xiaodong He
Lucy Vanderwende
1
+ PDF Chat GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering 2019 Drew A. Hudson
Christopher D. Manning
1
+ A Corpus for Reasoning about Natural Language Grounded in Photographs 2019 Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
1
+ PDF Chat DenseCap: Fully Convolutional Localization Networks for Dense Captioning 2016 Justin Johnson
Andrej Karpathy
Li Fei-Fei
1
+ PDF Chat Visual Question Generation as Dual Task of Visual Question Answering 2018 Yikang Li
Nan Duan
Bolei Zhou
Xiao Chu
Wanli Ouyang
Xiaogang Wang
Ming Zhou
1
+ PDF Chat Cycle-Consistency for Robust Visual Question Answering 2019 Meet Shah
Xinlei Chen
Marcus Rohrbach
Devi Parikh
1
+ LXMERT: Learning Cross-Modality Encoder Representations from Transformers 2019 Hao Tan
Mohit Bansal
1
+ PDF Chat Context and Attribute Grounded Dense Captioning 2019 Guojun Yin
Lu Sheng
Bin Liu
Nenghai Yu
Xiaogang Wang
Jing Shao
1
+ PDF Chat In Defense of Grid Features for Visual Question Answering 2020 Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
1
+ PDF Chat Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 2021 Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
Baining Guo
1
+ PDF Chat Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts 2021 Soravit Changpinyo
Piyush Sharma
Nan Ding
Radu Soricut
1
+ PDF Chat Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering 2021 Jihyung Kil
Cheng Zhang
Dong Xuan
Wei‐Lun Chao
1
+ PDF Chat Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval 2021 Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
1
+ VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts 2021 Wenhui Wang
Hangbo Bao
Dong Li
Furu Wei
1
+ PDF Chat LiT: Zero-Shot Transfer with Locked-image text Tuning 2022 Xiaohua Zhai
Xiao Wang
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
1
+ Flamingo: a Visual Language Model for Few-Shot Learning 2022 Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
Arthur Mensch
Katie Millican
Malcolm Reynolds
1
+ Training language models to follow instructions with human feedback 2022 Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
1
+ PDF Chat All You May Need for VQA are Image Captions 2022 Soravit Changpinyo
Doron Kukliansy
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
1
+ PDF Chat Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks 2023 Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
Chun Yuan
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
1