VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Type: Preprint

Publication Date: 2023-01-01

Citations: 64

DOI: https://doi.org/10.48550/arxiv.2305.11175

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat GiT: Towards Generalist Vision Transformer through Universal Language Interface 2024 Haiyang Wang
Hao Tang
Li Jiang
Shaoshuai Shi
Muhammad Ferjad Naeem
Hongsheng Li
Bernt Schiele
Liwei Wang
+ PDF Chat Unveiling Encoder-Free Vision-Language Models 2024 Haiwen Diao
Yufeng Cui
Xiaotong Li
Yueze Wang
Huchuan Lu
Xin-Long Wang
+ PDF Chat VisionLLaMA: A Unified LLaMA Interface for Vision Tasks 2024 Xiangxiang Chu
Jianlin Su
Bo Zhang
Chunhua Shen
+ PDF Chat Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks 2023 Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
Chun Yuan
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
+ PDF Chat Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models 2024 Yang Jiao
Shaoxiang Chen
Zequn Jie
Jingjing Chen
Lin Ma
Yu–Gang Jiang
+ Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks 2022 Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
Chun Yuan
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
+ PDF Chat MoAI: Mixture of All Intelligence for Large Language and Vision Models 2024 Byung‐Kwan Lee
Beomchan Park
Chae Won Kim
Yong Man Ro
+ PDF Chat Visual Large Language Models for Generalized and Specialized Applications 2025 Y.-H. Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
+ Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model 2023 Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
+ VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models 2022 Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
+ Vision-Language Models for Vision Tasks: A Survey 2023 Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
+ Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language 2023 William Berrios
Gautam Mittal
Tristan Thrush
Douwe Kiela
Amanpreet Singh
+ PDF Chat Why are Visually-Grounded Language Models Bad at Image Classification? 2024 Yuhui Zhang
Alyssa Unell
Xiaohan Wang
Dhruba Ghosh
Yuchang Su
Ludwig Schmidt
Serena Yeung
+ PDF Chat Vision-Language Models for Vision Tasks: A Survey 2024 J Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
+ PDF Chat A Single Transformer for Scalable Vision-Language Modeling 2024 Yangyi Chen
Xingyao Wang
Hao Peng
Heng Ji
+ PDF Chat POINTS: Improving Your Vision-language Model with Affordable Strategies 2024 Yuan Liu
Zhongyin Zhao
Ziyuan Zhuang
Le Tian
Xiao Zhou
Jie Zhou
+ PDF Chat Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark 2025 Alexis Roger
Prateek Humane
Daniel Z. Kaplan
Kshitij Gupta
Qi Sun
George Adamopoulos
Jong‐Hwan Lim
Quentin Anthony
Eileen B. Fennell
Irina Rish
+ MiniVLM: A Smaller and Faster Vision-Language Model 2020 Jianfeng Wang
Xiaowei Hu
Pengchuan Zhang
Xiujun Li
Lijuan Wang
Lei Zhang
Jianfeng Gao
Zicheng Liu
+ Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding 2023 Wujian Peng
Sicheng Xie
Zuyao You
Shiyi Lan
Zuxuan Wu
+ PDF Chat How Well Can Vision Language Models See Image Details? 2024 Chenhui Gou
Abdulwahab Felemban
Faizan Farooq Khan
Deyao Zhu
Jianfei Cai
Hamid Rezatofighi
Mohamed Elhoseiny

Works Cited by This (0)

Action Title Year Authors